Cluster computing

Saturday, October 1, 2022

This article discusses using the checklist for architecting and building multitenant solutions. Administrators will find that this list is familiar to them.

The checklist is structured around business and technical considerations as well as the five pillars of the Azure well-architected framework. These pillars include 1) Reliability, 2) Security, 3) Cost Optimization, 4) Operational Excellence, and 5) Performance efficiency. The elements that support these pillars are Azure well-architected review, azure advisor, documentation, patterns-support-and-service offers, reference architectures and design principles. Out of these, cost optimization is one of the primary benefits of using the right tool for the right solution. It helps to analyze the spend over time as well as the effects of scale out and scale up. The Azure Advisor can help improve reusability, on-demand scaling, reduced data duplication, among many others. Performance is usually based on external factors and is very close to customer satisfaction. Continuous telemetry and reactiveness are essential to tuned up performance. The shared environment controls for management and monitoring create alerts, dashboards, and notifications specific to the performance of the workload. Performance considerations include storage and compute abstractions, dynamic scaling, partitioning, storage pruning, enhanced drivers, and multilayer cache.

Operational excellence comes with security and reliability. Security and data management must be built right into the system at layers for every application and workload. The data management and analytics scenario focus on establishing a foundation for security. Although workload specific solutions might be required, the foundation for security is built with the Azure landing zones and managed independently from the workload. Confidentiality and integrity of data including privilege management, data privacy and appropriate controls must be ensured. Network isolation and end-to-end encryption must be implemented. SSO, MFA, conditional access and managed service identities are involved to secure authentication. Separation of concerns between azure control plane and data plane as well as RBAC access control must be used.

The checklist for business considerations include 1. understanding what kind of solution is being created such as business-to-business, business-to-consumer, or enterprise software 2. Defining the tenants in terms of number and growth plans, 3. Defining the pricing model and ensuring it aligns with the tenants’ consumption of Azure resources. 4. Understanding whether we need to separate the tenants into different tiers and based on the customer’s requirements, deciding on the tenancy model. Finally, promoting the multitenant solution in the commercial marketplace.

The technical considerations emphasize design and service-level objectives, as well as the scale of the solution. It also suggests applying Chaos engineering to test the reliability of the solution. The security considerations involve Zero Trust and least privilege principles.

Friday, September 30, 2022

Recovery and Replication of data in Multitenant Applications:

This is a continuation of the detailed article on recovery options. In next section, we discuss replication options.

The example used here is a full disaster recovery scenario for a multitenant SaaS application implemented with the database per tenant model. The concepts introduced here are geo-restore and geo-replication.

There can be at most four geo-secondaries for a single primary. Multiple geo-secondaries are tactical redundancy. Additional secondaries can also be used to scale out read-only workloads. If there’s only one secondary and it fails, the application is exposed to higher risk until a new secondary is created.

Each geo-secondary can be a single database or a database in an elastic pool. The elastic pool choice for each geo-secondary database is separate and does not depend on the configuration of any other replica. Each elastic pool is contained within a single logical server. Database names must be unique in a pool so multiple geo secondaries cannot share the same pool.

A geo-secondary that has finished initial seeding can be failed over on demand by the user. If the primary is unavailable, only the unplanned geo-failover can be used. The geo-secondary becomes the new primary. Once the outage is mitigated, the system makes the recovered primary a geo-secondary. Linking of all secondaries to the new primary is automatically done and replication relationships are reconfigured. After the outage that caused the geo-failover is mitigated, it may be desirable to return the primary to its original region.

Preparing for a geo-failover involves validating that the authentication and network access for the secondary server are properly configured. The backup retention policy on the secondary database matches that of the primary. This setting is not part of the database, and it is not replicated from the primary. The default configuration of a geo-secondary has a default PITR retention period of 7 days.

Thursday, September 29, 2022

This article discusses using the checklist for architecting and building multitenant solutions. Administrators will find that this list is familiar to them.

Wednesday, September 28, 2022

Recovery and Replication of data in Multitenant Applications:

This is a continuation of the detailed article on recovery options. In next section, we discuss replication options.

Geo-replication can also be performed for database migration with minimum downtime and application upgrades by creating an extra secondary as a fail back copy during application upgrades. An end-to-end recovery requires recovery of all components and dependent services. All components are resilient to the same failures and become available within the recovery time objective of the application. Designing cloud solutions for disaster recovery include scenarios using two Azure regions for business continuity with minimal downtime or using regions with maximum data preservation or to replicate an application to different geographies to follow demand.

Automatic asynchronous replication requires transactions to be committed on the primary database before they are replicated. It is used for creating a geo-secondary. When the geo-secondary is created, the replica is populated with the data of the primary database. This process is known as seeding. After it has been created and seeded, updates to the primary database are automatically and asynchronously replicated to the replica.

An application can access the geo-secondary replica to execute read-only queries. The security principal used to access the geo-secondary can be same as the one used for primary or different.

A planned geo-failover switches the roles of the primary and geo-secondary databases after completing full data synchronization. A planned failover does not result in data loss. Since the replication is done based on well-known log shipping techniques, it duration depends on the size of the log at the origin. This kind of failover is applicable to performing disaster recovery drills, relocating the database to a different region, returning the database to the primary region after the outage has been mitigated.

On the other hand, unplanned geo-failover immediately switches the geo-secondary to the primary role without any synchronization with the primary. Any transactions committed on the primary but not yet replicated to the secondary are lost. Only when the primary is not available, should the geo-failover be unplanned.

There can be at most four geo-secondaries for a single primary. Multiple geo-secondaries is tactical redundancy. Additional secondaries can also be used to scale out read-only workloads.

Tuesday, September 27, 2022

Recovery and Replication of data in Multitenant Applications:

This is a continuation of the detailed article on Azure Monitor and Azure Monitor Logs. These are different services and Logs are incredibly useful for troubleshooting and notifications. In next section, we discuss recovery options.

A geo-restore can be used to recover the catalog and tenant databases from automatically maintained geo-redundant backups into an alternate recovery region. After the outage is resolved, geo-replicate can be used to repatriate the changed databases to their original region.

A database can be restored to an earlier point in time within its restoration period. This works for any service tier or compute size for the restored database. If the database is restored into an elastic pool, there must be sufficient resources in the pool to accommodate the database. There is no charge incurred during the restoration and the restored database is charged at normal rates after that.

A point-in-time restore does not support cross-server restoration and it cannot restore a geo-secondary database. Hyperscale databases are not subject to a backup frequency and must be restored on demand. A restored database can be used to replace the original database by renaming it. If the database is restored only for its data, a recovery script must extract and apply that data to the original database.

A restore operation on a long-term backup can be performed form the logical server via the user interface, command line, programmability interface or scripts. It is not applicable to Hyperscale databases.

A deleted database can be restored to the deletion time, or an earlier point in time or an earlier point of time on the same server. A geo-restore can perform cross-server cross-region restore from the most recent backups. It is typically done when the database is restored or the entire region is inaccessible.

There is usually a delay when a backup is taken and when it is geo-restored and the restored database can be upto one hour behind the original database. Geo-restore relies on automatically created geo-replicated backups with a recovery point objective of up to 1 hour and an estimated recovery time objective (RTO) of upto 12 hours. It does not guarantee that the target region will have the capacity to restore the database after a regional outage, because a sharp increase in demand is likely. Therefore, it is most used for small databases. Business continuity for larger databases is ensured via auto-failover groups. It has a much lower RPO and RTO and the capacity is guaranteed.

Monday, September 26, 2022

Recovery in Multitenant Applications:

This is a continuation of the detailed article on Azure Monitor and Azure Monitor Logs. Azure Monitor and Azure Monitor Logs are different services and Logs are incredibly useful for troubleshooting and notifications.

This article talks about Recovery

The example used here is a full disaster recovery scenario for a multitenant SaaS application implemented with the database per tenant model. In this case, a geo-restore can be used to recover the catalog and tenant databases from automatically maintained geo-redundant backups into an alternate recovery region. After the outage is resolved, georeplicate can be used to repatriate the changed databases to their original region.

Geo-restore is about recovering any database from a backup in Azure SQL Database, including Hyperscale databases. We can restore a database from a backup in Azure SQL Managed Instance.

Backups can also be taken on a scheduled basis and this helps to protect databases from user and application errors, accidental database deletion, and prolonged outages. This built-in capability is available for all service tiers and compute sizes of Azure SQL Managed instances. The automated recovery helps in these cases: 1. Recovering to a specific point in time within the retention period, 2. Recovering to the deletion time for a deleted database. 3. Recovering to the time of a recent backup and 4. Recovering to the point of the most recent replicated backups.

If Long-term retention is configured, even the long-term retention backup can be used to restore databases.

Several factors affect the recovery time. These include: the size of the database, the compute size of the database, the number of transaction logs involved, the amount of activity that needs to be replayed to recover to the restore point, the network bandwidth if restoring to a different region, the number of concurrent restore requests that are processed in the target region.

GeoReplication helps to create a continuously synchronized readable secondary database for a primary database. It is preferable although not required to have the secondary database in a different region. Since this kind of secondary database is merely readable, it is called a georeplica. This option serves to perform quick recovery of individual databases in case of a regional disaster or a large-scale outage. Once georeplication is setup, a geo-failover helps maintain continuity of business. It can be initiated programmatically or manually. If a stable connection endpoint is required with automatic geo-failover, an Auto-failover group can be used. It provides endpoint redirection. Auto-failover groups provide read-write and read-only listener endpoints. The connection string for the application does not change.

Sunday, September 25, 2022

Logging in Multitenant Applications:

This is a continuation of the detailed article on Azure Monitoring but with an emphasis on logs. Azure Monitor and Azure Monitor Logs are different services and Logs are incredibly useful for troubleshooting and notifications.

When the cloud monitoring service such as Azure Monitor logs is used to monitor elastic pools and databases for a multitenant application, typically only one account and namespace is needed for the entire integration unlike keyvaults, storage accounts, application instances and databases that proliferate in size and number with the number of tenants. A single geographical region and instance of the Azure Monitor logs is sufficient for the multitenant application solution.

Azure Monitor logs support monitoring thousands of elastic pools and hundreds of thousands of databases. Azure monitor logs provide a single monitoring solution which can integrate monitoring of different applications and Azure services across multiple Azure subscriptions.

A single cloud resource such as an Azure SQL Database can have monitoring and alerting made available via the Portal. It is not convenient to query these logs for large installations or for a unified view across resources and subscriptions. Azure Monitor Logs can collect logs from various resources and their services. The SQL Analytics solution provides several predefined elastic pool and database monitoring and alerting views and queries. It also provides a custom view designer.

Platform diagnostics data can be created by simulating a workload on the tenant. Provisioning a batch of tenants prior to the simulation is recommended because it will be close to the real-world scenario.

Also, the Log Analytics Workspace and the Azure SQL Analytics solution must be installed and configured prior to the workload. Azure Monitor Logs is different from Azure Monitor and collects logs and telemetry data in a Log Analytics workspace. The workspace can be created in a dedicated resource group. With the help of the Azure Portal, a Log Analytics workspace and the SQL Analytics solution can be activated. It might take a couple of minutes before the solution is active. The titles or individual databases can then be opened in a drop-down menu.

The Portal allows for date range-based filtering, and it shows the pools and databases on the server. Pool level metrics are also available. Monitoring and alerting in the Azure monitor logs are based on queries over the data in the workspace and not specific to a resource. This enables us to query across databases and subscriptions. Alert rules can also be set up in Azure Monitor Logs. The billing is based on the data volume in the workspace. A free workspace is limited to 500MB/data. After that limit is reached, the data is no longer added to the workspace. This is not the case with the premium tiers.