Cluster computing

Thursday, October 6, 2022

The previous articles on Multitenant Applications talked about recovery and replication. This article talks about organization of data particularly keys and values.

Configuration data is stored as key-values. They are a simple and flexible representation of application settings. Keys serve as identifiers to retrieve corresponding values. A best practice involves organizing keys in hierarchical namespaces by using a character delimiter. A convention is not mandated for multiple tenants, but it helps. Keys regardless of their format must be treated as a whole. When parsing is avoided, it is easier to not break any usages. Data used within application frameworks might dictate specific naming schemes for key-values. A combined size limit of 10KB usually applies on a key-value.

Key namespaces must be easier to read with proper use of delimiters to denote hierarchical information. They must also be easier to manage. A key-name hierarchy must represent logical groups of configuration data. They should be easy to query using pattern matching.

Key-values can optionally have a label attribute. Labels are used to differentiate key-values with the same key. No labels are associated initially, and key-values can be referenced without a label. A common use of labels is to denote environments for the same key. Labels can also be used to create versions. It is possible to jump forward or fall back between keys using versions. Values depend on content-type with Unicode as the most popular form. MIME types are also applicable for feature flags, Key Vault references, and JSON key-values.

Key-values can be soft-deleted which means that they can be recovered. Soft delete will act as a safeguard to scenarios including the case when a deleted app configuration store could be recovered in the retention time period and the case when an app configuration store is permanently deleted. A soft-deleted store will be retained for a short time known as the retention period. When it elapses, the store is deleted permanently. Key-values can also be purged before the retention period expires. Permission to read and purge deleted stores are granted to owner and contributor roles by default.

Json content-type is preferable over other formats for key-values because it provides simpler data management, enhanced data export, and native support with app configuration provider. When configuration data changes, Event Grid can be used to receive data change notifications. These events can trigger web hooks, Azure functions, Azure storage queues, or any other event handler. Typically, a resource group and a message endpoint are created to subscribe to the topic.

Wednesday, October 5, 2022

The previous articles on Multitenant Applications talked about recovery and replication. This article talks about organization of data particularly keys and values.

Tuesday, October 4, 2022

Some Terminologies for geo-replication:

Automatic asynchronous replication – a geo-secondary is created for an existing database on any logical server other than the server where the primary database is located. When it is created, it is populated with the data of the primary database by a process known as seeding. Updates to the database are automatically and asynchronously replicated. Asynchronous replication means that the transactions are committed on the primary database before they are replicated.

Readable geo-secondary replicas – An application can access a geo-secondary replica to execute read-only queries using the same or different security principals. The geo-secondary replicas are chargeable after replication and failover but if they remain read-only geo-secondary to the primary, they are free of charge.

A planned geo-failover switches the roles of the primary and geo-secondary databases after completing full data synchronization. A planned failover does not result in data loss. Since the replication is done based on well-known log shipping techniques, it duration depends on the size of the log at the origin. This kind of failover is applicable to performing disaster recovery drills, relocating the database to a different region, returning the database to the primary region after the outage has been mitigated.

On the other hand, unplanned geo-failover immediately switches the geo-secondary to the primary role without any synchronization with the primary. Any transactions committed on the primary but not yet replicated to the secondary are lost. Only when the primary is not available, should the geo-failover be unplanned.

There can be at most four geo-secondaries for a single primary. Multiple geo-secondaries is tactical redundancy. Additional secondaries can also be used to scale out read-only workloads.

There can be at most four geo-secondaries for a single primary. Multiple geo-secondaries are tactical redundancy. Additional secondaries can also be used to scale out read-only workloads. If there’s only one secondary and it fails, the application is exposed to higher risk until a new secondary is created.

Each geo-secondary can be a single database or a database in an elastic pool. The elastic pool choice for each geo-secondary database is separate and does not depend on the configuration of any other replica. Each elastic pool is contained within a single logical server. Database names must be unique in a pool so multiple geo secondaries cannot share the same pool.

A geo-secondary that has finished initial seeding can be failed over on demand by the user. If the primary is unavailable, only the unplanned geo-failover can be used. The geo-secondary becomes the new primary. Once the outage is mitigated, the system makes the recovered primary a geo-secondary. Linking of all secondaries to the new primary is automatically done and replication relationships are reconfigured. After the outage that caused the geo-failover is mitigated, it may be desirable to return the primary to its original region.

Preparing for a geo-failover involves validating that the authentication and network access for the secondary server are properly configured. The backup retention policy on the secondary database matches that of the primary. This setting is not part of the database, and it is not replicated from the primary. The default configuration of a geo-secondary has a default Point-in-time Recovery (PITR) retention period of 7 days.

Monday, October 3, 2022

Recovery and Replication of data in Multitenant Applications:  

This is a continuation of the detailed article on recovery options. In next section, we discuss replication options.  

The example used here is a full disaster recovery scenario for a multitenant SaaS application implemented with the database per tenant model. The concepts introduced here are geo-restore and geo-replication.  

GeoReplication helps to create a continuously synchronized readable secondary database for a primary database. It is preferable although not required to have the secondary database in a different region. Since this kind of secondary database is merely readable, it is called a georeplica. This option serves to perform quick recovery of individual databases in case of a regional disaster or a large-scale outage. Once georeplication is setup, a geo-failover helps maintain continuity of business.  

In an on-premises Availability Group setup, the secondary database is not charged if read queries are not offloaded to it and SA (software assurance) agreement is in place. But secondary databases of geo-replication or failover groups will always be charged.

Azure Hybrid benefits are such that secondary databases of geo-replication or failover groups will still be charged. This is even though secondary databases of geo-replication or failover groups will be in read-only mode.

Failover groups initiate automatic failover after 60 minutes grace period. Forced failover can be initiated anytime if customer initiates it themselves or sets up their own automation. To monitor the replication lag between the primary database and the geo-secondary, there is a dynamic management view available from Azure SQL managed instance called the sys.dm_geo_replication_link_status.

Sunday, October 2, 2022

Recovery and Replication of data in Multitenant Applications:

This is a continuation of the detailed article on recovery options. In next section, we discuss replication options.

Saturday, October 1, 2022

This article discusses using the checklist for architecting and building multitenant solutions. Administrators will find that this list is familiar to them.

The checklist is structured around business and technical considerations as well as the five pillars of the Azure well-architected framework. These pillars include 1) Reliability, 2) Security, 3) Cost Optimization, 4) Operational Excellence, and 5) Performance efficiency. The elements that support these pillars are Azure well-architected review, azure advisor, documentation, patterns-support-and-service offers, reference architectures and design principles. Out of these, cost optimization is one of the primary benefits of using the right tool for the right solution. It helps to analyze the spend over time as well as the effects of scale out and scale up. The Azure Advisor can help improve reusability, on-demand scaling, reduced data duplication, among many others. Performance is usually based on external factors and is very close to customer satisfaction. Continuous telemetry and reactiveness are essential to tuned up performance. The shared environment controls for management and monitoring create alerts, dashboards, and notifications specific to the performance of the workload. Performance considerations include storage and compute abstractions, dynamic scaling, partitioning, storage pruning, enhanced drivers, and multilayer cache.

Operational excellence comes with security and reliability. Security and data management must be built right into the system at layers for every application and workload. The data management and analytics scenario focus on establishing a foundation for security. Although workload specific solutions might be required, the foundation for security is built with the Azure landing zones and managed independently from the workload. Confidentiality and integrity of data including privilege management, data privacy and appropriate controls must be ensured. Network isolation and end-to-end encryption must be implemented. SSO, MFA, conditional access and managed service identities are involved to secure authentication. Separation of concerns between azure control plane and data plane as well as RBAC access control must be used.

The checklist for business considerations include 1. understanding what kind of solution is being created such as business-to-business, business-to-consumer, or enterprise software 2. Defining the tenants in terms of number and growth plans, 3. Defining the pricing model and ensuring it aligns with the tenants’ consumption of Azure resources. 4. Understanding whether we need to separate the tenants into different tiers and based on the customer’s requirements, deciding on the tenancy model. Finally, promoting the multitenant solution in the commercial marketplace.

The technical considerations emphasize design and service-level objectives, as well as the scale of the solution. It also suggests applying Chaos engineering to test the reliability of the solution. The security considerations involve Zero Trust and least privilege principles.

Friday, September 30, 2022

Recovery and Replication of data in Multitenant Applications:

This is a continuation of the detailed article on recovery options. In next section, we discuss replication options.