Cluster computing

Monday, October 3, 2022

Recovery and Replication of data in Multitenant Applications:  

This is a continuation of the detailed article on recovery options. In next section, we discuss replication options.  

The example used here is a full disaster recovery scenario for a multitenant SaaS application implemented with the database per tenant model. The concepts introduced here are geo-restore and geo-replication.  

GeoReplication helps to create a continuously synchronized readable secondary database for a primary database. It is preferable although not required to have the secondary database in a different region. Since this kind of secondary database is merely readable, it is called a georeplica. This option serves to perform quick recovery of individual databases in case of a regional disaster or a large-scale outage. Once georeplication is setup, a geo-failover helps maintain continuity of business.  

There can be at most four geo-secondaries for a single primary. Multiple geo-secondaries are tactical redundancy. Additional secondaries can also be used to scale out read-only workloads. If there’s only one secondary and it fails, the application is exposed to higher risk until a new secondary is created. 

Each geo-secondary can be a single database or a database in an elastic pool. The elastic pool choice for each geo-secondary database is separate and does not depend on the configuration of any other replica. Each elastic pool is contained within a single logical server. Database names must be unique in a pool so multiple geo secondaries cannot share the same pool. 

A geo-secondary that has finished initial seeding can be failed over on demand by the user. If the primary is unavailable, only the unplanned geo-failover can be used. The geo-secondary becomes the new primary. Once the outage is mitigated, the system makes the recovered primary a geo-secondary. Linking of all secondaries to the new primary is automatically done and replication relationships are reconfigured. After the outage that caused the geo-failover is mitigated, it may be desirable to return the primary to its original region. 

Preparing for a geo-failover involves validating that the authentication and network access for the secondary server are properly configured. The backup retention policy on the secondary database matches that of the primary. This setting is not part of the database, and it is not replicated from the primary. The default configuration of a geo-secondary has a default PITR retention period of 7 days. 

In an on-premises Availability Group setup, the secondary database is not charged if read queries are not offloaded to it and SA (software assurance) agreement is in place. But secondary databases of geo-replication or failover groups will always be charged.

Azure Hybrid benefits are such that secondary databases of geo-replication or failover groups will still be charged. This is even though secondary databases of geo-replication or failover groups will be in read-only mode.

Failover groups initiate automatic failover after 60 minutes grace period. Forced failover can be initiated anytime if customer initiates it themselves or sets up their own automation. To monitor the replication lag between the primary database and the geo-secondary, there is a dynamic management view available from Azure SQL managed instance called the sys.dm_geo_replication_link_status.

Cluster computing

Monday, October 3, 2022

No comments:

Post a Comment