Wednesday, June 9, 2021

Migrating all regions in a multi-region Cosmos DB account to be each availability-zone redundant.

 

Introduction: 

Azure is the public cloud that offers CosmosDB as a resource for storing documents. This service is the equivalent of a global database for organizations that want to store semi-structured data. When the CosmosDB is provisioned as a singleton instance, it is hosted in a single region with read-and-write capabilities. This is made zone –redundant when there are multiple instances of the CosmosDB server each in separate zones so that if one fails, another can take over. All the instances from these zones behave the same way and there is no data loss between failovers. When the CosmosDB is provisioned in multiple regions, some instances can participate to be read-only while others can be provisioned to be write-only. It is possible to write to any one of the regions capable of taking writes and be able to read it from a read-only region. Provisioning such a multiple region accounts to be zone-redundant requires each of the regions to be zone-redundant. This article explains the steps to do so and the considerations to be made. 

Description: 

The difference between migrating a single region account to a zone redundant formation and a multiple region account that is migrating to its zone-redundancy is that those other regions may not all behave the same way. They may work in read-only or read-write modes and the account may display them in a list of regions corresponding to each of the sections titled Locations, Read Locations, and Write Locations in the summary information of its account.  

Powershell is a scripting language suitable for automation although it is not the only way for automating the migration of accounts from zonal to zone-redundant configurations. This example is described with the help of Powershell. The Powershell command to specify zone redundancy to the account takes only the LocationObject parameter for this purpose. The LocationObject can enumerate all the regions that the CosmosDB is hosted on, but it does not differentiate between read-and-write regions. There are no other parameters or options to specify separate lists for read-locations and write-locations so it may come across as difficult to differentiate between the regions in the consolidated list. 

The List of regions to migrate could come from the account users but if the automation were to automatically migrate the account in place, it needs to make a few decisions itself. Certainly, all regions can be enabled for zone-redundancy. There is a cost aspect but without the information for the user, let us see what the tradeoffs are in converting more regions than the primary to be zone redundant.  

If the write regions are more than one, this means the account allows database access directly to those regions for read-and-write and distribute the traffic geographically. Each such region appears as its own instance to its users. Syncing data across regions is costly, so converting the write-regions to do sync over low latency networks reduces the cost while improving the redundancy. Enabling the redundancy on each write-region increases billing as a matter of policy but technically it reduces cost in comparison to cross-region replication.  The read regions anyways rely on replication so the choice of converting them to zone–redundant regions rely exclusively on the customers' acceptance of the billing increase. The priority for automation has always been to secure the additional write regions with the read regions following next. 

The steps to convert the regions involve the use of the Locations list and the Write-Locations List to determine the difference that can be treated as read-regions. We start with a locations list that has just the primary and toggle the boolean option to convert all regions in the account to be write-only. Then we incrementally add each region with the zone-redundancy flag set to true. As each of the write-regions are added, they are all set with zone-redundancy and to support writes. Then the primary region is removed and added back with the zone-redundancy set. This converts all the write-regions including the primary to be zone-redundant. The adding and removing of a region is a requirement to turning on the zone-redundancy for that region. The priority order of the write regions can be fixed to be in the same order as the original configuration.  Then the flag for specifying all regions to be write-only can be disabled. This makes subsequent region additions to be read regions. The list of regions that need to be read regions is already known at this point. So, they are iteratively added back to the new configuration one by one and enabling the zone redundancy.  

This makes the automation to migrate an account to zone redundancy easy by starting with a Location Object that has only one region as write-only even if it not zone-redundant, adding it back between write regions and read regions, and taking all the other regions incrementally with the redundant zone option specified. Testing for this fix is just as easy as the automation because we have either all the cases covered under three categories - single, mixed, and all regions as write. CosmosDB is never provisioned without at least one write-region so we start out with at least one. Lastly, it must be called out that specifying zone redundant option on regions may throw an error it does not support availability zone redundancy. Older regions are known to be so. Therefore, the flag to set the redundancy must be specified as true only when the region supports it and false otherwise.  The net effect of all these steps is that the older configuration of separate read and write regions are maintained but each of them now has cheap and cost-effective availability zone-redundancy set to true. 



No comments:

Post a Comment