Tuesday, May 30, 2023

Pattern #1: Backup your MySQL database

  • Overview/Purpose:

This pattern articulates the way to provide business continuity and disaster recovery for your MySQL databases deployed on a single server or on a cluster in Azure such that the data can be recovered after a user or application error, regional data center outage, or unplanned disruptions.

  • Concepts to Understand

Paired Region: Azure supports cross region replication pairings for all geographies. Regions are paired for cross-region replication based on proximity and other factors. The Azure regional pairs in North America include East US – West US, East US 2 – Central US, North Central US – South Central US, West US 2 – West Central US, and West US 3 – East US. One of the benefits of choosing from these pairings is that if there’s a broad outage, recovery of at least one region is prioritized. Without pairings, the default region used across many deployments is Central US, but it is recommended to achieve high availability via availability zones and locally redundant or zone-redundant storage. Regions without a pair will not have geo-redundant storage.

Geo-Restore: This is a feature of the Azure Database for MySQL that allows the server to be restored with geo-redundant backups. The backups are hosted in the server’s paired region.

RTO: The Recovery Time Objective is the amount of time that the resource can be down without causing significant damage to the business and the time spent restoring it back to normal operations after the incident.

RPO: The Recovery Point Objective is the amount of time that might pass during a disruption before the quantity of data lost during that period is greater than the allowable threshold.

  • Solution Design

Set your MySQL server to take:

1.     geo-redundant backups with the ability to initiate geo-restore, or

2.     deploy read replicas in a different region.

With Geo-restore, a new server is created using the backup data that is replicated from another region. The overall time it takes to restore and recover depends on the size of the database and the number of logs to recover which is in the range of a few minutes to a few hours.

With read replicas, transaction logs from the primary are asynchronously streamed to the replica. In the event of a primary database outage due to a zone-level or regional level fault, failing over to the replica provides a shorter RTO and reduced data loss.

Feature

Cost

RTO

RPO

Geo-restore

Only on General-purpose/memory-optimized SKU

Varies

 <1h

Read replicas

Available on Basic

Minutes but depends on latency, size of data and write workload

< 5 min

 

Terraform to apply:

Option 1:

resource "azurerm_mysql_flexible_server" "default" {

:

  create_mode: “GeoRestore”

  geo_redundant_backup_enabled = true

  source_server_id: “other_server”

:

}

 

Changing the backup attribute to be geo_redundant from the default of locally redundant via Terraform, so that there is protection against region level failures, is an action that involves destroying the existing instance and creating it again.

Option 2:

resource "azurerm_mysql_flexible_server" "example" {

:

  create_mode: “Replica”

  source_server_id: “other_server”

  sku_name               = "B_Standard_B1s"

}

 

·        If possible, run a test drill for your changes.

Recovery plan:Applications do not see the failure of a database or storage because the configured MySQL server automatically recovers but user action is required when there is a region failure or a user error. A region failure is a rare event and requires the promotion of a read replica to master. The replica is stopped and then promoted.

This pattern holds true for Cassandra cluster as well where we can specify hours_between_backups that defaults to 24 hours and it takes continuous backups. Paired region support is available for Kubernetes clusters and persistent volumes.

Note that the databases are typically backed up automatically every day, we only need to choose between geo-restoring from a backup or linking a replica to the original server. It works for both a single server instance as well as a high-availability flexible server instance.

No comments:

Post a Comment