Wednesday, May 31, 2023

Pattern #2: Backup your Virtual Machines

 

  • Overview/Purpose:

This pattern articulates the way to provide business continuity and disaster recovery for your virtual machines such that the state can be recovered after a user or application error, regional data center outage, or unplanned disruptions.

 

  • Solution Design

Use the Azure Site Recovery to:

1.     Continuously replicate to a different target region and

2.     Setup a replication policy with

a.      Recovery point retention policy set to 1 day.

b.     App-consistent snapshot frequency set to 0 hours.

3.     Use a Backup vault to store the snapshots.

4.     The replication settings must specify target location, target subscription which can be the same as the source subscription and a target resource group, failover virtual network, failover subnet, new replica managed disk at destination, cache storage at source which is a Standard storage account, availability options for each VM and capacity reservation.

5.     After the VMs are enabled for replication, we can check the status of VM health under replicated items.

Do not setup VM replication for Databricks VM or other commodity compute that have no persistence of processing state.

 

Fail over to the secondary region and fail back to the primary region during and after outage.

 

With Azure Site Recovery:

 

Feature

Cost

RTO

RPO

Fail over and fail back

Free for 31 days,

Incur charges for Azure storage, storage transactions and data transfers.  A recovered VM might also incur compute charges.

Varies from a few minutes up to 2 hours

Recovery points can be as frequent as every hour.

 

Azure Backup is complimentary to Site Recovery. It allows for granular backups and restores specific data while Site Recovery allows for the protection of an entire site with automation and orchestration to make the failover and failback process seamless.

 

If possible, run a test drill for your changes.

 

Terraform to apply:

 

Definitions:

resource "azurerm_virtual_machine" "vm" {
  name                  = "vm"
  location              = azurerm_resource_group.primary.location
  resource_group_name   = azurerm_resource_group.primary.name
  vm_size               = "Standard_B1s"
  network_interface_ids = [azurerm_network_interface.vm.id]
 
  storage_image_reference {
    publisher = "OpenLogic"
    offer     = "CentOS"
    sku       = "7.5"
    version   = "latest"
  }
 
  storage_os_disk {
    name              = "vm-os-disk"
    os_type           = "Linux"
    caching           = "ReadWrite"
    create_option     = "FromImage"
    managed_disk_type = "Premium_LRS"
  }
 
  os_profile {
    admin_username = "test-admin-123"
    admin_password = "test-pwd-123"
    computer_name  = "vm"
  }
 
  os_profile_linux_config {
    disable_password_authentication = false
  }
}
 
resource "azurerm_recovery_services_vault" "vault" {
  name                = "example-recovery-vault"
  location            = azurerm_resource_group.secondary.location
  resource_group_name = azurerm_resource_group.secondary.name
  sku                 = "Standard"
}
 
resource "azurerm_site_recovery_fabric" "primary" {
  name                = "primary-fabric"
  resource_group_name = azurerm_resource_group.secondary.name
  recovery_vault_name = azurerm_recovery_services_vault.vault.name
  location            = azurerm_resource_group.primary.location
}
 
resource "azurerm_site_recovery_fabric" "secondary" {
  name                = "secondary-fabric"
  resource_group_name = azurerm_resource_group.secondary.name
  recovery_vault_name = azurerm_recovery_services_vault.vault.name
  location            = azurerm_resource_group.secondary.location
}
 
resource "azurerm_site_recovery_protection_container" "primary" {
  name                 = "primary-protection-container"
  resource_group_name  = azurerm_resource_group.secondary.name
  recovery_vault_name  = azurerm_recovery_services_vault.vault.name
  recovery_fabric_name = azurerm_site_recovery_fabric.primary.name
}
 
resource "azurerm_site_recovery_protection_container" "secondary" {
  name                 = "secondary-protection-container"
  resource_group_name  = azurerm_resource_group.secondary.name
  recovery_vault_name  = azurerm_recovery_services_vault.vault.name
  recovery_fabric_name = azurerm_site_recovery_fabric.secondary.name
}
 
resource "azurerm_site_recovery_replication_policy" "policy" {
  name                                                 = "policy"
  resource_group_name                                  = azurerm_resource_group.secondary.name
  recovery_vault_name                                  = azurerm_recovery_services_vault.vault.name
  recovery_point_retention_in_minutes                  = 24 * 60
  application_consistent_snapshot_frequency_in_minutes = 4 * 60
}