Cluster computing

Saturday, June 24, 2023

# REQUIRES -Version 2.0

Synopsis: The following Powershell script serves as a partial example

towards backup and restore of an AKS cluster.

The concept behind this form of BCDR solution is described here:

https://learn.microsoft.com/en-us/azure/backup/azure-kubernetes-service-cluster-backup-concept

param (

[Parameter(Mandatory=$true)][string]$resourceGroupName,

[Parameter(Mandatory=$true)][string]$accountName,

[Parameter(Mandatory=$true)][string]$subscriptionId,

[Parameter(Mandatory=$true)][string]$aksClusterName,

[Parameter(Mandatory=$true)][string]$aksClusterRG,

[string]$backupVaultRG = "testBkpVaultRG",

[string]$backupVaultName = "TestBkpVault",

[string]$location = "westus",

[string]$containerName = "backupc",

[string]$storageAccountName = "sabackup",

[string]$storageAccountRG = "rgbackup",

[string]$environment = "AzureCloud"

)

Connect-AzAccount -Environment "$environment"

Set-AzContext -SubscriptionId "$subscriptionId"

$storageSetting = New-AzDataProtectionBackupVaultStorageSettingObject -Type LocallyRedundant -DataStoreType OperationalStore

New-AzDataProtectionBackupVault -ResourceGroupName $backupVaultRG -VaultName $backupVaultName -Location $location -StorageSetting $storageSetting

$TestBkpVault = Get-AzDataProtectionBackupVault -VaultName $backupVaultName

$policyDefn = Get-AzDataProtectionPolicyTemplate -DatasourceType AzureKubernetesService

$policyDefn.PolicyRule[0]. Trigger | fl

ObjectType: ScheduleBasedTriggerContext

ScheduleRepeatingTimeInterval: {R/2023-04-05T13:00:00+00:00/PT4H}

TaggingCriterion: {Default}

$policyDefn.PolicyRule[1]. Lifecycle | fl

DeleteAfterDuration: P7D

DeleteAfterObjectType: AbsoluteDeleteOption

SourceDataStoreObjectType : DataStoreInfoBase

SourceDataStoreType: OperationalStore

TargetDataStoreCopySetting:

New-AzDataProtectionBackupPolicy -ResourceGroupName $backupVaultRG -VaultName $TestBkpVault.Name -Name aksBkpPolicy -Policy $policyDefn

$aksBkpPol = Get-AzDataProtectionBackupPolicy -ResourceGroupName $backupVaultRG -VaultName $TestBkpVault.Name -Name "aksBkpPolicy"

Write-Host "Installing Extension with cli"

az k8s-extension create --name azure-aks-backup --extension-type microsoft.dataprotection.kubernetes --scope cluster --cluster-type managedClusters --cluster-name $aksClusterName --resource-group $aksClusterRG --release-train stable --configuration-settings blobContainer=$containerName storageAccount=$storageAccountName storageAccountResourceGroup=$storageAccountRG storageAccountSubscriptionId=$subscriptionId

az k8s-extension show --name azure-aks-backup --cluster-type managedClusters --cluster-name $aksClusterName --resource-group $aksClusterRG

az k8s-extension update --name azure-aks-backup --cluster-type managedClusters --cluster-name $aksClusterName --resource-group $aksClusterRG --release-train stable --config-settings blobContainer=$containerName storageAccount=$storageAccountName storageAccountResourceGroup=$storageAccountRG storageAccountSubscriptionId=$subscriptionId # [cpuLimit=1] [memoryLimit=1Gi]

az role assignment create --assignee-object-id $(az k8s-extension show --name azure-aks-backup --cluster-name $aksClusterName --resource-group $aksClusterRG --cluster-type managedClusters --query identity.principalId --output tsv) --role 'Storage Account Contributor' --scope /subscriptions/$subscriptionId/resourceGroups/$storageAccountRG/providers/Microsoft.Storage/storageAccounts/$storageAccountName

az aks trustedaccess rolebinding create \

-g $aksClusterRG \

--cluster-name $aksClusterName\

–n randomRoleBindingName \

--source-resource-id $TestBkupVault.Id \

--roles Microsoft.DataProtection/backupVaults/backup-operator

Write-Host "This section is detailed overview of TrustedAccess"

az extension add --name aks-preview

az extension update --name aks-preview

az feature register --namespace "Microsoft.ContainerService" --name "TrustedAccessPreview"

az feature show --namespace "Microsoft.ContainerService" --name "TrustedAccessPreview"

az provider register --namespace Microsoft.ContainerService

# Create a Trusted Access RoleBinding in an AKS cluster

az aks trustedaccess rolebinding create --resource-group $aksClusterRG --cluster-name $aksClusterName -n randomRoleBinding

Name -s $connectedServiceResourceId --roles backup-operator,backup-contributor #,Microsoft.Compute/virtualMachineScaleSets/test-node-reader,Microsoft.Compute/virtualMachineScaleSets/test-admin

Write-Host "Update an existing Trusted Access Role Binding with new roles"

# Update RoleBinding command

az aks trustedaccess rolebinding update --resource-group $aksClusterRG --cluster-name $aksClusterName -n randomRoleBindingName --roles backup-operator,backup-contributor

Write-Host "Configure Backup"

$sourceClusterId = "/subscriptions/$subscriptionId/resourcegroups/$aksClusterRG /providers/Microsoft.ContainerService/managedClusters/$aksClusterName"

Write-Host "Snapshot resource group"

$snapshotRG = "/subscriptions/$subscriptionId/resourcegroups/snapshotrg"

Write-Host "The configuration of backup is performed in two steps"

$backupConfig = New-AzDataProtectionBackupConfigurationClientObject -SnapshotVolume $true -IncludeClusterScopeResource $true -DatasourceType AzureKubernetesService -LabelSelector "env=$environment"

$backupInstance = Initialize-AzDataProtectionBackupInstance -DatasourceType AzureKubernetesService -DatasourceLocation $dataSourceLocation -PolicyId $aksBkpPol.Id -DatasourceId $sourceClusterId -SnapshotResourceGroupId $snapshotRG -FriendlyName "Backup of AKS Cluster $aksClusterName" -BackupConfiguration $backupConfig

Write-Host "Assign required permissions and validate"

$aksCluster = $(Get-AzAksCluster -Id $sourceClusterId)

Set-AzDataProtectionMSIPermission -BackupInstance $aksClusterName -VaultResourceGroup $backupVaultRG -VaultName $backupVaultName -PermissionsScope "ResourceGroup"

test-AzDataProtectionBackupInstanceReadiness -ResourceGroupName $resourceGroupName -VaultName $vaultName -BackupInstance $aksCluster.Property

Write-Host "Protect the AKS cluster"

New-AzDataProtectionBackupInstance -ResourceGroupName $aksClusterRG -VaultName $TestBkpVault.Name -BackupInstance $aksCluster.Property

Write-Host "Run on-demand backup"

$instance = Get-AzDataProtectionBackupInstance -SubscriptionId $subscriptionId -ResourceGroupName $backupVaultRG -VaultName $TestBkpVault.Name -Name $aksClusterName

Write-Host "Specify Retention Rule"

$policyDefn.PolicyRule | fl

BackupParameter: Microsoft.Azure.PowerShell.Cmdlets.DataProtection.Models.Api20210201Preview.AzureBackupParams

BackupParameterObjectType: AzureBackupParams

DataStoreObjectType: DataStoreInfoBase

DataStoreType: OperationalStore

Name: BackupHourly

ObjectType: AzureBackupRule

Trigger: Microsoft.Azure.PowerShell.Cmdlets.DataProtection.Models.Api20210201Preview.ScheduleBasedTriggerContext

TriggerObjectType: ScheduleBasedTriggerContext

IsDefault: True

Lifecycle: {Microsoft.Azure.PowerShell.Cmdlets.DataProtection.Models.Api20210201Preview.SourceLifeCycle}

Name: Default

ObjectType: AzureRetentionRule

Write-Host "Trigger on-demand backup"

$AllInstances = Get-AzDataProtectionBackupInstance -ResourceGroupName $backupVaultRG -VaultName $TestBkpVault.Name

Backup-AzDataProtectionBackupInstanceAdhoc -BackupInstanceName $AllInstances[0].Name -ResourceGroupName $backupVaultRG -VaultName $TestBkpVault.Name -BackupRuleOptionRuleName "Default"

Write-Host "Tracking all the backup jobs"

$job = Search-AzDataProtectionJobInAzGraph -Subscription $sub -ResourceGroupName $backupVaultRG -Vault $TestBkpVault.Name -DatasourceType AzureKubernetesService -Operation OnDemandBackup

Friday, June 23, 2023

How to address IaC shortcomings – Part 6b?

A previous article discussed a resolution to IaC shortcomings for declaring resources with configuration not yet supported by an IaC repository. This article discusses irreversible changes and manual intervention for certain Iac deployments.

IaC is an agreement between the IaC provider and the resource provider. An attribute of a resource can only be applied when the IaC-provider applies it in the way the resource provider expects and the resource-provider provisions in the way that the IaC provider expects. In many cases, this is honored but some attributes can get out of sync resulting in unsuccessful deployments of what might seem to be correct declarations.

One of the limitations is when one resource is created as part of the configuration of another resource and there is an association formed between the two resources. It should be possible to reverse the rollout by disassociating the resources before deleting the one that was created. However, sometimes the associations cannot be broken by IaC or by actions on the management portal for the resources. Other forms of management such as the Azure CLI command must be used. In such cases, manual intervention or introduction of logic in the pipeline to break the impasse is required. Only by the mitigation and running the IaC twice, once to detect the conflict for the existing resources and second to reset the configuration, will the IaC start succeeding in subsequent deployments.

The destroy of an existing resource and the creation of a new resource is also required to keep the state in sync with the IaC. If the resource is being missed from the state, it might be interpreted as a resource that was not there in the IaC to begin with and require the destroy before the IaC recognized creation occurs.

It is possible to make use of the best of both worlds with a folder structure that separates the Terraform templates into a folder called ‘module’ and the resource provider templates in another folder at the same level and named something like ‘subscription-deployments’ which includes native blueprints and templates. The GitHub workflow definitions will leverage proper handling of either location or trigger the workflow on any changes to either of these locations.

The native support for extensibility depends on naming and logic.

Naming is facilitated with canned prefixes/suffixes and dynamic random string to make each rollout independent of the previous. Some examples include:

resource "random_string" "unique" {

count = var.enable_static_website && var.enable_cdn_profile ? 1 : 0

length = 8

special = false

upper = false

}

Logic can be written out with PowerShell for Azure public cloud which is the de facto standard for automation language. Then a pseudo resource can be added using this logic as follows:

resource "null_resource" "add_custom_domain" {

count = var.custom_domain_name != null ? 1 : 0

triggers = { always_run = timestamp() }

depends_on = [

azurerm_app_service.web-app

]

provisioner "local-exec" {

command = "pwsh ${path.module}/Setup-AzCdnCustomDomain.ps1"

environment = {

CUSTOM_DOMAIN = var.custom_domain_name

RG_NAME = var.resource_group_name

FRIENDLY_NAME = var.friendly_name

STATIC_CDN_PROFILE = var.cdn_profile_name

}

PowerShell scripts can help with both the deployment as well as the pipeline automations. There are a few caveats with scripts because the general preference is for declarative and idempotent IaC rather than script so extensibility must be given the same due consideration as customization.

All scripts can be stored in folders with names ending with ‘scripts’.
These are sufficient to address the above-mentioned shortcomings in the Infrastructure-as-Code.

Terraform and discusses the order and repetition involved in IaC deployments.

For instance, some attributes of a resource can be specified via the IaC provider but go completely ignored by the resource provider. If there are two attributes that can be specified, the resource-provider reserves the right to prioritize one over the other. Even when a resource attribute is correctly specified, the resource provider could mandate the destroy of existing resource and the creation of a new resource. A more common case is one when where the IaC wants to add a new property for all resources of a specific resource type but there are already existing resources that do not have that property initialized. In such a case, the applying of the IaC change to add a new property will fail for existing instances but succeed for the new instances. Only by running the IaC twice, once to detect the missing property for the existing resources and initialize and second to correctly report the new property, will the IaC start succeeding in subsequent deployments.

The native support for extensibility depends on naming and logic.

Naming is facilitated with canned prefixes/suffixes and dynamic random string to make each rollout independent of the previous. Some examples include:

resource "random_string" "unique" {

count = var.enable_static_website && var.enable_cdn_profile ? 1 : 0

length = 8

special = false

upper = false

}

Logic can be written out with PowerShell for Azure public cloud which is the de facto standard for automation language. Then a pseudo resource can be added using this logic as follows:

resource "null_resource" "add_custom_domain" {

count = var.custom_domain_name != null ? 1 : 0

triggers = { always_run = timestamp() }

depends_on = [

azurerm_app_service.web-app

]

provisioner "local-exec" {

command = "pwsh ${path.module}/Setup-AzCdnCustomDomain.ps1"

environment = {

CUSTOM_DOMAIN = var.custom_domain_name

RG_NAME = var.resource_group_name

FRIENDLY_NAME = var.friendly_name

STATIC_CDN_PROFILE = var.cdn_profile_name

}

All scripts can be stored in folders with names ending with ‘scripts’.
These are sufficient to address the above-mentioned shortcomings in the Infrastructure-as-Code.

Thursday, June 22, 2023

How to address IaC shortcomings – Part 6b?

The native support for extensibility depends on naming and logic.

Wednesday, June 21, 2023

Large files and Azure Databricks users

When data analytics users want to analyze large amounts of data, they choose Azure Databricks workspace and browse the data from remote unlimited storage. Usually, this is an Azure storage account, or a data lake and the users find it convenient to pass through their Azure Active Directory credentials and work with the files using their location as

· wasbs://container@storageaccount.blob.core.windows.net/prefix/to/blob for blobs and

· abfss://container@storageaccount.dfs.core.windows.net/path/to/file for files

This is sufficient to read with Spark as:

df = spark.read.format("binaryFile").option("pathGlobFilter", "*.p").load(file_location)

and write but the configuration for Spark is such that it is limited to 2Gb max message and unless the file is partitioned, large files usually cause trouble to the users when the same statement fails.

In such cases, the users must make use of a few options, if they wanted to continue using their AAD credentials.

First, they can use a Shared access signature URL that they create with user delegation to read the file. For example,

if not os.path.isfile("/dbfs/" + filename):

print("downloading file...")

with requests.get(sasUrl, stream=True) as resp:

if resp.ok:

with open("/dbfs/" + filename, "wb") as f:

for chunk in resp.iter_content(chunk_size=CHUNK_SIZE):

f.write(chunk)

print("file found...")

but it is somewhat hard to write interactively to a SAS URL because the request headers per the document to write with a SAS Token seems to be complained about. So, the writes need to be local and then the file uploaded.

from azure.storage.blob import ContainerClient

# https://learn.microsoft.com/en-us/python/api/azure-storage-blob/azure.storage.blob.containerclient?view=azure-python

sas_url = "https://account.blob.core.windows.net/mycontainer?sv=2015-04-05&st=2015-04-29T22%3A18%3A26Z&se=2015-04-30T02%3A23%3A26Z&sr=b&sp=rw&sip=168.1.5.60-168.1.5.70&spr=https&sig=Z%2FRHIX5Xcg0Mq2rqI3OlWTjEg2tYkboXr1P9ZUXDtkk%3D"

container = ContainerClient.from_container_url(sas_url)

with open(SOURCE_FILE, "rb") as data:

blob_client = container_client.upload_blob(name="myblob", data=data)

and second, they use the equivalent of mounting the remote storage as local filesystem but in this case, they must leverage a databricks access connector and the admin must grant permissions to the databricks user for using it in their workspace. The users can continue to use their AD credentials spanning the databricks workspace, the access connector and the external storage.

Third party options like smart_open discuss ways to do so such as follows:
!pip install smart_open[all]

!pip install azure-storage-blob

from smart_open import open

import os

# stream from Azure Blob Storage

connect_str="BlobEndpoint=https://storageaccount.blob.core.windows.net; \

SharedAccessSignature=<sasToken>"

transport_params = {

'client': azure.storage.blob.BlobServiceClient.from_connection_string(connect_str),

}

filename = 'azure://container/path/to/file'

# stream content *into* Azure Blob Storage (write mode):

with open(filename, 'wb', transport_params=transport_params) as fout:

fout.write(b'contents written here')

for line in open(filename, transport_params=transport_params):

print(line)

but this fails with the error that the write using the SASURL is not permitted even though the SASURL was generated with write permission

Therefore, leveraging the Spark framework as much as possible and working with SASUrls for upload and download of large files are preferable. Fortunately, the file transfer of about 10GB file takes only a couple of minutes.

Tuesday, June 20, 2023

How to resolve IaC shortcomings? Part 5

A previous article discussed a resolution to IaC shortcomings for declaring resources with configuration not yet supported by an IaC repository. This article continues that discussion with native support for extensibility with Terraform and discusses the order and repetition involved in IaC deployments.

The native support for extensibility depends on naming and logic.

Naming is facilitated with canned prefixes/suffixes and dynamic random string to make each rollout independent of the previous. Some examples include:

resource "random_string" "unique" {

count = var.enable_static_website && var.enable_cdn_profile ? 1 : 0

length = 8

special = false

upper = false

}

Logic can be written out with PowerShell for Azure public cloud which is the de facto standard for automation language. Then a pseudo resource can be added using this logic as follows:

resource "null_resource" "add_custom_domain" {

count = var.custom_domain_name != null ? 1 : 0

triggers = { always_run = timestamp() }

depends_on = [

azurerm_app_service.web-app

]

provisioner "local-exec" {

command = "pwsh ${path.module}/Setup-AzCdnCustomDomain.ps1"

environment = {

CUSTOM_DOMAIN = var.custom_domain_name

RG_NAME = var.resource_group_name

FRIENDLY_NAME = var.friendly_name

STATIC_CDN_PROFILE = var.cdn_profile_name

}

All scripts can be stored in folders with names ending with ‘scripts’.
These are sufficient to address the above-mentioned shortcomings in the Infrastructure-as-Code.