Cluster computing

Monday, September 2, 2024

With the surge of data science and analytics projects, many data scientists are required to build a chatbot application for their data. This article covers some of the ways to do that. We assume that a workspace is used by these data scientists to bring their compute and data together. Let us say that this is a databricks workspace and the data in available via the catalog and delta lake and the compute cluster has been provisioned as dedicated to this effort. The example/tutorial we refer to is published by the Databricks official documentation but is compared with the ease of use of exporting the user interface to an app service.

Part 1.

The example for Databricks separates the model and the user interface in this way :

Step 1. Set up the environment:

%pip install transformers sentence-transformers faiss-cpu

Step 2. Load the data into a Delta table:

from pyspark.sql import SparkSession

spark = SparkSession.builder.appName("Chatbot").getOrCreate()

# Load your data

data = [

{"id": 1, "text": "What is Databricks?"},

{"id": 2, "text": "How to create a Delta table?"}

]

df = spark.createDataFrame(data)

df.write.format("delta").save("/mnt/delta/chatbot_data")

Step 3. Generate embeddings using a pre-trained model:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer('all-MiniLM-L6-v2')

texts = [row['text'] for row in data]

embeddings = model.encode(texts)

# Save embeddings

import numpy as np

np.save("/dbfs/mnt/delta/embeddings.npy", embeddings)

Step 4. Use FAISS to perform vector search over the embeddings.

import faiss

# Load embeddings

embeddings = np.load("/dbfs/mnt/delta/embeddings.npy")

# Create FAISS index

index = faiss.IndexFlatL2(embeddings.shape[1])

index.add(embeddings)

# Save the index

faiss.write_index(index, "/dbfs/mnt/delta/faiss_index")

Step 5. Create a function to handle user queries and return relevant responses.

def chatbot(query):

query_embedding = model.encode([query])

D, I = index.search(query_embedding, k=1)

response_id = I[0][0]

response_text = texts[response_id]

return response_text

# Test the chatbot

print(chatbot("Tell me about Databricks"))

Step 6. Deploy the chatbot as

Option a) Databricks widget

dbutils.widgets.text("query", "", "Enter your query")

query = dbutils.widgets.get("query")

if query:

response = chatbot(query)

print(f"Response: {response}")

else:

print("Please enter a query.")

Option b) a rest api

from flask import Flask, request, jsonify

app = Flask(__name__)

@app.route('/chatbot', methods=['POST'])

def chatbot_endpoint():

query = request.json['query']

response = chatbot(query)

return jsonify({"response": response})

if __name__ == '__main__':

app.run(host='0.0.0.0', port=5000)

Step 7. Test the API:

For option a) use the widgets to interact with the notebook:

# Display the widgets

dbutils.widgets.text("query", "", "Enter your query")

query = dbutils.widgets.get("query")

if query:

response = chatbot(query)

displayHTML(f"<h3>Response:</h3><p>{response}</p>")

else:

displayHTML("<p>Please enter a query.</p>")

For option b) make a web request:

curl -X POST http://<your-databricks-url>:5000/chatbot -H "Content-Type: application/json" -d '{"query": "Tell me about Databricks"}'

Part 2.

The example for app service leverages the following query and user interface in this way:

The code hosting the model and completing the results of the query comprises of the following:

import openai, os, requests

openai.api_type = "azure"

# Azure OpenAI on your own data is only supported by the 2023-08-01-preview API version

openai.api_version = "2023-08-01-preview"

# Azure OpenAI setup

openai.api_base = "https://azai-open-1.openai.azure.com/" # Add your endpoint here

openai.api_key = os.getenv("OPENAI_API_KEY") # Add your OpenAI API key here

deployment_id = "mdl-gpt-35-turbo" # Add your deployment ID here

# Azure AI Search setup

search_endpoint = "https://searchrgopenaisadocs.search.windows.net"; # Add your Azure AI Search endpoint here

search_key = os.getenv("SEARCH_KEY"); # Add your Azure AI Search admin key here

search_index_name = "undefined"; # Add your Azure AI Search index name here

def setup_byod(deployment_id: str) -> None:

"""Sets up the OpenAI Python SDK to use your own data for the chat endpoint.

:param deployment_id: The deployment ID for the model to use with your own data.

To remove this configuration, simply set openai.requestssession to None.

"""

class BringYourOwnDataAdapter(requests.adapters.HTTPAdapter):

def send(self, request, **kwargs):

request.url = f"{openai.api_base}/openai/deployments/{deployment_id}/extensions/chat/completions?api-version={openai.api_version}"

return super().send(request, **kwargs)

session = requests.Session()

# Mount a custom adapter which will use the extensions endpoint for any call using the given `deployment_id`

session.mount(

prefix=f"{openai.api_base}/openai/deployments/{deployment_id}",

adapter=BringYourOwnDataAdapter()

)

openai.requestssession = session

setup_byod(deployment_id)

message_text = [{"role": "user", "content": "What are the differences between Azure Machine Learning and Azure AI services?"}]

completion = openai.ChatCompletion.create(

messages=message_text,

deployment_id=deployment_id,

dataSources=[ # camelCase is intentional, as this is the format the API expects

{

"type": "AzureCognitiveSearch",

"parameters": {

"endpoint": search_endpoint,

"key": search_key,

"indexName": search_index_name,

}

]

)

print(completion)

The user interface is simpler with code to host the app service as a react web app:

npm install @typebot.io/js @typebot.io/react

import { Standard } from "@typebot.io/react";

const App = () => {

return (

<Standard

typebot="basic-chat-gpt-civ35om"

style={{ width: "100%", height: "600px" }}

);

};

This concludes the creation of a chatbot function using the workspace.

Sunday, September 1, 2024

This is a summary of the book titled “Small Data: The Tiny clues that uncovers Huge Trends” written by Martin Lindstrom and published by St. Martin’s Press in 2017. What Sherlock Holmes was to clues solving a mystery, Martin Lindstrom strives to be that investigator for interpreting the buying preferences of individuals. As a marketing expert, he uses this to help individuals be more objective about their own preferences while empowering brands to understand customers’ unfulfilled and unmet desires. While data privacy advocates may balk at the data being scrutinized, the author teaches how small data can uncover insights in a set of 7 steps. Paying attention to cultural imbalances in people’s lives, freedom to be oneself, embodying one’s perspectives and owning universal moments help customers articulate their desires and demands. Small data helps to understand people’s desire motivated “twin-selves”. Then the narrative can be tuned to help customers connect with brands.

Small data researchers can uncover insights into consumer desires that big data misses. As an adviser for Lego, Martin used ethnographic insights from a 11-year-old German boy to inform its strategy, reducing the size of its building bricks and increasing the demands of Lego construction challenges. By 2014, Lego had become the largest global toy maker, surpassing Mattel. Small data can include habits, preferences, gestures, hesitations, speech patterns, decor, and online activity.

Small data can also reveal cultural imbalances that indicate what is missing in people's lives. For example, in the Russian Far East, colorful magnets covered refrigerator doors, symbolizing foreign travel, escape, and freedom. This led to the concept for Mamagazin – Mum’s Store, an e-commerce platform built for and by Russian mothers.

Freedom to be yourself is the greatest untapped American desire. Lindstrom helped Lowes Foods conceive a new strategy for stores in North Carolina and South Carolina, revealing that Americans value security and are often fearful. He concluded that freedom was not prevalent in everyday US culture, making it an untapped desire.

Lindstrom's marketing strategies have been successful in connecting with customers and addressing their unique needs. He helped a global cereal company understand why young women were not buying its top-selling breakfast brand by observing the tense relationships between Indian women and their mothers-in-law. He created a cereal box with two different color palettes, featuring earth tones for taller women and bright colors for mothers-in-law. Lindstrom also appealed to people's tribal need to belong during transformational times, using the Asian custom of passing items of worth to customers. This strategy increased customer retention rates. Lindstrom also tapped into the tribal need of tween and teenage girls by revising the strategy of Switzerland-based fashion brand Tally Weijl. He created a Wi-Fi-enabled smart mirror for young shoppers to share outfit photos on Facebook, allowing others to virtually vote on their choices.

Lindstrom leveraged the concept of "entry points" to boost customer retention rates in various industries. He used the concept of weight loss as a transformational moment to present free charm bracelets to dieters, symbolizing success, experience, and tribal belonging. He also tapped into the desire-motivated "Twin Selves" of consumers, which are those who desire things they once dreamed of but lost or never had. These contexts or experiences influence behavior by prompting individuals to become someone or something else. For example, he created a live-streamed event on a floating island to embody happier, sexier, and freer versions of himself. He also used the “Twin-Self” concept to create a brand image for a Chinese car, focusing on the driver's Twin Selves and creating a powerful, fast, and male car.

The power of narrative can help consumers connect with brands, as demonstrated by Steve Jobs' redesign of Tally Weijl and Devassa's use of brand ambassadors. By creating cohesive narratives, brands can resonate with consumers' stories about themselves, allowing them to resonate with their target audience. To conduct subtext research, follow the "7C's": collect baseline perspectives, focus on clues, connecting, cause, correlation, compensation, and concept. By understanding the emotions and shifts in consumer behavior, brands can better understand their target audience and develop strategies to compensate for what they feel their lives lack. By cultivating a more objective understanding of their inner motivations and desires, brands can better assess those of others, ultimately fostering a stronger connection with their customers.

Saturday, August 31, 2024

A self organizing map algorithm for scheduling meeting times as availabilities and bookings. A map is a low-dimensional representation of a training sample comprising of elements e. It is represented by nodes n. The map is transformed by a regression operation to modify the nodes position one element from the model (e) at a time. With preferences translating to nodes and availabilities as elements, this allows the map to start getting a closer match to the sample space with each epoch/iteration.

from sys import argv

import numpy as np

from io_helper import read_xyz, normalize

from neuron import generate_network, get_neighborhood, get_boundary

from distance import select_closest, euclidean_distance, boundary_distance

from plot import plot_network, plot_boundary

def main():

if len(argv) != 2:

print("Correct use: python src/main.py <filename>.xyz")

return -1

problem = read_xyz(argv[1])

boundary = som(problem, 100000)

problem = problem.reindex(boundary)

distance = boundary_distance(problem)

print('Boundary found of length {}'.format(distance))

def som(problem, iterations, learning_rate=0.8):

"""Solve the xyz using a Self-Organizing Map."""

# Obtain the normalized set of timeslots (w/ coord in [0,1])

timeslots = problem.copy()

# print(timeslots)

#timeslots[['X', 'Y', 'Z']] = normalize(timeslots[['X', 'Y', 'Z']])

# The population size is 8 times the number of timeslots

n = timeslots.shape[0] * 8

# Generate an adequate network of neurons:

network = generate_network(n)

print('Network of {} neurons created. Starting the iterations:'.format(n))

for i in range(iterations):

if not i % 100:

print('\t> Iteration {}/{}'.format(i, iterations), end="\r")

# Choose a random timeslot

timeslot = timeslots.sample(1)[['X', 'Y', 'Z']].values

winner_idx = select_closest(network, timeslot)

# Generate a filter that applies changes to the winner's gaussian

gaussian = get_neighborhood(winner_idx, n//10, network.shape[0])

# Update the network's weights (closer to the timeslot)

network += gaussian[:,np.newaxis] * learning_rate * (timeslot - network)

# Decay the variables

learning_rate = learning_rate * 0.99997

n = n * 0.9997

# Check for plotting interval

if not i % 1000:

plot_network(timeslots, network, name='diagrams/{:05d}.png'.format(i))

# Check if any parameter has completely decayed.

if n < 1:

print('Radius has completely decayed, finishing execution',

'at {} iterations'.format(i))

break

if learning_rate < 0.001:

print('Learning rate has completely decayed, finishing execution',

'at {} iterations'.format(i))

break

else:

print('Completed {} iterations.'.format(iterations))

# plot_network(timeslots, network, name='diagrams/final.png')

boundary = get_boundary(timeslots, network)

plot_boundary(timeslots, boundary, 'diagrams/boundary.png')

return boundary

if __name__ == '__main__':

main()

Reference:

https://github.com/raja0034/som4drones

#codingexercise

https://1drv.ms/w/s!Ashlm-Nw-wnWhPBaE87l8j0YBv5OFQ?e=uCIAp9

This is a summary of the book titled the “ESG Mindset” written by Matthew Sekol and published by Kogan Page in 2024. The author evaluates “Enterprise, Social and Governance” aka ESG practices for long-term sustainability of corporations and their challenge to the corporate culture The author finds that deployments can raise issues which might affect transformation and growth, and most companies interpret these practices to suit their needs. This poses a challenge to even a standard definition and acceptance of associated norms. Leaders also are quick to get at the intangible behind these practices by cutting them to the simplest form which risks diluting its relevance. The author concludes that to realize ESG mindset fully, the companies must be committed to go all the way. He asserts that these practices are not merely data. But technology is the invisible “fourth” pillar in ESG. There is demonstrated success in the campaigns of companies that have embraced ESG, but the mindset goes beyond operations. As with most practices in the modern world, ESG must remain flexible.

Environmental, Social, and Governance (ESG) practices are rooted in Corporate Social Responsibility (CSR) and Socially Responsible Investing (SRI) but differentiated itself by 2004 with its broader definition of "material value" and willingness to deal with intangibles. ESG is difficult to define as it links intangible values with material results. Companies must align their ESG mindset to manage crises in an increasingly complex world. ESG is not merely data, but requires companies to prioritize, interpret, and communicate their data to stakeholders. Companies must inventory their "data estate" by reviewing internal and external data sets to ensure transparency and sustainability. Challenges faced by companies include global emissions increasing by 70% between 1970 and 2004, climate change, and public pressure from stakeholders. Publicly traded companies can provide guidelines on how their boards make decisions, including those involving ESG or affecting stakeholders.

Globalization has led to systemic issues such as child welfare, climate change, forced labor, equity, and justice, resulting in crises. Boards must shift their decision-making practices from short-term to long-term to pursue their material goals. Technology, such as blockchain, the metaverse, and generative AI, can support ESG transformation by solving problems and facilitating goals. However, companies must modernize legacy technology, break down internal silos, and solve complex cultural fears of change. Technology also produces data that is integral to ESG analysis and decision-making, but it exposes companies to cybersecurity risks. Critics and controversy can hinder ESG, especially in the United States, where polarization and activism from both the left and right complicate the issues ESG already faces. Companies must collaborate to ensure ESG's relevance and address the accuracy and fairness of ESG scores.

ESG pillars interconnect and can be analyzed to uncover new issues and improve resilience in a crisis. Companies must recognize that long-term interconnected crises will become material to every company over time. Changes addressing systemic problems can influence both internal workings and external stakeholders. Companies like PepsiCo, Lego, and Target have successfully leveraged their investment in ESG goals in various ways. PepsiCo founded the Beverage Industry Environmental Roundtable (BIER) to address systemic industry issues, particularly around water use. Lego committed to switching to sustainable materials by 2030, while Target leveraged the social pillar of ESG by hiring a diverse workforce and practicing community outreach. Paramount aligned stakeholder engagement with its core product, storytelling, demonstrating its commitment to addressing systemic issues with an ESG mindset. The ESG mindset goes beyond operations, as large-scale disruptions in the Environmental and Social dimensions may leave businesses struggling to react. Companies can leverage their ESG goals while remaining profitable through B Corps, value chain improvements, and industry collaboration.

ESG must adapt to a complex and volatile world, addressing systemic issues, intangible value, and global economic development. Companies must move from the following data to promoting measurable change. Technology can help address complexity but requires stakeholder buy-in and coordination. Companies face pressure to standardize ESG goals, define the ESG mindset, and demonstrate how to implement it, especially in the face of political agendas and pushback against DEI programs.

It is interesting that there can be so many parallels to draw between organizations and data science projects from an ESG perspective. The same sets of benefits and challenges apply to the long-term sustainability of these projects and charters. It is not just about the analysis, operations and predictions but also how it is presented to stakeholders.

Friday, August 30, 2024

DevOps for IaC

As with any DevOps practice, the principles on which they are founded must always include a focus on people, process, and technology. With the help of Infrastructure-as-a-code and blueprints, resources, policies, and accesses can be packaged together and become a unit of provisioning the environment.

The DevOps Adoption RoadMap has evolved over time. What used to be Feature Driven Development around 1999 gave way to Lean thinking and Lean software development around 2003, which was followed by Product development flows in 2009 and Continuous Integration/Delivery in 2010. The DevOps Handbook and the DevOps Adoption Playbook are recent as of the last 5-6 years. Principles that inform practices that resolve challenges also align accordingly. For example, the elimination of risk happens with automated testing and deployments, and this resolves the manual testing, processes, deployments, and releases.

The people involved in bringing build and deployments to the cloud and making use of them instead of outdated and cumbersome enterprise systems must be given roles and clear separation of responsibility. For example, developers can initiate the promotion of code package to the next environment but only a set of people other than the developers must allow it to propagate to production systems and with signoffs. Fortunately, this is well-understood and there is existing software such as ITSM, ITBM, ITOM and CMDB. These are fancy acronyms for situations such as:

1. If you have a desired state you want to transition to, use a workflow,

2. If you have a problem, open a service ticket.

3. If you want orchestration and subscribe to events, use events monitoring and alerts.

4. If you want a logical model of the inventory, use a configuration management database.

Almost all IT businesses are concerned about ITOM such as with alerts and events, ITSM such as with incidents and service requests, and intelligence in operations. The only difference is that they have not been used or made available for our stated purposes, but this is still a good start.

The process that needs to be streamlined is unprecedented at this scale and sensitivity. The unnecessary control points, waste and overhead must be removed, and usability must be one of the foremost considerations for improving adoption.

The technology is inherently different between cloud and enterprise. While they have a lot in common when it comes to principles of storage, computing and networking, the division and organization in the cloud has many more knobs and levers that require due diligence.

These concerns around people, process and technology are what distinguishes and makes this landscape so fertile for improvements.

Thursday, August 29, 2024

Technical Debt in IaC:

A case study might be a great introduction to this subject. A team in an enterprise wanted to set up a new network in compliance with the security standards of the organization and migrate resources from the existing network to the new one. When they started out allocating subnets from the virtual network address space and deploying the first few resources such as an analytical workspace and its dependencies, they found that the exact same method provisioning for the old network did not create a resource that was at par with the functionality of the old one. For example, a compute instance could not be provisioned into the workspace in the new subnet because there was an error message that said, “could not get workspace info, please check the virtual network and associated rules”. It turned out that subnets were created with an old version of its definition from the IaC provider and lacked the new settings that were introduced more recently and were required for compatibility with the recent workspace definitions also published by the same IaC provider. The documentation on the IaC provider’s website suggests that the public cloud that provides those resources had introduced breaking changes and newer versions required newer definitions. This forced the team to update the subnet definition in its IaC to the most recent from the provider and redo all the allocations and deployments after a tear down. Fortunately, the resources introduced to the new virtual network were only pilots and represented a tiny fraction of the bulk of the resources supporting the workloads to migrate.

Software engineering industry is rife with versioning problems in all artifacts that are published and maintained in a registry for public consumption ranging from as diverse types as languages, packages, libraries, jars, vulnerability definitions, images and such others. In the IaC, the challenge is somewhat different because deployments are usually tiered and the priority and severity of a technical debt differs from case to case with infrastructure teams maintaining a wide inventory of deployments, their constituent resources and customers. It just so happens in this example that the failures are detected early, and the resolutions are narrow and specific, otherwise rehosting and much less restructuring are not easy tasks because they require complex deployments and steps.

While cost estimation, ROI and planning are as usual to any software engineering upgrades and project management, we have the advantage of breaking down deployments and their redeployments into contained boundaries so that they can be independently implemented and tested. Scoping and enumerating dependencies come with this way of handling the technical debt in IaC. A graph of dependencies between deployments can be immensely helpful to curate for efforts – both now and in the near future. Sample way of determining this co

Wednesday, August 28, 2024

# REQUIRES -Version 2.0

Synopsis: The following Powershell script serves as a partial example

towards backup and restore of an AKS cluster.

The concept behind this form of BCDR solution is described here:

https://learn.microsoft.com/en-us/azure/backup/azure-kubernetes-service-cluster-backup-concept

param (

[Parameter(Mandatory=$true)][string]$resourceGroupName,

[Parameter(Mandatory=$true)][string]$accountName,

[Parameter(Mandatory=$true)][string]$subscriptionId,

[Parameter(Mandatory=$true)][string]$aksClusterName,

[Parameter(Mandatory=$true)][string]$aksClusterRG,

[string]$backupVaultRG = "testBkpVaultRG",

[string]$backupVaultName = "TestBkpVault",

[string]$location = "westus",

[string]$containerName = "backupc",

[string]$storageAccountName = "sabackup",

[string]$storageAccountRG = "rgbackup",

[string]$environment = "AzureCloud"

)

Connect-AzAccount -Environment "$environment"

Set-AzContext -SubscriptionId "$subscriptionId"

$storageSetting = New-AzDataProtectionBackupVaultStorageSettingObject -Type LocallyRedundant -DataStoreType OperationalStore

New-AzDataProtectionBackupVault -ResourceGroupName $backupVaultRG -VaultName $backupVaultName -Location $location -StorageSetting $storageSetting

$TestBkpVault = Get-AzDataProtectionBackupVault -VaultName $backupVaultName

$policyDefn = Get-AzDataProtectionPolicyTemplate -DatasourceType AzureKubernetesService

$policyDefn.PolicyRule[0]. Trigger | fl

ObjectType: ScheduleBasedTriggerContext

ScheduleRepeatingTimeInterval: {R/2023-04-05T13:00:00+00:00/PT4H}

TaggingCriterion: {Default}

$policyDefn.PolicyRule[1]. Lifecycle | fl

DeleteAfterDuration: P7D

DeleteAfterObjectType: AbsoluteDeleteOption

SourceDataStoreObjectType : DataStoreInfoBase

SourceDataStoreType: OperationalStore

TargetDataStoreCopySetting:

New-AzDataProtectionBackupPolicy -ResourceGroupName $backupVaultRG -VaultName $TestBkpVault.Name -Name aksBkpPolicy -Policy $policyDefn

$aksBkpPol = Get-AzDataProtectionBackupPolicy -ResourceGroupName $backupVaultRG -VaultName $TestBkpVault.Name -Name "aksBkpPolicy"

Write-Host "Installing Extension with cli"

az k8s-extension create --name azure-aks-backup --extension-type microsoft.dataprotection.kubernetes --scope cluster --cluster-type managedClusters --cluster-name $aksClusterName --resource-group $aksClusterRG --release-train stable --configuration-settings blobContainer=$containerName storageAccount=$storageAccountName storageAccountResourceGroup=$storageAccountRG storageAccountSubscriptionId=$subscriptionId

az k8s-extension show --name azure-aks-backup --cluster-type managedClusters --cluster-name $aksClusterName --resource-group $aksClusterRG

az k8s-extension update --name azure-aks-backup --cluster-type managedClusters --cluster-name $aksClusterName --resource-group $aksClusterRG --release-train stable --config-settings blobContainer=$containerName storageAccount=$storageAccountName storageAccountResourceGroup=$storageAccountRG storageAccountSubscriptionId=$subscriptionId # [cpuLimit=1] [memoryLimit=1Gi]

az role assignment create --assignee-object-id $(az k8s-extension show --name azure-aks-backup --cluster-name $aksClusterName --resource-group $aksClusterRG --cluster-type managedClusters --query identity.principalId --output tsv) --role 'Storage Account Contributor' --scope /subscriptions/$subscriptionId/resourceGroups/$storageAccountRG/providers/Microsoft.Storage/storageAccounts/$storageAccountName

az aks trustedaccess rolebinding create \

-g $aksClusterRG \

--cluster-name $aksClusterName\

–n randomRoleBindingName \

--source-resource-id $TestBkupVault.Id \

--roles Microsoft.DataProtection/backupVaults/backup-operator

Write-Host "This section is detailed overview of TrustedAccess"

az extension add --name aks-preview

az extension update --name aks-preview

az feature register --namespace "Microsoft.ContainerService" --name "TrustedAccessPreview"

az feature show --namespace "Microsoft.ContainerService" --name "TrustedAccessPreview"

az provider register --namespace Microsoft.ContainerService

# Create a Trusted Access RoleBinding in an AKS cluster

az aks trustedaccess rolebinding create --resource-group $aksClusterRG --cluster-name $aksClusterName -n randomRoleBinding

Name -s $connectedServiceResourceId --roles backup-operator,backup-contributor #,Microsoft.Compute/virtualMachineScaleSets/test-node-reader,Microsoft.Compute/virtualMachineScaleSets/test-admin

Write-Host "Update an existing Trusted Access Role Binding with new roles"

# Update RoleBinding command

az aks trustedaccess rolebinding update --resource-group $aksClusterRG --cluster-name $aksClusterName -n randomRoleBindingName --roles backup-operator,backup-contributor

Write-Host "Configure Backup"

$sourceClusterId = "/subscriptions/$subscriptionId/resourcegroups/$aksClusterRG /providers/Microsoft.ContainerService/managedClusters/$aksClusterName"

Write-Host "Snapshot resource group"

$snapshotRG = "/subscriptions/$subscriptionId/resourcegroups/snapshotrg"

Write-Host "The configuration of backup is performed in two steps"

$backupConfig = New-AzDataProtectionBackupConfigurationClientObject -SnapshotVolume $true -IncludeClusterScopeResource $true -DatasourceType AzureKubernetesService -LabelSelector "env=$environment"

$backupInstance = Initialize-AzDataProtectionBackupInstance -DatasourceType AzureKubernetesService -DatasourceLocation $dataSourceLocation -PolicyId $aksBkpPol.Id -DatasourceId $sourceClusterId -SnapshotResourceGroupId $snapshotRG -FriendlyName "Backup of AKS Cluster $aksClusterName" -BackupConfiguration $backupConfig

Write-Host "Assign required permissions and validate"

$aksCluster = $(Get-AzAksCluster -Id $sourceClusterId)

Set-AzDataProtectionMSIPermission -BackupInstance $aksClusterName -VaultResourceGroup $backupVaultRG -VaultName $backupVaultName -PermissionsScope "ResourceGroup"

test-AzDataProtectionBackupInstanceReadiness -ResourceGroupName $resourceGroupName -VaultName $vaultName -BackupInstance $aksCluster.Property

Write-Host "Protect the AKS cluster"

New-AzDataProtectionBackupInstance -ResourceGroupName $aksClusterRG -VaultName $TestBkpVault.Name -BackupInstance $aksCluster.Property

Write-Host "Run on-demand backup"

$instance = Get-AzDataProtectionBackupInstance -SubscriptionId $subscriptionId -ResourceGroupName $backupVaultRG -VaultName $TestBkpVault.Name -Name $aksClusterName

Write-Host "Specify Retention Rule"

$policyDefn.PolicyRule | fl

BackupParameter: Microsoft.Azure.PowerShell.Cmdlets.DataProtection.Models.Api20210201Preview.AzureBackupParams

BackupParameterObjectType: AzureBackupParams

DataStoreObjectType: DataStoreInfoBase

DataStoreType: OperationalStore

Name: BackupHourly

ObjectType: AzureBackupRule

Trigger: Microsoft.Azure.PowerShell.Cmdlets.DataProtection.Models.Api20210201Preview.ScheduleBasedTriggerContext

TriggerObjectType: ScheduleBasedTriggerContext

IsDefault: True

Lifecycle: {Microsoft.Azure.PowerShell.Cmdlets.DataProtection.Models.Api20210201Preview.SourceLifeCycle}

Name: Default

ObjectType: AzureRetentionRule

Write-Host "Trigger on-demand backup"

$AllInstances = Get-AzDataProtectionBackupInstance -ResourceGroupName $backupVaultRG -VaultName $TestBkpVault.Name

Backup-AzDataProtectionBackupInstanceAdhoc -BackupInstanceName $AllInstances[0].Name -ResourceGroupName $backupVaultRG -VaultName $TestBkpVault.Name -BackupRuleOptionRuleName "Default"

Write-Host "Tracking all the backup jobs"

$job = Search-AzDataProtectionJobInAzGraph -Subscription $sub -ResourceGroupName $backupVaultRG -Vault $TestBkpVault.Name -DatasourceType AzureKubernetesService -Operation OnDemandBackup