Cluster computing

Wednesday, August 17, 2022

This is a continuation of a series of articles on hosting solutions and services on Azure public cloud with the most recent discussion on Multitenancy here. The previous articles introduced serverless batch processing with durable functions in Azure Container Instances. This article mentions some of the Azure Container Instances.

The components of this approach include an orchestration function as a durable function that orchestrates and performs the ACI container and app deployment, monitoring and cleanup, an activity function that is also a durable function and creates the ACI container group and instance, an Azure Container Registry that stores batch processing app in a container image, Azure container instances which run the batch processing jobs, Azure Active Directory and Managed Service Identity that are needed to manage the Azure Container Instances and Application Insights that monitors the job progress.

These components enable a series of durable functions to be connected, managed, monitored and retired. When the job completes. The batch processing job invokes the orchestrator function by raising an external event and provides job status Completed or Failed. Depending on the job status, the orchestrator function stops, restarts, or deletes the container group.

These serve for scenarios for using multitenant serverless batch processing are ones where workloads are simple and use only one container image. Another use case might be the case where computing needs vary depending on each individual job. Multi-tenant scenarios where some tenants need large computing power and other tenants have small computing requirements represent hybrid requirements and serverless batch processing can help in this regard.

Some considerations apply towards these batch processing scenarios. Long term stable workloads are better done by orchestrating containers in a cluster of dedicated virtual machines rather than on Azure Container Instances. However, ACI can quickly expand and contract the overall capacity to meet surges and peak traffic. In addition to variable load, it is also the most efficient, fast and cost-effective way to scale the number of instances.

On dedicated Virtual machines, rather than scale out the number of VMs and then deploy more containers onto those Virtual Machines, durable functions can be used to schedule and manage the container deployment and deletion. ACI enables a layered approach to orchestration by providing all of the scheduling and management to run single containers, allowing orchestrator platforms to manage multi-container tasks and architectures like scaling and coordinated upgrades.

Int match(char* regexp, char* text)

{

If (regexp[0] == ‘^’)

Return matchHere(regexp+1, text);

Do{

If (matchHere(regexp, text)

return 1;

} while(*text++ != ‘/0’);

Return 0;

}

Int matchHere(char* regexp, char* text)

{

If (regexp[0] == ‘/0’)

return 1;

if (regexp[1] == ‘*’)

return matchStar(regexp[0], regexp+2, text);

if (regexp[0] == ‘$’ || regexp[1] == ‘/0’)

return *text == ‘/0’;

if (*text != ‘/0’ && (regexp[0] == ‘.’ || *regexp == *text))

return matchHere(regexp+1, text+1);

return 0;

}

Int matchStar(int c, char* regexp, char* text)

{

If (matchHere(regexp, text))

Return 1;

}while (*text != ‘/0’ && (*text++ == c || c == '.'));

Return 0;

}

Reference: https://1drv.ms/w/s!Ashlm-Nw-wnWhLYK8RjQu87av0PAPA for this article.

Tuesday, August 16, 2022

This is a continuation of a series of articles on hosting solutions and services on Azure public cloud with the most recent discussion on Multitenancy here. The previous articles discussed Azure Arc instances and this one introduces serverless batch processing with durable functions in Azure Container Instances.

Durable Functions is an extension of Azure Functions that lets us write stateful functions in a serverless compute environment. It involves the use of an orchestration function to orchestrate the execution of other Durable functions within a function app. Orchestration functions define function workflows using procedural code without declarative schema or designers. Functions can call other durable functions synchronously and asynchronously and output can be saved to variables. They are durable and reliable because the execution is automatically checkpointed when the function awaits or yields. Local state is not lost during reboots or failures. They can be long running. Durable functions might also involve entity functions which define operations for reading and updating small pieces of state, known as durable entities. Like orchestrator functions, entity functions are functions with a special trigger type and manage the state of an entity explicitly, rather than implicitly representing state via control flow. Entities provide a means for scaling out applications by distributing the work across many entities, each with a modest sized state. This extension manages state, checkpoints and restarts behinds the scenes.

Durable functions can be used to schedule, manage and deploy serverless batch processing jobs in Azure Container Instances. Containers are popular for packaging, deploying and managing code hosted on orchestration frameworks such as Azure Kubernetes Service and Azure Service Fabric.

The scenarios for using multitenant serverless batch processing are ones where workloads are simple and use only one container image. Another use case might be the case where computing needs vary depending on each individual job. Multi-tenant scenarios where some tenants need large computing power and other tenants have small computing requirements represent hybrid requirements and serverless batch processing can help in this regard.

The architecture almost always involves a series of durable functions used with a Managed Service Identity. The batch processing job is packaged into a container image stored in a Azure Container registry. An Http Trigger invokes the orchestration function to orchestrate the deployment. An activity function uses the container image stored in the ACR to create an ACI container in a container group. An orchestration function uses the container URL to call and start the batch processing job and to monitor the job progress. When the job completes, the orchestration function raises an external event and provides job status Completed or Failed. Depending on the job status, the orchestration function stops, restarts or deletes the container group. Variations of this architecture involve the use of restart policies to control the container instances and the use of a full-fledged orchestration framework to manage complex multi-container tasks and interactions.

Monday, August 15, 2022

This is a continuation of a series of articles on hosting solutions and services on Azure public cloud with the most recent discussion on Multitenancy here This article continues to discuss troubleshooting the Azure Arc instance but introduces common experiences across Azure Arc-enabled data services.

One of the commonly encountered scenarios, is resource browsing. The Azure Data Studio provides an experience similar to the Azure portal for viewing information but it comes useful in cases where we don’t have a connection available to Azure. This requires both the Azure Data Studio and the Azure Arc extension to be installed. The connection is made to the controller and the namespace is entered for the data controller. The Azure Data Studio reads from the kube.config file in the default directory and lists the available Kubernetes cluster context.

The data controller lists the details on the resources such as name, region, connection mode, resource group, subscription, controller endpoint, and namespace. It has also links to open the resource in Azure Portal.

The SQL managed instance dashboard allows us to manage those instances. The overview shows the resource group, data controller, subscription ID, status, region, and other information. This location also links to the Grafana dashboard for viewing metrics or the Kibana dashboard for viewing logs.

Connection strings are also made available to developers and applications.

The PostgreSQL Hyperscale server group dashboard shows details about the server group such as resource group, data controller, subscription ID, status, region and more. The properties and resource health tabs display additional information and there is an option to diagnose and solve problems.

Billing data can be uploaded to and viewed from the Azure Portal. This depends on the connectivity mode of the instance whether it is indirectly connected or directly connected. The indirect mode does not have an automatic export. It is periodically exported and uploaded to Azure and processed. The process of exporting and uploading the data can be automated via scripts.

The guidance for uploading billing data mentions installation of tools such as Azure CLI and arcdata extension. The resource providers must be registered and the service principal must be created.

Assign roles to the service principal.

The billing data can be viewed from the Azure Portal. The Cost Management tab shows the cost analysis by resource. Filters are available to narrow down the analysis.

Once the billing data is downloaded, it can be exported with the Exports tab.

The Azure Portal is also available to browse the resources when there is a connection between them.Decommissioning a resource depends on whether it is directly connected or indirectly connected.

#codingexercise

when two nodes of a BST are swapped, they can be found by:

void InOrderTraverse(Node root, ref prev, ref List<Node> pairs)

{

if (root == null) return;

InOrderTraverse(root.left, ref prev, ref pairs);

If (prev && root.data < prev.data) pairs.add(root);

InOrderTraverse(root.right, ref root, ref pairs);

}

Sunday, August 14, 2022

Data transmitted from the Azure Arc data services can be tremendously helpful to management of resources. The ones used by Azure Arc enabled services may include: SQL MI – Azure Arc, PostgreSQL HyperScale – Azure Arc, Azure Data Studio, Azure CLI (az) and Azure Data CLI (azdata). When a cluster is configured to be directly connected to Azure, some data is automatically transmitted to Microsoft. Operational data from Metrics and Logs is automatically uploaded. Billing and Inventory data such as number of instances, and usages such as vCores consumed is automatically sent to Microsoft and is required from instances. Diagnostics information for troubleshooting purposes is not automatically sent. They must be sent on-demand. But Customer Experience Improvement Program (CEIP) Summary is automatically sent but it must be opted.

When a cluster is not configured to be directly connected to Azure, it does not automatically transmit operational, or billing and inventory data to Microsoft. Data can be transmitted to Microsoft when it is configured to be exported. The data and mechanisms are similar to that in the directly connected mode. CEIP summary, if allowed can be automatically transmitted.

Metrics include performance and capacity related metrics, which are collected to an InfluxDB provided as part of Azure-Arc enabled data services and these can be viewed on a Grafana dashboard. This is customary for many Kubernetes products.

Logs emitted by all components are collected to an ElasticSearch database also provided as part of Azure Arc enabled data services. These logs can be viewed on the Kibana dashboard.

If the data is sent to Azure Monitor or Log Analytics, the destination region/zone can be specified and access to view can be granted to other regions.

Billing data is used for all the resources which can be categorized into three types: Azure Arc enabled SQL managed instances, PostgreSQL Hyperscale server group, SQL Server on Azure Arc enabled servers and Data controller. Every database instance and the data controller itself will be reflected in Azure as an Azure resource in the Azure Resource Manager.

The JSON data pertaining to a resource has attributes such as customObjectName, uid, instanceName, instanceNamespace, instanceType, location, resourceGroupName, subscriptionId, isDeleted, externalEndpoint, vCores, createTimestamp, and updateTimestamp.

Diagnostic data has attributes such as Error Logs which include log files capturing errors and these are restricted and shared by user. Attributes also include DMVs which can contain query and query plans but are restricted and shared by users, Views that can contain customer data but are restricted and shared by only users, Crash dumps involving customer data which has a maximum of 30 day retention of crash dumps, and statistics objects and crash dumps involving personal data which has machine names, login names, emails, locations and other identifiable information.

#codingexercise

Check if path from root to leaf matches a given value in a binary tree.

bool hasPathSum(Node root, int sum)

{

if (root ==null) return sum == 0;

int newsum = sum-root.data;

if (newsum == 0 && root.left == null && root.right == null) return true;

return hasPathSum(root.left, newsum) || hasPathSum(root.right, newsum);

}

Saturday, August 13, 2022

A string S containing only the letters "A", "B" and "C" is given. The string can be transformed by removing one occurrence of "AA", "BB" or "CC".

Transformation of the string is the process of removing letters from it, based on the rules described above. As long as at least one rule can be applied, the process should be repeated. If more than one rule can be used, any one of them could be chosen.

Write a function:

class Solution { public String solution(String S); }

that, given a string S consisting of N characters, returns any string that can result from a sequence of transformations as described above.

For example, given string S = "ACCAABBC" the function may return "AC", because one of the possible sequences of transformations is as follows:

Also, given string S = "ABCBBCBA" the function may return "", because one possible sequence of transformations is:

Finally, for string S = "BABABA" the function must return "BABABA", because no rules can be applied to string S.

Write an efficient algorithm for the following assumptions:

the length of string S is within the range [0..50,000];

string S consists only of the following characters: "A", "B" and/or "C".

string getReduced(string prefix, string suffix)

{

bool fix = true;

while(fix)

{

if (string.IsNullOrWhitespace(suffix))

{fix = false; break;}

int I = 0;

while (i+1 < suffix.length && suffix[i] == suffix[i+1])

{

suffix = suffix.substring(I+2);

Fix = true;

}

If (fix) continue;

While( prefix.length > 0 && prefix.last() == suffix.first())

{

prefix = prefix.length-1;

suffix = suffix.substring(1, suffix.length-1);

. Fix = true;

}

If(fix) continue;

Prefix = prefix + suffix.first();

Suffix = suffix.substring(1, suffix.length-1);

Fix = true;

}

return prefix+suffix;

}

Friday, August 12, 2022

Logs emitted by all components are collected to an ElasticSearch database also provided as part of Azure Arc enabled data services. These logs can be viewed on the Kibana dashboard.

If the data is sent to Azure Monitor or Log Analytics, the destination region/zone can be specified and access to view can be granted to other regions.