Cluster computing

Saturday, August 20, 2022

This is a continuation of a series of articles on hosting solutions and services on Azure public cloud with the most recent discussion on Multitenancy here. The previous articles introduced multitenancy via hyperconvergence.  This article explores it some more.

Hyperconverged infrastructure helps multitenancy by providing high consolidation density. When there are more pluggable tenants on a single platform, there are some benefits but there also some barriers. There are ways to overcome the barriers and operating problems. One of the way to do that is by leveraging software-defined resources so that the setup and tear-down are configurable, easy and fast. Even patching and upgrade can be rapid by virtue of the consistency across tenants. Finally, multiple instances can be managed as one. The multitenant architecture thereby provides a single pane of glass for management across resources and their containers.

Till now we have seen several examples of Microsoft technologies-based multitenancy considerations but for hyperconvergence, let us take a look at technologies that specifically cover Infrastructure-as-a-service. The VMWare storage architecture, for example, delivers HCI via one of two options: First, bolting storage software onto a hypervisor, and building storage into the hypervisor. The bolt-on approach runs third-party storage software in virtual machines that sit on top of a hypervisor. This is an easier approach, but it comes with the following limitations: 1. Excessive resource usage, 2. Lower performance and longer latencies and 3. Hybrid and multiple environments with limited integration. The other approach is the built-in were the storage software is in kernel or built directly into the hypervisor. The convergence does not happen on a hypervisor using a virtual appliance but instead happens inside the hypervisor.

The advantages of this approach include 1. reduced resource usage, 2. Better performance and lower latencies and 3. Tight integration enabling end-to-end management from a single tool and a simplified operational model. The advantages of a built-in hyperconverged storage are 1) there is no need to dedicate certain virtual central processing units or a virtual storage appliance on a per-host basis, 2. CPU resources which are used only when they are needed and CPUs don’t need to be reserved for the worst case scenario and 3. CPU cycles go through one stack comprising of just the hypervisor instead of both they hypervisor and the guest operating system.

VMWare does this by providing compute, storage, networking and management on a single integrated layer of software that runs on industry-standard Intel based hardware. This helps to radically simplify the data center. Fully configured x86 servers have VMWare virtual SAN, vSphere and vCenter installed which then provide a single layer on which VMs can be hosted.

Some of the considerations towards the appropriateness of a HCI for business needs include: licensing and support, use of embedded storage or virtual storage appliance, combination and scalability with hybrid resources, native backup and disaster recovery capabilities and integration with cloud computing.

Friday, August 19, 2022

This is a continuation of a series of articles on hosting solutions and services on Azure public cloud with the most recent discussion on Multitenancy here. The previous articles introduced serverless batch processing with durable functions in Azure Container Instances. This article introduces multitenancy via hyperconvergence.

Hyperconverged infrastructure integrates compute, storage, networking, virtualization and automation into a single system that is delivered via an appliance or software that can be installed on an existing hardware.

The advantages of a hyperconverged over conventional infrastructure include the following:

1. It increases agility for IT systems to be brought online and scaled out to support dynamic businesses. When new resources are needed, a hyperconverged infrastructure can transition from power on to provisioning readiness within a short time.

2. It reduces upfront capital and operational costs because there are fewer components that are required in conventional infrastructures. There are no more specialists needed for each discipline as there were with conventional datacenters and which made integration harder. The holistic and integrated approach in hyperconverged saves costs.

3. This system is prebuilt and workload optimized which reduces complexity. The blocks of architecture can be easily assembled by “snapping” them together and this permits scaling seamlessly.

4. By providing one platform, it helps to transition applications and resources to the cloud. The deployment of applications to a hyperconverged infrastructure paves the way for deploying to cloud as cloud native applications and reduces a number of concerns from the applications.

5. This system is performant and resilient and it is quick and affordable way to modernize IT infrastructure with it. This helps businesses operate securely, and efficiently.

Virtualization is a tenet of most hyperconverged systems. The hypervisor included for virtualization enables multiple operating systems to be hosted and enable nesting. Clusters are deployed to distribute operational functions across nodes and enable scalability with the addition of new nodes.

Hyperconverged infrastructure is sometimes referred to as a datacenter in a box because the initial cabling and minimal networking configuration. A single vendor can now provide servers, storage and the hypervisor making it easier to support, patch, upgrade, and manage the resources. This reduces costs, time to deploy and training requirements for personnel.

Since the storage is directly embedded, many inefficiencies from legacy protocols are avoided. It also improves efficiency and performance. Manageability improves with a native hypervisor.

Thursday, August 18, 2022

This is a continuation of a series of articles on hosting solutions and services on Azure public cloud with the most recent discussion on Multitenancy here. The previous articles introduced serverless batch processing with durable functions in Azure Container Instances. This article mentions some of the tenancy considerations in Azure Active Directory.

Azure Active Directory organizes objects like users and applications into groups called tenants. These are security and operational management units from the Azure Active Directory’s point of view. Organizations can set policies on the users pertaining to a tenant and on the applications that the organization owns. Developers can choose to configure applications to be either single-tenant or multi-tenant during the app registration process in the Azure Portal. Single tenants applications are bound to the home tenant.Multi-tenant applications are available to users across tenants. Azure Active Directory uses the audience property to set configure single-tenancy or multi-tenancy. In the single-tenant mode, all the accounts are in this directory only where the users and guest accounts in the directory can use the application or API. This is the preferred choice for an audience internal to an organization. In the multi-tenant mode such as for schools and businesses using Microsoft 365, all users and guests can use the application or API even if the accounts are in any Azure AD directory. It is also possible to target an audience that wants to use both work and personal accounts with the multitenant

Setting and this is the preferred choice for the widest possible audience involving Microsoft accounts.

When applications are added to Azure AD, there are two representations: 1. Application Objects – which are the proper definitions of applications, and 2. Service principals which can be considered as representative of applications but there can be multiple representatives for the same application object. Applications can be managed in the Azure Portal through the App registrations experience.

Application objects can be created using application registrations in the Azure portal, creating new applications using Visual Studio and configured to use Azure AD authentication, adding one from the application store, or using Graph API or PowerShell to register or via developer centers. Service principals can be created when users sign into a third-party application integrated with Azure AD or when users sign in to Microsoft online services like Microsoft 365, or by adding an application from the application store, or when an application uses the Azure AD Application Proxy, or if it is configured to be part of Single Sign-On, or programmatically created via the Microsoft Graph API or PowerShell.

The best-practices for multi-tenant applications depend on the number of different policies that the IT administrators can set in their tenants. Some practices are applicable generally for multi-tenant applications which include testing the application in a tenant that has conditional access policies configured, following the principle of least user access to ensure that the application only requests permissions it actually needs and providing appropriate names and descriptions for permissions available from the application.

Reference: Multitenancy: https://1drv.ms/w/s!Ashlm-Nw-wnWhLMfc6pdJbQZ6XiPWA?e=fBoKcN  

Wednesday, August 17, 2022

This is a continuation of a series of articles on hosting solutions and services on Azure public cloud with the most recent discussion on Multitenancy here. The previous articles introduced serverless batch processing with durable functions in Azure Container Instances. This article mentions some of the Azure Container Instances.

The components of this approach include an orchestration function as a durable function that orchestrates and performs the ACI container and app deployment, monitoring and cleanup, an activity function that is also a durable function and creates the ACI container group and instance, an Azure Container Registry that stores batch processing app in a container image, Azure container instances which run the batch processing jobs, Azure Active Directory and Managed Service Identity that are needed to manage the Azure Container Instances and Application Insights that monitors the job progress.

These components enable a series of durable functions to be connected, managed, monitored and retired. When the job completes. The batch processing job invokes the orchestrator function by raising an external event and provides job status Completed or Failed. Depending on the job status, the orchestrator function stops, restarts, or deletes the container group.

These serve for scenarios for using multitenant serverless batch processing are ones where workloads are simple and use only one container image. Another use case might be the case where computing needs vary depending on each individual job. Multi-tenant scenarios where some tenants need large computing power and other tenants have small computing requirements represent hybrid requirements and serverless batch processing can help in this regard.

Some considerations apply towards these batch processing scenarios. Long term stable workloads are better done by orchestrating containers in a cluster of dedicated virtual machines rather than on Azure Container Instances. However, ACI can quickly expand and contract the overall capacity to meet surges and peak traffic. In addition to variable load, it is also the most efficient, fast and cost-effective way to scale the number of instances.

On dedicated Virtual machines, rather than scale out the number of VMs and then deploy more containers onto those Virtual Machines, durable functions can be used to schedule and manage the container deployment and deletion. ACI enables a layered approach to orchestration by providing all of the scheduling and management to run single containers, allowing orchestrator platforms to manage multi-container tasks and architectures like scaling and coordinated upgrades.

Int match(char* regexp, char* text)

{

If (regexp[0] == ‘^’)

Return matchHere(regexp+1, text);

Do{

If (matchHere(regexp, text)

return 1;

} while(*text++ != ‘/0’);

Return 0;

}

Int matchHere(char* regexp, char* text)

{

If (regexp[0] == ‘/0’)

return 1;

if (regexp[1] == ‘*’)

return matchStar(regexp[0], regexp+2, text);

if (regexp[0] == ‘$’ || regexp[1] == ‘/0’)

return *text == ‘/0’;

if (*text != ‘/0’ && (regexp[0] == ‘.’ || *regexp == *text))

return matchHere(regexp+1, text+1);

return 0;

}

Int matchStar(int c, char* regexp, char* text)

{

If (matchHere(regexp, text))

Return 1;

}while (*text != ‘/0’ && (*text++ == c || c == '.'));

Return 0;

}

Reference: https://1drv.ms/w/s!Ashlm-Nw-wnWhLYK8RjQu87av0PAPA for this article.

Tuesday, August 16, 2022

This is a continuation of a series of articles on hosting solutions and services on Azure public cloud with the most recent discussion on Multitenancy here. The previous articles discussed Azure Arc instances and this one introduces serverless batch processing with durable functions in Azure Container Instances.

Durable Functions is an extension of Azure Functions that lets us write stateful functions in a serverless compute environment. It involves the use of an orchestration function to orchestrate the execution of other Durable functions within a function app. Orchestration functions define function workflows using procedural code without declarative schema or designers. Functions can call other durable functions synchronously and asynchronously and output can be saved to variables. They are durable and reliable because the execution is automatically checkpointed when the function awaits or yields. Local state is not lost during reboots or failures. They can be long running. Durable functions might also involve entity functions which define operations for reading and updating small pieces of state, known as durable entities. Like orchestrator functions, entity functions are functions with a special trigger type and manage the state of an entity explicitly, rather than implicitly representing state via control flow. Entities provide a means for scaling out applications by distributing the work across many entities, each with a modest sized state. This extension manages state, checkpoints and restarts behinds the scenes.

Durable functions can be used to schedule, manage and deploy serverless batch processing jobs in Azure Container Instances. Containers are popular for packaging, deploying and managing code hosted on orchestration frameworks such as Azure Kubernetes Service and Azure Service Fabric.

The scenarios for using multitenant serverless batch processing are ones where workloads are simple and use only one container image. Another use case might be the case where computing needs vary depending on each individual job. Multi-tenant scenarios where some tenants need large computing power and other tenants have small computing requirements represent hybrid requirements and serverless batch processing can help in this regard.

The architecture almost always involves a series of durable functions used with a Managed Service Identity. The batch processing job is packaged into a container image stored in a Azure Container registry. An Http Trigger invokes the orchestration function to orchestrate the deployment. An activity function uses the container image stored in the ACR to create an ACI container in a container group. An orchestration function uses the container URL to call and start the batch processing job and to monitor the job progress. When the job completes, the orchestration function raises an external event and provides job status Completed or Failed. Depending on the job status, the orchestration function stops, restarts or deletes the container group. Variations of this architecture involve the use of restart policies to control the container instances and the use of a full-fledged orchestration framework to manage complex multi-container tasks and interactions.

Monday, August 15, 2022

This is a continuation of a series of articles on hosting solutions and services on Azure public cloud with the most recent discussion on Multitenancy here This article continues to discuss troubleshooting the Azure Arc instance but introduces common experiences across Azure Arc-enabled data services.

One of the commonly encountered scenarios, is resource browsing. The Azure Data Studio provides an experience similar to the Azure portal for viewing information but it comes useful in cases where we don’t have a connection available to Azure. This requires both the Azure Data Studio and the Azure Arc extension to be installed. The connection is made to the controller and the namespace is entered for the data controller. The Azure Data Studio reads from the kube.config file in the default directory and lists the available Kubernetes cluster context.

The data controller lists the details on the resources such as name, region, connection mode, resource group, subscription, controller endpoint, and namespace. It has also links to open the resource in Azure Portal.

The SQL managed instance dashboard allows us to manage those instances. The overview shows the resource group, data controller, subscription ID, status, region, and other information. This location also links to the Grafana dashboard for viewing metrics or the Kibana dashboard for viewing logs.

Connection strings are also made available to developers and applications.

The PostgreSQL Hyperscale server group dashboard shows details about the server group such as resource group, data controller, subscription ID, status, region and more. The properties and resource health tabs display additional information and there is an option to diagnose and solve problems.

Billing data can be uploaded to and viewed from the Azure Portal. This depends on the connectivity mode of the instance whether it is indirectly connected or directly connected. The indirect mode does not have an automatic export. It is periodically exported and uploaded to Azure and processed. The process of exporting and uploading the data can be automated via scripts.

The guidance for uploading billing data mentions installation of tools such as Azure CLI and arcdata extension. The resource providers must be registered and the service principal must be created.

Assign roles to the service principal.

The billing data can be viewed from the Azure Portal. The Cost Management tab shows the cost analysis by resource. Filters are available to narrow down the analysis.

Once the billing data is downloaded, it can be exported with the Exports tab.

The Azure Portal is also available to browse the resources when there is a connection between them.Decommissioning a resource depends on whether it is directly connected or indirectly connected.

#codingexercise

when two nodes of a BST are swapped, they can be found by:

void InOrderTraverse(Node root, ref prev, ref List<Node> pairs)

{

if (root == null) return;

InOrderTraverse(root.left, ref prev, ref pairs);

If (prev && root.data < prev.data) pairs.add(root);

InOrderTraverse(root.right, ref root, ref pairs);

}

Sunday, August 14, 2022

Data transmitted from the Azure Arc data services can be tremendously helpful to management of resources. The ones used by Azure Arc enabled services may include: SQL MI – Azure Arc, PostgreSQL HyperScale – Azure Arc, Azure Data Studio, Azure CLI (az) and Azure Data CLI (azdata). When a cluster is configured to be directly connected to Azure, some data is automatically transmitted to Microsoft. Operational data from Metrics and Logs is automatically uploaded. Billing and Inventory data such as number of instances, and usages such as vCores consumed is automatically sent to Microsoft and is required from instances. Diagnostics information for troubleshooting purposes is not automatically sent. They must be sent on-demand. But Customer Experience Improvement Program (CEIP) Summary is automatically sent but it must be opted.

When a cluster is not configured to be directly connected to Azure, it does not automatically transmit operational, or billing and inventory data to Microsoft. Data can be transmitted to Microsoft when it is configured to be exported. The data and mechanisms are similar to that in the directly connected mode. CEIP summary, if allowed can be automatically transmitted.

Metrics include performance and capacity related metrics, which are collected to an InfluxDB provided as part of Azure-Arc enabled data services and these can be viewed on a Grafana dashboard. This is customary for many Kubernetes products.

Logs emitted by all components are collected to an ElasticSearch database also provided as part of Azure Arc enabled data services. These logs can be viewed on the Kibana dashboard.

If the data is sent to Azure Monitor or Log Analytics, the destination region/zone can be specified and access to view can be granted to other regions.

Billing data is used for all the resources which can be categorized into three types: Azure Arc enabled SQL managed instances, PostgreSQL Hyperscale server group, SQL Server on Azure Arc enabled servers and Data controller. Every database instance and the data controller itself will be reflected in Azure as an Azure resource in the Azure Resource Manager.

The JSON data pertaining to a resource has attributes such as customObjectName, uid, instanceName, instanceNamespace, instanceType, location, resourceGroupName, subscriptionId, isDeleted, externalEndpoint, vCores, createTimestamp, and updateTimestamp.

Diagnostic data has attributes such as Error Logs which include log files capturing errors and these are restricted and shared by user. Attributes also include DMVs which can contain query and query plans but are restricted and shared by users, Views that can contain customer data but are restricted and shared by only users, Crash dumps involving customer data which has a maximum of 30 day retention of crash dumps, and statistics objects and crash dumps involving personal data which has machine names, login names, emails, locations and other identifiable information.

#codingexercise

Check if path from root to leaf matches a given value in a binary tree.

bool hasPathSum(Node root, int sum)

{

if (root ==null) return sum == 0;

int newsum = sum-root.data;

if (newsum == 0 && root.left == null && root.right == null) return true;

return hasPathSum(root.left, newsum) || hasPathSum(root.right, newsum);

}