Cluster computing: July 2022

Sunday, July 31, 2022

This is a continuation of series of articles on hosting solutions and services on Azure public cloud with the most recent discussion on Multitenancy here This article discusses Azure Arc enabled servers.

Azure Arc-enabled servers expose hybrid inventory to Azure management plane. The Windows and Linux physical servers and virtual machines hosted outside of Azure, on the corporate network or other clouds can become primary citizens as Azure resources when they are Azure-Arc enabled.

When an Azure Arc enabled Server is connected, it gets a resource ID to be included into a resource group. Standard Azure constructs such as Azure Policy and applying tags are enabled.

These diverse machines are connected by installing the Azure Connected Machine agent on each machine. This agent does not deliver any functionality and it doesn’t replace the Azure Log Analytics Agent or Azure Monitoring Agent. There are different deployment methods to get this agent installed on those external servers.

The supported cloud operations include govern, protect, configure and monitor. Governance is enabled with Azure Policy guest configurations to audit settings inside the machine. Non-Azure servers can be protected with Microsoft Defender for Endpoint and included through Microsoft Defender for cloud for threat detection, vulnerability management, and monitoring potential security threats. Microsoft Sentinel can be used for SIEM purposes. Configuration is enabled with Azure Automation for frequent and time-consuming management tasks. Configuration changes can be assessed for installed software, Microsoft Services, Windows registry and files, and Linux daemons using change tracking and inventory. Update management can be used to update Windows and Linux servers. Post-deployment configuration and automation tasks can be performed using Arc enabled servers VM extension. Operating Systems performance can be monitored using VM insights. Other log data such as performance data and events can be stored in a Log Analytics workspace.

Instance Metadata information about the connected machines is collected and stored in the region where the Azure Arc machine resource is configured and includes details such as Operating system name and version, Computer name, Computer fully qualified domain name and Connected Machine agent version.

The status for a connected machine can be viewed in the Azure Portal under Azure Arc -> Servers.

The connected machine agent sends a regular heartbeat message from a machine and if it stops, it is assumed to be disconnected within 15 to 30 minutes. The machine identity’s credential is valid up to 90 days and renewed every 45 days. Azure Arc-enabled servers has a limit for the number of instances that can be created in each resource group, but it does not have any limits at the subscription or service level.

Saturday, July 30, 2022

Monte Carlo methods

Model: Monte-Carlo tries to solve deterministic problems using optimizations based on probabilistic interpretations. It draws samples from a probability distribution. Simulated Annealing is a special case of Monte-Carlo. When trials and errors are scattered in their results, an objective function that can measure the cost or benefit will help with convergence. If the samples are large, a batch analysis mode is recommended. The approach to minimize or maximize the objective function is also possible via gradient descent methods but the use of simulated annealing can overcome local minimum even if the cost is higher because it will accept with a certain probability. In Simulated annealing, the current cost is computed, and the new cost is based on the direction of change. If the cost improves, the temperature decreases.

Sample implementation follows:

def annealingoptimize(domain,costf,T=10000.0,cool=0.95,step=1):
     # Initialize the values randomly
     vec=[float(random.randint(domain[i][0],domain[i][1]))
          for i in range(len(domain))]
     while T>0.1:
          # Choose one of the indices
          i=random.randint(0,len(domain)-1)
          # Choose a direction to change it
          dir=random.randint(-step,step)
          # Create a new list with one of the values changed
          vecb=vec[:]
          vecb[i]+=dir
          if vecb[i]<domain[i][0]: vecb[i]=domain[i][0]
          elif vecb[i]>domain[i][1]: vecb[i]=domain[i][1]

          # Calculate the current cost and the new cost
          ea=costf(vec)
          eb=costf(vecb)
          p=pow(math.e,(-eb-ea)/T)
          # Is it better, or does it make the probability
          # cutoff?
          if(eb<ea or random.random( )<p):
               vec=vecb
          # Decrease the temperature
          T=T*cool
     return vec

Friday, July 29, 2022

This is a continuation of series of articles on hosting solutions and services on Azure public cloud with the most recent discussion on Multitenancy here and picks up the discussion on the checklist for architecting and building multitenant solutions. Administrators would have found the list familiar to them.

While the earlier articles introduced the checklist as structured around business and technical considerations, the more recent articles provide information on a specific technology named Azure Arc

Azure Arc is a bridge that extends the Azure platform to the applications and services with the flexibility to run across datacenters, edge, and multi-cloud environments. Cloud native applications can be developed with a consistent development, operations, and security model because Azure Arc runs on both new and existing hardware, virtualization, and Kubernetes platforms, IoT devices and integrated systems.

Azure Arc supports custom location which provides a reference as deployment target that administrators can set up and users can access when creating a resource. The details of the backend infrastructure is hidden and only the reference is needed for the users. It is an Azure Resource Manager resource and it supports Azure role-based access control such that an administrator or operator can determine which users have access through which roles.

Resources can be created on a namespace within a Kubernetes cluster to target deployment of Azure Arc enabled database instance and it can be created on other IaaS platforms such as vCenter and Azure Stack HCI to deploy and manage virtual machines.

On the Kubernetes cluster, Azure Arc’s custom location references an abstraction of a namespace within the cluster and can be associated with granular RoleBindings and ClusterRoleBindings necessary for other services. Developers can then deploy these applications without having to know details of the namespace and Kubernetes cluster.

On the Azure Arc-enabled VMWare vSphere, the VM lifecycle operations can directly be executed from Azure. VM’s templates, network and storage can be browsed easily from the portal. Guest management can be enabled across Azure and VMWare virtual machines.

Reference: Multitenancy: https://1drv.ms/w/s!Ashlm-Nw-wnWhLMfc6pdJbQZ6XiPWA?e=fBoKcN

Thursday, July 28, 2022

While the earlier articles introduced the checklist as structured around business and technical considerations, the more recent articles provide information on a specific technology named Azure Arc

int GetNodeWithKLeaves(Node root, int k, ref List<Node> result)

{

if (root == null) return 0;

if (root.left == null && root.right == null) return 1;

int left = GetNodeWithKLeaves(root.left, k, ref result);

int right = GetNodeWithKLeaves(root.right, k, ref result);

if (left + right == k)

{

result.Add(root);

}

return left + right;

}

Wednesday, July 27, 2022

While the previous article introduced the checklist as structured around business and technical considerations, this article provides information on a specific technology named Azure Arc

It helps to centrally manage a wide range of resources including Windows and Linux servers, SQL Server, Kubernetes clusters and Azure services. It performs virtual machine lifecycle management from a variety of platforms such as Azure Stack HCI, VMware, and System Center VMM environments from a centralized location. It helps to meet governance and compliance standards for apps, infrastructure, and data with Azure Policy. Other services can be enrolled into such as Azure Monitor, Microsoft Defender for cloud and updates. The Microsoft Cloud adoption framework for hybrid and multi-cloud management in the Microsoft Cloud continues to provide guidance.

Cloud native applications can be built anywhere and at scale with Azure Arc because DevOps practices can be applied anywhere, and software can be build incrementally. Existing tools and practices involving source control and development environments can continue to work with Azure Arc. Errors can be reduced with consistent policy enforcements. The APIs are written once but run anywhere with the help of Kubernetes. It works even with machine-learning pipeline. It takes advantage of elastic scale, consistent on-premises and multi-cloud management, and cloud-style billing models. Sprawling IT assets can now be controlled, organized, and governed. It simplifies governance and management by delivering a consistent multi-cloud and on-premises management platform with the following features:

- It promotes visibility of non-Azure and/or on-premises resources into Azure Resource Manager.

- It helps to manage virtual machines, Kubernetes clusters and databases as if they were running on Azure

- It helps to use familiar Azure services and management capabilities

- It supports continued usage traditional ITOps while introducing DevOps practices to support new cloud native patterns in the environment.

- It helps to configure custom locations as an abstraction layer on top of Azure arc-enabled Kubernetes clusters

#codingexercise

Two nodes of a BST are swapped, find them

void InOrderTraverse(Node root, ref prev, ref List<Node> pairs)

{

if (root == null) return;

InOrderTraverse(root.left, ref prev, ref pairs);

If (prev && root.data < prev.data) pairs.add(root);

InOrderTraverse(root.right, ref root, ref pairs);

}

Given a BinaryTree and a value, check if the path sum from root to leaf equals the given value.

void getPathSum(Node root, int sum, ref List<Node> path, ref List<Node> paths)

{

if (root == null) return;

path.Add(root);

getPathSum(root.left, sum, ref path, ref paths);

getPathSum(root.right, sum, ref path, ref paths);

if (root.left == null && root.right == null && path.Sum() == sum)

paths.add(path);

path.RemoveLast();

}

Tuesday, July 26, 2022

While the previous article introduced the checklist as structured around business and technical considerations, it provided specific examples in terms of Microsoft technologies. This article focuses on the open-source scenarios on Azure with the Apache stack specifically. Some open-source products like Cassandra and Storm were studied for hosting on the Azure public cloud. This article focuses on the stream processing with fully managed open-source data engines.

Azure offers General Acceptance data services that run open-source engines. These include:

- Azure Event Hubs which offer Kafka for stream ingestion

- Azure Cosmos DB which supports event storage in Cassandra

- Azure Kubernetes service which hosts Kubernetes microservices for stream processing.

- Azure database for PostgreSQL which manages relational tables

- Azure Cache for Redis which provides an in-memory data store.

This brings the benefit of both the open source as well as the azure managed services. The open-source solution is preferred for its ability to migrate existing workloads, tap into the broader open-source community, and limit vendor lock-in. The open-source technologies are made accessible with the public cloud services that offer high availability, high performance, improved scalability, and elasticity. This form of stream-based solutions can be used both for new workloads as well as migrating existing workloads.

A solution comprising of the above-mentioned services and their open-source engine would be laid out this way: Streaming sources stream events to a Kafka store in the Azure Event Hub using one or more Kafka producers. AKS provides a managed environment for Apache Spark which consumes events. Microservices hosted on AKS write the events to Azure Cosmos DB using the Cassandra API. The change feed feature of CosmosDB processes the events in real-time. These events are batch-processed by applications that emit enriched information into PostgreSQL This relational data store relays the information downstream such as for reporting. Besides persistence, if an in-memory solution is required it can leverage Azure Cache for Redis. Websites and other applications use the cached data to improve response times.

This kind of solution can have improved performance, scalability, security, resiliency with just a few considerations. For example, the Cosmos datastore can apply partitioning strategy to boost performance. Similarly, the PostgreSQL server can be setup with connection pooling to avoid repeated setup and teardown of connections. Scalability can be improved with a premium tier for Event Hub. If the ingress exceeds a few Gigabytes of data, the dedicated tier can be used to setup and tear down clusters in a single tenant offering with a guaranteed capacity. Autoscaling of provisioned throughput is supported by Azure Cosmos DB when the workloads are unpredictable and spiky. Security can be improved with Azure Private Link so that the traffic between the services flows over the Azure backbone without breaching into the public internet. Keys could be managed and rotated with a keyvault. Availability zones can be used to protect business critical applications from datacenter failures. Cost optimization can be achieved with regulated throughput and scaling up when demand increases. Using the proper tier and model, we can reduce costs by keeping the usage within limits.

Monday, July 25, 2022

Each open-source product that is used in a multitenant solution must be carefully reviewed for the features it offers to support multitenancy. While the checklist alluded to some of the general requirements in terms of shared resources and tenant isolation, open-source products might be able to articulate isolation simply by naming containers differently. The considerations to overcome noisy neighbor problems and scaling out infrastructure must still be made to the degree that these products permit.

Let us take a few examples from the Apache stack. The Data partitioning guidance for Apache Cassandra for instance describes how to separate data partitions to be managed and accessed separately. Horizontal, vertical and functional partitioning strategies must be suitably applied. Another example is where Azure public Multi-access edge compute must provide high availability to the tenants. Cassandra can be used to support geo-replication.

Apache Storm is used in edge computing and features true stream processing with low-level APIs. The trained AI models can be brought to the edge with Azure Stack Hub while Storm stores the data. The advantage of hosting the AI models close to the edge is that there is no latency in predictions from the events. The models can always be trained on a high-performance processor including GPUs but do not need heavy duty compute to host and run them for making predictions. Storm can be central store to receive all the events from the edge as well as their predictions.

Since neither Big Data nor relational stores are suitable for the ingestion, processing and analysis of events and those stores can be large enough to overwhelm the continuous processing required for events, it is better to use Storm for the store and the edge to generate the events. Storm is taken as an example for stream processing store but it is not the only one. Readers are encouraged to review Apache Kafka, Apache Flink and Pulsar if they would like to leverage the nuances between their capabilities. There are also options available in the stream processing systems on the public cloud such as HD Insight with Storm which makes it easy to process and query data from Storm. These interactive SQL queries can execute fast and at scale over both structured or unstructured data. Stores like CosmosDB can accommodate diverse and unpredictable IoT workloads without sacrificing ingestion or query performance. If real-time processing systems are required, then Storm and other stores can help with capturing, analysis and generating reports or automated responses with minimal latency.

If batch processing is required, Apache Sqoop can help with automations over Big Data. For example, Sqoop jobs can be used to copy data. Data transfer options such as Azure Import/Export, Data Box and Sqoop can work databases with little or no impact to performance. Oozie and Sqoop can be used to manage batch workflows for captured real-time data.

#codingexercise

Two nodes of a BST are swapped, find them

void InOrderTraverse(Node root, ref prev, ref List<Node> pairs)

{

if (root == null) return;

InOrderTraverse(root.left, ref prev, ref pairs);

If (prev && root.data < prev.data) pairs.add(root);

InOrderTraverse(root.right, ref root, ref pairs);

}

Sunday, July 24, 2022

Reducing Trials and Errors

Model: When trials and errors are scattered in their results, an objective function that can measure the cost or benefit will help with convergence. If the samples are large, a batch analysis mode is recommended. The approach to minimize or maximize the objective function is also possible via gradient descent methods but the use of simulated annealing can overcome local minimum even if the cost is higher because it will accept with a certain probability. In Simulated annealing, the current cost is computed, and the new cost is based on the direction of change. If the cost improves, the temperature decreases.
Sample implementation follows:

Saturday, July 23, 2022

In the analytics space, a typical scenario is to build solutions that integrate data from many IoT devices into a comprehensive data analysis architecture to improve and automate decision making. In this scenario, a Cassandra cluster is used to store data.

If the architecture involves an N-tier application with Apache Cassandra, then Linux virtual machines and a virtual network configured for N-tier applications must be deployed with Apache Cassandra. If the data is non-relational or No-SQL, the non-relational databases that store data as key-value pairs, graphs, time-series objects, and other storage models could leverage the Azure CosmosDB Cassandra API as the service for data access.

Stream processing for fully managed open-source data engines like Kafka, Kubernetes, Cassandra, PostgreSQL, and Redis components is also a typical scenario. Events could be streamed by using fully managed Azure data services.

Performance considerations for running Apache Cassandra on Azure Virtual machines must be examined. Then their recommendations can be used as a baseline to test against the workload.

There must be some safeguards against the noisy neighbor antipattern which is specific to some workloads. Service level objectives and even service level agreements could be defined. These would be based on the requirements of the tenants as well as the composite SLAs of the Azure resources. Reliability is easily impacted by scale and service level agreements can suffer from performance. Testing that the application performs well under load is an important consideration. Finally, Chaos engineering applications can be applied to test the reliability of the solution.

Security checklist applies as early as design time. There must be tenant isolation in a multi-tenant application but putting the right enforcements and hardening are required to always realize it. In addition, there must be some testing that the tenants are isolated. There must be no cross-tenant access or data leakage and sometimes this involves static and runtime code analysis. These tools can safeguard the security considerations throughout the development.

Reference: Multitenancy: https://1drv.ms/w/s!Ashlm-Nw-wnWhLMfc6pdJbQZ6XiPWA?e=fBoKcN