Cluster computing

Sunday, October 17, 2021

This is a continuation of an article that describes operational considerations for hosting solutions on Azure public cloud.

1. When an IPSec VPN (site-to-site) or Express Route (private peering) is used, the configuration for the self-hosted Integrated runtime varies. In the site-to-site, the command channel and the data channel from the self-hosted integrated runtime crosses the Azure virtual network to reach the Data Factory and the Azure managed storage services respectively. With private peering, the data channel is entirely within the Azure Virtual network in which the self-hosted integration runtime runs.

2. Windows Firewall runs as a daemon on the local machine in which the self-hosted integration runtime is installed. The outbound port and domain requirements for corporate firewalls could be listed. These do not include the rules for the self-hosted integration runtime. The outbound port 443 must be opened for the self-hosted integration runtime to make internet connections. The inbound port 8060 must be opened only at the machine level. IP configurations and allow lists can be setup in data stores.

3. Multi-region clusters increase resiliency. This architecture builds on the AKS Baseline architecture where AD pod identity, ingress and egress restrictions, resource limits and other secure AKS infrastructure configurations are described. Each cluster is deployed in a separate Azure region and traffic is routed through all regions Even if one region becomes unavailable, traffic is routed through another that is closest to the user who issued the request. A regional hub-spoke network pair are deployed for each regional AKS instance. Azure Firewall manager policies are used to manage firewall policies across all regions. Azure FrontDoor is used to load balance and route traffic to a regional Azure application gateway instance designated for each AKS cluster. A single Azure container registry is used for all Kubernetes instances in the cluster.

4. Multitenant SaaS is excellent for running solutions that can be unbranded and marketed to other businesses. It adds an entire new revenue stream for a company. But the operational aspects of running this service is very different from that of a web application. The architecture for hosting this involves creating multiple resource groups. All users access resources through the Azure Front Door that has integration with both the Azure DNS and the Azure Active Directory. In each resource group, an application gateway routes traffic to multiple app services that are all hosted on the infrastructure provided by a layer of Azure Kubernetes service.

Saturday, October 16, 2021

This is a continuation of an article that describes operational considerations for hosting solutions on Azure public cloud.

1. Efficient Docker image deployment for intermittent low bandwidth connectivity scenarios requires the elimination of docker pulling of images. An alternative deployment mechanism can compensate for the restrictions by utilizing an Azure Container Registry, Signature Files, a fileshare, an IOT hub for pushing manifest to devices. The Deployment path involves pushing image to device which is containerized. The devices can send back messages which are collected in a device-image register. An image is a collection of layers where each layer represents a set of file-system differences and stored merely as folders and files. A SQL database can be used to track the state of what’s occurring on the target devices and the Azure based deployment services which helps with both during and after the deployment process.

2. Data from an on-premise SQL Server can be used in Azure Synapse that transforms the data for analysis. This would involve an ELT pipeline that converts the data into storage blobs which can then be ready by Azure Synapse for analysis and visualization. The Analysis stack involving PowerBI can be integrated with Azure active directory to allow only the members of the organization to sign in and view the dashboards. Analysis services support tabular models but not multi-dimensional models. Multi-dimensional models use OLAP constructs like cubes, dimensions and measures which are better analyzed with SQL Server Analysis services.

3. Image Processing is one of the core cognitive services provided by Azure. Companies can eliminate the need for managing individual or proprietary servers and leverage the industry standard with the use of Compute Vision API, Azure Grid to collect images and Azure Functions to leverage the Vision APIs for making analysis or predictions. The blob storage must trigger an Event grid notification that is sent to the Azure Function, and this makes an entry in the CosmosDB to persist the results of the analysis along with the image metadata. The database can autoscale but Azure Functions has a limit of about 200 instances.

4. A content-based recommendation uses information about the items to learn customer preferences and recommends items that share properties with items that a customer has previously interacted with. Azure Databricks can be used to train a model that predicts the probability a user will engage with an item. The model can then be deployed as a prediction service hosted on Azure Kubernetes service. MMLSpark library enables training a LightGBM classifier on Azure Databricks to predict the click probability. Azure ML is used to create a Docker image in the Azure container registry that holds the image with scoring scripts and all necessary dependencies for serving predictions. Azure ML is also used to provision the compute for serving predictions using Azure Kubernetes Service clusters. A cluster with ten standard L8s VMs can handle millions of records. The scoring service must run separately on each node in the Kubernetes cluster. The training can be handled independently from the production deployment.

5. Availability Zones can be used to spread a solution across multiple zones within a region allowing for applications to function even when one zone fails. For example, the VM uptime service level agreement can reach 99.99% because it eliminates single points of failure. Availability zone also have low latency and come at no cost as compared to the deployments that span region. Designing solutions that continue to function despite failure is key to improving the reliability of the solution. Zonal deployments can be specific to a zone to achieve more stringent latency or performance requirements while zone-redundant deployments make no distinction between the zones.

Friday, October 15, 2021

This is a continuation of an article that describes operational considerations for hosting solutions on Azure public cloud.

1. Azure Blueprints can be leveraged to allow an engineer or architect to sketch a project’s design parameters, define a repeatable set of resources that implements and adheres to an organization’s standards, patterns and requirements. It is a declarative way to orchestrate the deployment of various resource templates and other artifacts such as role assignments, policy assignments, ARM templates, and Resource Groups. Blueprint Objects are stored in the CosmosDB and replicated to multiple Azure regions. Since it is designed to setup the environment, it is different from resource provisioning. This package fits nicely into a CI/CD pipeline and handles both what should be deployed and the assignment of what was deployed.

2. Moving resources across regions is required by businesses to align to a region launch, align to services or resources specific to that region, or to align for proximity. Networking resources such as express route, vnet peering, gateway, edge routers etc and multi-tiered web applications, running in the cloud environment are particularly prone to migrations across regions. Steps to migrate involve planning downtime, ensuring subscription limits and quotas are met, assigning permissions, performing resource identification and such others as prerequisites. Then the components can be moved with the networking first, followed by the app, then followed by the PaaS services. Considerations include planning for complex infrastructure, moving resource types, moving all resources within an application together, ensuring capacity requirements are met, planning for business continuity, validating the migration, ensuring due diligence by testing and then moving to the target region.

3. Resource groups are created to group resources that share the same lifecycle. They have no bearing on the cost management of resources other than to help with querying. They can be used with tags to narrow down the interest. There is metadata stored about the resources and it is stored in a particular region. Resources can be moved from one resource group to another or even to another subscription. Finally, resource groups can be locked to prevent actions such as delete or write by users who have access.

Thursday, October 14, 2021

This is a continuation of an article that describes operational considerations for hosting solutions on Azure public cloud.

1. Blob rehydration to the archive tier can be for either hot or cool tier. There are two options for rehydrating a blob that is stored in the archive tier. A) One can copy an archived blob to an online tier using the reference of the blob or its URL. B) Or one can change the blob access tier to an online tier. It can rehydrate the archived blob to hot or cool by changing its tier. Rehydrating might take several hours but several of them can be done concurrently. Rehydration priority might also be set.

2. Virtual Network peering allows us to connect virtual networks in the same region or across regions as in the case of Global VNet Peering through the Azure Backbone network. When the peering is setup, traffic to the remote virtual network, traffic forwarded from the remote virtual network, virtual network gateway or Route server and traffic to the virtual network can be allowed by default.

3. Transaction processing in Azure is not on by default. A transactions locks and logs records so that others cannot use it, but it can be bound to partitions, enabled as distributed transactions and with two phase commit protocol. Transaction processing requires two communication steps for a resource manager and a response from the transaction coordinator which are costly for a datacenter in Azure. It does not scale as the number resource to calls expands as 2 resources – 4 network calls, 4 resources – 16 calls, 100 resource – 400 calls. Besides, the datacenter contains thousands of machines, failures are expected, and the system must deal with network partitions. Waiting for response from all resource managers has costly communication overhead.

4. Diagnostic settings to send platform logs and metrics to different destinations can be authored. Logs include Azure Activity logs and resource logs. Platform metrics are collected by default and stored in the Azure monitor metrics database. Each Azure resource requires its own diagnostic settings, and a single setting can define no more than one of each of the destinations. The available categories will vary for different resource types. The destinations for the logs could include the Log Analytics workspace, Event Hubs and Azure Storage. Metrics are sent automatically to the Azure Monitor Metrics. Optionally, settings can be used to send metrics to Azure monitor logs for analysis with other monitoring data using restricted queries. Multi-dimensional metrics (MDM) are not supported. They must be flattened

5. Legacy authentication to Azure AD can be blocked with conditional access which gives users’ easy access to the cloud apps. Azure Active Directory supports a broad variety of authentication protocols including legacy authentication but those protocols such as POP, SMTP, IMAP and MAPI cannot enforce MFA and create a vulnerability to the overall service. A conditional access policy blocks legacy authentication. The Azure portal shows Azure Active Directory Sign-ins where the client app column indicates those that use legacy authentication. Policies can then be set to block those applications directly or indirectly.

Wednesday, October 13, 2021

This is a continuation of an article that describes operational considerations for hosting solutions on Azure public cloud.

· Resources can be locked to prevent unexpected changes. A subscription, resource group or resource can be locked to prevent other users from accidentally deleting or modifying critical resources. The lock overrides any permissions the users may have. The lock level can be set to CannotDelete or ReadOnly with the ReadOnly being more restrictive. Lock inheritance can be applied at a parent scope, all resources within that scope can then inherit the same lock. Some considerations still apply after locking. For example, a CannotDelete lock on a storage account does not prevent data within that account to be deleted. A read only lock on an application gateway prevents you from getting the backend health of the application gateway because it uses POST. Only Owner and User Access Administrator role members are granted access to Microsoft.Authorization/locks/* actions.

· Azure KeyVaults can throttle client requests to help maintain optimal performance and reliability even though they take a high volume of concurrent calls to the Azure service. Failed requests return a 429 error code and the clients must exponentially backoff for retries. Caching the secrets retrieved from the Azure Key Vault in memory and reusing the secrets from memory mitigates the high load on a keyvault server. Encyrpt, wrap and verify public-key operations can be performed with no access to KeyVault, which not only reduces the risk of throttling, but also improves reliability. Prgrammatically, this can be done with the help of ServiceClientOptions when the corresponding client is initiated. The ServiceClientOptions takes a retry setting where a policy describing the delay, max delay, maxRetries and RetryMode can be specified.

· Legacy authentication to Azure AD can be blocked with conditional access which gives users’ easy access to the cloud apps. Azure Active Directory supports a broad variety of authentication protocols including legacy authentication but those protocols such as POP, SMTP, IMAP and MAPI cannot enforce MFA and create a vulnerability to the overall service. A conditional access policy blocks legacy authentication. The Azure portal shows Azure Active Directory Sign-ins where the client app column indicates those that use legacy authentication. Policies can then be set to block those applications directly or indirectly.

· End-to-End data driven workflows for data processing scenarios can be created using pipelines and activities in Azure Data Factory and Azure Synapse Analytics which can have one or more pipelines. A pipeline consists of a set of activities and helps to manage them as a set instead of each one individually. There are three groupings of activities: data movement activities, data transformation activities and control activities. An activity can take zero or more input datasets and produce one or more output datasets. The pipeline can be exported as JSON

Tuesday, October 12, 2021

Some trivia for Azure public cloud (continued from previous article)

Operational requirements for hosting solutions on Azure public cloud:

· Applications can be installed in Virtual machine scale sets with an Azure Template. A custom script extension can be added to the template. The reference to location of the script can be passed in as a parameter. Alternatively, http calls can also be made.

· A site-to-site VPN gateway can be configured between Azure and on-premises. Site to Site VPN gateway can provide better continuity for the workload in hybrid cloud setup with Azure. A load balancer front-end ip address cannot be reached with Virtual network peering across regions. Support for basic load balancer only exists within the same region but gateway transit can be allowed in globally peered network

· An Azure AD Connect server can be set up with either passthrough authentication or password hash authentication. The latter is supportive even when on-premises goes down. Connections must be allowed through the *.msappproxy.net and Azure datacenter IP ranges.

· Notify Azure Sentinel alert to your email automatically. This step requires a playbook to be created with designer user-interface and does not require code to be written. Completing this step is required since there is no built-in feature for Sentinel.

· Network traffic inbound and outbound from a virtual network subnet can be filtered with a network security group using the Azure portal. Security rules can be applied to resources deployed in a subnet. Firewall rules and NSGs both support restricting and allowing traffic.

· Servers running on Hyper-V can be discovered and assessed via Azure Migrate. Azure account, Hyper-V host and the Azure Migrate appliance is required for this purpose.

· Diagnostic settings to send platform logs and metrics to different destinations can be authored. Logs include Azure Activity logs and resource logs. Platform metrics are collected by default and stored in the Azure monitor metrics database. Each Azure resource requires its own diagnostic settings, and a single setting can define no more than one of each of the destinations. The available categories will vary for different resource types. The destinations for the logs could include the Log Analytics workspace, Event Hubs and Azure Storage. Metrics are sent automatically to the Azure Monitor Metrics. Optionally, settings can be used to send metrics to Azure monitor logs for analysis with other monitoring data using restricted queries. Multi-dimensional metrics (MDM) are not supported. They must be flattened

· Data collection rules (DCR) specify whether data coming to Azure monitor should be sent or stored. Input sources include Azure Monitor agents running on virtual machines, virtual machine scale sets, and Azure Arc for servers. A rule includes data sources, streams which is a unique handle that transforms and schematizes as one type, destinations for where the data should be sent and data flows for definitions of which streams should be sent to which destinations.

· Automatic tuning in Azure SQL database and Azure SQL managed database instance can help with tuning on peak performance and stable workloads. There is support for continuous performance tuning based on AI and machine learning.

· Limited access to Azure storage resources can be granted using shared access signatures (SAS). With a SAS there is granular control over how a client can access data, what resources it has access to, what permissions it has on the resources and how long it is valid. There are three types of SAS - user delegation SAS, Service SAS and account SAS. The SAS token is generated on the client side using one of the Azure storage libraries. If this is leaked, it can be used by anyone, but it is set to expire.

Monday, October 11, 2021

Some trivia for Azure public cloud (continued from previous article)

Operational requirements for hosting solutions on Azure public cloud:

· Disaster recovery – Azure Site Recovery services contributes to application-level protection and recovery. It provides near-synchronous replication for any workloads for single or multi-tier applications, works with active-directory and sql server replication. It protects Sharepoint, Dynamics AX and Remote Desktop services It has flexible recovery plans with a rich automation library. One of its biggest use cases is to replicate VMs to Azure. It provides end to end recovery plans.

· Data redundancy – Azure storage services come with built-in redundancy which also improves durability of the existing Blob services. The Geo-redundant storage (GRS) copies data synchronously three times within a single physical location in the primary region using the LRS. The Geo-zone-redundant storage service (GZRS) copies data synchronously across three availability zones in the primary region using ZRS. Then the other regions are replicated asynchronously. Read Access GRS (RA-GRS) can provide redundancy just for read access.

· Blob rehydration to the archive tier can be for either hot or cool tier. There are two options for rehydrating a blob that is stored in the archive tier. A) One can copy an archived blob to an online tier using the reference of the blob or its URL. B) Or one can change the blob access tier to an online tier. It can rehydrate the archived blob to hot or cool by changing its tier. Rehydrating might take several hours but several of them can be done concurrently. Rehydration priority might also be set.

· Storage costs can be optimized by managing the data lifecycle. Azure storage lifecycle management offers a rule-based policy that can be used to transition blob data to the appropriate access tiers or to be set with expiration. The lifecycle policy definition has attributes for actions, baseblobs and filters.

· Azure monitors are full stack monitoring service. Many Azure services use it to collect and analyze monitoring data. The Blob storage collects the same kind of monitoring data as other Azure resources. Platform metrics and activity logs are automatically collected.

· Virtual Network peering allows us to connect virtual networks in the same region or across regions as in the case of Global VNet Peering through the Azure Backbone network. When the peering is setup, traffic to the remote virtual network, traffic forwarded from the remote virtual network, virtual network gateway or Route server and traffic to the virtual network can be allowed by default.

· Transaction processing in Azure is not on by default. A transactions locks and logs records so that others cannot use it, but it can be bound to partitions, enabled as distributed transactions and with two phase commit protocol. Transaction processing requires two communication steps for a resource manager and a response from the transaction coordinator which are costly for a datacenter in Azure. It does not scale as the number resource to calls expands as 2 resources – 4 network calls, 4 resources – 16 calls, 100 resource – 400 calls. Besides, the datacenter contains thousands of machines, failures are expected, and the system must deal with network partitions. Waiting for response from all resource managers has costly communication overhead.