Cluster computing

Tuesday, October 19, 2021

This is a continuation of an article that describes operational considerations for hosting solutions on Azure public cloud.

1. When an application is migrated to Azure, its storage can remain the same format as before. For example, if file storage was used, NFSv3 can continue to be used by using Azure Storage V2.

2. General-purpose v2 accounts deliver the lowest per-gigabyte capacity prices for Azure Storage, as well as industry-competitive transaction prices. General-purpose v2 accounts support default account access tiers of hot or cool and blob level tiering between hot, cool, or archive.

3. Archive storage tier does not provide immediate data access. That can take hours to rehydrate. If we need immediate access, change the access tier to hot or cool. A v1 storage account can be upgraded to either hot or cool storage tier.

4. Costs for storage tier is based on amount of data stored depending on the access tier, the data access cost, the transaction cost, the geo-replication data transfer cost, the outbound data transfer cost, and the changing storage access tier. The primary access pattern for the blob storage in terms of reads and writes and their comparisons determines the cost savings. All storage accesses can be monitored, and metrics emitted include capacity costs, transaction costs, and data transfer costs.

5. Elastic pools can help manage and scale multiple databases in Azure SQL Database. Traditionally, there were two options: over-provision resources based on peak usage and overpay, or Under-provision to save cost, at the expense of performance and customer satisfaction during peaks. Elastic pools solve this problem by ensuring that databases get the performance resources they need when they need it. They provide a simple resource allocation mechanism within a predictable budget

6. ExpressRoute, VPN Gateway and Virtual network peering provide different levels of functionality. If we want private site-to-site connectivity, we can use ExpressRoute. If we want secure site-to-site VPN connectivity, we can use virtual network site-to-site connection. If we want secure point to site connectivity, we can use virtual network point-to-site connection. We must have a private site-to-site connectivity, a secure site to site VPN connectivity, and a secure point to site connectivity only in that order.

Monday, October 18, 2021

This is a continuation of an article that describes operational considerations for hosting solutions on Azure public cloud.

1. When an IPSec VPN (site-to-site) or Express Route (private peering) is used, the configuration for the self-hosted Integrated runtime varies. In the site-to-site, the command channel and the data channel from the self-hosted integrated runtime crosses the Azure virtual network to reach the Data Factory and the Azure managed storage services respectively. With private peering, the data channel is entirely within the Azure Virtual network in which the self-hosted integration runtime runs.

2. Windows Firewall runs as a daemon on the local machine in which the self-hosted integration runtime is installed. The outbound port and domain requirements for corporate firewalls could be listed. These do not include the rules for the self-hosted integration runtime. The outbound port 443 must be opened for the self-hosted integration runtime to make internet connections. The inbound port 8060 must be opened only at the machine level. IP configurations and allow lists can be setup in data stores.

3. Multi-region clusters increase resiliency. This architecture builds on the AKS Baseline architecture where AD pod identity, ingress and egress restrictions, resource limits and other secure AKS infrastructure configurations are described. Each cluster is deployed in a separate Azure region and traffic is routed through all regions Even if one region becomes unavailable, traffic is routed through another that is closest to the user who issued the request. A regional hub-spoke network pair are deployed for each regional AKS instance. Azure Firewall manager policies are used to manage firewall policies across all regions. Azure FrontDoor is used to load balance and route traffic to a regional Azure application gateway instance designated for each AKS cluster. A single Azure container registry is used for all Kubernetes instances in the cluster.

4. Multitenant SaaS is excellent for running solutions that can be unbranded and marketed to other businesses. It adds an entire new revenue stream for a company. But the operational aspects of running this service is very different from that of a web application. The architecture for hosting this involves creating multiple resource groups. All users access resources through the Azure Front Door that has integration with both the Azure DNS and the Azure Active Directory. In each resource group, an application gateway routes traffic to multiple app services that are all hosted on the infrastructure provided by a layer of Azure Kubernetes service.

5. It is always good to spot check an AKS cluster against the current recommended Azure best practices. For example, the AKS baseline cluster architecture brings the best in terms of availability and protection. In addition, the AKS workloads can be effectively managed by designating proper control on requests and imposing limits. Setting the scale out of containers and the use of proxies, load balancers and ingress contribute to the best practices.

6. High availability can be improved with availability zones. Using an architecture that uses redundant resources spread across zones to provide high resilience. Most of the resources are actively used because they serve the requests. Some backend services or stores such as the relational store might have redundancy and used only when the active ones fail. The use of Availability zones significantly improves the IaaS which is critical to the hosting of web applications that are not managed instances in the cloud. Therefore, using zonal and zone-redundant architecture is specifically useful to the Azure public cloud.

7. Identity is a necessary investment for any software application and service hosted in the public cloud. The right choices can endear the software to its users. Seamless integration and SSO enables applications and services to work together with the same notion of user. Creating separate Active Directory domain in Azure is required in Azure that is trusted by domains in the on-premises AD forest is a significant step in this direction.

Sunday, October 17, 2021

This is a continuation of an article that describes operational considerations for hosting solutions on Azure public cloud.

Saturday, October 16, 2021

This is a continuation of an article that describes operational considerations for hosting solutions on Azure public cloud.

1. Efficient Docker image deployment for intermittent low bandwidth connectivity scenarios requires the elimination of docker pulling of images. An alternative deployment mechanism can compensate for the restrictions by utilizing an Azure Container Registry, Signature Files, a fileshare, an IOT hub for pushing manifest to devices. The Deployment path involves pushing image to device which is containerized. The devices can send back messages which are collected in a device-image register. An image is a collection of layers where each layer represents a set of file-system differences and stored merely as folders and files. A SQL database can be used to track the state of what’s occurring on the target devices and the Azure based deployment services which helps with both during and after the deployment process.

2. Data from an on-premise SQL Server can be used in Azure Synapse that transforms the data for analysis. This would involve an ELT pipeline that converts the data into storage blobs which can then be ready by Azure Synapse for analysis and visualization. The Analysis stack involving PowerBI can be integrated with Azure active directory to allow only the members of the organization to sign in and view the dashboards. Analysis services support tabular models but not multi-dimensional models. Multi-dimensional models use OLAP constructs like cubes, dimensions and measures which are better analyzed with SQL Server Analysis services.

3. Image Processing is one of the core cognitive services provided by Azure. Companies can eliminate the need for managing individual or proprietary servers and leverage the industry standard with the use of Compute Vision API, Azure Grid to collect images and Azure Functions to leverage the Vision APIs for making analysis or predictions. The blob storage must trigger an Event grid notification that is sent to the Azure Function, and this makes an entry in the CosmosDB to persist the results of the analysis along with the image metadata. The database can autoscale but Azure Functions has a limit of about 200 instances.

4. A content-based recommendation uses information about the items to learn customer preferences and recommends items that share properties with items that a customer has previously interacted with. Azure Databricks can be used to train a model that predicts the probability a user will engage with an item. The model can then be deployed as a prediction service hosted on Azure Kubernetes service. MMLSpark library enables training a LightGBM classifier on Azure Databricks to predict the click probability. Azure ML is used to create a Docker image in the Azure container registry that holds the image with scoring scripts and all necessary dependencies for serving predictions. Azure ML is also used to provision the compute for serving predictions using Azure Kubernetes Service clusters. A cluster with ten standard L8s VMs can handle millions of records. The scoring service must run separately on each node in the Kubernetes cluster. The training can be handled independently from the production deployment.

5. Availability Zones can be used to spread a solution across multiple zones within a region allowing for applications to function even when one zone fails. For example, the VM uptime service level agreement can reach 99.99% because it eliminates single points of failure. Availability zone also have low latency and come at no cost as compared to the deployments that span region. Designing solutions that continue to function despite failure is key to improving the reliability of the solution. Zonal deployments can be specific to a zone to achieve more stringent latency or performance requirements while zone-redundant deployments make no distinction between the zones.

Friday, October 15, 2021

This is a continuation of an article that describes operational considerations for hosting solutions on Azure public cloud.

1. Azure Blueprints can be leveraged to allow an engineer or architect to sketch a project’s design parameters, define a repeatable set of resources that implements and adheres to an organization’s standards, patterns and requirements. It is a declarative way to orchestrate the deployment of various resource templates and other artifacts such as role assignments, policy assignments, ARM templates, and Resource Groups. Blueprint Objects are stored in the CosmosDB and replicated to multiple Azure regions. Since it is designed to setup the environment, it is different from resource provisioning. This package fits nicely into a CI/CD pipeline and handles both what should be deployed and the assignment of what was deployed.

2. Moving resources across regions is required by businesses to align to a region launch, align to services or resources specific to that region, or to align for proximity. Networking resources such as express route, vnet peering, gateway, edge routers etc and multi-tiered web applications, running in the cloud environment are particularly prone to migrations across regions. Steps to migrate involve planning downtime, ensuring subscription limits and quotas are met, assigning permissions, performing resource identification and such others as prerequisites. Then the components can be moved with the networking first, followed by the app, then followed by the PaaS services. Considerations include planning for complex infrastructure, moving resource types, moving all resources within an application together, ensuring capacity requirements are met, planning for business continuity, validating the migration, ensuring due diligence by testing and then moving to the target region.

3. Resource groups are created to group resources that share the same lifecycle. They have no bearing on the cost management of resources other than to help with querying. They can be used with tags to narrow down the interest. There is metadata stored about the resources and it is stored in a particular region. Resources can be moved from one resource group to another or even to another subscription. Finally, resource groups can be locked to prevent actions such as delete or write by users who have access.

Thursday, October 14, 2021

This is a continuation of an article that describes operational considerations for hosting solutions on Azure public cloud.

1. Blob rehydration to the archive tier can be for either hot or cool tier. There are two options for rehydrating a blob that is stored in the archive tier. A) One can copy an archived blob to an online tier using the reference of the blob or its URL. B) Or one can change the blob access tier to an online tier. It can rehydrate the archived blob to hot or cool by changing its tier. Rehydrating might take several hours but several of them can be done concurrently. Rehydration priority might also be set.

2. Virtual Network peering allows us to connect virtual networks in the same region or across regions as in the case of Global VNet Peering through the Azure Backbone network. When the peering is setup, traffic to the remote virtual network, traffic forwarded from the remote virtual network, virtual network gateway or Route server and traffic to the virtual network can be allowed by default.

3. Transaction processing in Azure is not on by default. A transactions locks and logs records so that others cannot use it, but it can be bound to partitions, enabled as distributed transactions and with two phase commit protocol. Transaction processing requires two communication steps for a resource manager and a response from the transaction coordinator which are costly for a datacenter in Azure. It does not scale as the number resource to calls expands as 2 resources – 4 network calls, 4 resources – 16 calls, 100 resource – 400 calls. Besides, the datacenter contains thousands of machines, failures are expected, and the system must deal with network partitions. Waiting for response from all resource managers has costly communication overhead.

4. Diagnostic settings to send platform logs and metrics to different destinations can be authored. Logs include Azure Activity logs and resource logs. Platform metrics are collected by default and stored in the Azure monitor metrics database. Each Azure resource requires its own diagnostic settings, and a single setting can define no more than one of each of the destinations. The available categories will vary for different resource types. The destinations for the logs could include the Log Analytics workspace, Event Hubs and Azure Storage. Metrics are sent automatically to the Azure Monitor Metrics. Optionally, settings can be used to send metrics to Azure monitor logs for analysis with other monitoring data using restricted queries. Multi-dimensional metrics (MDM) are not supported. They must be flattened

5. Legacy authentication to Azure AD can be blocked with conditional access which gives users’ easy access to the cloud apps. Azure Active Directory supports a broad variety of authentication protocols including legacy authentication but those protocols such as POP, SMTP, IMAP and MAPI cannot enforce MFA and create a vulnerability to the overall service. A conditional access policy blocks legacy authentication. The Azure portal shows Azure Active Directory Sign-ins where the client app column indicates those that use legacy authentication. Policies can then be set to block those applications directly or indirectly.

Wednesday, October 13, 2021

This is a continuation of an article that describes operational considerations for hosting solutions on Azure public cloud.

· Resources can be locked to prevent unexpected changes. A subscription, resource group or resource can be locked to prevent other users from accidentally deleting or modifying critical resources. The lock overrides any permissions the users may have. The lock level can be set to CannotDelete or ReadOnly with the ReadOnly being more restrictive. Lock inheritance can be applied at a parent scope, all resources within that scope can then inherit the same lock. Some considerations still apply after locking. For example, a CannotDelete lock on a storage account does not prevent data within that account to be deleted. A read only lock on an application gateway prevents you from getting the backend health of the application gateway because it uses POST. Only Owner and User Access Administrator role members are granted access to Microsoft.Authorization/locks/* actions.

· Azure KeyVaults can throttle client requests to help maintain optimal performance and reliability even though they take a high volume of concurrent calls to the Azure service. Failed requests return a 429 error code and the clients must exponentially backoff for retries. Caching the secrets retrieved from the Azure Key Vault in memory and reusing the secrets from memory mitigates the high load on a keyvault server. Encyrpt, wrap and verify public-key operations can be performed with no access to KeyVault, which not only reduces the risk of throttling, but also improves reliability. Prgrammatically, this can be done with the help of ServiceClientOptions when the corresponding client is initiated. The ServiceClientOptions takes a retry setting where a policy describing the delay, max delay, maxRetries and RetryMode can be specified.

· Legacy authentication to Azure AD can be blocked with conditional access which gives users’ easy access to the cloud apps. Azure Active Directory supports a broad variety of authentication protocols including legacy authentication but those protocols such as POP, SMTP, IMAP and MAPI cannot enforce MFA and create a vulnerability to the overall service. A conditional access policy blocks legacy authentication. The Azure portal shows Azure Active Directory Sign-ins where the client app column indicates those that use legacy authentication. Policies can then be set to block those applications directly or indirectly.

· End-to-End data driven workflows for data processing scenarios can be created using pipelines and activities in Azure Data Factory and Azure Synapse Analytics which can have one or more pipelines. A pipeline consists of a set of activities and helps to manage them as a set instead of each one individually. There are three groupings of activities: data movement activities, data transformation activities and control activities. An activity can take zero or more input datasets and produce one or more output datasets. The pipeline can be exported as JSON