Cluster computing

Thursday, October 21, 2021

This is a continuation of an article that describes operational considerations for hosting solutions on Azure public cloud.

Some of the best practices for Azure Container Registry include Network-close deployment, geo-replicated multi-region deployments, maximized pull performance, repository namespaces, dedicated resource group, individual and headless authentication and authorization and management of registry size.

When the registry is created in the same region as where the containers are deployed, the closeness of the registry to the host in terms of network helps lower latency and cost. Availability is improved with further enhancing the region to be zone redundant. Docker images have a layering construct which facilitates incremental deployments, but new nodes need to pull all layers defined in the dockerfile. Since there are many fetches, the network RTT matters to the design.

Multi-region deployments could leverage geo-replication which simplifies registry management and minimizes latency. It is also configured to use regional webhook which notifies us of events in specific replicas when images are pushed.

The pull-performance can be maximized by reducing the image size and the number of layers. The former is achieved by removing unnecessary layers and the use of multi-stage Docker build. Base images can be smaller when the alpine version is used. The number of layers should ideally be between 5-10.

The repository namespaces allow sharing a single registry with multiple groups within your organization. Nested namespaces support group isolation but a flat list of repositories is preferred.

Resource groups tie resource lifetimes. A Registry should reside in its own resource group. Azure container instances, on the other hand, can be created or deleted as necessary.

When an individual uses the registry, the preferred way to authenticate is to use “az acr login”. When a build and deployment pipeline authenticate, it can use a server principal.

The storage for container registry must align with a typical scenario, standard for most production applications and premium for improved performance and geo-replication.

An Azure Function helps to create and delete the Container Instance in the time needed or get the state or message from a container instance.

Some like to use tag as ‘Latest’ when pulling images but using a specific version eliminates uncertainty and falls back on tried and tested deployments.

Wednesday, October 20, 2021

This is a continuation of an article that describes operational considerations for hosting solutions on Azure public cloud.

1. Azure (global) supports FileStorage. Azure Stack Hub does not support.

2. Azure (global) supports General Purpose V1, V2, and Blob storage accounts while Azure Stack Hub is general-purpose v1 only. Prefer V2 because it provides Blob, queue, file and table storage with LRS, GRS, RA-GRS redundancy options

3. Azure Resource Manager provides options for moving a resource to a different subscription or a different resource group. ARM deployments come with full-fledged functionality, scalability and security.

4. If a storage account must be moved, then it must first be copied and then AzCopy can be used to move the data.

5. When an application is migrated to Azure, its storage can remain the same format as before. For example, if file storage was used, NFSv3 can continue to be used by using Azure Storage V2.

6. General-purpose v2 accounts deliver the lowest per-gigabyte capacity prices for Azure Storage, as well as industry-competitive transaction prices. General-purpose v2 accounts support default account access tiers of hot or cool and blob level tiering between hot, cool, or archive.

7. Archive storage tier does not provide immediate data access. That can take hours to rehydrate. If we need immediate access, change the access tier to hot or cool. A v1 storage account can be upgraded to either hot or cool storage tier.

8. Costs for storage tier is based on amount of data stored depending on the access tier, the data access cost, the transaction cost, the geo-replication data transfer cost, the outbound data transfer cost, and the changing storage access tier. The primary access pattern for the blob storage in terms of reads and writes and their comparisons determines the cost savings. All storage accesses can be monitored, and metrics emitted include capacity costs, transaction costs, and data transfer costs.

9. Elastic pools can help manage and scale multiple databases in Azure SQL Database. Traditionally, there were two options: over-provision resources based on peak usage and overpay, or Under-provision to save cost, at the expense of performance and customer satisfaction during peaks. Elastic pools solve this problem by ensuring that databases get the performance resources they need when they need it. They provide a simple resource allocation mechanism within a predictable budget

10. Conditional access policies can be leveraged on Azure resources to enforce criteria from security standpoint. For example, if we wanted to configure MFA on all user authentication, we would set it on the Azure AD as the MFA conditional access. Common conditional access policies involve blocking legacy authentications and requiring MFA for all users.

11. On the other hand, the Privileged Identity Management is a service from Azure AD that can help with management, control and monitoring of important resources in the organization. It provides just in time privileged access to Azure resources such as storage accounts, assigns time-bound access and requires approval. It can also enforce MFA to activate any role.

12. B2B collaboration can be setup with Azure AD. External users can be invited as a guest but they must authenticate against their home organization so they can’t have access as a guest if they no longer have access to their home organization.

13. Securing privileged access for hybrid and cloud deployments in Azure AD requires changes to processes as well as resources such as the use of host defenses, user account protections and identity management.

Tuesday, October 19, 2021

This is a continuation of an article that describes operational considerations for hosting solutions on Azure public cloud.

1. When an application is migrated to Azure, its storage can remain the same format as before. For example, if file storage was used, NFSv3 can continue to be used by using Azure Storage V2.

2. General-purpose v2 accounts deliver the lowest per-gigabyte capacity prices for Azure Storage, as well as industry-competitive transaction prices. General-purpose v2 accounts support default account access tiers of hot or cool and blob level tiering between hot, cool, or archive.

3. Archive storage tier does not provide immediate data access. That can take hours to rehydrate. If we need immediate access, change the access tier to hot or cool. A v1 storage account can be upgraded to either hot or cool storage tier.

4. Costs for storage tier is based on amount of data stored depending on the access tier, the data access cost, the transaction cost, the geo-replication data transfer cost, the outbound data transfer cost, and the changing storage access tier. The primary access pattern for the blob storage in terms of reads and writes and their comparisons determines the cost savings. All storage accesses can be monitored, and metrics emitted include capacity costs, transaction costs, and data transfer costs.

5. Elastic pools can help manage and scale multiple databases in Azure SQL Database. Traditionally, there were two options: over-provision resources based on peak usage and overpay, or Under-provision to save cost, at the expense of performance and customer satisfaction during peaks. Elastic pools solve this problem by ensuring that databases get the performance resources they need when they need it. They provide a simple resource allocation mechanism within a predictable budget

6. ExpressRoute, VPN Gateway and Virtual network peering provide different levels of functionality. If we want private site-to-site connectivity, we can use ExpressRoute. If we want secure site-to-site VPN connectivity, we can use virtual network site-to-site connection. If we want secure point to site connectivity, we can use virtual network point-to-site connection. We must have a private site-to-site connectivity, a secure site to site VPN connectivity, and a secure point to site connectivity only in that order.

Monday, October 18, 2021

This is a continuation of an article that describes operational considerations for hosting solutions on Azure public cloud.

1. When an IPSec VPN (site-to-site) or Express Route (private peering) is used, the configuration for the self-hosted Integrated runtime varies. In the site-to-site, the command channel and the data channel from the self-hosted integrated runtime crosses the Azure virtual network to reach the Data Factory and the Azure managed storage services respectively. With private peering, the data channel is entirely within the Azure Virtual network in which the self-hosted integration runtime runs.

2. Windows Firewall runs as a daemon on the local machine in which the self-hosted integration runtime is installed. The outbound port and domain requirements for corporate firewalls could be listed. These do not include the rules for the self-hosted integration runtime. The outbound port 443 must be opened for the self-hosted integration runtime to make internet connections. The inbound port 8060 must be opened only at the machine level. IP configurations and allow lists can be setup in data stores.

3. Multi-region clusters increase resiliency. This architecture builds on the AKS Baseline architecture where AD pod identity, ingress and egress restrictions, resource limits and other secure AKS infrastructure configurations are described. Each cluster is deployed in a separate Azure region and traffic is routed through all regions Even if one region becomes unavailable, traffic is routed through another that is closest to the user who issued the request. A regional hub-spoke network pair are deployed for each regional AKS instance. Azure Firewall manager policies are used to manage firewall policies across all regions. Azure FrontDoor is used to load balance and route traffic to a regional Azure application gateway instance designated for each AKS cluster. A single Azure container registry is used for all Kubernetes instances in the cluster.

4. Multitenant SaaS is excellent for running solutions that can be unbranded and marketed to other businesses. It adds an entire new revenue stream for a company. But the operational aspects of running this service is very different from that of a web application. The architecture for hosting this involves creating multiple resource groups. All users access resources through the Azure Front Door that has integration with both the Azure DNS and the Azure Active Directory. In each resource group, an application gateway routes traffic to multiple app services that are all hosted on the infrastructure provided by a layer of Azure Kubernetes service.

5. It is always good to spot check an AKS cluster against the current recommended Azure best practices. For example, the AKS baseline cluster architecture brings the best in terms of availability and protection. In addition, the AKS workloads can be effectively managed by designating proper control on requests and imposing limits. Setting the scale out of containers and the use of proxies, load balancers and ingress contribute to the best practices.

6. High availability can be improved with availability zones. Using an architecture that uses redundant resources spread across zones to provide high resilience. Most of the resources are actively used because they serve the requests. Some backend services or stores such as the relational store might have redundancy and used only when the active ones fail. The use of Availability zones significantly improves the IaaS which is critical to the hosting of web applications that are not managed instances in the cloud. Therefore, using zonal and zone-redundant architecture is specifically useful to the Azure public cloud.

7. Identity is a necessary investment for any software application and service hosted in the public cloud. The right choices can endear the software to its users. Seamless integration and SSO enables applications and services to work together with the same notion of user. Creating separate Active Directory domain in Azure is required in Azure that is trusted by domains in the on-premises AD forest is a significant step in this direction.

Sunday, October 17, 2021

This is a continuation of an article that describes operational considerations for hosting solutions on Azure public cloud.

Saturday, October 16, 2021

This is a continuation of an article that describes operational considerations for hosting solutions on Azure public cloud.

1. Efficient Docker image deployment for intermittent low bandwidth connectivity scenarios requires the elimination of docker pulling of images. An alternative deployment mechanism can compensate for the restrictions by utilizing an Azure Container Registry, Signature Files, a fileshare, an IOT hub for pushing manifest to devices. The Deployment path involves pushing image to device which is containerized. The devices can send back messages which are collected in a device-image register. An image is a collection of layers where each layer represents a set of file-system differences and stored merely as folders and files. A SQL database can be used to track the state of what’s occurring on the target devices and the Azure based deployment services which helps with both during and after the deployment process.

2. Data from an on-premise SQL Server can be used in Azure Synapse that transforms the data for analysis. This would involve an ELT pipeline that converts the data into storage blobs which can then be ready by Azure Synapse for analysis and visualization. The Analysis stack involving PowerBI can be integrated with Azure active directory to allow only the members of the organization to sign in and view the dashboards. Analysis services support tabular models but not multi-dimensional models. Multi-dimensional models use OLAP constructs like cubes, dimensions and measures which are better analyzed with SQL Server Analysis services.

3. Image Processing is one of the core cognitive services provided by Azure. Companies can eliminate the need for managing individual or proprietary servers and leverage the industry standard with the use of Compute Vision API, Azure Grid to collect images and Azure Functions to leverage the Vision APIs for making analysis or predictions. The blob storage must trigger an Event grid notification that is sent to the Azure Function, and this makes an entry in the CosmosDB to persist the results of the analysis along with the image metadata. The database can autoscale but Azure Functions has a limit of about 200 instances.

4. A content-based recommendation uses information about the items to learn customer preferences and recommends items that share properties with items that a customer has previously interacted with. Azure Databricks can be used to train a model that predicts the probability a user will engage with an item. The model can then be deployed as a prediction service hosted on Azure Kubernetes service. MMLSpark library enables training a LightGBM classifier on Azure Databricks to predict the click probability. Azure ML is used to create a Docker image in the Azure container registry that holds the image with scoring scripts and all necessary dependencies for serving predictions. Azure ML is also used to provision the compute for serving predictions using Azure Kubernetes Service clusters. A cluster with ten standard L8s VMs can handle millions of records. The scoring service must run separately on each node in the Kubernetes cluster. The training can be handled independently from the production deployment.

5. Availability Zones can be used to spread a solution across multiple zones within a region allowing for applications to function even when one zone fails. For example, the VM uptime service level agreement can reach 99.99% because it eliminates single points of failure. Availability zone also have low latency and come at no cost as compared to the deployments that span region. Designing solutions that continue to function despite failure is key to improving the reliability of the solution. Zonal deployments can be specific to a zone to achieve more stringent latency or performance requirements while zone-redundant deployments make no distinction between the zones.

Friday, October 15, 2021

This is a continuation of an article that describes operational considerations for hosting solutions on Azure public cloud.

1. Azure Blueprints can be leveraged to allow an engineer or architect to sketch a project’s design parameters, define a repeatable set of resources that implements and adheres to an organization’s standards, patterns and requirements. It is a declarative way to orchestrate the deployment of various resource templates and other artifacts such as role assignments, policy assignments, ARM templates, and Resource Groups. Blueprint Objects are stored in the CosmosDB and replicated to multiple Azure regions. Since it is designed to setup the environment, it is different from resource provisioning. This package fits nicely into a CI/CD pipeline and handles both what should be deployed and the assignment of what was deployed.

2. Moving resources across regions is required by businesses to align to a region launch, align to services or resources specific to that region, or to align for proximity. Networking resources such as express route, vnet peering, gateway, edge routers etc and multi-tiered web applications, running in the cloud environment are particularly prone to migrations across regions. Steps to migrate involve planning downtime, ensuring subscription limits and quotas are met, assigning permissions, performing resource identification and such others as prerequisites. Then the components can be moved with the networking first, followed by the app, then followed by the PaaS services. Considerations include planning for complex infrastructure, moving resource types, moving all resources within an application together, ensuring capacity requirements are met, planning for business continuity, validating the migration, ensuring due diligence by testing and then moving to the target region.

3. Resource groups are created to group resources that share the same lifecycle. They have no bearing on the cost management of resources other than to help with querying. They can be used with tags to narrow down the interest. There is metadata stored about the resources and it is stored in a particular region. Resources can be moved from one resource group to another or even to another subscription. Finally, resource groups can be locked to prevent actions such as delete or write by users who have access.