Cluster computing

Saturday, October 16, 2021

This is a continuation of an article that describes operational considerations for hosting solutions on Azure public cloud.

1. Efficient Docker image deployment for intermittent low bandwidth connectivity scenarios requires the elimination of docker pulling of images. An alternative deployment mechanism can compensate for the restrictions by utilizing an Azure Container Registry, Signature Files, a fileshare, an IOT hub for pushing manifest to devices. The Deployment path involves pushing image to device which is containerized. The devices can send back messages which are collected in a device-image register. An image is a collection of layers where each layer represents a set of file-system differences and stored merely as folders and files. A SQL database can be used to track the state of what’s occurring on the target devices and the Azure based deployment services which helps with both during and after the deployment process.

2. Data from an on-premise SQL Server can be used in Azure Synapse that transforms the data for analysis. This would involve an ELT pipeline that converts the data into storage blobs which can then be ready by Azure Synapse for analysis and visualization. The Analysis stack involving PowerBI can be integrated with Azure active directory to allow only the members of the organization to sign in and view the dashboards. Analysis services support tabular models but not multi-dimensional models. Multi-dimensional models use OLAP constructs like cubes, dimensions and measures which are better analyzed with SQL Server Analysis services.

3. Image Processing is one of the core cognitive services provided by Azure. Companies can eliminate the need for managing individual or proprietary servers and leverage the industry standard with the use of Compute Vision API, Azure Grid to collect images and Azure Functions to leverage the Vision APIs for making analysis or predictions. The blob storage must trigger an Event grid notification that is sent to the Azure Function, and this makes an entry in the CosmosDB to persist the results of the analysis along with the image metadata. The database can autoscale but Azure Functions has a limit of about 200 instances.

4. A content-based recommendation uses information about the items to learn customer preferences and recommends items that share properties with items that a customer has previously interacted with. Azure Databricks can be used to train a model that predicts the probability a user will engage with an item. The model can then be deployed as a prediction service hosted on Azure Kubernetes service. MMLSpark library enables training a LightGBM classifier on Azure Databricks to predict the click probability. Azure ML is used to create a Docker image in the Azure container registry that holds the image with scoring scripts and all necessary dependencies for serving predictions. Azure ML is also used to provision the compute for serving predictions using Azure Kubernetes Service clusters. A cluster with ten standard L8s VMs can handle millions of records. The scoring service must run separately on each node in the Kubernetes cluster. The training can be handled independently from the production deployment.

5. Availability Zones can be used to spread a solution across multiple zones within a region allowing for applications to function even when one zone fails. For example, the VM uptime service level agreement can reach 99.99% because it eliminates single points of failure. Availability zone also have low latency and come at no cost as compared to the deployments that span region. Designing solutions that continue to function despite failure is key to improving the reliability of the solution. Zonal deployments can be specific to a zone to achieve more stringent latency or performance requirements while zone-redundant deployments make no distinction between the zones.

Cluster computing

Saturday, October 16, 2021

No comments:

Post a Comment