Cluster computing

Sunday, October 31, 2021

This is a continuation of the article introduced here: https://1drv.ms/w/s!Ashlm-Nw-wnWhKdmXP_0_c--oqotlA?e=KDRRVf. It describes the design patterns for considerations in hosting solutions on Azure public cloud.

1. Queue based load leveling – this uses a queue that acts as a buffer between a task and a service that it calls so that intermittent heavy workloads can be staged and smoothly processed.

2. Retry - Enable an application to handle anticipated, temporary failures when it tries to connect to a service or network resource by transparently retrying an operation that's previously failed.

3. Scheduler agent supervisor - Coordinate a set of actions across a distributed set of services and other remote resources.

4. Sequential convoy - Process a set of related messages in a defined order, without blocking processing of other groups of messages.

5. Sharding - Divide a data store into a set of horizontal partitions or shards.

6. Sidecar - Deploy components of an application into a separate process or container to provide isolation and encapsulation.

7. Static content hosting - Deploy static content to a cloud-based storage service that can deliver them directly to the client.

8. Strangler fig - Incrementally migrate a legacy system by gradually replacing specific pieces of functionality with new applications and services.

9. Throttling - Control the consumption of resources used by an instance of an application, an individual tenant, or an entire service.

10. Valet Key - Use a token or key that provides clients with restricted direct access to a specific resource or service.

11. Minimal design – use what is always necessary and no more.

The top ten design principles cited in the documentation for Azure services are:

1. Designing for self-healing when failures occur

2. Making all things redundant so that there is no single point of failure.

3. Minimizing co-ordination between application services to achieve scalability.

4. Designing to scale out horizontally adding or removing instances as demand rises.

5. Partitioning around limits such that it works around database, network or compute limits.

6. Designing for operations – so that there are enough tools for them to use.

7. Using managed services – to leverage Platform as a service rather than Infrastructure-as-a-service

8. Using the best data store for the job so that the data fits

9. Designing for evolution so that application changes are easy

10. Building for the needs for the business where every decision is driven by business requirements.

Saturday, October 30, 2021

1. Competing Consumers – This enables multiple concurrent consumers to process messages received on the same messaging channel. It addresses specific challenges in the messaging category.

2. Compute Resource Consolidation - This consolidates multiple tasks or operations into a single computational unit. It is widely applicable to many implementations.

3. CQRS – These segregates operations that read data from those that update data by directing them to different interfaces. This helps with performance and efficiency.

4. Deployment stamps – These deploy multiple independent copies of application components, including data stores.

5. Event sourcing – This uses an append only store to record the full series of events that describe the actions taken on data in a specific domain.

6. External configuration store – This moves configuration information out of the application deployment package to a centralized location.

7. Federated Identity – This delegates authentication to an external identity provider.

8. Gatekeeper – This protects applications and services by using a dedicated host instance as a man-in-the-middle position. The host brokers services to a client, validates and sanitizes requests and passes requests and data between them.

9. Gateway Aggregation - This uses a gateway to aggregate multiple individual requests into a single request.

10. Gateway offloading – The shared or specialized service functionality is offloaded to gateway proxy.

11. Gateway Routing – This routes requests to multiple services using a single endpoint.

12. Geodes – This deploys backend services into a set of geographical nodes. Each node can service any client request in any region.

13. Health Endpoint Monitoring – This is crucial for external tools to continually monitor the health of a service through an exposed endpoint.

14. Index table – This creates indexes over the fields in the data stores which comes useful for querying.

15. Leader election – This coordinates the actions performed by a collection of collaborating task instances in a distributed application by electing one instance as the leader that assumes responsibility for managing the other instances.

16. Materialized view – This generates prepopulated views over the data in one or more data stores so that the query for the view does not need to be run again and the view is available to subsequent queries.

17. Pipes and filters – these separate the execution into stages that form a series. Those stages become reusable.

18. Priority queue – These requests are sent to services so that the requests with a higher priority are received and processed before others.

19. Publisher/Subscriber - This enables an application to produce events for interested consumers without coupling the producer to the consumers. It also allows fan-out of events to multiple parties.

20. Queue based load leveling – this uses a queue that acts as a buffer between a task and a service that it calls so that intermittent heavy workloads can be staged and smoothly processed.

https://1drv.ms/w/s!Ashlm-Nw-wnWhKdun2-8IUJ19Or-Og

Sample python implementation:

#! /usr/bin/python

def determining_root_cause_for_api(failures):

Return cluster_centroids_of_top_clusters(failures)

def batch_cluster_repeated_pass(api, failures)

api_cluster = classify(api, failures)

proposals = gen_proposals(api_cluster)

clusters = [(FULL, api_cluster)]

For proposal in proposals:

Cluster = get_cluster_proposal(proposal, api, failures)

clusters += [(proposal, cluster)]

Selections = select_top_clusters(clusters)

Return api_from(selections)

Def select_top_clusters(threshold, clusters, strategy = goodness_of_fit):

return clusters_greater_than_goodness_of_fit_weighted_size(threshold, clusters)

Friday, October 29, 2021

This is a continuation of an article that describes operational considerations for hosting solutions on Azure public cloud.

These are some of the cloud design patterns that are useful for building reliable, scalable, secure applications in the cloud. Some of the challenges encountered for which these design patterns hold include:

Data Management: This is a key element of the cloud application. Data is hosted in different locations and across multiple servers for reasons such as performance, scalability or availability, and this can present a range of challenges. For example, data consistency must be maintained and data will typically need to be synchronized or replicated across different locations.

Design and Implementation: Good design comes with consistency and coherence in component design and deployment. It simplifies administration and deployment. Components and subsystems get reused from scenario to scenario.

Messaging: A messaging infrastructure connects services so that they have flexibility and scalability. It is widely used and provides benefits such as retries, dead letter queue and ordered delivery but it also brings challenges such as services not reacting to changes in message schema, idempotency of operations and the periods of inactivity when the messages are not delivered.

The patterns used to resolve these kinds of problems include the following:

1. Ambassador patterns that create helper services and send network requests on behalf of a consumer service or application.

2. Anti-corruption layer that implements a façade between a modern application and a legacy system.

3. Asynchronous request-reply where the backend processing is decoupled from a frontend host, or it needs to time but the front end still needs a clear response

4. Backends for Frontends: where microservices provide the core functionalities that the frontend requires

5. Bulkhead: where elements of an application are isolated into pools so that if one fails, the others will continue

6. Cache-Aside: this loads data on demand into a cache from a data store.

7. Choreography: where each service decides when and how a business operation is processed and eliminates the need for an orchestrator

8. Circuit breaker: which handles fault that takes variable amount of time to fix when connecting to remote resources.

9. Claim Check: which splits a large message into claim checks and payload to avoid overwhelming a message bus

10. Compensating Transaction: which undoes the work performed by a series of steps that together define an eventually consistent operation.

Thursday, October 28, 2021

This is a continuation of the article on the change-feed for Cosmos DB. Specifically, it discusses document versioning support. If we treat each item in CosmosDB as a standalone document, then it is subject to the same versioning principles as a blob. The versioning for blobs automatically maintains previous versions of an object. When blob versioning is enabled, it can restore an earlier version of a blob to recover data to recover the data if it is accidentally deleted or modified. If a blob is edited, it copies on write so a new version is created. Since each write operation creates a new version, it is possible to revert to earlier versions. Similarly, an older version of the data can be promoted to create a new version. Each version maintains a new identifier. A blob can have only one current version at a time. Blob versions are immutable. The content or metadata of an existing blob pertaining to its version. If there are many versions for a blob, it will tend to increase the latency for listing operations. Fewer than a thousand versions are preferable for a blob. Old versions can be deleted automatically.

In the case of a blob, the version identifier is the timestamp at which the blob was updated. The version ID is assigned at the time that the version is created. Read or write operations can target a specific version given the version id. If it is omitted, the current version is used instead. The x-ms-version-id header in the http responses holds this identifier.

Versioning is not enabled on a per blob basis. It is set at the account level. Prior to versioning being enabled on the account level, a blob in that account does not have a version. When the versioning is enabled, all write operations create a new version except for the put block operation.

When the delete operation is called, the current version becomes a previous version and there is no current version anymore. All the previously existing versions are preserved.

Blob versioning can be enabled or disabled. When it is disabled, no new versions are subsequently created. Any existing versions remain accessible.

Blob versioning is frequently used with soft delete which protects a blob, snapshot or version from accidental deletes or overwrites by maintaining the deleted data in the system for a specified period. During this time, a soft-deleted object can be restored to its original before the delete was issued. After the expiry of the retention period, the object is permanently deleted. This can be enabled or disabled at the container level. Attempting to delete a soft-deleted object does not affect its expiry time.

The API support for versioning and soft-delete is only available in later versions rather than the earlier versions. If a blob has snapshots, the blob cannot be deleted without deleting the base blob. No new snapshots are created. Soft-delete objects are invisible unless they are called out for displaying or listing.

Wednesday, October 27, 2021

Versioning stored documents in Azure Cosmos DB

Introduction

History of data is often important as much as the data itself. For example, Finance, healthcare and insurance industries often track histories of portions of the data for audit purposes, and reporting. CosmosDB forms the storage layer for many microservices in Azure. This article explains the ‘change-feed’ feature associated with this storage.

Description:

CosmosDB exposes an API for the underlying log of changes regarding the documents in its collection. For users familiar with the SQL Server relational store, this is the equivalent of the change data capture. The changes are recorded incrementally and can be distributed across one or more consumers for parallel processing, enabling a variety of applications. The change feed works for updates and other forms of writes but not deletions. Usually only the most recent change is available. Intermediate changes are not visible.

The change feed is not targeted at solving all the versioning requirements from the CosmosDB store. That requires a Document Versioning Pattern which involves the following:

1. Intent – This ensures that each entity in collections, when updated maintains the history of changes.

2. Motivation – This tracks the history of entities throughout their lifecycle

3. Applicability – This covers the usages such as auditing, reporting and analysis

4. Structure – In order to keep the state of the objects, every update must be turned into an append operation.

5. Participants - A materialized view is made possible with the change feed

6. Consequences – This should work for short and long histories. If it suffers performance degradation, it might not apply to all use cases for versioning.

Change feed allows the use of a “soft marker” on the items for the updates and the filter based on that when the processing items in the change feed. This enables the recording of deletes since deletes are not supported. Inserts and updates are recorded by the change feed automatically.

Change feed items come in the order of their modification time. This sort order is guaranteed per logical partition key.

In a multi-region Azure Cosmos DB account, the failover of a write region will be supported where the change feed will work across the manual failover operation and will remain contiguous.

Conclusion:

This approach solves the capture of data changes for its applicability to auditing, reporting and analysis.

Tuesday, October 26, 2021

This is a continuation of an article that describes operational considerations for hosting solutions on Azure public cloud.

This articles focuses on support for containers and Kubernetes in Azure.

Compute requirement of a modern cloud app typically involves load balanced compute nodes that operate together with control nodes and databases.

VM Scale sets provide scale, customization, availability, low cost and elasticity.

VM scale sets in Azure resource manager generally have a type and a capacity. App deployment allow VM extension updates just like OS updates.

Container infrastructure layering allows even more scale because it virtualizes the operating system. While traditional virtual machines enable hardware virtualization and hyper V’s allow isolation plus performance, containers are cheap and barely anything more than just applications.

Azure container service serves both linux and windows container services. It has standard docker tooling and API support with streamlined provisioning of DCOS and Docker swarm.

Azure is an open cloud because it supports open source infrastructure tools such as Linux, ubuntu, docker, etc. layered with databases and middleware such as hadoop, redis, mysql etc., app framework and tools such as nodejs, java, python etc., applications such as Joomla, drupal etc and management applications such as chef, puppet, etc. and finally with devops tools such as jenkins, Gradle, Xamarin etc.

Job based computations use larger sets of resources such as with compute pools that involve automatic scaling and regional coverage with automatic recovery of failed tasks and input/output handling.

Azure involves a lot of fine grained loosely coupled micro services using HTTP listener, Page content, authentication, usage analytic, order management, reporting, product inventory and customer databases.

Efficient Docker image deployment for intermittent low bandwidth connectivity scenarios requires the elimination of docker pulling of images. An alternative deployment mechanism can compensate for the restrictions by utilizing an Azure Container Registry, Signature Files, a fileshare, an IOT hub for pushing manifest to devices. The Deployment path involves pushing image to device which is containerized. The devices can send back messages which are collected in a device-image register. An image is a collection of layers where each layer represents a set of file-system differences and stored merely as folders and files. A SQL database can be used to track the state of what’s occurring on the target devices and the Azure based deployment services which helps with both during and after the deployment process.

Resource groups are created to group resources that share the same lifecycle. They have no bearing on the cost management of resources other than to help with querying. They can be used with tags to narrow down the interest. There is metadata stored about the resources and it is stored in a particular region. Resources can be moved from one resource group to another or even to another subscription. Finally, resource groups can be locked to prevent actions such as delete or write by users who have access.

As with its applicability to many deployments, Azure Monitor provides tremendous insight into operations of Azure Resources. It is always recommended to create multiple application insights resources and usually one per environment. This results in better separation of telemetry,alerts, workitems, configurations and permissions. Limits are spread such as web test count, throttling, data allowance etc and it also helps with cross-resource queries.

Monday, October 25, 2021

This is a continuation of an article that describes operational considerations for hosting solutions on Azure public cloud.

Multiple-choice questions in certification examinations are quite costly to make a mistake because they go beyond the cursory knowledge on the Azure resources. We recap just a few of the storage related questions from a recent test.

1. The storage-based questions are somewhat easier to answer because they apply to a lot of common use cases. Some attention to limits imposed on different types of storage, their access polices, tiers, and retention period will go a long way in getting the answers right. Familiarity with hot, cool and archive tiers are tested by their use cases. Access control policy enforcement and cost management apply just as much they do for all Azure resources. Redundancy and availability are special considerations. Geo-replication is a hot topic.

2. Hot, cool and archive access tiers for blob data are optimized for access patterns. The hot tier has the highest storage cost but the lowest access cost. The cool tier stores for a minimum of 30 days and the archive tier for a minimum of 180 days. The archive tier is an offline tier for storing data with data rehydration available on standard and priority basis. The storage capacity costs Hot tier can be set to cool tier or archive tier and cool tier can be set to archive tier. If a blob is moved from the archive tier to the hot tier, it will be moved back to the archive tier by the lifecycle management engine. End-to-End latency and server latency are both available for block blobs.

3. Azure storage events allow application to react to events such as the creation and deletion of blobs. They are pushed using Azure Event Grid to subscribers such as Azure Functions, Azure Logic Applications or even to the http listener. Blob storage events schema defines Microsoft.Storage.BlobCreated, BlobDeleted, BlobTierChanged and AsyncOperationInitiated.

4. Network File System (NFS) 3.0 protocol is supported in Azure Blob Storage. Mounting a storage account container involves creating an Azure Virtual Network (VNet) and configuring network security to allow traffic to and from the storage account container via the VNet. Azurite open-source emulator can be used for local development environment.

5. Azure (global) supports General Purpose V1, V2, and Blob storage accounts while Azure Stack Hub is general-purpose v1 only. V2 is preferred because it provides Blob, queue, file and table storage with LRS, GRS, RA-GRS redundancy options

6. Costs for storage tier is based on amount of data stored depending on the access tier, the data access cost, the transaction cost, the geo-replication data transfer cost, the outbound data transfer cost, and the changing storage access tier. The primary access pattern for the blob storage in terms of reads and writes and their comparisons determines the cost savings. All storage accesses can be monitored, and metrics emitted include capacity costs, transaction costs, and data transfer costs.