Cluster computing

Tuesday, November 2, 2021

This article is a continuation of a series of articles on Azure Cloud architecture, design patterns, technology choices and migration guidance. Specifically, it discusses Cloud Adoption Framework. This framework discusses the Cloud Adoption Framework.

The cloud adoption framework brings it together for the cloud adoption journey. It involves:

1. Getting Started: A great starting point aligns with an existing scenario for which the starting guides have already been published. For example, an organization trying to accomplish a certain goal might want to choose the cloud adoption scenario that best supports their strategy, examine antipatterns across methodologies and their solutions, align foundational concepts to onboard a person, project, or team, adopt the cloud to deliver business and technical outcomes sooner, improve controls to ensure proper operations of the cloud or establish teams to support adoption and operations. Specific cloud adoption scenarios include hybrid and multi-cloud adoptions, modern application platform scenario, SAP adoption scenario, desktop virtualization, and cloud adoption for the retail industry. Cloud adoption requires technical change, but it is never restricted to IT. Other teams might want to migrate existing workloads to the cloud, build new products and services in the cloud, or might want to be unblocked by environment design and configuration. A solid operating model improves controls. The Azure Advisor fits this role.

2. Strategy: A cloud adoption strategy describes it to the cloud technicians and makes a case to the business stakeholders. This is efficiently done only when it encompasses the following a. the motivations are defined and documented, b. the business outcomes are documented, c. the financial considerations are evaluated, and d. the technical considerations are understood. There is a published strategy-and-plan-template that builds out the cloud adoption strategy and it helps to capture the output of each of these steps.

3. Plan: The cloud adoption framework must come with a plan that translates the goals from the strategy document to something that can be executed in the field. The collective cloud teams must put together Specific-Measurable-Achievable-Relevant-and-Timebound action items that capture the prioritized tasks to drive adoption efforts and maps to the metrics and motivations defined in the cloud adoption framework. This includes a. digital estate, b. initial organizational alignment, c. skills readiness plan, and d. cloud adoption plan.

4. Readiness: Before a plan is enacted, some preparation is required. In this case, the following exercises guide us through the process of creating a landing zone to support cloud adoption. These include: a setup guide that familiarizes tools and processes involved, a landing zone that establishes code based starting point for the infrastructure and environment, its subsequent expansion to meet the platform requirements specified in the plan and finally, the best practices, that validates landing zone modifications against the best practices to ensure the best configurations.

Monday, November 1, 2021

This is a continuation of the article introduced here: https://1drv.ms/w/s!Ashlm-Nw-wnWhKdsT81vgXlVl38AfA?e=qzxyuJ. Specifically, it discusses the technology choices from the Application Architecture guide for Azure Public Cloud.

1. Choosing a candidate service:

a. if full control of the compute is required, a virtual machine or its scaleset is appropriate.

b. If it has HPC workload, Azure Batch is helpful.

c. If it has a microservice architecture, it has an Azure App Service.

d. If it has an event-driven workload with short-lived processes. Azure functions suit the task.

e. If a full-fledged orchestration is required, Azure Container Instances are helpful.

f. If a managed service is needed with .Net framework, Azure Service Fabric is helpful.

g. If Spring boot applications are required, Azure Spring Cloud is helpful.

h. If RedHat Openshift is required, Azure RedHat Openshift is dedicated to this purpose.

i. If a managed infrastructure is required, Azure Kubernetes Service does the job.

There are two ways to migrate on-premises compute to the cloud:

The first involves the ‘lift and shift’ pattern which is a strategy for migrating a workload to the cloud without the redesigning ans is also called ‘rehosting’ pattern.

The second involves refactoring an application to take advantage of the cloud native features.

2. Microservices architecture can have further options for deployment. There are two approaches here: The first involves a service orchestrator that manages services running on dedicated nodes (VMs) and the second involves a serverless architecture using Functions-as-a-service (FaaS). When Microservices are deployed as binary executables aka Reliable Services, the Reliable Services Programming Model makes use of Service Fabric Programming APIs to query system, report health, receive notifications on configuration and code changes, and discover other services. This is tremendously advantageous for building stateful services using so called Reliable Collections. AKS and Mesophere provide alternative infrastructures for deployment.

3. The Kubernetes at the edge compute option is excellent to keep operational costs low, easily configure and deploy a cluster, find flexibility with existing infrastructure at the edge, and for running a mixed node cluster with both Linux and Windows nodes. Out of the three options for leveraging Kubernetes with Baremetal, K8s on Azure Stack edge and AKS on HCI, the last option is the easiest to work with.

4. Choosing an identity service involves comparing the options for self-managed Active Directory services, Azure Active Directory, and managed Azure Active Directory Domain Services. Azure Active Directory does away with one’s own directory services and leveraging the cloud provided one instead.

5. Choosing a data store is not always easy. The term polyglot persistence is used to describe solutions that use a mix of data store technologies. Therefore, it is important to understand the main storage models and their tradeoffs.

6. Choosing an analytical data store is not always about Big Data or lambda architectures with its incremental data processing speed serving layer and batch processing layer, although they both require it. The driving force varies on a case-by-case basis.

7. Similarly, AI/ML services can be leveraged directly from the Azure Cognitive Services portfolio, but its applicability varies on a case-by-case basis as well.

Sunday, October 31, 2021

This is a continuation of the article introduced here: https://1drv.ms/w/s!Ashlm-Nw-wnWhKdmXP_0_c--oqotlA?e=KDRRVf. It describes the design patterns for considerations in hosting solutions on Azure public cloud.

1. Queue based load leveling – this uses a queue that acts as a buffer between a task and a service that it calls so that intermittent heavy workloads can be staged and smoothly processed.

2. Retry - Enable an application to handle anticipated, temporary failures when it tries to connect to a service or network resource by transparently retrying an operation that's previously failed.

3. Scheduler agent supervisor - Coordinate a set of actions across a distributed set of services and other remote resources.

4. Sequential convoy - Process a set of related messages in a defined order, without blocking processing of other groups of messages.

5. Sharding - Divide a data store into a set of horizontal partitions or shards.

6. Sidecar - Deploy components of an application into a separate process or container to provide isolation and encapsulation.

7. Static content hosting - Deploy static content to a cloud-based storage service that can deliver them directly to the client.

8. Strangler fig - Incrementally migrate a legacy system by gradually replacing specific pieces of functionality with new applications and services.

9. Throttling - Control the consumption of resources used by an instance of an application, an individual tenant, or an entire service.

10. Valet Key - Use a token or key that provides clients with restricted direct access to a specific resource or service.

11. Minimal design – use what is always necessary and no more.

The top ten design principles cited in the documentation for Azure services are:

1. Designing for self-healing when failures occur

2. Making all things redundant so that there is no single point of failure.

3. Minimizing co-ordination between application services to achieve scalability.

4. Designing to scale out horizontally adding or removing instances as demand rises.

5. Partitioning around limits such that it works around database, network or compute limits.

6. Designing for operations – so that there are enough tools for them to use.

7. Using managed services – to leverage Platform as a service rather than Infrastructure-as-a-service

8. Using the best data store for the job so that the data fits

9. Designing for evolution so that application changes are easy

10. Building for the needs for the business where every decision is driven by business requirements.

Saturday, October 30, 2021

1. Competing Consumers – This enables multiple concurrent consumers to process messages received on the same messaging channel. It addresses specific challenges in the messaging category.

2. Compute Resource Consolidation - This consolidates multiple tasks or operations into a single computational unit. It is widely applicable to many implementations.

3. CQRS – These segregates operations that read data from those that update data by directing them to different interfaces. This helps with performance and efficiency.

4. Deployment stamps – These deploy multiple independent copies of application components, including data stores.

5. Event sourcing – This uses an append only store to record the full series of events that describe the actions taken on data in a specific domain.

6. External configuration store – This moves configuration information out of the application deployment package to a centralized location.

7. Federated Identity – This delegates authentication to an external identity provider.

8. Gatekeeper – This protects applications and services by using a dedicated host instance as a man-in-the-middle position. The host brokers services to a client, validates and sanitizes requests and passes requests and data between them.

9. Gateway Aggregation - This uses a gateway to aggregate multiple individual requests into a single request.

10. Gateway offloading – The shared or specialized service functionality is offloaded to gateway proxy.

11. Gateway Routing – This routes requests to multiple services using a single endpoint.

12. Geodes – This deploys backend services into a set of geographical nodes. Each node can service any client request in any region.

13. Health Endpoint Monitoring – This is crucial for external tools to continually monitor the health of a service through an exposed endpoint.

14. Index table – This creates indexes over the fields in the data stores which comes useful for querying.

15. Leader election – This coordinates the actions performed by a collection of collaborating task instances in a distributed application by electing one instance as the leader that assumes responsibility for managing the other instances.

16. Materialized view – This generates prepopulated views over the data in one or more data stores so that the query for the view does not need to be run again and the view is available to subsequent queries.

17. Pipes and filters – these separate the execution into stages that form a series. Those stages become reusable.

18. Priority queue – These requests are sent to services so that the requests with a higher priority are received and processed before others.

19. Publisher/Subscriber - This enables an application to produce events for interested consumers without coupling the producer to the consumers. It also allows fan-out of events to multiple parties.

20. Queue based load leveling – this uses a queue that acts as a buffer between a task and a service that it calls so that intermittent heavy workloads can be staged and smoothly processed.

https://1drv.ms/w/s!Ashlm-Nw-wnWhKdun2-8IUJ19Or-Og

Sample python implementation:

#! /usr/bin/python

def determining_root_cause_for_api(failures):

Return cluster_centroids_of_top_clusters(failures)

def batch_cluster_repeated_pass(api, failures)

api_cluster = classify(api, failures)

proposals = gen_proposals(api_cluster)

clusters = [(FULL, api_cluster)]

For proposal in proposals:

Cluster = get_cluster_proposal(proposal, api, failures)

clusters += [(proposal, cluster)]

Selections = select_top_clusters(clusters)

Return api_from(selections)

Def select_top_clusters(threshold, clusters, strategy = goodness_of_fit):

return clusters_greater_than_goodness_of_fit_weighted_size(threshold, clusters)

Friday, October 29, 2021

This is a continuation of an article that describes operational considerations for hosting solutions on Azure public cloud.

These are some of the cloud design patterns that are useful for building reliable, scalable, secure applications in the cloud. Some of the challenges encountered for which these design patterns hold include:

Data Management: This is a key element of the cloud application. Data is hosted in different locations and across multiple servers for reasons such as performance, scalability or availability, and this can present a range of challenges. For example, data consistency must be maintained and data will typically need to be synchronized or replicated across different locations.

Design and Implementation: Good design comes with consistency and coherence in component design and deployment. It simplifies administration and deployment. Components and subsystems get reused from scenario to scenario.

Messaging: A messaging infrastructure connects services so that they have flexibility and scalability. It is widely used and provides benefits such as retries, dead letter queue and ordered delivery but it also brings challenges such as services not reacting to changes in message schema, idempotency of operations and the periods of inactivity when the messages are not delivered.

The patterns used to resolve these kinds of problems include the following:

1. Ambassador patterns that create helper services and send network requests on behalf of a consumer service or application.

2. Anti-corruption layer that implements a façade between a modern application and a legacy system.

3. Asynchronous request-reply where the backend processing is decoupled from a frontend host, or it needs to time but the front end still needs a clear response

4. Backends for Frontends: where microservices provide the core functionalities that the frontend requires

5. Bulkhead: where elements of an application are isolated into pools so that if one fails, the others will continue

6. Cache-Aside: this loads data on demand into a cache from a data store.

7. Choreography: where each service decides when and how a business operation is processed and eliminates the need for an orchestrator

8. Circuit breaker: which handles fault that takes variable amount of time to fix when connecting to remote resources.

9. Claim Check: which splits a large message into claim checks and payload to avoid overwhelming a message bus

10. Compensating Transaction: which undoes the work performed by a series of steps that together define an eventually consistent operation.

Thursday, October 28, 2021

This is a continuation of the article on the change-feed for Cosmos DB. Specifically, it discusses document versioning support. If we treat each item in CosmosDB as a standalone document, then it is subject to the same versioning principles as a blob. The versioning for blobs automatically maintains previous versions of an object. When blob versioning is enabled, it can restore an earlier version of a blob to recover data to recover the data if it is accidentally deleted or modified. If a blob is edited, it copies on write so a new version is created. Since each write operation creates a new version, it is possible to revert to earlier versions. Similarly, an older version of the data can be promoted to create a new version. Each version maintains a new identifier. A blob can have only one current version at a time. Blob versions are immutable. The content or metadata of an existing blob pertaining to its version. If there are many versions for a blob, it will tend to increase the latency for listing operations. Fewer than a thousand versions are preferable for a blob. Old versions can be deleted automatically.

In the case of a blob, the version identifier is the timestamp at which the blob was updated. The version ID is assigned at the time that the version is created. Read or write operations can target a specific version given the version id. If it is omitted, the current version is used instead. The x-ms-version-id header in the http responses holds this identifier.

Versioning is not enabled on a per blob basis. It is set at the account level. Prior to versioning being enabled on the account level, a blob in that account does not have a version. When the versioning is enabled, all write operations create a new version except for the put block operation.

When the delete operation is called, the current version becomes a previous version and there is no current version anymore. All the previously existing versions are preserved.

Blob versioning can be enabled or disabled. When it is disabled, no new versions are subsequently created. Any existing versions remain accessible.

Blob versioning is frequently used with soft delete which protects a blob, snapshot or version from accidental deletes or overwrites by maintaining the deleted data in the system for a specified period. During this time, a soft-deleted object can be restored to its original before the delete was issued. After the expiry of the retention period, the object is permanently deleted. This can be enabled or disabled at the container level. Attempting to delete a soft-deleted object does not affect its expiry time.

The API support for versioning and soft-delete is only available in later versions rather than the earlier versions. If a blob has snapshots, the blob cannot be deleted without deleting the base blob. No new snapshots are created. Soft-delete objects are invisible unless they are called out for displaying or listing.

Wednesday, October 27, 2021

Versioning stored documents in Azure Cosmos DB

Introduction

History of data is often important as much as the data itself. For example, Finance, healthcare and insurance industries often track histories of portions of the data for audit purposes, and reporting. CosmosDB forms the storage layer for many microservices in Azure. This article explains the ‘change-feed’ feature associated with this storage.

Description:

CosmosDB exposes an API for the underlying log of changes regarding the documents in its collection. For users familiar with the SQL Server relational store, this is the equivalent of the change data capture. The changes are recorded incrementally and can be distributed across one or more consumers for parallel processing, enabling a variety of applications. The change feed works for updates and other forms of writes but not deletions. Usually only the most recent change is available. Intermediate changes are not visible.

The change feed is not targeted at solving all the versioning requirements from the CosmosDB store. That requires a Document Versioning Pattern which involves the following:

1. Intent – This ensures that each entity in collections, when updated maintains the history of changes.

2. Motivation – This tracks the history of entities throughout their lifecycle

3. Applicability – This covers the usages such as auditing, reporting and analysis

4. Structure – In order to keep the state of the objects, every update must be turned into an append operation.

5. Participants - A materialized view is made possible with the change feed

6. Consequences – This should work for short and long histories. If it suffers performance degradation, it might not apply to all use cases for versioning.

Change feed allows the use of a “soft marker” on the items for the updates and the filter based on that when the processing items in the change feed. This enables the recording of deletes since deletes are not supported. Inserts and updates are recorded by the change feed automatically.

Change feed items come in the order of their modification time. This sort order is guaranteed per logical partition key.

In a multi-region Azure Cosmos DB account, the failover of a write region will be supported where the change feed will work across the manual failover operation and will remain contiguous.

Conclusion:

This approach solves the capture of data changes for its applicability to auditing, reporting and analysis.