Cluster computing

Thursday, November 4, 2021

Planning for onboarding a financial calculator service to Azure public cloud:

Problem statement: This article leverages an Azure industry cloud example to onboard a FinTech Service to the Azure public cloud. While there are many examples from the Azure industry clouds and verticals, this article is specifically for onboarding a financial calculator service that acts a broker between external producers and consumers. With this case study, we attempt to do a dry run of the principles learned from the Azure Cloud adoption framework and make the migration in the most efficient, reliable, available and cost-effective manner.

Article: Onboarding a service such as a financial calculator in the Azure public cloud is all about improving its deployment, using the proper subscription, planning for capacity and demand, optimizing the Azure resources, monitoring the service health, setting up management group, access control, security and privacy of the services and setting up the pricing controls and the support options. We look at these in more detail now.

Proper subscription: Many of the rate limits, quotas and the availability of services are quite sufficient in the very first tier of subscription. The Azure management console has a specific set of options to determine the scale required for the service.

Resource and Resource groups: The allocation of a resource group, identity and access control is certainly a requirement for the onboarding of a service. It is equally important to use the pricing calculator and TCO calculator in the Azure public cloud to determine the costs. Some back of the envelope calculation in terms of the bytes per request, number of requests per second, the latency, recovery time, recovery point, MTTR, MTBF help with determining the requirements and the resource management.

Optimizing the Azure resources: This is automated. If we are deploying a python Django application and a node.js frontend application, then it is important to make use of api gateway, load balancer, proxy and scalability options, certificates, domain name resources etc. The use of resources specific to the service as well as those that enhance its abilities must be methodically ruled off from the checklist that one can draw from the Azure management portal.

Monitoring the service health: Metrics specific to the financial calculator service in terms of the size of the data processed, the mode of delivery, the number of requests submitted to the system, the load on the service in terms of the distribution statistics and other such metrics will help determine if the service requires additional resources or when something goes wrong. Alerts can be set up for the thresholds so we can remain passive until we get an alert.

Management group, Identity and access control: Even if there is only one person in charge of the service, the setting up of a management group, user and access control formalizes and detaches that person so that anyone else can take on the administrator role. This option will also help set up registrations and notifications to that account so that it is easier to pass the responsibility around.

Security and privacy: The financial calculator service happens to be a stateless transparent financial proxy which does not retain any data from the customer, so it does not need any further actions towards security and privacy. TLS setup on the service and use of proper certificates along with domain names will help keep it compute resource independent.

Advisor: Azure has an advisor capability that advises on the efficiencies possible with the deployment after the above-mentioned steps have been taken. This helps in streamlining the operations and reducing cost.

Conclusion: The service onboarding feature is critical towards the proper functioning of the service both in terms of its cost and benefits. When the public cloud knowledge center articles are followed up meticulously for the use of the Azure management portal in the deployment of the service, the service is guaranteed to improve its return on investment.

Wednesday, November 3, 2021

This is a continuation from the previous post.

1. Adoption: The cloud adoption plan at enterprise scale warrants significant investments in the creation of a new business logic. A migration plan moves those workloads to the cloud with the following three approaches: lift and shift, lift and optimize, or modernize. The migration scenarios, best practices and process improvements come with sufficient literature. Another area of emphasis is innovation. Unlike migration, this can provide the greatest business value by unlocking new technical skills and expanded business capabilities.

2. Govern: The Governance in the Microsoft Cloud Adoption Framework for Azure is an iterative process. As cloud estates change over time, so do the cloud governance processes and policies especially for an organization that has been heavily invested in on-premises infrastructure. With the ability to create an entire virtual data center with a few lines of code, the paradigm left shifts in favor of automation. The governance benchmark tool goes a long way in realizing this vision.

3. Manage: Cloud Management delivers strategy using planning, readiness and adoption and drives the digital assets towards tangible business outcomes. This form of management requires articulated business commitments, a management baseline, its subsequent expansion, and advanced operations and design principles.

4. Secure: The security in the Microsoft cloud adoption framework is a journey. It involves incremental progress and maturity and does not have a static destination. Its end state can be envisioned, and this provides guidance for periodic assessments, alignments and definite results. Organizations like the NIST, The Open Group, and the Center for Internet Security provide standards to this effect. Establishing the security roles and responsibilities helps with resources for this deeply technical discipline.

5. Organize: Cloud adoption cannot happen without well-organized people. Successful adoption is the result of properly skilled people doing the appropriate types of work. The approach to establish and maintain the proper organizational structure involves a. defining the type, b. cloud functions that adopt and operate the cloud, c. defining the teams that can provided various cloud functions and d. coming up with the Responsible, Accountable, Consulted and Informed (RACI) matrix.

6. Resources: The public cloud comes with several tools and templates for each of the above stages such as cloud journey tracker, strategy and plan template, readiness checklist, governance benchmark assessment, migration discovery checklist, solution accelerators, operations management workbook, RACI diagram, and such others.

Conclusion: The cloud adoption framework lays the roadmap for a successful cloud adoption journey.

Tuesday, November 2, 2021

This article is a continuation of a series of articles on Azure Cloud architecture, design patterns, technology choices and migration guidance. Specifically, it discusses Cloud Adoption Framework. This framework discusses the Cloud Adoption Framework.

The cloud adoption framework brings it together for the cloud adoption journey. It involves:

1. Getting Started: A great starting point aligns with an existing scenario for which the starting guides have already been published. For example, an organization trying to accomplish a certain goal might want to choose the cloud adoption scenario that best supports their strategy, examine antipatterns across methodologies and their solutions, align foundational concepts to onboard a person, project, or team, adopt the cloud to deliver business and technical outcomes sooner, improve controls to ensure proper operations of the cloud or establish teams to support adoption and operations. Specific cloud adoption scenarios include hybrid and multi-cloud adoptions, modern application platform scenario, SAP adoption scenario, desktop virtualization, and cloud adoption for the retail industry. Cloud adoption requires technical change, but it is never restricted to IT. Other teams might want to migrate existing workloads to the cloud, build new products and services in the cloud, or might want to be unblocked by environment design and configuration. A solid operating model improves controls. The Azure Advisor fits this role.

2. Strategy: A cloud adoption strategy describes it to the cloud technicians and makes a case to the business stakeholders. This is efficiently done only when it encompasses the following a. the motivations are defined and documented, b. the business outcomes are documented, c. the financial considerations are evaluated, and d. the technical considerations are understood. There is a published strategy-and-plan-template that builds out the cloud adoption strategy and it helps to capture the output of each of these steps.

3. Plan: The cloud adoption framework must come with a plan that translates the goals from the strategy document to something that can be executed in the field. The collective cloud teams must put together Specific-Measurable-Achievable-Relevant-and-Timebound action items that capture the prioritized tasks to drive adoption efforts and maps to the metrics and motivations defined in the cloud adoption framework. This includes a. digital estate, b. initial organizational alignment, c. skills readiness plan, and d. cloud adoption plan.

4. Readiness: Before a plan is enacted, some preparation is required. In this case, the following exercises guide us through the process of creating a landing zone to support cloud adoption. These include: a setup guide that familiarizes tools and processes involved, a landing zone that establishes code based starting point for the infrastructure and environment, its subsequent expansion to meet the platform requirements specified in the plan and finally, the best practices, that validates landing zone modifications against the best practices to ensure the best configurations.

Monday, November 1, 2021

This is a continuation of the article introduced here: https://1drv.ms/w/s!Ashlm-Nw-wnWhKdsT81vgXlVl38AfA?e=qzxyuJ. Specifically, it discusses the technology choices from the Application Architecture guide for Azure Public Cloud.

1. Choosing a candidate service:

a. if full control of the compute is required, a virtual machine or its scaleset is appropriate.

b. If it has HPC workload, Azure Batch is helpful.

c. If it has a microservice architecture, it has an Azure App Service.

d. If it has an event-driven workload with short-lived processes. Azure functions suit the task.

e. If a full-fledged orchestration is required, Azure Container Instances are helpful.

f. If a managed service is needed with .Net framework, Azure Service Fabric is helpful.

g. If Spring boot applications are required, Azure Spring Cloud is helpful.

h. If RedHat Openshift is required, Azure RedHat Openshift is dedicated to this purpose.

i. If a managed infrastructure is required, Azure Kubernetes Service does the job.

There are two ways to migrate on-premises compute to the cloud:

The first involves the ‘lift and shift’ pattern which is a strategy for migrating a workload to the cloud without the redesigning ans is also called ‘rehosting’ pattern.

The second involves refactoring an application to take advantage of the cloud native features.

2. Microservices architecture can have further options for deployment. There are two approaches here: The first involves a service orchestrator that manages services running on dedicated nodes (VMs) and the second involves a serverless architecture using Functions-as-a-service (FaaS). When Microservices are deployed as binary executables aka Reliable Services, the Reliable Services Programming Model makes use of Service Fabric Programming APIs to query system, report health, receive notifications on configuration and code changes, and discover other services. This is tremendously advantageous for building stateful services using so called Reliable Collections. AKS and Mesophere provide alternative infrastructures for deployment.

3. The Kubernetes at the edge compute option is excellent to keep operational costs low, easily configure and deploy a cluster, find flexibility with existing infrastructure at the edge, and for running a mixed node cluster with both Linux and Windows nodes. Out of the three options for leveraging Kubernetes with Baremetal, K8s on Azure Stack edge and AKS on HCI, the last option is the easiest to work with.

4. Choosing an identity service involves comparing the options for self-managed Active Directory services, Azure Active Directory, and managed Azure Active Directory Domain Services. Azure Active Directory does away with one’s own directory services and leveraging the cloud provided one instead.

5. Choosing a data store is not always easy. The term polyglot persistence is used to describe solutions that use a mix of data store technologies. Therefore, it is important to understand the main storage models and their tradeoffs.

6. Choosing an analytical data store is not always about Big Data or lambda architectures with its incremental data processing speed serving layer and batch processing layer, although they both require it. The driving force varies on a case-by-case basis.

7. Similarly, AI/ML services can be leveraged directly from the Azure Cognitive Services portfolio, but its applicability varies on a case-by-case basis as well.

Sunday, October 31, 2021

This is a continuation of the article introduced here: https://1drv.ms/w/s!Ashlm-Nw-wnWhKdmXP_0_c--oqotlA?e=KDRRVf. It describes the design patterns for considerations in hosting solutions on Azure public cloud.

1. Queue based load leveling – this uses a queue that acts as a buffer between a task and a service that it calls so that intermittent heavy workloads can be staged and smoothly processed.

2. Retry - Enable an application to handle anticipated, temporary failures when it tries to connect to a service or network resource by transparently retrying an operation that's previously failed.

3. Scheduler agent supervisor - Coordinate a set of actions across a distributed set of services and other remote resources.

4. Sequential convoy - Process a set of related messages in a defined order, without blocking processing of other groups of messages.

5. Sharding - Divide a data store into a set of horizontal partitions or shards.

6. Sidecar - Deploy components of an application into a separate process or container to provide isolation and encapsulation.

7. Static content hosting - Deploy static content to a cloud-based storage service that can deliver them directly to the client.

8. Strangler fig - Incrementally migrate a legacy system by gradually replacing specific pieces of functionality with new applications and services.

9. Throttling - Control the consumption of resources used by an instance of an application, an individual tenant, or an entire service.

10. Valet Key - Use a token or key that provides clients with restricted direct access to a specific resource or service.

11. Minimal design – use what is always necessary and no more.

The top ten design principles cited in the documentation for Azure services are:

1. Designing for self-healing when failures occur

2. Making all things redundant so that there is no single point of failure.

3. Minimizing co-ordination between application services to achieve scalability.

4. Designing to scale out horizontally adding or removing instances as demand rises.

5. Partitioning around limits such that it works around database, network or compute limits.

6. Designing for operations – so that there are enough tools for them to use.

7. Using managed services – to leverage Platform as a service rather than Infrastructure-as-a-service

8. Using the best data store for the job so that the data fits

9. Designing for evolution so that application changes are easy

10. Building for the needs for the business where every decision is driven by business requirements.

Saturday, October 30, 2021

1. Competing Consumers – This enables multiple concurrent consumers to process messages received on the same messaging channel. It addresses specific challenges in the messaging category.

2. Compute Resource Consolidation - This consolidates multiple tasks or operations into a single computational unit. It is widely applicable to many implementations.

3. CQRS – These segregates operations that read data from those that update data by directing them to different interfaces. This helps with performance and efficiency.

4. Deployment stamps – These deploy multiple independent copies of application components, including data stores.

5. Event sourcing – This uses an append only store to record the full series of events that describe the actions taken on data in a specific domain.

6. External configuration store – This moves configuration information out of the application deployment package to a centralized location.

7. Federated Identity – This delegates authentication to an external identity provider.

8. Gatekeeper – This protects applications and services by using a dedicated host instance as a man-in-the-middle position. The host brokers services to a client, validates and sanitizes requests and passes requests and data between them.

9. Gateway Aggregation - This uses a gateway to aggregate multiple individual requests into a single request.

10. Gateway offloading – The shared or specialized service functionality is offloaded to gateway proxy.

11. Gateway Routing – This routes requests to multiple services using a single endpoint.

12. Geodes – This deploys backend services into a set of geographical nodes. Each node can service any client request in any region.

13. Health Endpoint Monitoring – This is crucial for external tools to continually monitor the health of a service through an exposed endpoint.

14. Index table – This creates indexes over the fields in the data stores which comes useful for querying.

15. Leader election – This coordinates the actions performed by a collection of collaborating task instances in a distributed application by electing one instance as the leader that assumes responsibility for managing the other instances.

16. Materialized view – This generates prepopulated views over the data in one or more data stores so that the query for the view does not need to be run again and the view is available to subsequent queries.

17. Pipes and filters – these separate the execution into stages that form a series. Those stages become reusable.

18. Priority queue – These requests are sent to services so that the requests with a higher priority are received and processed before others.

19. Publisher/Subscriber - This enables an application to produce events for interested consumers without coupling the producer to the consumers. It also allows fan-out of events to multiple parties.

20. Queue based load leveling – this uses a queue that acts as a buffer between a task and a service that it calls so that intermittent heavy workloads can be staged and smoothly processed.

https://1drv.ms/w/s!Ashlm-Nw-wnWhKdun2-8IUJ19Or-Og

Sample python implementation:

#! /usr/bin/python

def determining_root_cause_for_api(failures):

Return cluster_centroids_of_top_clusters(failures)

def batch_cluster_repeated_pass(api, failures)

api_cluster = classify(api, failures)

proposals = gen_proposals(api_cluster)

clusters = [(FULL, api_cluster)]

For proposal in proposals:

Cluster = get_cluster_proposal(proposal, api, failures)

clusters += [(proposal, cluster)]

Selections = select_top_clusters(clusters)

Return api_from(selections)

Def select_top_clusters(threshold, clusters, strategy = goodness_of_fit):

return clusters_greater_than_goodness_of_fit_weighted_size(threshold, clusters)

Friday, October 29, 2021

This is a continuation of an article that describes operational considerations for hosting solutions on Azure public cloud.

These are some of the cloud design patterns that are useful for building reliable, scalable, secure applications in the cloud. Some of the challenges encountered for which these design patterns hold include:

Data Management: This is a key element of the cloud application. Data is hosted in different locations and across multiple servers for reasons such as performance, scalability or availability, and this can present a range of challenges. For example, data consistency must be maintained and data will typically need to be synchronized or replicated across different locations.

Design and Implementation: Good design comes with consistency and coherence in component design and deployment. It simplifies administration and deployment. Components and subsystems get reused from scenario to scenario.

Messaging: A messaging infrastructure connects services so that they have flexibility and scalability. It is widely used and provides benefits such as retries, dead letter queue and ordered delivery but it also brings challenges such as services not reacting to changes in message schema, idempotency of operations and the periods of inactivity when the messages are not delivered.

The patterns used to resolve these kinds of problems include the following:

1. Ambassador patterns that create helper services and send network requests on behalf of a consumer service or application.

2. Anti-corruption layer that implements a façade between a modern application and a legacy system.

3. Asynchronous request-reply where the backend processing is decoupled from a frontend host, or it needs to time but the front end still needs a clear response

4. Backends for Frontends: where microservices provide the core functionalities that the frontend requires

5. Bulkhead: where elements of an application are isolated into pools so that if one fails, the others will continue

6. Cache-Aside: this loads data on demand into a cache from a data store.

7. Choreography: where each service decides when and how a business operation is processed and eliminates the need for an orchestrator

8. Circuit breaker: which handles fault that takes variable amount of time to fix when connecting to remote resources.

9. Claim Check: which splits a large message into claim checks and payload to avoid overwhelming a message bus

10. Compensating Transaction: which undoes the work performed by a series of steps that together define an eventually consistent operation.