Tuesday, June 28, 2022

 This is a continuation of series of articles on hosting solutions and services on Azure public cloud with the most recent discussion on Multitenancy here This article discusses the architectural approaches for compute in multitenant solutions. 

Compute services involve a variety of compute resources such as commodity virtual machines, containers, queue processors, PaaS resources, and specialized resources like GPUs and high-performance compute. Multitenant solutions introduce sharing which brings a higher density of tenants to infrastructure and reduce the operational cost and management.

When these compute resources are shared, isolation model, impact to scaling performance, state management and security of the compute resources must be considered. These key decisions for planning a multitenant compute solution are discussed below.

Scaling of resources helps meet the changing demand from the growing number of tenants and the increase in the amount of traffic. We might need to increase the capacity of the resources to maintain an acceptable performance rate. When the number of active users reduces or there is a decrease in the traffic, the compute resources could be released to reduce costs. When the resources are dedicated to a tenant, they can be independently scaled to meet the tenants’ demands. This is the simplest solution but it requires a minimum number of resources per tenant. A shared scaling of resources in the platform implies all the tenants will be affected. They will also suffer when the scale is insufficient to handle their overall load. If a single tenant uses a disproportionate amount of the resources available in the system, it leads to a well-known problem called the noisy neighbor antipattern. When the resource usage increases above the total capacity from the peak load of the tenants involved, failures occur which are not specific to a tenant and impact the performance of those tenants. The total capacity can also be exceeded when the individual usages are small but the number of tenants increase dramatically. Performance problems often remain undetected until an application is under load.  A load testing preview can help analyze the behavior of the application under stress. Scaling horizontally or vertically helps correct the correlated application behavior.

The triggers that cause the components to scale must be carefully planned. When the scaling depends on the number of tenants, it is best to batch the next scaling after a batch size of tenants have been adequately served with the available resources. Many compute services provide autoscaling and require us to provide minimum and maximum levels of scale. Azure app services, Azure app functions, Azure container apps, Azure Kubernetes service, and Service for virtual machines can automatically increase or decrease the number of instances that run in the application. 

State based considerations come when the data must be persisted between requests. From a scalability perspective, stateless components are often easy to scale out in terms of workers, instances or nodes and they can provide a warm start to immediately process requests. Tenants can also be moved between resources if it can be permitted. Stateful resources depend on the persisted state. They are generally avoided to be saved in the compute layer and are instead stored in storage services. Transient state can be stored in caches or local temporary files.

The patterns described above such as the use of the deployment stamp pattern, the compute resource consolidation pattern and the dedicated compute resources per tenant pattern help to optimize the operational cost and management with little or no impact to the usages.


No comments:

Post a Comment