Cluster computing

Saturday, July 2, 2022

Border Gateway Protocol:

This is a continuation of series of articles on hosting solutions and services on Azure public cloud with the most recent discussion on Multitenancy here This discusses networking considerations in Multitenant applications.   

This protocol can be configured on a Windows Server with Routing and Remote Access Service Gateway in multitenant mode. It gives the ability to manage the tenant’s vm networks and their remote sites.

BGP is a dynamic routing protocol. It learns the route between sites that are connected using site-to-site VPN connections. It eliminates the need for manual route configuration on routers. When configured as a multi-tenant BGP router to exchange tenant and Cloud Service Provider subnet routes, the RAS gateway is deployed on a vm or a set of vms for high availability. The single tenant edge gateway deployment can be on a physical computer in a LAN deployment.

The Powershell script to configure the multitenant mode looks like this:

$foo_RoutingDomain = “FooTenant”

$bar_RoutingDomain = “BarTenant”

Install-RemoteAccess -MultiTenancy

Enable-RemoteAccessRoutingDomain -Name $foo_RoutingDomain -Type All -PassThru

Enable-RemoteAccessRoutingDomain -Name $bar_RoutingDomain -Type All -PassThru

There can be several modes of deployment between Enterprise sites and a Cloud Service Provider Datacenter. This involves dynamic routing information exchange between an Enterprise and the multiple gateways of the CSP. A few modes of deployments are enumerated below:

RAS VPN site-to-site gateway with BGP at the Enterprise site edge.

Third Party Gateway with BGP at the Enterprise site edge

Multiple Enterprise sites with Third Party gateways

Separation Termination points for BGP and VPN

The last mode of deployment supports internal BGP (iBGP) and external BGP (eBGP) segregation. The iBGP is only used with the separation of termination points for BGP and VPN.  BGP is used for peering and maintains a separate routing table different from those for internal networks.  The route metrics are based on shortest AS paths rather than distance or cost between hops. Unlike OSPF or interior Gateway Protocol that provides fault tolerance or redundancy and direct connections to external Autonomous Systems, BGP handles multiple connections to an external Autonomous System while allowing the existing router to handle the additional demands. It is an admission control protocol based on path-vector routing.

The way BGP works are that it establishes neighbor relationships called peers between routers called speakers. If the relationships are all within the same AS, it is called internal BGP. If it connects separate autonomous systems, it is called external BGP. Initially, peers share full routing tables. Afterward, they share only the updates. 

The features of the BGP Router using Windows Server include:

Independent deployment of just the BGP routing role service and not the Remote Access Service which leads improved router performance.

Collection of statistics using Message counters and Route Counters. The Get-BGPStatistics cmdlet provides this information.

Equal Cost multipath routing support for redundant networks

Hold Time Configuration- The BGP Router supports configuration of the Hold Timer Value according to the network requirements.

Internal BGP and external BGP segregation – The local and remote BGP routers are distinct supporting iBGP and eBGP peering. The iBGP is only used with the fourth mode of deployment listed which is the separation of termination points for BGP and VPN.

Latest RFC compliance – RFC-4271 aka BGP-4 protocol compliant implementation implies the product is interoperable with third party vendors.

Ipv4 and ipv6 peering supported- this support comes from ipv4 and ipv6 peering while the BGP router is assigned an ipv4 address.

Ipv4 and ipv6 advertisement capability or Multiprotocol Network layer Reachability Information NLRI is supported

Mixed mode and passive mode peering is supported. The former refers to the BGP Router serving as both the initiator as well as the responder. The latter mode is just responsive so it helps with debugging and troubleshooting.

Route attribute rewrite capability is provided. The BGP routing policies Next-Hop, MED, Local-Pref and Community are supported.

Route filtering – The BGP router supports filtering ingress or egress route advertisements.

Friday, July 1, 2022

This is a continuation of series of articles on hosting solutions and services on Azure public cloud with the most recent discussion on Multitenancy here This article discusses the architectural approaches for IoT in multitenant solutions.    

IoT services like storage services are hetereogenous in their functionality. IoT systems vary in flavor and size. Not all IoT systems have the same certifications or capabilities. Multitenant solutions introduce sharing which brings a higher density of tenants to infrastructure and reduce the operational cost and management. Unlike compute or storage, isolation model can be as granular as the devices. Services that depend on the number of devices supported in a single instance include Azure IoT central, Azure IoT Hub Device Provisioning Service (DPS) and Azure IoT Hub. Different devices even in the same solution might have different throughput requirements. Throughput refers to the total bytes transferred in unit time and can be affected by both the number of messages as well as the size of messages.

When these IoT resources are shared, isolation model, impact to scaling performance, state management and security of the IoT resources become complex. These key decisions for planning a multitenant IoT solution are discussed below.   

Scaling resources helps meet the changing demand from the growing number of tenants and the increase in the amount of traffic. We might need to increase the capacity of the resources to maintain an acceptable performance rate. For example, if a single IoT topic/queue is provisioned for all the tenants and the traffic exceeds the specific number of IoT operations per second, the Azure IoT will reject the application’s requests and all the tenants will be impacted. Scaling depends on Number of producers and consumers, Payload size, Partition count, Egress request rate and Usage of IoT Hubs Capture, Schema Registry, and other advanced features. When additional IoT is provisioned or rate limit is adjusted, the multitenant solution can perform retries to overcome the transient failures from requests. When the number of active users reduces or there is a decrease in the traffic, the IoT resources could be released to reduce costs. When the resources are dedicated to a tenant, they can be independently scaled to meet the tenants’ demands. This is the simplest solution, but it requires a minimum number of resources per tenant. A shared scaling of resources in the platform implies all the tenants will be affected. They will also suffer when the scale is insufficient to handle their overall load. If a single tenant uses a disproportionate amount of the resources available in the system, it leads to a well-known problem called the noisy neighbor antipattern. When the resource usage increases above the total capacity from the peak load of the tenants involved, failures occur which are not specific to a tenant and impact the performance of those tenants. The total capacity can also be exceeded when the individual usages are small, but the number of tenants increase dramatically. Performance problems often remain undetected until an application is under load.  A load testing preview can help analyze the behavior of the application under stress. Scaling horizontally or vertically helps correct the correlated application behavior.   

Data isolation depends on the scope of isolation. When the storage for IoT is SQL Server, then the IoT solution can make use of IoT Hub. If the storage is Azure Data Explorer, the IoT solution can benefit from IoT Central. Finally, IoT resources can be provisioned within a single subscription or separated into one per tenant.  

Varying levels and scope of sharing of IoT resources demands simplicity from the architecture of the multitenant application to store and access data with little expertise. A particular concern for multitenant solution is the level of customization to be supported.  

Patterns such as the use of the deployment stamp pattern, the IoT resource consolidation pattern and the dedicated IoT resources per tenant pattern help to optimize the operational cost and management with little or no impact to the usages.   

Reference: Multitenancy: https://1drv.ms/w/s!Ashlm-Nw-wnWhLMfc6pdJbQZ6XiPWA?e=fBoKcN        

Thursday, June 30, 2022

Messaging services unlike storage services are more homogeneous in their functionality. All messaging systems have similar functionalities, transport protocols and usage scenarios. Most of the modern messaging systems involve asynchronous communications. Multitenant solutions introduce sharing which brings a higher density of tenants to infrastructure and reduce the operational cost and management. Unlike compute or storage, isolation model can be as granular as the messages and events. Using the published information and how data is consumed and processed by the applications, we can distinguish between different kinds. Services that deliver an event can include Azure Event Grid and Event Hubs and systems that send a message can include Service Bus

When these messaging resources are shared, isolation model, impact to scaling performance, state management and security of the messaging resources become complex. These key decisions for planning a multitenant messaging solution are discussed below. 

Scaling resources helps meet the changing demand from the growing number of tenants and the increase in the amount of traffic. We might need to increase the capacity of the resources to maintain an acceptable performance rate. For example, if a single messaging topic/queue is provisioned for all the tenants and the traffic exceeds the specific number of messaging operations per second, the Azure messaging will reject the application’s requests and all the tenants will be impacted. Scaling depends on Number of producers and consumers, Payload size, Partition count, Egress request rate and Usage of Event Hubs Capture, Schema Registry, and other advanced features. When additional messaging is provisioned or rate limit is adjusted, the multitenant solution can perform retries to overcome the transient failures from requests. When the number of active users reduces or there is a decrease in the traffic, the messaging resources could be released to reduce costs. When the resources are dedicated to a tenant, they can be independently scaled to meet the tenants’ demands. This is the simplest solution, but it requires a minimum number of resources per tenant. A shared scaling of resources in the platform implies all the tenants will be affected. They will also suffer when the scale is insufficient to handle their overall load. If a single tenant uses a disproportionate amount of the resources available in the system, it leads to a well-known problem called the noisy neighbor antipattern. When the resource usage increases above the total capacity from the peak load of the tenants involved, failures occur which are not specific to a tenant and impact the performance of those tenants. The total capacity can also be exceeded when the individual usages are small, but the number of tenants increase dramatically. Performance problems often remain undetected until an application is under load.  A load testing preview can help analyze the behavior of the application under stress. Scaling horizontally or vertically helps correct the correlated application behavior. 

Data isolation depends on the scope of isolation. When ServiceBus is used for instance, separate topics or queues are deployed for each tenant and subscriptions can be shared between tenants. Another option is to use some level of sharing for queues and topics and create more instances when the utility has exceeded tolerable limits. Finally, messaging resources can be provisioned within a single subscription or separated into one per tenant.

Varying levels and scope of sharing of messaging resources demands simplicity from the architecture of the multitenant application to store and access data with little expertise. A particular concern for multitenant solution is the level of customization to be supported.

Patterns such as the use of the deployment stamp pattern, the messaging resource consolidation pattern and the dedicated messaging resources per tenant pattern help to optimize the operational cost and management with little or no impact to the usages. 

Wednesday, June 29, 2022

Storage services involve a variety of storage resources such as commodity disks, local storage, remote network shares, blobs, tables, queues, database resources, and specialized resources like cold tier and archival. Multitenant solutions introduce sharing which brings a higher density of tenants to infrastructure and reduce the operational cost and management. Unlike compute, data can leak, egress or remain vulnerable in both transit and rest, therefore isolation model is even more important.

When these storage resources are shared, isolation model, impact to scaling performance, state management and security of the storage resources become complex. These key decisions for planning a multitenant storage solution are discussed below.

Scaling of resources helps meet the changing demand from the growing number of tenants and the increase in the amount of traffic. We might need to increase the capacity of the resources to maintain an acceptable performance rate. For example, if a single storage account is provisioned for all the tenants and the traffic exceeds the specific number of storage operations per second, the Azure storage will reject the application’s requests and all the tenants will be impacted. When additional storage is provisioned or rate limit is adjusted, the multitenant solution can perform retries to overcome the transient failures from requests. When the number of active users reduces or there is a decrease in the traffic, the storage resources could be released to reduce costs. When the resources are dedicated to a tenant, they can be independently scaled to meet the tenants’ demands. This is the simplest solution but it requires a minimum number of resources per tenant. A shared scaling of resources in the platform implies all the tenants will be affected. They will also suffer when the scale is insufficient to handle their overall load. If a single tenant uses a disproportionate amount of the resources available in the system, it leads to a well-known problem called the noisy neighbor antipattern. When the resource usage increases above the total capacity from the peak load of the tenants involved, failures occur which are not specific to a tenant and impact the performance of those tenants. The total capacity can also be exceeded when the individual usages are small but the number of tenants increase dramatically. Performance problems often remain undetected until an application is under load. A load testing preview can help analyze the behavior of the application under stress. Scaling horizontally or vertically helps correct the correlated application behavior.

Data isolation depends on the data storage provider. When the CosmosDB is used for instance, separate containers are deployed for each tenant and databases and accounts can be shared between tenants. When Azure Storage is used, either the container or the account could be separated per tenant. When a shared storage management system such as a relational store is used, separate tables or even separate databases can be used for each tenant. Finally, storage resources can be provisioned within a single subscription or separated into one per tenant.

Varying levels and scope of sharing of storage resources demands simplicity from the architecture of the multitenant application to store and access data with little expertise. A particular concern for multitenant solution is the level of customization to be supported.

Patterns such as the use of the deployment stamp pattern, the storage resource consolidation pattern and the dedicated storage resources per tenant pattern help to optimize the operational cost and management with little or no impact to the usages.

Tuesday, June 28, 2022

Compute services involve a variety of compute resources such as commodity virtual machines, containers, queue processors, PaaS resources, and specialized resources like GPUs and high-performance compute. Multitenant solutions introduce sharing which brings a higher density of tenants to infrastructure and reduce the operational cost and management.

When these compute resources are shared, isolation model, impact to scaling performance, state management and security of the compute resources must be considered. These key decisions for planning a multitenant compute solution are discussed below.

Scaling of resources helps meet the changing demand from the growing number of tenants and the increase in the amount of traffic. We might need to increase the capacity of the resources to maintain an acceptable performance rate. When the number of active users reduces or there is a decrease in the traffic, the compute resources could be released to reduce costs. When the resources are dedicated to a tenant, they can be independently scaled to meet the tenants’ demands. This is the simplest solution but it requires a minimum number of resources per tenant. A shared scaling of resources in the platform implies all the tenants will be affected. They will also suffer when the scale is insufficient to handle their overall load. If a single tenant uses a disproportionate amount of the resources available in the system, it leads to a well-known problem called the noisy neighbor antipattern. When the resource usage increases above the total capacity from the peak load of the tenants involved, failures occur which are not specific to a tenant and impact the performance of those tenants. The total capacity can also be exceeded when the individual usages are small but the number of tenants increase dramatically. Performance problems often remain undetected until an application is under load. A load testing preview can help analyze the behavior of the application under stress. Scaling horizontally or vertically helps correct the correlated application behavior.

The triggers that cause the components to scale must be carefully planned. When the scaling depends on the number of tenants, it is best to batch the next scaling after a batch size of tenants have been adequately served with the available resources. Many compute services provide autoscaling and require us to provide minimum and maximum levels of scale. Azure app services, Azure app functions, Azure container apps, Azure Kubernetes service, and Service for virtual machines can automatically increase or decrease the number of instances that run in the application.

State based considerations come when the data must be persisted between requests. From a scalability perspective, stateless components are often easy to scale out in terms of workers, instances or nodes and they can provide a warm start to immediately process requests. Tenants can also be moved between resources if it can be permitted. Stateful resources depend on the persisted state. They are generally avoided to be saved in the compute layer and are instead stored in storage services. Transient state can be stored in caches or local temporary files.

The patterns described above such as the use of the deployment stamp pattern, the compute resource consolidation pattern and the dedicated compute resources per tenant pattern help to optimize the operational cost and management with little or no impact to the usages.

Monday, June 27, 2022

Resource organization helps a multi-tenant solution with tenant-isolation and scale. There are specific tradeoffs to consider with multi-tenant isolation and scale-out across multiple resources. Azure’s resource limits and quotas and scaling the solution beyond these limits will be discussed.

When there is an automated deployment process and there is a need to scale across resources, the way to deploy and assign tenants must be decided. As we are approaching the number of tenants that can be assigned to a specific resource, we must detect the threshold. When we plan to deploy new resources, it must be decided whether they will be ready just in time or ready ahead of time.

When assumptions are made in code and configuration, they can limit the ability to scale. There might be a need to scale out to multiple storage accounts, but the application tier might be assuming a single storage account for all tenants.

Azure resources are deployed and managed through a hierarchy. Most resources are deployed into resource groups which are contained in subscriptions. This hierarchy pertains to a tenant. When we deploy the resources, we have the option to isolate them at different levels. Different models can be used in different components of the same solution.

Resources that are shared across multiple instances can still achieve isolation on a single instance for all the workloads from the tenants. When we run a single instance of resource, the service limits, subscription limits and the quota applies. When these limits are encountered, the shared resources must be scaled out.

Isolations within a shared resource requires the application code to be fully aware of multitenancy, and to restrict the data for a specific tenant. An alternative to this is to separate resources in resource groups. These help to manage the lifecycles of resources. Those resources that are in a resource group can be deleted all at once by deleting the resource group A naming convention, strategy and resource tags or a tenant catalog database is required in this case. Resource groups can also be separated into subscriptions. This enables us to easily configure policies and access control by putting resource groups into a shared subscription. There is a limit to the maximum number of resource groups that can be put in a subscription so they must spill over to a new subscription upon exceeding. Separate subscriptions help achieve complete isolation of tenant specific resources and they can be programmatically created. Azure reservations can also be used across subscriptions. The only difficulty is to request quota increases when there are a large number of subscriptions. The Quota API comes helpful to some resource types and quota increases must be requested by initiating a support case. Tenant specific subscriptions can be put into a management group hierarchy so that it enables easy management of access control rules and policies.

Reference: Multitenancy: https://1drv.ms/w/s!Ashlm-Nw-wnWhLMfc6pdJbQZ6XiPWA?e=fBoKcN