Cluster computing

Saturday, July 23, 2022

This is a continuation of series of articles on hosting solutions and services on Azure public cloud with the most recent discussion on Multitenancy here and picks up the discussion on the checklist for architecting and building multitenant solutions. Administrators would have found the list familiar to them.

While the previous article introduced the checklist as structured around business and technical considerations, it provided specific examples in terms of Microsoft technologies. This article focuses on the open-source scenarios on Azure.

Each open-source product that is used in a multitenant solution must be carefully reviewed for the features it offers to support multitenancy. While the checklist alluded to some of the general requirements in terms of shared resources and tenant isolation, open-source products might be able to articulate isolation simply by naming containers differently. The considerations to overcome noisy neighbor problems and scaling out infrastructure must still be made to the degree that these products permit.

Let us take a few examples from the Apache stack. The Data partitioning guidance for Apache Cassandra for instance describes how to separate data partitions to be managed and accessed separately. Horizontal, vertical and functional partitioning strategies must be suitably applied. Another example is where Azure public Multi-access edge compute must provide high availability to the tenants. Cassandra can be used to support geo-replication.

In the analytics space, a typical scenario is to build solutions that integrate data from many IoT devices into a comprehensive data analysis architecture to improve and automate decision making. In this scenario, a Cassandra cluster is used to store data.

If the architecture involves an N-tier application with Apache Cassandra, then Linux virtual machines and a virtual network configured for N-tier applications must be deployed with Apache Cassandra. If the data is non-relational or No-SQL, the non-relational databases that store data as key-value pairs, graphs, time-series objects, and other storage models could leverage the Azure CosmosDB Cassandra API as the service for data access.

Stream processing for fully managed open-source data engines like Kafka, Kubernetes, Cassandra, PostgreSQL, and Redis components is also a typical scenario. Events could be streamed by using fully managed Azure data services.

Performance considerations for running Apache Cassandra on Azure Virtual machines must be examined. Then their recommendations can be used as a baseline to test against the workload.

There must be some safeguards against the noisy neighbor antipattern which is specific to some workloads. Service level objectives and even service level agreements could be defined. These would be based on the requirements of the tenants as well as the composite SLAs of the Azure resources. Reliability is easily impacted by scale and service level agreements can suffer from performance. Testing that the application performs well under load is an important consideration. Finally, Chaos engineering applications can be applied to test the reliability of the solution.

Security checklist applies as early as design time. There must be tenant isolation in a multi-tenant application but putting the right enforcements and hardening are required to always realize it. In addition, there must be some testing that the tenants are isolated. There must be no cross-tenant access or data leakage and sometimes this involves static and runtime code analysis. These tools can safeguard the security considerations throughout the development.

Reference: Multitenancy: https://1drv.ms/w/s!Ashlm-Nw-wnWhLMfc6pdJbQZ6XiPWA?e=fBoKcN

Cluster computing

Saturday, July 23, 2022

No comments:

Post a Comment