Tuesday, July 26, 2022

 

This is a continuation of series of articles on hosting solutions and services on Azure public cloud with the most recent discussion on Multitenancy here and picks up the discussion on the checklist for architecting and building multitenant solutions. Administrators would have found the list familiar to them.  

While the previous article introduced the checklist as structured around business and technical considerations, it provided specific examples in terms of Microsoft technologies. This article focuses on the open-source scenarios on Azure with the Apache stack specifically. Some open-source products like Cassandra and Storm were studied for hosting on the Azure public cloud. This article focuses on the stream processing with fully managed open-source data engines.

Azure offers General Acceptance data services that run open-source engines. These include:

-          Azure Event Hubs which offer Kafka for stream ingestion

-          Azure Cosmos DB which supports event storage in Cassandra

-          Azure Kubernetes service which hosts Kubernetes microservices for stream processing.

-          Azure database for PostgreSQL which manages relational tables

-          Azure Cache for Redis which provides an in-memory data store.

This brings the benefit of both the open source as well as the azure managed services. The open-source solution is preferred for its ability to migrate existing workloads, tap into the broader open-source community, and limit vendor lock-in. The open-source technologies are made accessible with the public cloud services that offer high availability, high performance, improved scalability, and elasticity. This form of stream-based solutions can be used both for new workloads as well as migrating existing workloads.

A solution comprising of the above-mentioned services and their open-source engine would be laid out this way:  Streaming sources stream events to a Kafka store in the Azure Event Hub using one or more Kafka producers. AKS provides a managed environment for Apache Spark which consumes events. Microservices hosted on AKS write the events to Azure Cosmos DB using the Cassandra API. The change feed feature of CosmosDB processes the events in real-time. These events are batch-processed by applications that emit enriched information into PostgreSQL This relational data store relays the information downstream such as for reporting. Besides persistence, if an in-memory solution is required it can leverage Azure Cache for Redis. Websites and other applications use the cached data to improve response times.

This kind of solution can have improved performance, scalability, security, resiliency with just a few considerations.  For example, the Cosmos datastore can apply partitioning strategy to boost performance. Similarly, the PostgreSQL server can be setup with connection pooling to avoid repeated setup and teardown of connections. Scalability can be improved with a premium tier for Event Hub.  If the ingress exceeds a few Gigabytes of data, the dedicated tier can be used to setup and tear down clusters in a single tenant offering with a guaranteed capacity. Autoscaling of provisioned throughput is supported by Azure Cosmos DB when the workloads are unpredictable and spiky. Security can be improved with Azure Private Link so that the traffic between the services flows over the Azure backbone without breaching into the public internet.  Keys could be managed and rotated with a keyvault. Availability zones can be used to protect business critical applications from datacenter failures. Cost optimization can be achieved with regulated throughput and scaling up when demand increases. Using the proper tier and model, we can reduce costs by keeping the usage within limits.

No comments:

Post a Comment