This is a continuation of series of articles on hosting solutions and
services on Azure public cloud with the most recent discussion on Multitenancy here and
picks up the discussion on the checklist for architecting and building
multitenant solutions. Administrators would have found the list familiar to
them.
While the previous article introduced the checklist as structured
around business and technical considerations, it provided specific examples in
terms of Microsoft technologies. This article focuses on the open-source
scenarios on Azure with the Apache stack specifically. Some open-source
products like Cassandra and Storm were studied for hosting on the Azure public
cloud. This article focuses on the stream processing with fully managed
open-source data engines.
Azure offers General Acceptance data services that run open-source
engines. These include:
-
Azure
Event Hubs which offer Kafka for stream ingestion
-
Azure
Cosmos DB which supports event storage in Cassandra
-
Azure
Kubernetes service which hosts Kubernetes microservices for stream processing.
-
Azure
database for PostgreSQL which manages relational tables
-
Azure
Cache for Redis which provides an in-memory data store.
This brings the benefit of both the open source as well as the azure
managed services. The open-source solution is preferred for its ability to
migrate existing workloads, tap into the broader open-source community, and
limit vendor lock-in. The open-source technologies are made accessible with the
public cloud services that offer high availability, high performance, improved
scalability, and elasticity. This form of stream-based solutions can be used
both for new workloads as well as migrating existing workloads.
A solution comprising of the above-mentioned services and their
open-source engine would be laid out this way:
Streaming sources stream events to a Kafka store in the Azure Event Hub
using one or more Kafka producers. AKS provides a managed environment for Apache
Spark which consumes events. Microservices hosted on AKS write the events to Azure
Cosmos DB using the Cassandra API. The change feed feature of CosmosDB
processes the events in real-time. These events are batch-processed by
applications that emit enriched information into PostgreSQL This relational
data store relays the information downstream such as for reporting. Besides
persistence, if an in-memory solution is required it can leverage Azure Cache
for Redis. Websites and other applications use the cached data to improve
response times.
No comments:
Post a Comment