This is a continuation of series of articles on hosting solutions and services on Azure public cloud with the most recent discussion on Multitenancy here and picks up the discussion on the checklist for architecting and building multitenant solutions. Administrators would have found the list familiar to them.
While the previous article introduced the checklist as structured
around business and technical considerations, it provided specific examples in
terms of Microsoft technologies. This article focuses on the open-source
scenarios on Azure with the Apache stack specifically.
Each open-source product that is used in a multitenant solution must
be carefully reviewed for the features it offers to support multitenancy. While
the checklist alluded to some of the general requirements in terms of shared
resources and tenant isolation, open-source products might be able to
articulate isolation simply by naming containers differently. The
considerations to overcome noisy neighbor problems and scaling out
infrastructure must still be made to the degree that these products permit.
Let us take a few examples from the Apache stack. The Data
partitioning guidance for Apache Cassandra for instance describes how to
separate data partitions to be managed and accessed separately. Horizontal,
vertical and functional partitioning strategies must be suitably applied.
Another example is where Azure public Multi-access edge compute must provide high
availability to the tenants. Cassandra can be used to support geo-replication.
Apache Storm is used in edge computing and features true stream
processing with low-level APIs. The
trained AI models can be brought to the edge with Azure Stack Hub while Storm
stores the data. The advantage of hosting the AI models close to the edge is
that there is no latency in predictions from the events. The models can always
be trained on a high-performance processor including GPUs but do not need heavy
duty compute to host and run them for making predictions. Storm can be central
store to receive all the events from the edge as well as their predictions.
Since neither Big Data nor relational stores are suitable for the
ingestion, processing and analysis of events and those stores can be large
enough to overwhelm the continuous processing required for events, it is better
to use Storm for the store and the edge to generate the events. Storm is taken
as an example for stream processing store but it is not the only one. Readers
are encouraged to review Apache Kafka, Apache Flink and Pulsar if they would
like to leverage the nuances between their capabilities. There are also options
available in the stream processing systems on the public cloud such as HD
Insight with Storm which makes it easy to process and query data from Storm.
These interactive SQL queries can execute fast and at scale over both
structured or unstructured data. Stores like CosmosDB can accommodate diverse
and unpredictable IoT workloads without sacrificing ingestion or query
performance. If real-time processing
systems are required, then Storm and other stores can help with capturing,
analysis and generating reports or automated responses with minimal latency.
If batch processing is required, Apache Sqoop can help with
automations over Big Data. For example, Sqoop jobs can be used to copy
data. Data transfer options such as
Azure Import/Export, Data Box and Sqoop can work databases with little or no
impact to performance. Oozie and Sqoop can be used to manage batch workflows
for captured real-time data.
#codingexercise
No comments:
Post a Comment