(Continued from previous article)
When these IoT resources are shared, isolation model, impact-to-scaling
performance, state management and security of the IoT resources become complex.
Scaling resources helps meet the changing demand from the growing number of consumers
and the increase in the amount of traffic. We might need to increase the
capacity of the resources to maintain an acceptable performance rate. Scaling
depends on number of producers and consumers, payload size, partition count, egress
request rate and usage of IoT hubs capture, schema registry, and other advanced
features. When additional IoT is provisioned or rate limit is adjusted, the
multitenant solution can perform retries to overcome the transient failures
from requests. When the number of active users reduces or there is a decrease
in the traffic, the IoT resources could be released to reduce costs. Data
isolation depends on the scope of isolation. When the storage for IoT is a
relational database server, then the IoT solution can make use of IoT Hub. Varying
levels and scope of sharing of IoT resources demands simplicity from the
architecture. Patterns such as the
use of the deployment stamp pattern, the IoT resource consolidation pattern and
the dedicated IoT resources pattern help to optimize the operational cost and
management with little or no impact on the usages.
Edge computing relies heavily on asynchronous backend
processing. Some form of message broker becomes necessary to maintain order
between events, retries and dead-letter queues. The storage for the data must
follow the data partitioning guidance where the
partitions can be managed and accessed separately. Horizontal, vertical, and
functional partitioning strategies must be suitably applied. In the analytics space, a typical scenario is to build
solutions that integrate data from many IoT devices into a comprehensive data
analysis architecture to improve and automate decision making.
Event
Hubs, blob storage, and IoT hubs can collect data on the ingestion side, while
they are distributed after analysis via alerts and notifications, dynamic
dashboarding, data warehousing, and storage/archival. The fan-out of data to
different services is itself a value addition but the ability to transform
events into processed events also generates more possibilities for downstream
usages including reporting and visualizations.
One of the main considerations for data pipelines involving
ingestion capabilities for IoT scale data is the business continuity and
disaster recovery scenario. This is achieved with replication. A broker stores messages in a topic which is
a logical group of one or more partitions. The broker guarantees message
ordering within a partition and provides a persistent log-based storage layer
where the append-only logs inherently guarantee message ordering. By deploying
brokers over more than one cluster, geo-replication is introduced to address
disaster recovery strategies.
Each partition is associated with an append-only log, so
messages appended to the log are ordered by the time and have important offsets
such as the first available offset in the log, the high watermark or the offset
of the last message that was successfully written and committed to the log by
the brokers and the end offset where the
last message was written to the log and exceeds the high watermark. When a
broker goes down, subsequent durability and availability must be addressed with
replicas. Each partition has many replicas that are evenly distributed but one
replica is elected as the leader and the rest are followers. The leader is
where all the produce and consume requests go, and followers replicate the
writes from the leader.
A pull-based replication model is the norm for brokers where
dedicated fetcher threads periodically pull data between broker pairs. Each
replica is a byte-for-byte copy of each other, which makes this replication
offset preserving. The number of replicas is determined by the replication
factor. The leader maintains a ledge called the in-sync replica set, where
messages are committed by the leader after all replicas in the ISR set
replicate the message. Global availability demands that brokers are deployed with
different deployment modes. Two popular deployment modes are 1) a single broker
that stretches over multiple clusters and 2) a federation of connected
clusters.