Public
cloud computing must deal with events at an unprecedented scale. The right
choice of architectural style plays a big role in the total cost of ownership
for a solution involving events. IoT traffic for instance can be channeled via
event driven stack available from Azure and via SQL Edge also available from
Azure. The distinction between these may not be fully recognized or appreciated
by development teams focused on agile and expedient delivery of work items but
a sound architecture is like a good investment that increases the return
multiple times as opposed to one that might require frequent scaling, revamping
or even rewriting. This article explores
the differences between the two. It is a continuation of a series of
articles on operational engineering aspects of Azure public cloud computing
that included the most recent discussion on Azure SQL Edge which is a full-fledged general
availability service that provides similar Service Level Agreements as expected
from others in the category.
Event Driven architecture consists of event producers and
consumers. Event producers are those that generate a stream of events and event
consumers are ones that listen for events
The scale out can be adjusted to suit the demands of the
workload and the events can be responded to in real time. Producers and
consumers are isolated from one another. In some extreme cases such as IoT, the
events must be ingested at very high volumes. There is scope for a high degree
of parallelism since the consumers are run independently and in parallel, but
they are tightly coupled to the events. Network latency for message exchanges
between producers and consumers is kept to a minimum. Consumers can be added as
necessary without impacting existing ones.
Some of the benefits of this architecture include the
following: The publishers and subscribers are decoupled. There are no
point-to-point integrations. It's easy to add new consumers to the system.
Consumers can respond to events immediately as they arrive. They are highly
scalable and distributed. There are subsystems that have independent views of
the event stream.
Some of the challenges faced with this architecture
include the following: Event loss is tolerated so if there needs to be
guaranteed delivery, this poses a challenge. Some IoT traffic mandate a
guaranteed delivery Events are processed in exactly the order they arrive. Each
consumer type typically runs in multiple instances, for resiliency and scalability.
This can pose a challenge if the processing logic is not idempotent, or the
events must be processed in order.
Some of the best practices demonstrated by this code.
Events should be lean and mean and not bloated. Services should share only IDs and/or
a timestamp. Large data transfer between
services in this case is an antipattern. Loosely coupled event driven systems
are best.
Azure SQL Edge is an optimized relational database engine
that is geared towards edge computing. It provides a high-performance data
storage and processing layer for IoT applications. It provides capabilities to
stream, process and analyze data where the data can vary from relational to
document to graph to time-series and which makes it a right choice for a
variety of modern IoT applications. It is built on the same database engine as
the SQL Server and Azure SQL so applications will find it convenient to
seamlessly use queries that are written in T-SQL. This makes applications
portable between devices, datacenters and cloud.
Azure SQL Edge uses the same stream capabilities as Azure
Stream Analytics on IoT edge. This native implementation of data streaming is
called T-SQL streaming. It can handle fast streaming from multiple data
sources. The patterns and relationships in data is extracted from several IoT
input sources. The extracted information can be used to trigger actions, alerts
and notifications. A T-SQL Streaming job consists of a Stream Input that
defines the connections to a data source to read the data stream from, a stream
output job that defines the connections to a data source to write the data
stream to, and a stream query job that defines the data transformation,
aggregations, filtering, sorting and joins to be applied to the input stream
before it is written to the stream output.
Both the storage and the message queue handle large volume
of data and the execution can be stages as processing and analysis. The processing can be either batch oriented
or stream oriented. The analysis and
reporting can be offloaded to a variety of technology stacks with impressive
dashboards. While the processing handles the requirements for batch and
real-time processing on the big data, the analytics supports exploration and
rendering of output from big data. It utilizes components such as data sources,
data storage, batch processors, stream processors, real-time message queue,
analytics data store, analytics and reporting stacks, and orchestration.
Some of the benefits of this application include the
following: The ability to offload processing to a database, elastic scale and
interoperability with existing solutions.
Some of the challenges faced with this architectural style
include: The complexity to handle isolation for multiple data sources, and the
challenge to build, deploy and test data pipelines over a shared architecture.
Different products require as many as skillsets and maintenance with a
requirement for data and query virtualization. For example, U-SQL which is a
combination of SQL and C# is used with Azure Data Lake Analytics while SQL APIs
are used with Edge, Hive, HBase, FLink and Spark. With an Event driven
processing using heterogenous stack, the emphasis on data security gets diluted
and spread over a very large number of components.