Cluster computing: Event-driven or Database

Public cloud computing must deal with events at an unprecedented scale. The right choice of architectural style plays a big role in the total cost of ownership for a solution involving events. IoT traffic for instance can be channeled via event driven stack available from Azure and via SQL Edge also available from Azure. The distinction between these may not be fully recognized or appreciated by development teams focused on agile and expedient delivery of work items but a sound architecture is like a good investment that increases the return multiple times as opposed to one that might require frequent scaling, revamping or even rewriting. This article explores the differences between the two. It is a continuation of a series of articles on operational engineering aspects of Azure public cloud computing that included the most recent discussion on Azure SQL Edge which is a full-fledged general availability service that provides similar Service Level Agreements as expected from others in the category.

Event Driven architecture consists of event producers and consumers. Event producers are those that generate a stream of events and event consumers are ones that listen for events

The scale out can be adjusted to suit the demands of the workload and the events can be responded to in real time. Producers and consumers are isolated from one another. In some extreme cases such as IoT, the events must be ingested at very high volumes. There is scope for a high degree of parallelism since the consumers are run independently and in parallel, but they are tightly coupled to the events. Network latency for message exchanges between producers and consumers is kept to a minimum. Consumers can be added as necessary without impacting existing ones.

Some of the benefits of this architecture include the following: The publishers and subscribers are decoupled. There are no point-to-point integrations. It's easy to add new consumers to the system. Consumers can respond to events immediately as they arrive. They are highly scalable and distributed. There are subsystems that have independent views of the event stream.

Some of the challenges faced with this architecture include the following: Event loss is tolerated so if there needs to be guaranteed delivery, this poses a challenge. Some IoT traffic mandate a guaranteed delivery Events are processed in exactly the order they arrive. Each consumer type typically runs in multiple instances, for resiliency and scalability. This can pose a challenge if the processing logic is not idempotent, or the events must be processed in order.

Some of the best practices demonstrated by this code. Events should be lean and mean and not bloated. Services should share only IDs and/or a timestamp. Large data transfer between services in this case is an antipattern. Loosely coupled event driven systems are best.

Azure SQL Edge is an optimized relational database engine that is geared towards edge computing. It provides a high-performance data storage and processing layer for IoT applications. It provides capabilities to stream, process and analyze data where the data can vary from relational to document to graph to time-series and which makes it a right choice for a variety of modern IoT applications. It is built on the same database engine as the SQL Server and Azure SQL so applications will find it convenient to seamlessly use queries that are written in T-SQL. This makes applications portable between devices, datacenters and cloud.

Azure SQL Edge uses the same stream capabilities as Azure Stream Analytics on IoT edge. This native implementation of data streaming is called T-SQL streaming. It can handle fast streaming from multiple data sources. The patterns and relationships in data is extracted from several IoT input sources. The extracted information can be used to trigger actions, alerts and notifications. A T-SQL Streaming job consists of a Stream Input that defines the connections to a data source to read the data stream from, a stream output job that defines the connections to a data source to write the data stream to, and a stream query job that defines the data transformation, aggregations, filtering, sorting and joins to be applied to the input stream before it is written to the stream output.

Both the storage and the message queue handle large volume of data and the execution can be stages as processing and analysis. The processing can be either batch oriented or stream oriented. The analysis and reporting can be offloaded to a variety of technology stacks with impressive dashboards. While the processing handles the requirements for batch and real-time processing on the big data, the analytics supports exploration and rendering of output from big data. It utilizes components such as data sources, data storage, batch processors, stream processors, real-time message queue, analytics data store, analytics and reporting stacks, and orchestration.

Some of the benefits of this application include the following: The ability to offload processing to a database, elastic scale and interoperability with existing solutions.

Some of the challenges faced with this architectural style include: The complexity to handle isolation for multiple data sources, and the challenge to build, deploy and test data pipelines over a shared architecture. Different products require as many as skillsets and maintenance with a requirement for data and query virtualization. For example, U-SQL which is a combination of SQL and C# is used with Azure Data Lake Analytics while SQL APIs are used with Edge, Hive, HBase, FLink and Spark. With an Event driven processing using heterogenous stack, the emphasis on data security gets diluted and spread over a very large number of components.

Cluster computing

Saturday, December 25, 2021

Event-driven or Database - the choice is yours.

No comments:

Post a Comment