Tuesday, August 3, 2021

Azure stream analytics

 Introduction:

This article is a continuation of the series of articles starting with the description of SignalR service sometime back. In this article, we focus on stream analytics from Azure. As the name suggests, this is a service used for analyzing events that are ordered and arrive in a continuous manner. Like its industry counterparts, this service also defines notions of jobs and running them on clusters analysis done with the help of data arriving in this form include identifying patterns and relationships and applies to data sources that range from devices sensors clickstreams social media feeds and other applications action be taken certain patterns and workflow scared that provide alerts and notifications to users’ data can also be transformed and channeled via pipelines for automating. This service is available on Azure IoT Edge runtime environment and enables processing data on those devices.

Data from device traffic usually build timestamps and are discreet, often independent of one another. They are also characterized as unstructured data and arriving in an ordered manner where it's generally not possible to store all of them at once for subsequent analysis. When the analysis is done in batches it becomes a batch processing job that runs on a cluster and scales out batches to different nodes, as many as the cluster will allow. Holding sets of data events in batches might introduce latency, so the notion of micro batching is introduced for more processing. Streaming actions, take it even further to process one event at a time.

Some of the use cases for continuous events involve geospatial analytics for fleet management and driverless vehicles weblogs in clickstream analytics and point of sale data from inventory control. In all these cases there is a point of ingestion from data sources typically via Azure Event Hubs, IoT hub, or BLOB storage. Even tottering options and time windows can be suitably adjusted to perform aggregations. The language of query is SQL and it can be extended with JavaScript or C sharp user-defined functions. Queries written in SQL are easy to apply to filtering, sorting, and aggregation. The topology between ingestion and delivery is handled by this stream analytics service while allowing extensions with the help of reference data stores, Azure functions, and real-time scoring via machine learning services. Event Hubs, Azure BLOB storage, and IoT hubs can collect data on the ingestion side, while they are distributed after analysis via alerts and notifications, dynamic dashboarding, data warehousing, and storage/archival. The fan-out of data to different services is itself a value addition but the ability to transform events into processed events also generates more possibilities for downstream usages including reporting and visualizations. As with all the services in the Azure portfolio, it comes with standard deployment using Azure resource manager templates, health monitoring via Azure monitoring, billing usages that can drive down costs, and various forms of programmability options such as SDK, REST-based API services, command-line interfaces, and PowerShell automation. It is a fully managed PaaS offering so the infrastructure and workflow initializers need not be set up by hand. It can also run in the cloud and scale to many events with relatively low latency. This service is not only production ready but also reliable in mission-critical deployments. Security and compliance are not sacrificed for the sake of performance. Finally, it integrates with Visual Studio to bring comprehensive testing, debugging, publishing, and authoring convenience.


No comments:

Post a Comment