Cluster computing

This is a continuation of a series of articles on operational engineering aspects of Azure public cloud computing that included the most recent discussion on Azure SQL Edge which is a full-fledged general availability service that provides similar Service Level Agreements as expected from others in the category.

SQL Edge is an optimized relational database engine that is geared towards edge computing. It provides a high-performance data storage and processing layer for IoT applications. It provides capabilities to stream, process and analyze data where the data can vary from relational to document to graph to time-series and which makes it a right choice for a variety of modern IoT applications. It is built on the same database engine as the SQL Server and Azure SQL so applications will find it convenient to seamlessly use queries that are written in T-SQL. This makes applications portable between devices, datacenters and cloud.

Azure SQL Edge uses the same stream capabilities as Azure Stream Analytics on IoT edge. This native implementation of data streaming is called T-SQL streaming. It can handle fast streaming from multiple data sources. A T-SQL Streaming job consists of a Stream Input that defines the connections to a data source to read the data stream from, a stream output job that defines the connections to a data source to write the data stream to, and a stream query job that defines the data transformation, aggregations, filtering, sorting and joins to be applied to the input stream before it is written to the stream output.

Azure SQL Edge is also noteworthy for bringing the machine learning technique directly to the edge by running ML models for edge devices. SQL Edge supports Open Neural Network Exchange (ONNX) and the model can be deployed with T-SQL. The model can be pre-trained or custom-trained outside the SQL Edge with a choice of frameworks. The model just needs to be in ONNX format. The ONNX model is simply inserted into the models table in the ONNX database and the connection string is sufficient to send the data into SQL. Then the PREDICT method can be run on the data using the model.

ML pipeline is a newer technology as compared to traditional software development stacks and pipelines have generally been on-premises simply due to the latitude in using different frameworks and development styles. Also, experimentation can get out of control from the limits allowed for free-tier in the public cloud. In some cases, Event processing systems such as Apache Spark and Kafka find it easier to replace Extract-Transform-Load solutions that proliferated with data warehouse. The use of SQL Edge avoids the requirement to perform ETL and machine learning models are end-products. They can be hosted in a variety of environments not just the cloud or the SQL Edge. Some ML users would like to load the model on mobile or edge devices. Many IoT traffic and experts agree that the streaming data from edge devices can be quite heavy in traffic where a database system will out-perform any edge device-based computing. Internet tcp relays are of the order of 250-300 milliseconds whereas ingestion rate for database processing can be upwards of thousands of events per second. These are some of the benefits of using machine learning within the database.

Cluster computing

Friday, December 24, 2021

No comments:

Post a Comment