Cluster computing

This is a continuation of a series of articles on operational engineering aspects of Azure public cloud computing the included the most recent discussion on Azure Maps which is a full-fledged general availability service that provides similar Service Level Agreements as expected from others in the category. In this article, we explore Azure SQL Edge.

SQL Edge is an optimized relational database engine that is geared towards edge computing. It provides a high-performance data storage and processing layer for IoT applications. It provides capabilities to stream, process and analyze data where the data can vary from relational to document to graph to time-series and which makes it a right choice for a variety of modern IoT applications. It is built on the same database engine as the SQL Server and Azure SQL so applications will find it convenient to seamlessly use queries that are written in T-SQL. This makes applications portable between devices, datacenters and cloud.

Azure SQL Edge uses the same stream capabilities as Azure Stream Analytics on IoT edge. This native implementation of data streaming is called T-SQL streaming. It can handle fast streaming from multiple data sources. A T-SQL Streaming job consists of a Stream Input that defines the connections to a data source to read the data stream from, a stream output job that defines the connections to a data source to write the data stream to, and a stream query job that defines the data transformation, aggregations, filtering, sorting and joins to be applied to the input stream before it is written to the stream output.

Data can be transferred in and out of SQL Edge. For example, data can be synchronized from SQL Edge to Azure Blob storage by using Azure Data factory. As with all SQL instances, the client tools help create the database and the tables. The SQLPackage.exe is used to create and apply a DAC package file to the SQL Edge container. A stored procedure or trigger is used to update the watermark levels for a table. A watermark table is used to store the last timestamp up to which data has already been synchronized with Azure Storage. The stored procedure is run after every synchronization. A Data factory pipeline is used to synchronize data to Azure Blob storage from a table in Azure SQL Edge. This is created by using its user interface. The PeriodicSync property must be set at the time of creation. A lookup activity is used to get the old watermark value. A dataset is created to represent the data in the watermark table. This table contains the old watermark that was used in the previous copy operation. A new Linked Service is created to source the data from the SQL Edge server using a connection credentials. When the connection is tested, it can be used to preview the data to eliminate surprised during synchronization. The pipeline editor is a designer tool where the WatermarkDataset is selected as the source dataset. The lookup activity gets new watermark value from the table that contains the source data so it can be copied to the destination. A query can be added to the pipeline editor for selecting the maximum value of the timestamp from the Watermark table. Only the first row is selected as the new watermark. Incremental progress is maintained by continually advancing the watermark. Not only the source but the sink must also be specified on the editor. The sink will use a new linked service to the blob storage. The success output of a Copy activity is connected to a stored procedure activity which then writes a new watermark. Finally, the pipeline is scheduled to be triggered periodically.

Cluster computing

Monday, December 20, 2021

No comments:

Post a Comment