Cluster computing

Monday, December 20, 2021

This is a continuation of a series of articles on operational engineering aspects of Azure public cloud computing the included the most recent discussion on Azure Maps which is a full-fledged general availability service that provides similar Service Level Agreements as expected from others in the category. In this article, we explore Azure SQL Edge.

SQL Edge is an optimized relational database engine that is geared towards edge computing. It provides a high-performance data storage and processing layer for IoT applications. It provides capabilities to stream, process and analyze data where the data can vary from relational to document to graph to time-series and which makes it a right choice for a variety of modern IoT applications. It is built on the same database engine as the SQL Server and Azure SQL so applications will find it convenient to seamlessly use queries that are written in T-SQL. This makes applications portable between devices, datacenters and cloud.

Azure SQL Edge uses the same stream capabilities as Azure Stream Analytics on IoT edge. This native implementation of data streaming is called T-SQL streaming. It can handle fast streaming from multiple data sources. A T-SQL Streaming job consists of a Stream Input that defines the connections to a data source to read the data stream from, a stream output job that defines the connections to a data source to write the data stream to, and a stream query job that defines the data transformation, aggregations, filtering, sorting and joins to be applied to the input stream before it is written to the stream output.

Data can be transferred in and out of SQL Edge. For example, data can be synchronized from SQL Edge to Azure Blob storage by using Azure Data factory. As with all SQL instances, the client tools help create the database and the tables. The SQLPackage.exe is used to create and apply a DAC package file to the SQL Edge container. A stored procedure or trigger is used to update the watermark levels for a table. A watermark table is used to store the last timestamp up to which data has already been synchronized with Azure Storage. The stored procedure is run after every synchronization. A Data factory pipeline is used to synchronize data to Azure Blob storage from a table in Azure SQL Edge. This is created by using its user interface. The PeriodicSync property must be set at the time of creation. A lookup activity is used to get the old watermark value. A dataset is created to represent the data in the watermark table. This table contains the old watermark that was used in the previous copy operation. A new Linked Service is created to source the data from the SQL Edge server using a connection credentials. When the connection is tested, it can be used to preview the data to eliminate surprised during synchronization. The pipeline editor is a designer tool where the WatermarkDataset is selected as the source dataset. The lookup activity gets new watermark value from the table that contains the source data so it can be copied to the destination. A query can be added to the pipeline editor for selecting the maximum value of the timestamp from the Watermark table. Only the first row is selected as the new watermark. Incremental progress is maintained by continually advancing the watermark. Not only the source but the sink must also be specified on the editor. The sink will use a new linked service to the blob storage. The success output of a Copy activity is connected to a stored procedure activity which then writes a new watermark. Finally, the pipeline is scheduled to be triggered periodically.

Sunday, December 19, 2021

Edge Computing has developed differently from mainstream desktop, enterprise and cloud computing. The focus has always been on speed rather than data processing which is delegated to the core or cloud computing. Edge Servers work well for machine data collection and Internet of Things. Edge computing is typically associated with Event-Driven Architecture style. It relies heavily on asynchronous backend processing. Some form of message broker becomes necessary to maintain order between events, retries and dead-letter queues.

Azure SQL edge supports two deployment modes – those that are connected through Azure IoT edge and those that have disconnected deployment. The connected deployment requires Azure SQL Edge to be deployed as a module for Azure IoT Edge. In the disconnected deployment mode, it can be deployed as a standalone docker container or a Kubernetes cluster.

There are two editions for the Azure SQL edge – a developer edition and a production sku edition and the spec changes from 4 cores/32 GB to 8 cores and 64 GB. Azure SQL Edge uses the same stream capabilities as Azure Stream Analytics on IoT edge. This native implementation of data streaming is called T-SQL streaming. It can handle fast streaming from multiple data sources. The patterns and relationships in data is extracted from several IoT input sources. The extracted information can be used to trigger actions, alerts and notifications. A T-SQL Streaming job consists of a Stream Input that defines the connections to a data source to read the data stream from, a stream output job that defines the connections to a data source to write the data stream to, and a stream query job that defines the data transformation, aggregations, filtering, sorting and joins to be applied to the input stream before it is written to the stream output.

SQL Edge also support machine learning models by integrating with Open Neural Network Exchange runtimes. The models are developed independent of the edge but can be run on the edge.

Saturday, December 18, 2021

Azure Maps and heatmaps

This is a continuation of a series of articles on operational engineering aspects of Azure public cloud computing. In this article, we continue the discussion on Azure Maps which is a full-fledged general availability service that provides similar Service Level Agreements as expected from others in the category. We focus on one of the features of Azure Maps that enables overlay of images and heatmaps.

Azure Maps is a collection of geospatial services and SDKs that fetches the latest geographic data and provides it as a context to web and mobile applications. Specifically, it provides REST APIs to render vector and raster maps as overlays including satellite imagery, provides creator services to enable indoor map data publication, provides search services to locate addresses, places, and points of interest given indoor and outdoor data, provides various routing options such as point-to-point, multipoint, multipoint optimization, isochrone, electric vehicle, commercial vehicle, traffic influenced, and matrix routing, provides traffic flow view and incidents view, for applications that require real-time traffic information, provides Time zone and Geolocation services, provides elevation services with Digital Elevation Model, provides Geofencing service and mapping data storage, with location information hosted in Azure and provides Location intelligence through geospatial analytics.

The Web SDK for Azure Maps allows several features with the use of its map control. We can create a map, change the style of the map, add controls to the map, add layers on top of the map, add html markers, show traffic, cluster point data, and use data-driven style expressions, use image templates, react to events and make app accessible.

Heatmaps are also known as point density maps because they represent the density of data and the relative density of each data point using a range of colors. This can be overlaid on the maps as a layer. Heat maps can be used in different scenarios including temperature data, data for noise sensors, and GPS trace.

The addition of heat map is as simple as:

Map.layers.add(new atlas.layer.HeatMapLayer(datasource, null, { radius: 10, opacity: 0.8}), ‘labels’);

The opacity or transparency is normalized between 0 and 1. The intensity is a multiplier to the weight of each data point. The weight is a measure of the number of times the data point applies to the map.

Azure maps provides consistent zoomable heat map and the data aggregates together and the heat map might look different from when it was normal focus. Scaling the radius also changes the heat map because it doubles with each zoom level.

All of this processing is on the client side for the rendering of given data points.

Friday, December 17, 2021

Location queries
Location is a datatype. It can be represented either as a point or a polygon and each helps with answering questions such as getting top 3 stores near to a geographic point or stores within a region. Since it is a data type, there is some standardization available. SQL Server defines not one but two data types for the purpose of specifying location: the Geography data type and the Geometry data type. The Geography data type stores ellipsoidal data such as GPS Latitude and Longitude and the geometry data type stores Euclidean (flat) coordinate system. The point and the polygon are examples of the Geography data type. Both the geography and the geometry data type must have reference to a spatial system and since there are many of them, it must be used specifically in association with one. This is done with the help of a parameter called the Spatial Reference Identifier or SRID for short. The SRID 4326 is the well-known GPS coordinates that give information in the form of latitude/Longitude. Translation of an address to a Latitude/Longitude/SRID tuple is supported with the help of built-in functions that simply drill down progressively from the overall coordinate span. A table such as ZipCode could have an identifier, code, state, boundary, and center point with the help of these two data types. The boundary could be considered the polygon formed by the zip and the Center point as the central location in this zip. Distances between stores and their membership to zip can be calculated based on this center point. Geography data type also lets us perform clustering analytics which answers questions such as the number of stores or restaurants satisfying a certain spatial condition and/or matching certain attributes. These are implemented using R-Tree data structures that support such clustering techniques. The geometry data type supports operations such as area and distance because it translates to coordinates. It has its own rectangular coordinate system that we can use to specify the boundaries or the ‘bounding box’ that the spatial index covers.

The operations performed with these data types include the distance between two geography objects, the method to determine a range from a point such as a buffer or a margin, and the intersection of two geographic locations. The geometry data type supports operations such as area and distance because it translates to coordinates. Some other methods supported with these data types include contains, overlaps, touches, and within.

A note about the use of these data types now follows. One approach is to store the coordinates in a separate table where the primary keys are saved as the pair of latitude and longitude and then to describe them as unique such that a pair of latitude and longitude does not repeat. Such an approach is questionable because the uniqueness constraint for locations has a maintenance overhead. For example, two locations could refer to the same point and then unreferenced rows might need to be cleaned up. Locations also change ownership, for example, store A could own a location that was previously owned by store B, but B never updates its location. Moreover, stores could undergo renames or conversions. Thus, it may be better to keep the spatial data associated in a repeatable way along with the information about the location. Also, these data types do not participate in set operations. That is easy to do with collections and enumerable with the programming language of choice and usually consist of the following four steps: answer initialization, return an answer on termination, accumulation called for each row, and merge called when merging the processing from parallel workers. These steps are like a map-reduce algorithm. These data types and operations are improved with the help of a spatial index. These indexes continue to be like indexes of other data types and are stored using B-Tree. Since this is an ordinary one-dimensional index, the reduction of the dimensions of the two-dimensional spatial data is performed by means of tessellation which divides the area into small subareas and records the subareas that intersect each spatial instance. For example, with a given geography data type, the entire globe is divided into hemispheres and each hemisphere is projected onto a plane. When that given geography instance covers one or more subsections or tiles, the spatial index would have an entry for each such tile that is covered. The geometry data type has its own rectangular coordinate system that you define which you can use to specify the boundaries or the ‘bounding box’ that the spatial index covers. Visualizers support overlays with spatial data which is popular with mapping applications that super-impose information over the map with the help of transparent layers. An example is the Azure Maps with GeoFence as described here.

Thursday, December 16, 2021

Adding Azure Maps to an Android Application

This is a continuation of a series of articles on operational engineering aspects of Azure public cloud computing. In this article, we continue the discussion on Azure Maps which is a full-fledged general availability service that provides similar Service Level Agreements as expected from others in the category but with an emphasis on writing mobile applications. Specifically, we target the Android platform.

We leverage an Event-Driven architecture style where the Service Bus delivers the messages that the mobile application processes. As with the case of GeoFencing, different messages can be used for different handling. The mobile application is a consumer for the message making occassional API calls that generates messages on the backend of a web-queue server. The scope of this document is to focus on just the mobile application stack. The tracking and producing of messages are done in the backend and the mobile application uses the Bing Maps to display the location. We will need an active Azure Maps account and key for this purpose. The subscription, resource group, name and pricing tier must be determined beforehand. The mobile application merely adds an Azure Maps control to the application.

An Android application will require Java based deployment. Since the communication is over HTTP, the technology stack can be independent between the backend and the mobile application. The Azure Maps Android SDK will be leveraged for this purpose. The top-level build.gradle file will define the URL https://atlas.microsoft.com/sdk/android. Java 8 can be chosen as the appropriate version to use. The SDK can be imported into the build.gradle with the artifact description as "com.azure.android:azure-maps-control:1.0.0". The application will introduce the map control as <com.azure.android.maps.control.MapControl android:id="@+id/mapcontrol" android:layout_width="match_parent" android:layout_height="match_parent" /> in the main activity xml file. The corresponding Java file will add imports for the Azure Map SDK, set the Azure Maps Authentication information and get the map control instance in the onCreate method. SetSubscriptionKey and SetAadProperties can be used to add the authentication information on every view. The control will display the map even on the emulator. Sample Android application can be seen here.

As with all application, the activity control loops must be tightened and guide the user through specific workflows. The views, their lifetime and activity must be controlled, and the user should not see the application as hung or spinning. The interactivity for the control is assured if the application is recycling and cleaning up the associated resources as the user moves in and out of a page to another page.

It is highly recommended to get the activity framework and navigations worked out and planned independent of the content. The views corresponding to the content are going to be restricted to the one that displays the control, so the application focuses mostly on the user navigations and activities.

Wednesday, December 15, 2021

Azure Maps and GeoFence

This is a continuation of a series of articles on operational engineering aspects of Azure public cloud computing. In this article, we continue on the discussion on Azure Maps which is a full-fledged general availability service that provides similar Service Level Agreements as expected from others in the category.

Azure Maps can be helpful for tracking entry and exit into a geographical location such as the perimeters of a construction area. Such tracking can be used to generate notifications by email. A Geofencing GeoJSON data is uploaded to define the construction area we want to monitor. The Data Upload API will be used to upload geofences as polygon coordinates to the Azure Maps Account. Two logic apps can be written to send email notifications to the construction site operations when say a equipment enters or exit the construction site. An Azure Event Grid will subscribe to enter and exit events for Azure Maps geofence. Two webhook event subscriptions will call the HTTP endpoints defined in the two logic applications. The Search GeoFence Get API is used to receive notifications when a piece of equipment enters or exits the geofence areas.

The Geofencing GeoJSON data contains a FeatureCollection which consists of two geofences that pertain to distinct polygonal areas within the construction site. The first has no time expirations or restrictions and the second can only be queried during business hours. This data can be uploaded with a POST method call to the mapData endpoint along with the subscription key. Once the data is uploaded, we can retrieve its metadata to ascertain the created timestamp.

The Logic App will require a resource group and subscription to be deployed. A common trigger function to respond when an HTTP request is received, is sufficient for this purpose. Then the Azure Maps event subscriptions is created. It will require name, event schema, system topic name, filter to event types, endpoint type and endpoint. The Spatial Geofence Get API will send out the notifications on the entry to and exit from the geofence. Each equipment has a device id which is unique so both the entry and exit can be noted. The get method also returns a location in the form of x,y distance from the geofence. A negative distance will imply that the data will lie directly within the polygon.

Tuesday, December 14, 2021

Azure Maps:

This is a continuation of a series of articles on operational engineering aspects of Azure public cloud computing. In this article, we take a break to discuss a location service named Azure Maps. This is a full-fledged general availability service that provides similar Service Level Agreements as expected from others in the category.

SDKs are also available with flavors suited for desktop and mobile applications. Both the SDKs are quite powerful and enhance programmability. They allow customization of interactive maps that can render content and imagery specific to the publisher. The interactive map uses WebGL map control that is known for rendering large datasets with high performance. The SDKs can be used with JavaScript and TypeScript.

Location is a datatype. It can be represented either as a point or a polygon and each helps with answering questions such as getting top 3 stores near to a geographic point or stores within a region. Since it is a data type, there is some standardization available. SQL Server defines not one but two data types for the purpose of specifying location: the Geography data type and the Geometry data type. The Geography data type stores ellipsoidal data such as GPS Latitude and Longitude and the geometry data type stores Euclidean (flat) coordinate system. The point and the polygon are examples of the Geography data type. Both the geography and the geometry data type must have reference to a spatial system and since there are many of them, it must be used specifically in association with one. This is done with the help of a parameter called the Spatial Reference Identifier or SRID for short. The SRID 4326 is the well-known GPS coordinates that give information in the form of latitude/Longitude. Translation of an address to a Latitude/Longitude/SRID tuple is supported with the help of built-in functions that simply drill down progressively from the overall coordinate span. A table such as ZipCode could have an identifier, code, state, boundary, and center point with the help of these two data types. The boundary could be considered the polygon formed by the zip and the Center point as the central location in this zip. Distances between stores and their membership to zip can be calculated based on this center point. Geography data type also lets us perform clustering analytics which answers questions such as the number of stores or restaurants satisfying a certain spatial condition and/or matching certain attributes. These are implemented using R-Tree data structures that support such clustering techniques. The geometry data type supports operations such as area and distance because it translates to coordinates. It has its own rectangular coordinate system that we can use to specify the boundaries or the ‘bounding box’ that the spatial index covers.

Mapping the spatial data involves rendering the data as a layer on top of images. These overlays enhance the display and provide visual aid to the end-users with geographical context. The Azure Maps Power BI provides this functionality to visualize spatial data on top of a map. An Azure Maps account is required to create this resource via the Azure Portal.

Thanks