Cluster computing

Friday, December 17, 2021

Location queries
Location is a datatype. It can be represented either as a point or a polygon and each helps with answering questions such as getting top 3 stores near to a geographic point or stores within a region. Since it is a data type, there is some standardization available. SQL Server defines not one but two data types for the purpose of specifying location: the Geography data type and the Geometry data type. The Geography data type stores ellipsoidal data such as GPS Latitude and Longitude and the geometry data type stores Euclidean (flat) coordinate system. The point and the polygon are examples of the Geography data type. Both the geography and the geometry data type must have reference to a spatial system and since there are many of them, it must be used specifically in association with one. This is done with the help of a parameter called the Spatial Reference Identifier or SRID for short. The SRID 4326 is the well-known GPS coordinates that give information in the form of latitude/Longitude. Translation of an address to a Latitude/Longitude/SRID tuple is supported with the help of built-in functions that simply drill down progressively from the overall coordinate span. A table such as ZipCode could have an identifier, code, state, boundary, and center point with the help of these two data types. The boundary could be considered the polygon formed by the zip and the Center point as the central location in this zip. Distances between stores and their membership to zip can be calculated based on this center point. Geography data type also lets us perform clustering analytics which answers questions such as the number of stores or restaurants satisfying a certain spatial condition and/or matching certain attributes. These are implemented using R-Tree data structures that support such clustering techniques. The geometry data type supports operations such as area and distance because it translates to coordinates. It has its own rectangular coordinate system that we can use to specify the boundaries or the ‘bounding box’ that the spatial index covers.

The operations performed with these data types include the distance between two geography objects, the method to determine a range from a point such as a buffer or a margin, and the intersection of two geographic locations. The geometry data type supports operations such as area and distance because it translates to coordinates. Some other methods supported with these data types include contains, overlaps, touches, and within.

A note about the use of these data types now follows. One approach is to store the coordinates in a separate table where the primary keys are saved as the pair of latitude and longitude and then to describe them as unique such that a pair of latitude and longitude does not repeat. Such an approach is questionable because the uniqueness constraint for locations has a maintenance overhead. For example, two locations could refer to the same point and then unreferenced rows might need to be cleaned up. Locations also change ownership, for example, store A could own a location that was previously owned by store B, but B never updates its location. Moreover, stores could undergo renames or conversions. Thus, it may be better to keep the spatial data associated in a repeatable way along with the information about the location. Also, these data types do not participate in set operations. That is easy to do with collections and enumerable with the programming language of choice and usually consist of the following four steps: answer initialization, return an answer on termination, accumulation called for each row, and merge called when merging the processing from parallel workers. These steps are like a map-reduce algorithm. These data types and operations are improved with the help of a spatial index. These indexes continue to be like indexes of other data types and are stored using B-Tree. Since this is an ordinary one-dimensional index, the reduction of the dimensions of the two-dimensional spatial data is performed by means of tessellation which divides the area into small subareas and records the subareas that intersect each spatial instance. For example, with a given geography data type, the entire globe is divided into hemispheres and each hemisphere is projected onto a plane. When that given geography instance covers one or more subsections or tiles, the spatial index would have an entry for each such tile that is covered. The geometry data type has its own rectangular coordinate system that you define which you can use to specify the boundaries or the ‘bounding box’ that the spatial index covers. Visualizers support overlays with spatial data which is popular with mapping applications that super-impose information over the map with the help of transparent layers. An example is the Azure Maps with GeoFence as described here.

Thursday, December 16, 2021

Adding Azure Maps to an Android Application

This is a continuation of a series of articles on operational engineering aspects of Azure public cloud computing. In this article, we continue the discussion on Azure Maps which is a full-fledged general availability service that provides similar Service Level Agreements as expected from others in the category but with an emphasis on writing mobile applications. Specifically, we target the Android platform.

We leverage an Event-Driven architecture style where the Service Bus delivers the messages that the mobile application processes. As with the case of GeoFencing, different messages can be used for different handling. The mobile application is a consumer for the message making occassional API calls that generates messages on the backend of a web-queue server. The scope of this document is to focus on just the mobile application stack. The tracking and producing of messages are done in the backend and the mobile application uses the Bing Maps to display the location. We will need an active Azure Maps account and key for this purpose. The subscription, resource group, name and pricing tier must be determined beforehand. The mobile application merely adds an Azure Maps control to the application.

An Android application will require Java based deployment. Since the communication is over HTTP, the technology stack can be independent between the backend and the mobile application. The Azure Maps Android SDK will be leveraged for this purpose. The top-level build.gradle file will define the URL https://atlas.microsoft.com/sdk/android. Java 8 can be chosen as the appropriate version to use. The SDK can be imported into the build.gradle with the artifact description as "com.azure.android:azure-maps-control:1.0.0". The application will introduce the map control as <com.azure.android.maps.control.MapControl android:id="@+id/mapcontrol" android:layout_width="match_parent" android:layout_height="match_parent" /> in the main activity xml file. The corresponding Java file will add imports for the Azure Map SDK, set the Azure Maps Authentication information and get the map control instance in the onCreate method. SetSubscriptionKey and SetAadProperties can be used to add the authentication information on every view. The control will display the map even on the emulator. Sample Android application can be seen here.

As with all application, the activity control loops must be tightened and guide the user through specific workflows. The views, their lifetime and activity must be controlled, and the user should not see the application as hung or spinning. The interactivity for the control is assured if the application is recycling and cleaning up the associated resources as the user moves in and out of a page to another page.

It is highly recommended to get the activity framework and navigations worked out and planned independent of the content. The views corresponding to the content are going to be restricted to the one that displays the control, so the application focuses mostly on the user navigations and activities.

Wednesday, December 15, 2021

Azure Maps and GeoFence

This is a continuation of a series of articles on operational engineering aspects of Azure public cloud computing. In this article, we continue on the discussion on Azure Maps which is a full-fledged general availability service that provides similar Service Level Agreements as expected from others in the category.

Azure Maps is a collection of geospatial services and SDKs that fetches the latest geographic data and provides it as a context to web and mobile applications. Specifically, it provides REST APIs to render vector and raster maps as overlays including satellite imagery, provides creator services to enable indoor map data publication, provides search services to locate addresses, places, and points of interest given indoor and outdoor data, provides various routing options such as point-to-point, multipoint, multipoint optimization, isochrone, electric vehicle, commercial vehicle, traffic influenced, and matrix routing, provides traffic flow view and incidents view, for applications that require real-time traffic information, provides Time zone and Geolocation services, provides elevation services with Digital Elevation Model, provides Geofencing service and mapping data storage, with location information hosted in Azure and provides Location intelligence through geospatial analytics.

Azure Maps can be helpful for tracking entry and exit into a geographical location such as the perimeters of a construction area. Such tracking can be used to generate notifications by email. A Geofencing GeoJSON data is uploaded to define the construction area we want to monitor. The Data Upload API will be used to upload geofences as polygon coordinates to the Azure Maps Account. Two logic apps can be written to send email notifications to the construction site operations when say a equipment enters or exit the construction site. An Azure Event Grid will subscribe to enter and exit events for Azure Maps geofence. Two webhook event subscriptions will call the HTTP endpoints defined in the two logic applications. The Search GeoFence Get API is used to receive notifications when a piece of equipment enters or exits the geofence areas.

The Geofencing GeoJSON data contains a FeatureCollection which consists of two geofences that pertain to distinct polygonal areas within the construction site. The first has no time expirations or restrictions and the second can only be queried during business hours. This data can be uploaded with a POST method call to the mapData endpoint along with the subscription key. Once the data is uploaded, we can retrieve its metadata to ascertain the created timestamp.

The Logic App will require a resource group and subscription to be deployed. A common trigger function to respond when an HTTP request is received, is sufficient for this purpose. Then the Azure Maps event subscriptions is created. It will require name, event schema, system topic name, filter to event types, endpoint type and endpoint. The Spatial Geofence Get API will send out the notifications on the entry to and exit from the geofence. Each equipment has a device id which is unique so both the entry and exit can be noted. The get method also returns a location in the form of x,y distance from the geofence. A negative distance will imply that the data will lie directly within the polygon.

Tuesday, December 14, 2021

Azure Maps:

This is a continuation of a series of articles on operational engineering aspects of Azure public cloud computing. In this article, we take a break to discuss a location service named Azure Maps. This is a full-fledged general availability service that provides similar Service Level Agreements as expected from others in the category.

SDKs are also available with flavors suited for desktop and mobile applications. Both the SDKs are quite powerful and enhance programmability. They allow customization of interactive maps that can render content and imagery specific to the publisher. The interactive map uses WebGL map control that is known for rendering large datasets with high performance. The SDKs can be used with JavaScript and TypeScript.

Location is a datatype. It can be represented either as a point or a polygon and each helps with answering questions such as getting top 3 stores near to a geographic point or stores within a region. Since it is a data type, there is some standardization available. SQL Server defines not one but two data types for the purpose of specifying location: the Geography data type and the Geometry data type. The Geography data type stores ellipsoidal data such as GPS Latitude and Longitude and the geometry data type stores Euclidean (flat) coordinate system. The point and the polygon are examples of the Geography data type. Both the geography and the geometry data type must have reference to a spatial system and since there are many of them, it must be used specifically in association with one. This is done with the help of a parameter called the Spatial Reference Identifier or SRID for short. The SRID 4326 is the well-known GPS coordinates that give information in the form of latitude/Longitude. Translation of an address to a Latitude/Longitude/SRID tuple is supported with the help of built-in functions that simply drill down progressively from the overall coordinate span. A table such as ZipCode could have an identifier, code, state, boundary, and center point with the help of these two data types. The boundary could be considered the polygon formed by the zip and the Center point as the central location in this zip. Distances between stores and their membership to zip can be calculated based on this center point. Geography data type also lets us perform clustering analytics which answers questions such as the number of stores or restaurants satisfying a certain spatial condition and/or matching certain attributes. These are implemented using R-Tree data structures that support such clustering techniques. The geometry data type supports operations such as area and distance because it translates to coordinates. It has its own rectangular coordinate system that we can use to specify the boundaries or the ‘bounding box’ that the spatial index covers.

Mapping the spatial data involves rendering the data as a layer on top of images. These overlays enhance the display and provide visual aid to the end-users with geographical context. The Azure Maps Power BI provides this functionality to visualize spatial data on top of a map. An Azure Maps account is required to create this resource via the Azure Portal.

Thanks

Monday, December 13, 2021

Azure Object Anchors:

This is a continuation of a series of articles on operational engineering aspects of Azure public cloud computing. In this article, we take a break to discuss a preview feature named Azure Object anchors. An Azure preview feature is available for customers to use but it does not necessarily have the same service level as the services that have been released with general availability.

Azure Object Anchors is a service under the mixed-reality category. It detects an object in the physical world using a 3D model. This model may not apply a rendering directly to the physical counterpart without some translation and rotation. This is referred to as a pose of the model and comes with the acronym 6DoF also called the 6 Degrees of Freedom. The service accepts a 3D object model and outputs an Azure Object Anchors model. The generated model can be used alongside a runtime SDK to enable a HoloLens application to load an object model, detect and track instances of that model in the physical world.

Some examples use cases enabled by this model include:

1) Training which creates a mixed reality training experience for workers without the need to place markers or adjust hologram alignments. It can also be used to augment Mixed Reality training experiences with automated detection and tracking.

2) It can be used for task guidance where a set of tasks can be simplified when using Mixed Reality.

This is different from object embedding that finds salient objects in a vector space. This is overlay or superimposition of a model on top of the existing physical world video which requires a specific object that has been converted to the vector space. A service that does automated embedding and the overlay together, is not available yet.

Conversion service is involved in transforming a 3D asset into a vector space model. The asset can be described as a Computer-Aided Design diagram or scanned. It must be in one of the supported file formats as fbx, ply, obj, glb, and gltf format. The unit of measurement for the 3D model must be one of Azure.MixedReality.ObjectAnchors.Conversion.AssetLengthUnit enumeration. A gravity vector is provided as an axis. A console application is available in the samples to convert this 3D asset into An Azure Object Anchors model. An upload will require Account ID Guid, the account domain that is the named qualifier for the resource to be uploaded, and an Account Key

The converted model can also be downloaded. It can be visualized using its mesh. Instead of building a scene to visualize the converted model, we can simply open the “VisualizeScene” and add it to the scene build list. Only that VisualizeScene must be included in the Build Settings. All other scenes shouldn’t be included. Next, from the hierarchy panel, we can select the Visualize GameObject and then select the Play button on top of the Unity editor. Ensure that the Scene view is selected. With the Scene View’s navigational control, we can then inspect the Object Anchors model.

After the model has been viewed, it can be copied to a HoloLens device that has the runtime SDK for Unity which can then assist with detecting physical objects that match the original model.

Thanks

Sunday, December 12, 2021

This is a continuation of an article that describes operational considerations for hosting solutions on the Azure public cloud.

There are several references to best practices throughout the series of articles we wrote from the documentation for the Azure Public Cloud. The previous article focused on the antipatterns to avoid, specifically the noisy neighbor antipattern. This article focuses on the performance tuning of CosmosDB usages

An example of an application using CosmosDB is a drone delivery application that runs on Azure Kubernetes Service. When a fleet of drones sends position data in real-time to Azure IoT Hub, a functions app receives the events, transforms the data into GeoJSON format, and writes it to CosmosDB. The geospatial data in CosmosDB can be indexed for efficient spatial queries which enables a client application to query all drones within a finite distance of a given location or find all drones in a certain polygon. Azure Functions is used to write data to CosmosDB because it can be lightweight and there is no requirement to require a full-fledged stream processing engine that joins streams, aggregates data, or processes across time windows and CosmosDB can support high write throughput.

Monitoring data for CosmosDB can show 429 error codes in responses. Cosmos DB would throw this error when it is temporarily throttling requests and usually when the caller is consuming more resource units than provisioned. It can also be thrown when the items to be created are already existing in the store.

When the 429-error code and is accompanied with a wait of about 600 ms before the operation is retried, it points to waits without any corresponding activity. Another chart for resource unit consumption per partition versus provisioned resource units per partition will help with the original cause for the 429-error preceding the wait. This may show that the resource unit consumption has exceeded the provisioned resource units.

Another likely case for CosmosDB errors is the incorrect usage of partition keys. Cross-partition queries may result when queries do not include a partition key, and this is quite inefficient. It might even lead to high latency when multiple database partitions are queried in serial. On the opposite side, hot write partitions may result when the documents are being written and a partition key is missing. A partition heat map can assist in this regard because it will show the head room between allocated and consumed resource units.

CosmosDB provides snapshot isolation. So, it is important to include version string with the operations. There is a system defined _eTag property that is automatically generated and updated by the server every time the item is updated. _eTag can be used with the client supplied if-match request header to allow the server to decide whether an item can be conditionally updated. This property value changes every time it is updated and this can be relied upon as a signal to the application to reapply updates and retry the original client request.

Saturday, December 11, 2021

Creating Git Pull Requests (PR):

Introduction: This article focuses on some of the advanced techniques used with git pull requests that are required for reviewing code changes made to the source code for a team. The purpose of the pull request is that it allows reviewers to see the differences between the current and the proposed code on a file by file and line by line basis. They are so named because it is opened between two branches that are different where one branch is pulled from another usually the master branch and when the request is completed, it is merged back into the master. When a feature is written, it is checked into the feature branch as the source and merged into the master branch as target. Throughout this discussion, we will refer to the master and the feature branch.

Technique #1: Merge options

Code is checked into the branch in the form of commits. Each commit represents a point of time. A sequence of commits is a linear history for that branch. Code changes overlay one on top of the other. When there is a conflict between current and proposed code snippets, they are referred to as HEAD or TAIL because only one of them is accepted. ‘Rebase’ and ‘Merge’ are two techniques by which changes made in master can be pulled into the feature branch. The new changes from master appear as TAIL or HEAD respectively in the feature branch. Rebase preserves history of commits while merge creates a new commit.

There are four ways to merge the code changes from the feature to the master branch. These include:

Merge (no fast forward) – which is a non-linear history preserving all commits from both branches.

Squash commit - which is a linear history with only a single commit on the target.

Rebase and fast forward – which is a rebase source commits onto target and fast-forward

Semi-linear merge – which rebases source commits onto target and create a two-parent merge.

Prefer the squash commit as the merge to master because the entire feature can be rolled back if the need arises.

Technique #2 Interactive rebase

This allows us to manipulate multiple commits so that the history is modified to reflect only certain commits. When commits are rebased, we can pick and squash those that we want to keep or fold respectively so that the timeline shows only the history required. A clean history is readable which reflects the order of the commits and for narrowing down the root cause for bugs, creating a change log and to automatically extract release notes.

Technique #3: No history

Creating a pull request without history by creating another branch enables easier review. If a feature branch has a lot of commits that is hard to rebase, then there is an option to create a PR without history. This is done in two stages:

First, the target branch for the feature_branch merge is selected as a new branch say feature_branch_no_history and merge all the code changes with the “squash commit” option.

Second, a new PR is created that targets the merging of the feature_branch_no_history into the master.

The steps to completely clear history would be:

-- Remove the history from

rm -rf .git

-- recreate the repos from the current content only

git init

git add .

git commit -m "Initial commit"

-- push to the github remote repos ensuring you overwrite history

git remote add origin git@github.com:<YOUR ACCOUNT>/<YOUR REPOS>.git

git push -u --force origin master

A safer approach might be:

git init

git add .

git commit -m 'Initial commit'

git remote add origin [repo_address]

git push --mirror --force

Conclusion: Exercising caution with git pull request and history helps with a cleaner, readable and actionable code review and merge practice.