Cluster computing

Monday, December 13, 2021

Azure Object Anchors:

This is a continuation of a series of articles on operational engineering aspects of Azure public cloud computing. In this article, we take a break to discuss a preview feature named Azure Object anchors. An Azure preview feature is available for customers to use but it does not necessarily have the same service level as the services that have been released with general availability.

Azure Object Anchors is a service under the mixed-reality category. It detects an object in the physical world using a 3D model. This model may not apply a rendering directly to the physical counterpart without some translation and rotation. This is referred to as a pose of the model and comes with the acronym 6DoF also called the 6 Degrees of Freedom. The service accepts a 3D object model and outputs an Azure Object Anchors model. The generated model can be used alongside a runtime SDK to enable a HoloLens application to load an object model, detect and track instances of that model in the physical world.

Some examples use cases enabled by this model include:

1) Training which creates a mixed reality training experience for workers without the need to place markers or adjust hologram alignments. It can also be used to augment Mixed Reality training experiences with automated detection and tracking.

2) It can be used for task guidance where a set of tasks can be simplified when using Mixed Reality.

This is different from object embedding that finds salient objects in a vector space. This is overlay or superimposition of a model on top of the existing physical world video which requires a specific object that has been converted to the vector space. A service that does automated embedding and the overlay together, is not available yet.

Conversion service is involved in transforming a 3D asset into a vector space model. The asset can be described as a Computer-Aided Design diagram or scanned. It must be in one of the supported file formats as fbx, ply, obj, glb, and gltf format. The unit of measurement for the 3D model must be one of Azure.MixedReality.ObjectAnchors.Conversion.AssetLengthUnit enumeration. A gravity vector is provided as an axis. A console application is available in the samples to convert this 3D asset into An Azure Object Anchors model. An upload will require Account ID Guid, the account domain that is the named qualifier for the resource to be uploaded, and an Account Key

The converted model can also be downloaded. It can be visualized using its mesh. Instead of building a scene to visualize the converted model, we can simply open the “VisualizeScene” and add it to the scene build list. Only that VisualizeScene must be included in the Build Settings. All other scenes shouldn’t be included. Next, from the hierarchy panel, we can select the Visualize GameObject and then select the Play button on top of the Unity editor. Ensure that the Scene view is selected. With the Scene View’s navigational control, we can then inspect the Object Anchors model.

After the model has been viewed, it can be copied to a HoloLens device that has the runtime SDK for Unity which can then assist with detecting physical objects that match the original model.

Thanks

Sunday, December 12, 2021

This is a continuation of an article that describes operational considerations for hosting solutions on the Azure public cloud.

There are several references to best practices throughout the series of articles we wrote from the documentation for the Azure Public Cloud. The previous article focused on the antipatterns to avoid, specifically the noisy neighbor antipattern. This article focuses on the performance tuning of CosmosDB usages

An example of an application using CosmosDB is a drone delivery application that runs on Azure Kubernetes Service. When a fleet of drones sends position data in real-time to Azure IoT Hub, a functions app receives the events, transforms the data into GeoJSON format, and writes it to CosmosDB. The geospatial data in CosmosDB can be indexed for efficient spatial queries which enables a client application to query all drones within a finite distance of a given location or find all drones in a certain polygon. Azure Functions is used to write data to CosmosDB because it can be lightweight and there is no requirement to require a full-fledged stream processing engine that joins streams, aggregates data, or processes across time windows and CosmosDB can support high write throughput.

Monitoring data for CosmosDB can show 429 error codes in responses. Cosmos DB would throw this error when it is temporarily throttling requests and usually when the caller is consuming more resource units than provisioned. It can also be thrown when the items to be created are already existing in the store.

When the 429-error code and is accompanied with a wait of about 600 ms before the operation is retried, it points to waits without any corresponding activity. Another chart for resource unit consumption per partition versus provisioned resource units per partition will help with the original cause for the 429-error preceding the wait. This may show that the resource unit consumption has exceeded the provisioned resource units.

Another likely case for CosmosDB errors is the incorrect usage of partition keys. Cross-partition queries may result when queries do not include a partition key, and this is quite inefficient. It might even lead to high latency when multiple database partitions are queried in serial. On the opposite side, hot write partitions may result when the documents are being written and a partition key is missing. A partition heat map can assist in this regard because it will show the head room between allocated and consumed resource units.

CosmosDB provides snapshot isolation. So, it is important to include version string with the operations. There is a system defined _eTag property that is automatically generated and updated by the server every time the item is updated. _eTag can be used with the client supplied if-match request header to allow the server to decide whether an item can be conditionally updated. This property value changes every time it is updated and this can be relied upon as a signal to the application to reapply updates and retry the original client request.

Saturday, December 11, 2021

Creating Git Pull Requests (PR):

Introduction: This article focuses on some of the advanced techniques used with git pull requests that are required for reviewing code changes made to the source code for a team. The purpose of the pull request is that it allows reviewers to see the differences between the current and the proposed code on a file by file and line by line basis. They are so named because it is opened between two branches that are different where one branch is pulled from another usually the master branch and when the request is completed, it is merged back into the master. When a feature is written, it is checked into the feature branch as the source and merged into the master branch as target. Throughout this discussion, we will refer to the master and the feature branch.

Technique #1: Merge options

Code is checked into the branch in the form of commits. Each commit represents a point of time. A sequence of commits is a linear history for that branch. Code changes overlay one on top of the other. When there is a conflict between current and proposed code snippets, they are referred to as HEAD or TAIL because only one of them is accepted. ‘Rebase’ and ‘Merge’ are two techniques by which changes made in master can be pulled into the feature branch. The new changes from master appear as TAIL or HEAD respectively in the feature branch. Rebase preserves history of commits while merge creates a new commit.

There are four ways to merge the code changes from the feature to the master branch. These include:

Merge (no fast forward) – which is a non-linear history preserving all commits from both branches.

Squash commit - which is a linear history with only a single commit on the target.

Rebase and fast forward – which is a rebase source commits onto target and fast-forward

Semi-linear merge – which rebases source commits onto target and create a two-parent merge.

Prefer the squash commit as the merge to master because the entire feature can be rolled back if the need arises.

Technique #2 Interactive rebase

This allows us to manipulate multiple commits so that the history is modified to reflect only certain commits. When commits are rebased, we can pick and squash those that we want to keep or fold respectively so that the timeline shows only the history required. A clean history is readable which reflects the order of the commits and for narrowing down the root cause for bugs, creating a change log and to automatically extract release notes.

Technique #3: No history

Creating a pull request without history by creating another branch enables easier review. If a feature branch has a lot of commits that is hard to rebase, then there is an option to create a PR without history. This is done in two stages:

First, the target branch for the feature_branch merge is selected as a new branch say feature_branch_no_history and merge all the code changes with the “squash commit” option.

Second, a new PR is created that targets the merging of the feature_branch_no_history into the master.

The steps to completely clear history would be:

-- Remove the history from

rm -rf .git

-- recreate the repos from the current content only

git init

git add .

git commit -m "Initial commit"

-- push to the github remote repos ensuring you overwrite history

git remote add origin git@github.com:<YOUR ACCOUNT>/<YOUR REPOS>.git

git push -u --force origin master

A safer approach might be:

git init

git add .

git commit -m 'Initial commit'

git remote add origin [repo_address]

git push --mirror --force

Conclusion: Exercising caution with git pull request and history helps with a cleaner, readable and actionable code review and merge practice.

Friday, December 10, 2021

Azure Blueprint usages

As a public cloud, Azure provides uniform templates to manage resource provisioning across several services. Azure offers a control plane for all resources that can be deployed to the cloud and services take advantage of them both for themselves and their customers. While Azure Functions allow extensions via new resources, Azure Resource provider and ARM APIs provide extensions via existing resources. This eliminates the need to have new processes introduced around new resources and is a significant win for reusability and user convenience. New and existing resources are not the only way to write extensions, there are other options such as writing it in the Azure Store or via other control planes such as container orchestration frameworks and third-party platforms. This article focuses on Azure Blueprints.

Azure Blueprints can be leveraged to allow an engineer or architect to sketch a project’s design parameters, define a repeatable set of resources that implements and adheres to an organization’s standards, patterns, and requirements. It is a declarative way to orchestrate the deployment of various resource templates and other artifacts such as role assignments, policy assignments, ARM templates, and Resource Groups. Blueprint Objects are stored in the Cosmos DB and replicated to multiple Azure regions. Since it is designed to set up the environment, it is different from resource provisioning. This package fits nicely into a CI/CD pipeline and handles both what should be deployed and the assignment of what was deployed.

Azure Blueprints differ from ARM templates in that the former helps environment setup while the latter helps with resource provisioning. It is a package that comprises artifacts that declare resource groups, policies, role assignments, and ARM Template deployments. It can be composed and versioned and included in continuous integration and continuous delivery pipelines. The components of the package can be assigned to a subscription in a single operation, audited, and tracked. Although the components can be individually registered, the Blueprint facilitates a relationship to the template and an active connection.

There are two categories within the Blueprint – definitions for deployment that explain what should be deployed and the definitions for assignments that explain what was deployed. A previous effort to author ARM Templates become reusable in Azure Blueprint. In this way, Blueprint becomes bigger than just the templates and allows reusing an existing process to manage new resources.

A Blueprint focuses on standards, patterns, and requirements. The design can be reused to maintain consistency and compliance. It differs from an Azure policy in that it supports parameters with policies and initiatives. A policy is a self-contained manifest that governs resource properties during deployment and for already existing resources. Resources within a subscription adhere to the requirements and standards. When a Blueprint comprises resource templates and Azure policy along with parameters, it becomes holistic in cloud governance.

Thursday, December 9, 2021

Designing a microservices architecture a service on the public cloud

Microservices is great for allowing the domain to drive the development of a cloud service. It fits right into the approach to do “one thing” for the company and comes with a well-defined boundary for that service. Since it fulfils business capabilities, it does not focus on horizontal layers as much as it focuses on end-to-end vertical integration. It is cohesive and loosely coupled with other services. The Domain Driven Design provides a framework to build the services. It comes with two stages – strategic and tactical. The steps to designing with this framework includes 1. analyzing domain, 2. defining bounded context, 3. defining entities, aggregates and services and 4. Identifying microservices.

The benefits of this service include: This is a simple architecture that focuses on end-to-end addition of business capabilities. They are easy to deploy and manage. There is a clear separation of concerns. The front end is decoupled from the worker using asynchronous messaging. The front end and the worker can be scaled independently.

Challenges faced with this service include: Care must be taken to ensure that the front end and the worker do not become large, monolithic components that are difficult to maintain and update. It hides unnecessary dependencies when the front end and worker share data schemas or code modules.

Some examples of microservice include: The microservices are best suited for expanding the backend service portfolio such as for eCommerce. Works great for transactional processing and deep separation of data access. Useful to work with application gateway, load balancer and ingress.

Few things to consider when deploying these services include the following:

1. Availability – Event sourcing components allow system components to be loosely coupled and deployed independently of one another. Many of the Azure resources are built for availability.

2. Scalability – Cosmos DB and Service Bus provide fast, predictable performance and scale seamlessly as the application grows. The event sourcing microservices based architecture can also make use of azure functions and Azure container instances to scale horizontally.

3. Security features are available from all Azure resources, but it is also possible to include Azure monitors and Azure Sentinels.

4. Fault zones and update zones are already tackled by the Azure resources so the resiliency comes with the use of these resources and the architecture can improve the overall order processing system.

5. Azure advisor provides effective cost estimates and improvements.

These are only a few of the considerations. Some others follow from the choice of technologies and their support in Azure.

Wednesday, December 8, 2021

Event driven vs big data

Let’s compare our description in yesterday's post with the Big Data architectural style of building services. This can be a vectorized execution environment and typically involving a size of data not seen with the traditional database systems. Both the storage and the message queue handle large volume of data and the execution can be stages as processing and analysis. The processing can be either batch oriented or stream oriented. The analysis and reporting can be offloaded to a variety of technology stacks with impressive dashboards. While the processing handles the requirements for batch and real-time processing on the big data, the analytics supports exploration and rendering of output from big data. It utilizes components such as data sources, data storage, batch processors, stream processors, real-time message queue, analytics data store, analytics and reporting stacks, and orchestration.

Some of the benefits of this application include the following: The ability to mix technology choices, achieving performance through parallelism, elastic scale and interoperability with existing solutions.

Some of the challenges faced with this architectural style include: The complexity where numerous components are required to handle the multiple data sources, and the challenge to build, deploy and test big data processes. Different products require as many as skillsets and maintenance with a requirement for data and query virtualization. For example, U-SQL which is a combination of SQL and C# is used with Azure Data Lake Analytics while SQL APIs are used with Hive, HBase, FLink and Spark. With this kind of a landscape, the emphasis on data security gets diluted and spread over a very large number of components.

Some of the best practices with this architectural style leverage parallelism, partition data, apply schema-on read semantics, process data in place, balance utilization and time costs, separate cluster resources, orchestrate data ingestion and scrub sensitive data

Some examples include applications that leverage IoT architecture and edge computing.

Conclusion: Both these styles serve their purpose of a cloud service very well.

Tuesday, December 7, 2021

Event driven architectural style for cloud computing

The choice of architecture for a web service has a significant contribution to this effect. We review the choices between Event-Driven and the Big Data architectural styles.

Event Driven architecture consists of event producers and consumers. Event producers are those that generate a stream of events and event consumers are ones that listen for events

The scale out can be adjusted to suit the demands of the workload and the events can be responded to in real time. Producers and consumers are isolated from one another. In some extreme cases such as IoT, the events must be ingested at very high volumes. There is scope for a high degree of parallelism since the consumers are run independently and in parallel, but they are tightly coupled to the events. Network latency for message exchanges between producers and consumers is kept to a minimum. Consumers can be added as necessary without impacting existing ones.

Some of the benefits of this architecture include the following: The publishers and subscribers are decoupled. There are no point-to-point integrations. It's easy to add new consumers to the system. Consumers can respond to events immediately as they arrive. They are highly scalable and distributed. There are subsystems that have independent views of the event stream.

Some of the challenges faced with this architecture include the following: Event loss is tolerated so if there needs to be guaranteed delivery, this poses a challenge. Some IoT traffic mandate a guaranteed delivery Events are processed in exactly the order they arrive. Each consumer type typically runs in multiple instances, for resiliency and scalability. This can pose a challenge if the processing logic is not idempotent, or the events must be processed in order.

Some of the best practices demonstrated by this code. Events should be lean and mean and not bloated. Services should share only IDs and/or a timestamp. Large data transfer between services in this case is an antipattern. Loosely coupled event driven systems are best.

Some of the examples with this architectural style include edge computing including IoT traffic. It works great for automations that rely heavily on asynchronous backend processing and it is useful to maintain order, retries and dead letter queues