Friday, December 31, 2021

 This is a continuation of a series of articles on operational engineering aspects of Azure public cloud computing that included the most recent discussion on Azure Data Lake which is a full-fledged general availability service that provides similar Service Level Agreements as expected from others in the category. This article focuses on Azure Data Lake which is suited to store and handle Big Data. This is built over Azure Blob Storage, so it provides native support for web-accessible documents. It is not a massive virtual data warehouse, but it powers a lot of analytics and is centerpiece of most solutions that conform to the Big Data architectural style. In this section, we continue our focus on the programmability aspects of Azure Data Lake.

We discussed that there are two forms of programming with Azure Data Lake – one that leverages U-SQL and another that leverages open-source programs such as for Apache Spark.  U-SQL unifies the benefits of SQL with the expressive power of your own code. SQLCLR improves programmability of U-SQL by allowing user to write user-defined operators such as functions, aggregations and data types that can be used in conventional SQL-Expressions or required only an invocation from a SQL statement. 

The other form of programming is largely applied to HD Insight as opposed to U-SQL for Azure data Lake analytics (ADLA) which target data in batch processing often involving map-reduce algorithms on Hadoop clusters. Also, Hadoop is inherently batch processing while Microsoft stack allows streaming as well.  Some open source like Flink, Kafka, Pulsar and StreamNative support stream operators. While Kafka uses stream processing as a special case of batch processing, the Flink does just the reverse. Apache Flink also provides a SQL abstraction over its Table API. Sample Flink query might look like this:

final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

env.fromCollection(List<Tuple2<String, Integer>> tuples)

.flatMap(new ExtractHashTags())

.keyBy(0)

.timeWindow(Time.seconds(30))

.sum(1)

.filter(new FilterHashTags())

.timeWindowAll(Time.seconds(30))

.apply(new GetTopHashTag())

.print();

Notice the use of pipelined execution and the writing of functions to do processing on input by basis. A sample function looks like this:

package org.apache.pulsar.functions.api.examples;

import java.util.function.Function;

public class ExclamationFunction implements Function<String, String> {

    @Override

    public String apply(String input) {

        return String.format("%s!", input);

    }

}

Both forms have their purpose and the choice depends on the stack used for the analytics.

 

Thursday, December 30, 2021

This is a continuation of a series of articles on operational engineering aspects of Azure public cloud computing that included the most recent discussion on Azure Data Lake which is a full-fledged general availability service that provides similar Service Level Agreements as expected from others in the category. This article focuses on Azure Data Lake which is suited to store and handle Big Data. This is built over Azure Blob Storage, so it provides native support for web-accessible documents. It is not a massive virtual data warehouse, but it powers a lot of analytics and is centerpiece of most solutions that conform to the Big Data architectural style. In this section, we continue our focus on the programmability aspects of Azure Data Lake.

The power of the Azure Data Lake is better demonstrated by the U-SQL queries that can be written without consideration that this is being applied to the scale of Big Data. U-SQL unifies the benefits of SQL with the expressive power of your own code. SQLCLR improves programmability of U-SQL.  Conventional SQL-Expressions like SELECT, EXTRACT, WHERE, HAVING, GROUP BY and DECLARE can be used as usual but C# Expressions improve User Defined types (UDTs), User defined functions (UDFs) and User defined aggregates (UDAs).  These types, functions and aggregates can be used directly in a U-SQL script. For example, SELECT Convert.ToDateTime(Convert.ToDateTime(@dt).ToString("yyyy-MM-dd")) AS dt, dt AS olddt FROM @rs0; where @dt is a datetime variable makes the best of both C# and SQL. The power of SQL expressions can never be understated for many business use-cases and they suffice by themselves but having the  SQL programmability implies that we can even take all the processing all the way into C# and have the SQL script just be an invocation. 

The trouble with analytics pipeline is that developers prefer open-source solutions to build them. When we start accruing digital assets in the form of U-SQL scripts, the transition to working with something Apache Spark might not be straightforward or easy. The Azure analytics layer consists of both HD Insight and Azure data Lake analytics (HDLA) which target data differently. The HDInsight works on managed Hadoop clusters and allows developers to write map-reduce with open source. The ADLA is native to Azure and enables C#, SQL over job services. We will also recall that Hadoop was inherently batch processing while Microsoft stack allowed streaming as well.  The steps to transform U-SQL scripts to Apache Spark include the following:

-          Transform the job orchestration pipeline to include the new Spark programs

-          Find the differences between how U-SQL and Spark manage your data.

-          Transform the U-SQL scripts to Spark. Choose from one of Azure Data Factory Data Flow, Azure HDInsight Hive, Azure HDInsight Spark or Azure DataBricks services.

With these steps, it is possible to have the best of both worlds while leveraging the benefits of each.

Wednesday, December 29, 2021

This is a continuation of a series of articles on operational engineering aspects of Azure public cloud computing that included the most recent discussion on Azure Data Lake which is a full-fledged general availability service that provides similar Service Level Agreements as expected from others in the category. This article focuses on Azure Data Lake which is suited to store and handle Big Data. This is built over Azure Blob Storage, so it provides native support for web-accessible documents. It is not a massive virtual data warehouse, but it powers a lot of analytics and is centerpiece of most solutions that conform to the Big Data architectural style. In this section, we continue our focus on Data Lake monitoring and usages.

The monitoring for the Azure Data Lake leverages the monitoring for the storage account. Azure Storage Analytics performs logging and provides metric data for a storage account. This data can be used to trace requests, analyze usage trends, and diagnose issues with the storage account.

The power of the Azure Data Lake is better demonstrated by the U-SQL queries that can be written without consideration that this is being applied to the scale of Big Data. U-SQL unifies the benefits of SQL with the expressive power of your own code. This is said to work very well with all kinds of data stores – file, object and relational. U-SQL works on the Azure ecosystem which involves the Azure data lake storage as the foundation and the analytics layer over it. The Azure analytics layer consists of both HD Insight and Azure data Lake analytics (HDLA) which target data differently. The HDInsight works on managed Hadoop clusters and allows developers to write map-reduce with open source. The ADLA is native to Azure and enables C#, SQL over job services. We will also recall that Hadoop was inherently batch processing while Microsoft stack allowed streaming as well. The benefit of the Azure storage is that it spans several kinds of data formats and stores. The ADLA has several other advantages over the managed Hadoop clusters in addition to working with a store for the universe. It enables limitless scale and enterprise grade with easy data preparation. The ADLA is built on Apache yarn, scales dynamically and supports a pay by query model. It supports Azure AD for access control and the U-SQL allows programmability like C#.

U-SQL supports big data analytics which generally have the characteristics that they require processing of any kind of data, allow use of custom algorithms, and scale to any size and be efficient.
This lets queries to be written for a variety of big data analytics. In addition, it supports SQL for Big Data which allows querying over structured data Also it enables scaling and parallelization. While Hive supported HiveSQL and Microsoft Scoop connector enabled SQL over big data and Apache Calcite became a SQL Adapter, U-SQL seems to improve the query language itself. It can unify querying over structured and unstructured data. It has declarative SQL and can execute local and remote queries. It increases productivity and agility. It brings in features from T-SQL, Hive SQL, and SCOPE which has been Microsoft's internal Big Data language. U-SQL is extensible, and it can be extended with C# and .NET
If we look at the pattern of separating query from data source, we quickly see it's no longer just a consolidation of data sources. It is also pushing down the query to the data sources and thus can act as a translator. Projections, filters and joins can now take place where the data resides. This was a design decision that came from the need to support heterogeneous data sources. Moreover, it gives a consistent unified view of the data to the user.

SQLCLR improves programmability of U-SQL.  Conventional SQL-Expressions like SELECT, EXTRACT, WHERE, HAVING, GROUP BY and DECLARE can be used as usual but C# Expressions improve User Defined types (UDTs), User defined functions (UDFs) and User defined aggregates (UDAs).  These types, functions and aggregates can be used directly in a U-SQL script. For example, SELECT Convert.ToDateTime(Convert.ToDateTime(@dt).ToString("yyyy-MM-dd")) AS dt, dt AS olddt FROM @rs0; where @dt is a datetime variable makes the best of both C# and SQL. The power of SQL expressions can never be understated for many business use-cases and they suffice by themselves but having the  SQL programmability implies that we can even take all the processing all the way into C# and have the SQL script just be an invocation.  This requires assembly to be registered and versioned. U-SQL runs code in x64 format. An uploaded assembly DLL and resource file, such as a different runtime, a native assembly or a configuration file can be at most 400 MB.  The total size of all registered resources cannot be greater than 3 GB. There can only be one version of any given assembly.  This is sufficient for many business cases which can often be written in the form of a UDF that can take simple parameters and output a simple datatype. These functions can even keep state between invocations. U-SQL comes with a test SDK and together with the local run SDK, script level tests can be authored. Azure Data Lake Tools for Visual Studio enables us to create U-SQL script test cases. A test data source can also be specified for these tests.

Tuesday, December 28, 2021

 This is a continuation of a series of articles on operational engineering aspects of Azure public cloud computing that included the most recent discussion on Azure Data Lake which is a full-fledged general availability service that provides similar Service Level Agreements as expected from others in the category. This article focuses on Azure Data Lake which is suited to store and handle Big Data. This is built over Azure Blob Storage, so it provides native support for web-accessible documents. It is not a massive virtual data warehouse, but it powers a lot of analytics and is centerpiece of most solutions the conform to the Big Data architectural style. In this section, we focus on Data Lake monitoring.

As we might expect from the use of Azure storage account, the monitoring for the Azure Data Lake leverages the monitoring for the storage account. Azure Storage Analytics performs logging and provides metric data for a storage account. This data can be used to trace requests, analyze usage trends, and diagnose issues with the storage account. 

The storage account must enable it individually for each service that needs to be monitored. Blobs, queues, tables and file services are all subject to monitoring. The aggregated data is stored in a well-known blob designated for logging and in well-known tables, which may be accessed using the Blob service and Table service APIs. There is a 20TB limit for the metrics and this is besides what size is provisioned for data, so we can forget about resizing.  When we monitor a storage service, the service health, capacity, availability and performance of the service is studied. The service health can be observed from the portal and the notifications can be subscribed to. The $MetricsCapacityBlob enables monitoring for the blob service. Storage Metrics records this data once per day. The capacity is measured in bytes and both the containerCount and ObjectCount are available per daily entry. Availability is monitored in the hourly and minute metrics tables that record primary transactions against blob, tables and queues. The availability data is a column in these tables. Performance is measured in the AverageE2ELatency and AverageServerLatency columns. The E2ELatency is recorded only for successful requests and the average server latency includes time that the client takes to send the data and receive acknowledgements from the storage service. A high value for the first and a low value for the second implies that the client is slow, or the network connectivity is poor. Nagle’s algorithm is a TCP optimization on the sender and it is designed to reduce network congestion by coalescing small send requests into larger TCP segments. So small segments are held back until a larger segment is available to send the data. But it does not work well with delayed acknowledgements that are an optimization on the receiver side. When the receiver delays the ack and the sender waits for the ack to send a small segment, the data transfer gets stalled. Turning off these optimizations enables with improved table, blob and queue usages.

Requests to create blobs for logging and the requests to create table entities for metrics are billable. Older logging and metrics data can be archived or truncated. As with all data using Azure storage, this is set via the retention policy on the containers. Metrics are stored for both the service and the API operations of that service which includes the percentages and the count of certain status messages. These features can help analyze the cost aspect of the usages.

 

 

Monday, December 27, 2021

 

This is a continuation of a series of articles on operational engineering aspects of Azure public cloud computing that included the most recent discussion on Azure Data Lake which is a full-fledged general availability service that provides similar Service Level Agreements as expected from others in the category. This article focuses on Azure Data Lake which is suited to store and handle Big Data. This is built over Azure Blob Storage, so it provides native support for web-accessible documents. It is not a massive virtual data warehouse, but it powers a lot of analytics and is centerpiece of most solutions the conform to the Big Data architectural style.

Gen 2 is the current standard for building Enterprise Data Lakes on Azure.  A data lake must store petabytes of data while handling bandwidths up to Gigabytes of data transfer per second. The hierarchical namespace of the object storage helps organize objects and files into a deep hierarchy of folders for efficient data access. The naming convention recognizes these folder paths by including the folder separator character in the name itself. With this organization and folder access directly to the object store, the performance of the overall usage of data lake is improved. The Azure Blob File System Driver for Hadoop is a mere shim over the Azure Data Lake Storage interface that supports file system semantics over blob storage. Fine grained access control lists and active directory integration round up the data security considerations. The data management and analytics form the core scenarios supported by Data Lake. For multi-region deployments, it is recommended to have the data landing in one region and then replicated globally using AzCopy, Azure Data Factory or third-party products which assist with migrating data from one place to another. The best practices for Azure Data Lake involve evaluating feature support and known issues, optimizing for data ingestion, considering data structures, performing ingestion, processing and analysis from several data sources and leveraging monitor telemetry

Azure Data Lake supports query acceleration and analytics framework. It significantly improves data processing by only retrieving data that is relevant to an operation. This cascades to reduced time and processing power for the end-to-end scenarios that are necessary to gain critical insights into stored data. Both ‘filtering predicates' and ‘column projections’ are enabled, and SQL can be used to describe them. Only the data that meets these conditions are transmitted.  A request processes only one file so joins, aggregates and other query operators are not supported but the request can be in any format such as csv or json file formats. The query acceleration feature isn’t limited to Data Lake Storage. It is supported even on Blobs in storage accounts that form the persistence layer below the containers of the data lake. Even those without hierarchical namespace are supported by the Azure Data Lake query acceleration feature. The query acceleration is part of the data lake so applications can be switched with one another, and the data selectivity and improved latency continues across the switch. Since the processing is on the side of the Data Lake, the pricing model for query acceleration differs from that of the normal transactional model.

Gen2 also supports Premium block blob storage accounts that are ideal for big data analytics applications and workloads. These require low latency and a high number of transactions. Workloads can be interactive, IoT, streaming analytics, artificial intelligence and machine learning.

 

Sunday, December 26, 2021

This is a continuation of a series of articles on operational engineering aspects of Azure public cloud computing that included the most recent discussion on Azure SQL Edge which is a full-fledged general availability service that provides similar Service Level Agreements as expected from others in the category. This article focuses on Azure Data Lake which is suited to store and handle Big Data. This is built over Azure Blob Storage, so it provides native support for web-accessible documents. It is not a massive virtual data warehouse, but it powers a lot of analytics and is centerpiece of most solutions the conform to the Big Data architectural style.

The Gen 1 Data Lake was not integrated with the Blob Storage but Gen 2 does. There’s support for file-system semantics in Gen 2 and file security. Since, these features are provided from blob storage, it comes with the best practices in storage engineering that include replication groups, high availability, tiered data storage and storage class, aging and retention policies.

Gen 2 is the current standard for building Enterprise Data Lakes on Azure.  A data lake must store petabytes of data while handling bandwidths up to Gigabytes of data transfer per second. The hierarchical namespace of the object storage helps organize objects and files into a deep hierarchy of folders for efficient data access. The naming convention recognizes these folder paths by including the folder separator character in the name itself. With this organization and folder access directly to the object store, the performance of the overall usage of data lake is improved.

Both the object store containers and the containers exposed by Data Lake are transparently available to applications and services. The Blob storage features such as diagnostic logging, access tiers and lifecycle management policies are available to the account. The integration with Blob Storage is only one aspect of the integration from Azure Data Lake. Many other services are also integrated with Azure Data Lake to support data ingestion, data analytics and reporting with visual representations. The data management and analytics form the core scenarios supported by Data Lake. Fine grained access control lists and active directory integration round up the data security considerations. Even if the data lake comprises a few data asset types, some planning phase is required to avoid the dreaded data swamp analogy. Governance and organization are key to avoiding this situation. When the size and number of data systems are several, a robust data catalog system is required. Since Data Lake is a PaaS service, it can support multiple accounts at no overhead. A minimum of three lakes is recommended during the discovery and design phase due to the following factors:

1.       Isolation of data environments and predictability

2.       Features and functionality at the storage account level or regional versus global data lakes

3.       The use of a data catalog, data governance and project tracking tools

For multi-region deployments, it is recommended to have the data landing in one region and then replicated globally using AzCopy, Azure Data Factory or third-party products which assist with migrating data from one place to another.

The best practices for Azure Data Lake involve evaluating feature support and known issues, optimizing for data ingestion, considering data structures, performing ingestion, processing and analysis from several data sources and leveraging monitor telemetry

 

 

Saturday, December 25, 2021

Event-driven or Database - the choice is yours.

 

Public cloud computing must deal with events at an unprecedented scale. The right choice of architectural style plays a big role in the total cost of ownership for a solution involving events. IoT traffic for instance can be channeled via event driven stack available from Azure and via SQL Edge also available from Azure. The distinction between these may not be fully recognized or appreciated by development teams focused on agile and expedient delivery of work items but a sound architecture is like a good investment that increases the return multiple times as opposed to one that might require frequent scaling, revamping or even rewriting.  This article explores the differences between the two. It is a continuation of a series of articles on operational engineering aspects of Azure public cloud computing that included the most recent discussion on Azure SQL Edge which is a full-fledged general availability service that provides similar Service Level Agreements as expected from others in the category.

Event Driven architecture consists of event producers and consumers. Event producers are those that generate a stream of events and event consumers are ones that listen for events

The scale out can be adjusted to suit the demands of the workload and the events can be responded to in real time. Producers and consumers are isolated from one another. In some extreme cases such as IoT, the events must be ingested at very high volumes. There is scope for a high degree of parallelism since the consumers are run independently and in parallel, but they are tightly coupled to the events. Network latency for message exchanges between producers and consumers is kept to a minimum. Consumers can be added as necessary without impacting existing ones.

Some of the benefits of this architecture include the following: The publishers and subscribers are decoupled. There are no point-to-point integrations. It's easy to add new consumers to the system. Consumers can respond to events immediately as they arrive. They are highly scalable and distributed. There are subsystems that have independent views of the event stream.

Some of the challenges faced with this architecture include the following: Event loss is tolerated so if there needs to be guaranteed delivery, this poses a challenge. Some IoT traffic mandate a guaranteed delivery Events are processed in exactly the order they arrive. Each consumer type typically runs in multiple instances, for resiliency and scalability. This can pose a challenge if the processing logic is not idempotent, or the events must be processed in order.

Some of the best practices demonstrated by this code. Events should be lean and mean and not bloated. Services should share only IDs and/or a timestamp.  Large data transfer between services in this case is an antipattern. Loosely coupled event driven systems are best.

Azure SQL Edge is an optimized relational database engine that is geared towards edge computing. It provides a high-performance data storage and processing layer for IoT applications. It provides capabilities to stream, process and analyze data where the data can vary from relational to document to graph to time-series and which makes it a right choice for a variety of modern IoT applications. It is built on the same database engine as the SQL Server and Azure SQL so applications will find it convenient to seamlessly use queries that are written in T-SQL. This makes applications portable between devices, datacenters and cloud.

Azure SQL Edge uses the same stream capabilities as Azure Stream Analytics on IoT edge. This native implementation of data streaming is called T-SQL streaming. It can handle fast streaming from multiple data sources. The patterns and relationships in data is extracted from several IoT input sources. The extracted information can be used to trigger actions, alerts and notifications. A T-SQL Streaming job consists of a Stream Input that defines the connections to a data source to read the data stream from, a stream output job that defines the connections to a data source to write the data stream to, and a stream query job that defines the data transformation, aggregations, filtering, sorting and joins to be applied to the input stream before it is written to the stream output.

Both the storage and the message queue handle large volume of data and the execution can be stages as processing and analysis.  The processing can be either batch oriented or stream oriented.  The analysis and reporting can be offloaded to a variety of technology stacks with impressive dashboards. While the processing handles the requirements for batch and real-time processing on the big data, the analytics supports exploration and rendering of output from big data. It utilizes components such as data sources, data storage, batch processors, stream processors, real-time message queue, analytics data store, analytics and reporting stacks, and orchestration.

Some of the benefits of this application include the following: The ability to offload processing to a database, elastic scale and interoperability with existing solutions.

Some of the challenges faced with this architectural style include: The complexity to handle isolation for multiple data sources, and the challenge to build, deploy and test data pipelines over a shared architecture. Different products require as many as skillsets and maintenance with a requirement for data and query virtualization. For example, U-SQL which is a combination of SQL and C# is used with Azure Data Lake Analytics while SQL APIs are used with Edge, Hive, HBase, FLink and Spark. With an Event driven processing using heterogenous stack, the emphasis on data security gets diluted and spread over a very large number of components.

 

Friday, December 24, 2021

 

This is a continuation of a series of articles on operational engineering aspects of Azure public cloud computing that included the most recent discussion on Azure SQL Edge which is a full-fledged general availability service that provides similar Service Level Agreements as expected from others in the category.

SQL Edge is an optimized relational database engine that is geared towards edge computing. It provides a high-performance data storage and processing layer for IoT applications. It provides capabilities to stream, process and analyze data where the data can vary from relational to document to graph to time-series and which makes it a right choice for a variety of modern IoT applications. It is built on the same database engine as the SQL Server and Azure SQL so applications will find it convenient to seamlessly use queries that are written in T-SQL. This makes applications portable between devices, datacenters and cloud.

Azure SQL Edge uses the same stream capabilities as Azure Stream Analytics on IoT edge. This native implementation of data streaming is called T-SQL streaming. It can handle fast streaming from multiple data sources. A T-SQL Streaming job consists of a Stream Input that defines the connections to a data source to read the data stream from, a stream output job that defines the connections to a data source to write the data stream to, and a stream query job that defines the data transformation, aggregations, filtering, sorting and joins to be applied to the input stream before it is written to the stream output.

Azure SQL Edge is also noteworthy for bringing the machine learning technique directly to the edge by running ML models for edge devices. SQL Edge supports Open Neural Network Exchange (ONNX) and the model can be deployed with T-SQL. The model can be pre-trained or custom-trained outside the SQL Edge with a choice of frameworks. The model just needs to be in ONNX format. The ONNX model is simply inserted into the models table in the ONNX database and the connection string is sufficient to send the data into SQL. Then the PREDICT method can be run on the data using the model.

ML pipeline is a newer technology as compared to traditional software development stacks and pipelines have generally been on-premises simply due to the latitude in using different frameworks and development styles.  Also, experimentation can get out of control from the limits allowed for free-tier in the public cloud.  In some cases, Event processing systems such as Apache Spark and Kafka find it easier to replace Extract-Transform-Load solutions that proliferated with data warehouse. The use of SQL Edge avoids the requirement to perform ETL and machine learning models are end-products. They can be hosted in a variety of environments not just the cloud or the SQL Edge. Some ML users would like to load the model on mobile or edge devices. Many IoT traffic and experts agree that the streaming data from edge devices can be quite heavy in traffic where a database system will out-perform any edge device-based computing. Internet tcp relays are of the order of 250-300 milliseconds whereas ingestion rate for database processing can be upwards of thousands of events per second. These are some of the benefits of using machine learning within the database.

 

Thursday, December 23, 2021

A summary of the book “The Burnout Fix” written by Jacinta M. Jimenez

This is a book that talks about how to overcome overwhelm, beat busy and sustain success in the new world of work. One would think that burnout is a problem where a solution is screaming at you but there is no rapid relief ointment without systemic changes. The book recognizes burnout as a pervasive social problem in the United States. Jacinta M. Jimenez attributes it to factors that must be addressed by both the individual and his/her organization. The organization must treat the symptoms of burnout and need to prevent it altogether. The individual must foster resilience with her science-backed PULSE practices. These practices help to lead a more purpose driven life and support team members’ well-beings.

We see rightaway that there is a shift of focus from grit to resilience. A hyperconnected world demands more from an individual besides their own beliefs to take on unsustainable volumes of work and remain available to tackle tasks. When this goes on for too long, burnout results. Even if we work harder or smarter, we neglect to nurture a steady personal pulse which makes even our successes short-lived.

There are five capabilities suggested to avoid burnout and lead to improvements in the following areas:

1. Behavioral – where we boost our professional and personal growth by developing a healthy performance pace.

2. Cognitive – where we rid ourselves of unhealthy thought patterns

3. Physical – where we embrace the power of leisure as a strategy to protect and restore the reserves of energy

4. Social – where we build a diverse network of social support to make ourselves more adaptable and improve our thinking.

5. Emotional – where we don’t control our priorities or time, evaluate the effort we exert and take control of ourselves.

Most people tackle their goals by breaking them into smaller concrete steps to help avoid cognitive and emotional exhaustion. There are three P’s involved which are 1. Plan where we assess our skills and progressive towards bigger goals using progress indicators. 2. Practice where we commit to continuous learning by experimenting and receiving feedback while journaling our progress. 3. Ponder where we reflect on what worked and what did not work.

We must reduce distracting thoughts and work towards mental clarity using three C’s 1 Curiosity where we identify recurring thoughts and check if they are grounded in reality, 2. Compassion for ourselves where we treat ourselves to our kindness and 3. Calibration where we switch between showing ourselves more compassion to needing more information. We can cultivate this by stacking habits including new habits, scheduling reminders for mind space, breathing, writing down thoughts and learning from binary thinking, sticking with self-compassion and being consistent.

We must prioritize leisure time The ability to enjoy stress free leisure time is essential to keep calm and centered.

We focus on how fast others respond rather than the quality of responses. To give more space to ourselves, we must practice three S’s which are 1. Silence to eliminate duress from devices or going on meditation retreat 2. Sanctuary to go and spend time in nature or to improve our mood and 3. solitude to spend time by ourselves to slow down sensory input.

Social wellness is equally important and rather limited by symptoms before burnout. We feel we belong and can securely access support from community, we reduce the stress on our brain and create conditions for improved productivity. This can be done with three B’s: Belonging where we strengthen our sense of belonging by actively working to be more compassionate. 2. Breadth where we create a visual map of the circles of support and 3. boundaries where we reflect on our personal values.

Energy is finite so we must manage it carefully. We do this with three E’s: 1. Enduring principles where we determine what guides us in our current stage, 2. Energy expenditure where we assess how we spend our energy and 3. Emotional acuity where we resist the tendency to ignore our emotions. 

Similarly, we lead healthy teams by embracing 1. Agency, 2. Benevolence, and 3. Community. When leaders demonstrate and implement techniques to increase resilience, it percolates through the rank and file.


Wednesday, December 22, 2021

 

Azure Machine Learning provides an environment to create and manage the end-to-end life cycle of Machine Learning models. Unlike general purpose software, Azure machine learning has significantly different requirements such as the use of a wide variety of technologies, libraries and frameworks, separation of training and testing phases before deploying and use of a model and iterations for model tuning independent of the model creation and training etc.  Azure Machine Learning’s compatibility with open-source frameworks and platforms like PyTorch and TensorFlow makes it an effective all-in-one platform for integrating and handling data and models which tremendously relieves the onus on the business to develop new capabilities. Azure Machine Learning is designed for all skill levels, with advanced MLOps features and simple no-code model creation and deployment.

We will compare this environment with TensorFlow but for those unfamiliar with the latter, here is a use case with TensorFlow. A JavaScript application performs image processing with a machine learning algorithm. When enough training data images have been processed, the model learns the characteristics of the drawings which results in their labels. Then as it runs through the test data set, it can predict the label of the drawing using the model. TensorFlow has a library called Keras which can help author the model and deploy it to an environment such as Colab where the model can be trained on a GPU. Once the training is done, the model can be loaded and run anywhere else including a browser. The power of TensorFlow is in its ability to load the model and make predictions in the browser itself.

The labeling of drawings starts with a sample of say a hundred classes. The data for each class is available on Google Cloud as numpy arrays with several images numbering say N, for that class. The dataset is pre-processed for training where it is converted to batches and outputs the probabilities.

As with any ML learning example, the data is split into 70% training set and 30% test set. There is no order to the data and the split is taken over a random set.  

TensorFlow makes it easy to construct this model using the TensorFlow Lite ModelMaker. It can only present the output after the model is trained. In this case, the model must be run after the training data has labels assigned.  This might be done by hand. The model works better with fewer parameters. It might contain 3 convolutional layers and 2 dense layers.  The pooling size is specified for each of the convolutional layers and they are stacked up on the model. The model is trained using the tf.train.AdamOptimizer() and compiled with a loss function, optimizer just created, and a metric such as top k in terms of categorical accuracy. The summary of the model can be printed for viewing the model. With a set of epochs and batches, the model can be trained.   Annotations help TensorFlow Lite converter to fuse TF.Text API. This fusion leads to a significant speedup than conventional models. The architecture for the model is also tweaked to include projection layer along with the usual convolutional layer and attention encoder mechanism which achieves similar accuracy but with much smaller model size. There is native support for HashTables for NLP models.

With the model and training/test sets defined, it is now as easy to evaluate the model and run the inference.  The model can also be saved and restored. It is executed faster when there is GPU added to the computing.

When the model is trained, it can be done in batches of predefined size. The number of passes of the entire training dataset called epochs can also be set up front.  A batch size of 256 and the number of steps as 5 could be used. These are called model tuning parameters. Every model has a speed, Mean Average Precision and output. The higher the precision, the lower the speed. It is helpful to visualize the training with the help of a high chart that updates the chart with the loss after each epoch. Usually there will be a downward trend in the loss which is referred to as the model is converging.

When the model is trained, it might take a lot of time say about 4 hours. When the test data has been evaluated, the model’s efficiency can be predicted using precision and recall, terms that are used to refer to positive inferences by the model and those that were indeed positive within those inferences.

Azure Machine Learning has a drag and drop interface that can be used to train and deploy models.  It uses a machine learning workspace to organize shared resources such as pipelines, datasets, compute resources, registered models, published pipelines, and real-time endpoints. A visual canvas helps build end to end machine learning workflow. It trains, tests and deploys models all in the designer.  The datasets and components can be dragged and dropped onto the canvas. A pipeline draft connects the components. A pipeline run can be submitted using the resources in the workspace. The training pipelines can be converted to inference pipelines and the pipelines can be published to submit a new pipeline that can be run with different parameters and datasets. A training pipeline can be reused for different models and a batch inference pipeline can be used to make predictions on new data.

Tuesday, December 21, 2021

 This is a continuation of a series of articles on operational engineering aspects of Azure public cloud computing the included the most recent discussion on Azure Maps which is a full-fledged general availability service that provides similar Service Level Agreements as expected from others in the category. In this article, we explore Azure Logic applications.

Each logic app is a workflow that implements some process. This might be a system-to-system process, such as connecting two or more applications. Alternatively, it might be a user-to-system process, one that connects people with software and potentially has long delays. Logic Apps is designed to support either of these scenarios.

Azure Logic Applications is a member of Azure Integration Services. It simplifies the way legacy, modern and niche systems are connected across cloud, on-premises and hybrid environments. The integrated solutions are very valuable for B2B scenarios. Integration services distinguish themselves with four common components in their design – namely, APIs, Events, Messaging, and Orchestration. APIs are a prerequisite for interactions between services. They facilitate functional programmatic access as well as automation. For example, a workflow orchestration might implement a complete business process by invoking different APIs in different applications, each of which carries out some part of that process. Integrating applications commonly requires implementing all or part of a business process. It can involve connecting software-as-a-service implementation such as Salesforce CRM, update on-premises data stored in SQL Server and Oracle database and invoke operations in an external application. These translate to specific business purposes and custom logic for orchestration. Many backend operations are asynchronous by nature requiring background operations. Even APIs are written with asynchronous processing but long running APIs are not easily tolerated. Some form of background processing is required. Situations like this call for a message queue. Events facilitate the notion of publisher-subscriber so that the polling on messages from a queue can be avoided. For example, Event Grid supports subscribers to avoid polling. Rather than requiring a receiver to poll for new messages, the receiver instead registers an event handler for the event source it’s interested in. Event Grid then invokes that event handler when the specified event occurs. Azure Logic applications are workflows. A workflow can easily span all four of these components for its execution.

Azure Logic Applications can be multi-tenant. It is easier to write the application as multi-tenant when we create a workflow from the template's gallery. These range from simple connectivity for Software-as-a-service applications to advanced B2B solutions. Multi-tenancy means there is a shared, common infrastructure across numerous customers simultaneously, leading to economies of scale

Monday, December 20, 2021

 

This is a continuation of a series of articles on operational engineering aspects of Azure public cloud computing the included the most recent discussion on Azure Maps which is a full-fledged general availability service that provides similar Service Level Agreements as expected from others in the category. In this article, we explore Azure SQL Edge.

SQL Edge is an optimized relational database engine that is geared towards edge computing. It provides a high-performance data storage and processing layer for IoT applications. It provides capabilities to stream, process and analyze data where the data can vary from relational to document to graph to time-series and which makes it a right choice for a variety of modern IoT applications. It is built on the same database engine as the SQL Server and Azure SQL so applications will find it convenient to seamlessly use queries that are written in T-SQL. This makes applications portable between devices, datacenters and cloud.

Azure SQL Edge uses the same stream capabilities as Azure Stream Analytics on IoT edge. This native implementation of data streaming is called T-SQL streaming. It can handle fast streaming from multiple data sources. A T-SQL Streaming job consists of a Stream Input that defines the connections to a data source to read the data stream from, a stream output job that defines the connections to a data source to write the data stream to, and a stream query job that defines the data transformation, aggregations, filtering, sorting and joins to be applied to the input stream before it is written to the stream output.

Data can be transferred in and out of SQL Edge. For example, data can be synchronized from SQL Edge to Azure Blob storage by using Azure Data factory. As with all SQL instances, the client tools help create the database and the tables. The SQLPackage.exe is used to create and apply a DAC package file to the SQL Edge container.  A stored procedure or trigger is used to update the watermark levels for a table. A watermark table is used to store the last timestamp up to which data has already been synchronized with Azure Storage. The stored procedure is run after every synchronization. A Data factory pipeline is used to synchronize data to Azure Blob storage from a table in Azure SQL Edge. This is created by using its user interface. The PeriodicSync property must be set at the time of creation. A lookup activity is used to get the old watermark value. A dataset is created to represent the data in the watermark table. This table contains the old watermark that was used in the previous copy operation. A new Linked Service is created to source the data from the SQL Edge server using a connection credentials. When the connection is tested, it can be used to preview the data to eliminate surprised during synchronization. The pipeline editor is a designer tool where the WatermarkDataset is selected as the source dataset. The lookup activity gets new watermark value from the table that contains the source data so it can be copied to the destination. A query can be added to the pipeline editor for selecting the maximum value of the timestamp from the Watermark table. Only the first row is selected as the new watermark. Incremental progress is maintained by continually advancing the watermark. Not only the source but the sink must also be specified on the editor. The sink will use a new linked service to the blob storage. The success output of a Copy activity is connected to a stored procedure activity which then writes a new watermark. Finally, the pipeline is scheduled to be triggered periodically.

Sunday, December 19, 2021

This is a continuation of a series of articles on operational engineering aspects of Azure public cloud computing the included the most recent discussion on Azure Maps which is a full-fledged general availability service that provides similar Service Level Agreements as expected from others in the category. In this article, we explore Azure SQL Edge.

Edge Computing has developed differently from mainstream desktop, enterprise and cloud computing. The focus has always been on speed rather than data processing which is delegated to the core or cloud computing. Edge Servers work well for machine data collection and Internet of Things. Edge computing is typically associated with Event-Driven Architecture style. It relies heavily on asynchronous backend processing. Some form of message broker becomes necessary to maintain order between events, retries and dead-letter queues.

SQL Edge is an optimized relational database engine that is geared towards edge computing. It provides a high-performance data storage and processing layer for IoT applications. It provides capabilities to stream, process and analyze data where the data can vary from relational to document to graph to time-series and which makes it a right choice for a variety of modern IoT applications. It is built on the same database engine as the SQL Server and Azure SQL so applications will find it convenient to seamlessly use queries that are written in T-SQL. This makes applications portable between devices, datacenters and cloud.

Azure SQL edge supports two deployment modes – those that are connected through Azure IoT edge and those that have disconnected deployment.  The connected deployment requires Azure SQL Edge to be deployed as a module for Azure IoT Edge. In the disconnected deployment mode, it can be deployed as a standalone docker container or a Kubernetes cluster.

There are two editions for the Azure SQL edge – a developer edition and a production sku edition and the spec changes from 4 cores/32 GB to 8 cores and 64 GB. Azure SQL Edge uses the same stream capabilities as Azure Stream Analytics on IoT edge. This native implementation of data streaming is called T-SQL streaming. It can handle fast streaming from multiple data sources. The patterns and relationships in data is extracted from several IoT input sources. The extracted information can be used to trigger actions, alerts and notifications. A T-SQL Streaming job consists of a Stream Input that defines the connections to a data source to read the data stream from, a stream output job that defines the connections to a data source to write the data stream to, and a stream query job that defines the data transformation, aggregations, filtering, sorting and joins to be applied to the input stream before it is written to the stream output.

SQL Edge also support machine learning models by integrating with Open Neural Network Exchange runtimes. The models are developed independent of the edge but can be run on the edge.

 

 

Saturday, December 18, 2021

Azure Maps and heatmaps

This is a continuation of a series of articles on operational engineering aspects of Azure public cloud computing. In this article, we continue the discussion on Azure Maps which is a full-fledged general availability service that provides similar Service Level Agreements as expected from others in the category.  We focus on one of the features of Azure Maps that enables overlay of images and heatmaps.

Azure Maps is a collection of geospatial services and SDKs that fetches the latest geographic data and provides it as a context to web and mobile applications.  Specifically, it provides REST APIs to render vector and raster maps as overlays including satellite imagery, provides creator services to enable indoor map data publication, provides search services to locate addresses, places, and points of interest given indoor and outdoor data, provides various routing options such as point-to-point, multipoint, multipoint optimization, isochrone, electric vehicle, commercial vehicle, traffic influenced, and matrix routing, provides traffic flow view and incidents view, for applications that require real-time traffic information, provides Time zone and Geolocation services, provides elevation services with Digital Elevation Model, provides Geofencing service and mapping data storage, with location information hosted in Azure and provides Location intelligence through geospatial analytics.

The Web SDK for Azure Maps allows several features with the use of its map control.  We can create a map, change the style of the map, add controls to the map, add layers on top of the map, add html markers, show traffic, cluster point data, and use data-driven style expressions, use image templates, react to events and make app accessible.

Heatmaps are also known as point density maps because they represent the density of data and the relative density of each data point using a range of colors. This can be overlaid on the maps as a layer. Heat maps can be used in different scenarios including temperature data, data for noise sensors, and GPS trace.

The addition of heat map is as simple as:

Map.layers.add(new atlas.layer.HeatMapLayer(datasource, null, { radius: 10, opacity: 0.8}), ‘labels’);

The opacity or transparency is normalized between 0 and 1. The intensity is a multiplier to the weight of each data point. The weight is a measure of the number of times the data point applies to the map.

Azure maps provides consistent zoomable heat map and the data aggregates together and the heat map might look different from when it was normal focus. Scaling the radius also changes the heat map because it doubles with each zoom level.

All of this processing is on the client side for the rendering of given data points.

Friday, December 17, 2021

 

Location queries
Location is a datatype. It can be represented either as a point or a polygon and each helps with answering questions such as getting top 3 stores near to a geographic point or stores within a region. Since it is a data type, there is some standardization available. SQL Server defines not one but two data types for the purpose of specifying location: the Geography data type and the Geometry data type.  The Geography data type stores ellipsoidal data such as GPS Latitude and Longitude and the geometry data type stores Euclidean (flat) coordinate system. The point and the polygon are examples of the Geography data type. Both the geography and the geometry data type must have reference to a spatial system and since there are many of them, it must be used specifically in association with one. This is done with the help of a parameter called the Spatial Reference Identifier or SRID for short. The SRID 4326 is the well-known GPS coordinates that give information in the form of latitude/Longitude. Translation of an address to a Latitude/Longitude/SRID tuple is supported with the help of built-in functions that simply drill down progressively from the overall coordinate span.  A table such as ZipCode could have an identifier, code, state, boundary, and center point with the help of these two data types. The boundary could be considered the polygon formed by the zip and the Center point as the central location in this zip. Distances between stores and their membership to zip can be calculated based on this center point. Geography data type also lets us perform clustering analytics which answers questions such as the number of stores or restaurants satisfying a certain spatial condition and/or matching certain attributes. These are implemented using R-Tree data structures that support such clustering techniques. The geometry data type supports operations such as area and distance because it translates to coordinates.   It has its own rectangular coordinate system that we can use to specify the boundaries or the ‘bounding box’ that the spatial index covers.

The operations performed with these data types include the distance between two geography objects, the method to determine a range from a point such as a buffer or a margin, and the intersection of two geographic locations. The geometry data type supports operations such as area and distance because it translates to coordinates. Some other methods supported with these data types include contains, overlaps, touches, and within. 

A note about the use of these data types now follows. One approach is to store the coordinates in a separate table where the primary keys are saved as the pair of latitude and longitude and then to describe them as unique such that a pair of latitude and longitude does not repeat. Such an approach is questionable because the uniqueness constraint for locations has a maintenance overhead. For example, two locations could refer to the same point and then unreferenced rows might need to be cleaned up. Locations also change ownership, for example, store A could own a location that was previously owned by store B, but B never updates its location. Moreover, stores could undergo renames or conversions.  Thus, it may be better to keep the spatial data associated in a repeatable way along with the information about the location. Also, these data types do not participate in set operations. That is easy to do with collections and enumerable with the programming language of choice and usually consist of the following four steps: answer initialization, return an answer on termination, accumulation called for each row, and merge called when merging the processing from parallel workers. These steps are like a map-reduce algorithm. These data types and operations are improved with the help of a spatial index. These indexes continue to be like indexes of other data types and are stored using B-Tree. Since this is an ordinary one-dimensional index, the reduction of the dimensions of the two-dimensional spatial data is performed by means of tessellation which divides the area into small subareas and records the subareas that intersect each spatial instance. For example, with a given geography data type, the entire globe is divided into hemispheres and each hemisphere is projected onto a plane. When that given geography instance covers one or more subsections or tiles, the spatial index would have an entry for each such tile that is covered.  The geometry data type has its own rectangular coordinate system that you define which you can use to specify the boundaries or the ‘bounding box’ that the spatial index covers. Visualizers support overlays with spatial data which is popular with mapping applications that super-impose information over the map with the help of transparent layers. An example is the Azure Maps with GeoFence as described here.

Thursday, December 16, 2021

Adding Azure Maps to an Android Application

 


This is a continuation of a series of articles on operational engineering aspects of Azure public cloud computing. In this article, we continue the discussion on Azure Maps which is a full-fledged general availability service that provides similar Service Level Agreements as expected from others in the category but with an emphasis on writing mobile applications. Specifically, we target the Android platform.

We leverage an Event-Driven architecture style where the Service Bus delivers the messages that the mobile application processes. As with the case of GeoFencing, different messages can be used for different handling. The mobile application is a consumer for the message making occassional API calls that generates messages on the backend of a web-queue server. The scope of this document is to focus on just the mobile application stack. The tracking and producing of messages are done in the backend and the mobile application uses the Bing Maps to display the location.  We will need an active Azure Maps account and key for this purpose. The subscription, resource group, name and pricing tier must be determined beforehand. The mobile application merely adds an Azure Maps control to the application.

An Android application will require Java based deployment. Since the communication is over HTTP, the technology stack can be independent between the backend and the mobile application. The Azure Maps Android SDK will be leveraged for this purpose. The top-level build.gradle file will define the URL https://atlas.microsoft.com/sdk/android. Java 8 can be chosen as the appropriate version to use. The SDK can be imported into the build.gradle  with the artifact description as "com.azure.android:azure-maps-control:1.0.0". The application will introduce the map control as <com.azure.android.maps.control.MapControl android:id="@+id/mapcontrol" android:layout_width="match_parent" android:layout_height="match_parent" /> in the main activity xml file. The corresponding Java file will add imports for the Azure Map SDK, set the Azure Maps Authentication information and get the map control instance in the onCreate method. SetSubscriptionKey and SetAadProperties can be used to add the authentication information on every view. The control will display the map even on the emulator. Sample Android application can be seen here.

As with all application, the activity control loops must be tightened and guide the user through specific workflows. The views, their lifetime and activity must be controlled, and the user should not see the application as hung or spinning. The interactivity for the control is assured if the application is recycling and cleaning up the associated resources as the user moves in and out of a page to another page.

It is highly recommended to get the activity framework and navigations worked out and planned independent of the content. The views corresponding to the content are going to be restricted to the one that displays the control, so the application focuses mostly on the user navigations and activities. 

Wednesday, December 15, 2021

Azure Maps and GeoFence

This is a continuation of a series of articles on operational engineering aspects of Azure public cloud computing. In this article, we continue on the discussion on Azure Maps which is a full-fledged general availability service that provides similar Service Level Agreements as expected from others in the category.

Azure Maps is a collection of geospatial services and SDKs that fetches the latest geographic data and provides it as a context to web and mobile applications.  Specifically, it provides REST APIs to render vector and raster maps as overlays including satellite imagery, provides creator services to enable indoor map data publication, provides search services to locate addresses, places, and points of interest given indoor and outdoor data, provides various routing options such as point-to-point, multipoint, multipoint optimization, isochrone, electric vehicle, commercial vehicle, traffic influenced, and matrix routing, provides traffic flow view and incidents view, for applications that require real-time traffic information, provides Time zone and Geolocation services, provides elevation services with Digital Elevation Model, provides Geofencing service and mapping data storage, with location information hosted in Azure and provides Location intelligence through geospatial analytics.

Azure Maps can be helpful for tracking entry and exit into a geographical location such as the perimeters of a construction area. Such tracking can be used to generate notifications by email. A Geofencing GeoJSON data is uploaded to define the construction area we want to monitor.  The Data Upload API will be used to upload geofences as polygon coordinates to the Azure Maps Account. Two logic apps can be written to send email notifications to the construction site operations when say a equipment enters or exit the construction site. An Azure Event Grid will subscribe to enter and exit events for Azure Maps geofence. Two webhook event subscriptions will call the HTTP endpoints defined in the two logic applications. The Search GeoFence Get API is used to receive notifications when a piece of equipment enters or exits the geofence areas.

The Geofencing GeoJSON data contains a FeatureCollection which consists of two geofences that pertain to distinct polygonal areas within the construction site. The first has no time expirations or restrictions and the second can only be queried during business hours. This data can be uploaded with a POST method call to the mapData endpoint along with the subscription key. Once the data is uploaded,  we can retrieve its metadata to ascertain the created timestamp.

The Logic App will require a resource group and subscription to be deployed. A common trigger function to respond when an HTTP request is received, is sufficient for this purpose. Then the Azure Maps event subscriptions is created. It will require name, event schema, system topic name, filter to event types, endpoint type and endpoint. The Spatial Geofence Get API will send out the notifications on the entry to and exit from the geofence. Each equipment has a device id which is unique so both the entry and exit can be noted. The get method also returns a location in the form of x,y distance from the geofence. A negative distance will imply that the data will lie directly within the polygon.


Tuesday, December 14, 2021

 

Azure Maps:

This is a continuation of a series of articles on operational engineering aspects of Azure public cloud computing. In this article, we take a break to discuss a location service named Azure Maps. This is a full-fledged general availability service that provides similar Service Level Agreements as expected from others in the category.

Azure Maps is a collection of geospatial services and SDKs that fetches the latest geographic data and provides it as a context to web and mobile applications.  Specifically, it provides REST APIs to render vector and raster maps as overlays including satellite imagery, provides creator services to enable indoor map data publication, provides search services to locate addresses, places, and points of interest given indoor and outdoor data, provides various routing options such as point-to-point, multipoint, multipoint optimization, isochrone, electric vehicle, commercial vehicle, traffic influenced, and matrix routing, provides traffic flow view and incidents view, for applications that require real-time traffic information, provides Time zone and Geolocation services, provides elevation services with Digital Elevation Model, provides Geofencing service and mapping data storage, with location information hosted in Azure and provides Location intelligence through geospatial analytics.

SDKs are also available with flavors suited for desktop and mobile applications. Both the SDKs are quite powerful and enhance programmability. They allow customization of interactive maps that can render content and imagery specific to the publisher. The interactive map uses WebGL map control that is known for rendering large datasets with high performance. The SDKs can be used with JavaScript and TypeScript.

Location is a datatype. It can be represented either as a point or a polygon and each helps with answering questions such as getting top 3 stores near to a geographic point or stores within a region. Since it is a data type, there is some standardization available. SQL Server defines not one but two data types for the purpose of specifying location: the Geography data type and the Geometry data type.  The Geography data type stores ellipsoidal data such as GPS Latitude and Longitude and the geometry data type stores Euclidean (flat) coordinate system. The point and the polygon are examples of the Geography data type. Both the geography and the geometry data type must have reference to a spatial system and since there are many of them, it must be used specifically in association with one. This is done with the help of a parameter called the Spatial Reference Identifier or SRID for short. The SRID 4326 is the well-known GPS coordinates that give information in the form of latitude/Longitude. Translation of an address to a Latitude/Longitude/SRID tuple is supported with the help of built-in functions that simply drill down progressively from the overall coordinate span.  A table such as ZipCode could have an identifier, code, state, boundary, and center point with the help of these two data types. The boundary could be considered the polygon formed by the zip and the Center point as the central location in this zip. Distances between stores and their membership to zip can be calculated based on this center point. Geography data type also lets us perform clustering analytics which answers questions such as the number of stores or restaurants satisfying a certain spatial condition and/or matching certain attributes. These are implemented using R-Tree data structures that support such clustering techniques. The geometry data type supports operations such as area and distance because it translates to coordinates.   It has its own rectangular coordinate system that we can use to specify the boundaries or the ‘bounding box’ that the spatial index covers.

Mapping the spatial data involves rendering the data as a layer on top of images. These overlays enhance the display and provide visual aid to the end-users with geographical context. The Azure Maps Power BI provides this functionality to visualize spatial data on top of a map. An Azure Maps account is required to create this resource via the Azure Portal.

Thanks