Cluster computing

Friday, December 24, 2021

This is a continuation of a series of articles on operational engineering aspects of Azure public cloud computing that included the most recent discussion on Azure SQL Edge which is a full-fledged general availability service that provides similar Service Level Agreements as expected from others in the category.

SQL Edge is an optimized relational database engine that is geared towards edge computing. It provides a high-performance data storage and processing layer for IoT applications. It provides capabilities to stream, process and analyze data where the data can vary from relational to document to graph to time-series and which makes it a right choice for a variety of modern IoT applications. It is built on the same database engine as the SQL Server and Azure SQL so applications will find it convenient to seamlessly use queries that are written in T-SQL. This makes applications portable between devices, datacenters and cloud.

Azure SQL Edge uses the same stream capabilities as Azure Stream Analytics on IoT edge. This native implementation of data streaming is called T-SQL streaming. It can handle fast streaming from multiple data sources. A T-SQL Streaming job consists of a Stream Input that defines the connections to a data source to read the data stream from, a stream output job that defines the connections to a data source to write the data stream to, and a stream query job that defines the data transformation, aggregations, filtering, sorting and joins to be applied to the input stream before it is written to the stream output.

Azure SQL Edge is also noteworthy for bringing the machine learning technique directly to the edge by running ML models for edge devices. SQL Edge supports Open Neural Network Exchange (ONNX) and the model can be deployed with T-SQL. The model can be pre-trained or custom-trained outside the SQL Edge with a choice of frameworks. The model just needs to be in ONNX format. The ONNX model is simply inserted into the models table in the ONNX database and the connection string is sufficient to send the data into SQL. Then the PREDICT method can be run on the data using the model.

ML pipeline is a newer technology as compared to traditional software development stacks and pipelines have generally been on-premises simply due to the latitude in using different frameworks and development styles. Also, experimentation can get out of control from the limits allowed for free-tier in the public cloud. In some cases, Event processing systems such as Apache Spark and Kafka find it easier to replace Extract-Transform-Load solutions that proliferated with data warehouse. The use of SQL Edge avoids the requirement to perform ETL and machine learning models are end-products. They can be hosted in a variety of environments not just the cloud or the SQL Edge. Some ML users would like to load the model on mobile or edge devices. Many IoT traffic and experts agree that the streaming data from edge devices can be quite heavy in traffic where a database system will out-perform any edge device-based computing. Internet tcp relays are of the order of 250-300 milliseconds whereas ingestion rate for database processing can be upwards of thousands of events per second. These are some of the benefits of using machine learning within the database.

Thursday, December 23, 2021

A summary of the book “The Burnout Fix” written by Jacinta M. Jimenez

This is a book that talks about how to overcome overwhelm, beat busy and sustain success in the new world of work. One would think that burnout is a problem where a solution is screaming at you but there is no rapid relief ointment without systemic changes. The book recognizes burnout as a pervasive social problem in the United States. Jacinta M. Jimenez attributes it to factors that must be addressed by both the individual and his/her organization. The organization must treat the symptoms of burnout and need to prevent it altogether. The individual must foster resilience with her science-backed PULSE practices. These practices help to lead a more purpose driven life and support team members’ well-beings.

We see rightaway that there is a shift of focus from grit to resilience. A hyperconnected world demands more from an individual besides their own beliefs to take on unsustainable volumes of work and remain available to tackle tasks. When this goes on for too long, burnout results. Even if we work harder or smarter, we neglect to nurture a steady personal pulse which makes even our successes short-lived.

There are five capabilities suggested to avoid burnout and lead to improvements in the following areas:

1. Behavioral – where we boost our professional and personal growth by developing a healthy performance pace.

2. Cognitive – where we rid ourselves of unhealthy thought patterns

3. Physical – where we embrace the power of leisure as a strategy to protect and restore the reserves of energy

4. Social – where we build a diverse network of social support to make ourselves more adaptable and improve our thinking.

5. Emotional – where we don’t control our priorities or time, evaluate the effort we exert and take control of ourselves.

Most people tackle their goals by breaking them into smaller concrete steps to help avoid cognitive and emotional exhaustion. There are three P’s involved which are 1. Plan where we assess our skills and progressive towards bigger goals using progress indicators. 2. Practice where we commit to continuous learning by experimenting and receiving feedback while journaling our progress. 3. Ponder where we reflect on what worked and what did not work.

We must reduce distracting thoughts and work towards mental clarity using three C’s 1 Curiosity where we identify recurring thoughts and check if they are grounded in reality, 2. Compassion for ourselves where we treat ourselves to our kindness and 3. Calibration where we switch between showing ourselves more compassion to needing more information. We can cultivate this by stacking habits including new habits, scheduling reminders for mind space, breathing, writing down thoughts and learning from binary thinking, sticking with self-compassion and being consistent.

We must prioritize leisure time The ability to enjoy stress free leisure time is essential to keep calm and centered.

We focus on how fast others respond rather than the quality of responses. To give more space to ourselves, we must practice three S’s which are 1. Silence to eliminate duress from devices or going on meditation retreat 2. Sanctuary to go and spend time in nature or to improve our mood and 3. solitude to spend time by ourselves to slow down sensory input.

Social wellness is equally important and rather limited by symptoms before burnout. We feel we belong and can securely access support from community, we reduce the stress on our brain and create conditions for improved productivity. This can be done with three B’s: Belonging where we strengthen our sense of belonging by actively working to be more compassionate. 2. Breadth where we create a visual map of the circles of support and 3. boundaries where we reflect on our personal values.

Energy is finite so we must manage it carefully. We do this with three E’s: 1. Enduring principles where we determine what guides us in our current stage, 2. Energy expenditure where we assess how we spend our energy and 3. Emotional acuity where we resist the tendency to ignore our emotions.

Similarly, we lead healthy teams by embracing 1. Agency, 2. Benevolence, and 3. Community. When leaders demonstrate and implement techniques to increase resilience, it percolates through the rank and file.

Wednesday, December 22, 2021

Azure Machine Learning provides an environment to create and manage the end-to-end life cycle of Machine Learning models. Unlike general purpose software, Azure machine learning has significantly different requirements such as the use of a wide variety of technologies, libraries and frameworks, separation of training and testing phases before deploying and use of a model and iterations for model tuning independent of the model creation and training etc. Azure Machine Learning’s compatibility with open-source frameworks and platforms like PyTorch and TensorFlow makes it an effective all-in-one platform for integrating and handling data and models which tremendously relieves the onus on the business to develop new capabilities. Azure Machine Learning is designed for all skill levels, with advanced MLOps features and simple no-code model creation and deployment.

We will compare this environment with TensorFlow but for those unfamiliar with the latter, here is a use case with TensorFlow. A JavaScript application performs image processing with a machine learning algorithm. When enough training data images have been processed, the model learns the characteristics of the drawings which results in their labels. Then as it runs through the test data set, it can predict the label of the drawing using the model. TensorFlow has a library called Keras which can help author the model and deploy it to an environment such as Colab where the model can be trained on a GPU. Once the training is done, the model can be loaded and run anywhere else including a browser. The power of TensorFlow is in its ability to load the model and make predictions in the browser itself.

The labeling of drawings starts with a sample of say a hundred classes. The data for each class is available on Google Cloud as numpy arrays with several images numbering say N, for that class. The dataset is pre-processed for training where it is converted to batches and outputs the probabilities.

As with any ML learning example, the data is split into 70% training set and 30% test set. There is no order to the data and the split is taken over a random set.

TensorFlow makes it easy to construct this model using the TensorFlow Lite ModelMaker. It can only present the output after the model is trained. In this case, the model must be run after the training data has labels assigned. This might be done by hand. The model works better with fewer parameters. It might contain 3 convolutional layers and 2 dense layers. The pooling size is specified for each of the convolutional layers and they are stacked up on the model. The model is trained using the tf.train.AdamOptimizer() and compiled with a loss function, optimizer just created, and a metric such as top k in terms of categorical accuracy. The summary of the model can be printed for viewing the model. With a set of epochs and batches, the model can be trained. Annotations help TensorFlow Lite converter to fuse TF.Text API. This fusion leads to a significant speedup than conventional models. The architecture for the model is also tweaked to include projection layer along with the usual convolutional layer and attention encoder mechanism which achieves similar accuracy but with much smaller model size. There is native support for HashTables for NLP models.

With the model and training/test sets defined, it is now as easy to evaluate the model and run the inference. The model can also be saved and restored. It is executed faster when there is GPU added to the computing.

When the model is trained, it can be done in batches of predefined size. The number of passes of the entire training dataset called epochs can also be set up front. A batch size of 256 and the number of steps as 5 could be used. These are called model tuning parameters. Every model has a speed, Mean Average Precision and output. The higher the precision, the lower the speed. It is helpful to visualize the training with the help of a high chart that updates the chart with the loss after each epoch. Usually there will be a downward trend in the loss which is referred to as the model is converging.

When the model is trained, it might take a lot of time say about 4 hours. When the test data has been evaluated, the model’s efficiency can be predicted using precision and recall, terms that are used to refer to positive inferences by the model and those that were indeed positive within those inferences.

Azure Machine Learning has a drag and drop interface that can be used to train and deploy models. It uses a machine learning workspace to organize shared resources such as pipelines, datasets, compute resources, registered models, published pipelines, and real-time endpoints. A visual canvas helps build end to end machine learning workflow. It trains, tests and deploys models all in the designer. The datasets and components can be dragged and dropped onto the canvas. A pipeline draft connects the components. A pipeline run can be submitted using the resources in the workspace. The training pipelines can be converted to inference pipelines and the pipelines can be published to submit a new pipeline that can be run with different parameters and datasets. A training pipeline can be reused for different models and a batch inference pipeline can be used to make predictions on new data.

Tuesday, December 21, 2021

This is a continuation of a series of articles on operational engineering aspects of Azure public cloud computing the included the most recent discussion on Azure Maps which is a full-fledged general availability service that provides similar Service Level Agreements as expected from others in the category. In this article, we explore Azure Logic applications.

Each logic app is a workflow that implements some process. This might be a system-to-system process, such as connecting two or more applications. Alternatively, it might be a user-to-system process, one that connects people with software and potentially has long delays. Logic Apps is designed to support either of these scenarios.

Azure Logic Applications is a member of Azure Integration Services. It simplifies the way legacy, modern and niche systems are connected across cloud, on-premises and hybrid environments. The integrated solutions are very valuable for B2B scenarios. Integration services distinguish themselves with four common components in their design – namely, APIs, Events, Messaging, and Orchestration. APIs are a prerequisite for interactions between services. They facilitate functional programmatic access as well as automation. For example, a workflow orchestration might implement a complete business process by invoking different APIs in different applications, each of which carries out some part of that process. Integrating applications commonly requires implementing all or part of a business process. It can involve connecting software-as-a-service implementation such as Salesforce CRM, update on-premises data stored in SQL Server and Oracle database and invoke operations in an external application. These translate to specific business purposes and custom logic for orchestration. Many backend operations are asynchronous by nature requiring background operations. Even APIs are written with asynchronous processing but long running APIs are not easily tolerated. Some form of background processing is required. Situations like this call for a message queue. Events facilitate the notion of publisher-subscriber so that the polling on messages from a queue can be avoided. For example, Event Grid supports subscribers to avoid polling. Rather than requiring a receiver to poll for new messages, the receiver instead registers an event handler for the event source it’s interested in. Event Grid then invokes that event handler when the specified event occurs. Azure Logic applications are workflows. A workflow can easily span all four of these components for its execution.

Azure Logic Applications can be multi-tenant. It is easier to write the application as multi-tenant when we create a workflow from the template's gallery. These range from simple connectivity for Software-as-a-service applications to advanced B2B solutions. Multi-tenancy means there is a shared, common infrastructure across numerous customers simultaneously, leading to economies of scale

Monday, December 20, 2021

Data can be transferred in and out of SQL Edge. For example, data can be synchronized from SQL Edge to Azure Blob storage by using Azure Data factory. As with all SQL instances, the client tools help create the database and the tables. The SQLPackage.exe is used to create and apply a DAC package file to the SQL Edge container. A stored procedure or trigger is used to update the watermark levels for a table. A watermark table is used to store the last timestamp up to which data has already been synchronized with Azure Storage. The stored procedure is run after every synchronization. A Data factory pipeline is used to synchronize data to Azure Blob storage from a table in Azure SQL Edge. This is created by using its user interface. The PeriodicSync property must be set at the time of creation. A lookup activity is used to get the old watermark value. A dataset is created to represent the data in the watermark table. This table contains the old watermark that was used in the previous copy operation. A new Linked Service is created to source the data from the SQL Edge server using a connection credentials. When the connection is tested, it can be used to preview the data to eliminate surprised during synchronization. The pipeline editor is a designer tool where the WatermarkDataset is selected as the source dataset. The lookup activity gets new watermark value from the table that contains the source data so it can be copied to the destination. A query can be added to the pipeline editor for selecting the maximum value of the timestamp from the Watermark table. Only the first row is selected as the new watermark. Incremental progress is maintained by continually advancing the watermark. Not only the source but the sink must also be specified on the editor. The sink will use a new linked service to the blob storage. The success output of a Copy activity is connected to a stored procedure activity which then writes a new watermark. Finally, the pipeline is scheduled to be triggered periodically.

Sunday, December 19, 2021

Edge Computing has developed differently from mainstream desktop, enterprise and cloud computing. The focus has always been on speed rather than data processing which is delegated to the core or cloud computing. Edge Servers work well for machine data collection and Internet of Things. Edge computing is typically associated with Event-Driven Architecture style. It relies heavily on asynchronous backend processing. Some form of message broker becomes necessary to maintain order between events, retries and dead-letter queues.

Azure SQL edge supports two deployment modes – those that are connected through Azure IoT edge and those that have disconnected deployment. The connected deployment requires Azure SQL Edge to be deployed as a module for Azure IoT Edge. In the disconnected deployment mode, it can be deployed as a standalone docker container or a Kubernetes cluster.

There are two editions for the Azure SQL edge – a developer edition and a production sku edition and the spec changes from 4 cores/32 GB to 8 cores and 64 GB. Azure SQL Edge uses the same stream capabilities as Azure Stream Analytics on IoT edge. This native implementation of data streaming is called T-SQL streaming. It can handle fast streaming from multiple data sources. The patterns and relationships in data is extracted from several IoT input sources. The extracted information can be used to trigger actions, alerts and notifications. A T-SQL Streaming job consists of a Stream Input that defines the connections to a data source to read the data stream from, a stream output job that defines the connections to a data source to write the data stream to, and a stream query job that defines the data transformation, aggregations, filtering, sorting and joins to be applied to the input stream before it is written to the stream output.

SQL Edge also support machine learning models by integrating with Open Neural Network Exchange runtimes. The models are developed independent of the edge but can be run on the edge.

Saturday, December 18, 2021

Azure Maps and heatmaps

This is a continuation of a series of articles on operational engineering aspects of Azure public cloud computing. In this article, we continue the discussion on Azure Maps which is a full-fledged general availability service that provides similar Service Level Agreements as expected from others in the category. We focus on one of the features of Azure Maps that enables overlay of images and heatmaps.

Azure Maps is a collection of geospatial services and SDKs that fetches the latest geographic data and provides it as a context to web and mobile applications. Specifically, it provides REST APIs to render vector and raster maps as overlays including satellite imagery, provides creator services to enable indoor map data publication, provides search services to locate addresses, places, and points of interest given indoor and outdoor data, provides various routing options such as point-to-point, multipoint, multipoint optimization, isochrone, electric vehicle, commercial vehicle, traffic influenced, and matrix routing, provides traffic flow view and incidents view, for applications that require real-time traffic information, provides Time zone and Geolocation services, provides elevation services with Digital Elevation Model, provides Geofencing service and mapping data storage, with location information hosted in Azure and provides Location intelligence through geospatial analytics.

The Web SDK for Azure Maps allows several features with the use of its map control. We can create a map, change the style of the map, add controls to the map, add layers on top of the map, add html markers, show traffic, cluster point data, and use data-driven style expressions, use image templates, react to events and make app accessible.

Heatmaps are also known as point density maps because they represent the density of data and the relative density of each data point using a range of colors. This can be overlaid on the maps as a layer. Heat maps can be used in different scenarios including temperature data, data for noise sensors, and GPS trace.

The addition of heat map is as simple as:

Map.layers.add(new atlas.layer.HeatMapLayer(datasource, null, { radius: 10, opacity: 0.8}), ‘labels’);

The opacity or transparency is normalized between 0 and 1. The intensity is a multiplier to the weight of each data point. The weight is a measure of the number of times the data point applies to the map.

Azure maps provides consistent zoomable heat map and the data aggregates together and the heat map might look different from when it was normal focus. Scaling the radius also changes the heat map because it doubles with each zoom level.

All of this processing is on the client side for the rendering of given data points.