Cluster computing

Tuesday, March 14, 2023

Shrinking budgets pose tremendous challenge to organizations with their digital transformation initiatives and cloud adoption roadmap. Technology decision makers must decide what to do with legacy applications that have proliferated prior to the pandemic. There are three main choices available: maintain the status quo and do nothing, migrate and modernize the applications to a modern cloud-based environment or rewrite and replace them. The last one might be tempting given the various capabilities introduced by both AWS and Azure and refreshed knowledge base about the application to be transformed but lift-and-shift costs have been brought down by both the clouds.

As a specific example, significant cost savings can be achieved with just migrating legacy ASP.Net applications from on-premises to the cloud. Traditional .NET applications are well poised for migration by virtue of the .NET runtime on which they run. Azure has claims to provide savings of up to 54% over running applications on-premises and 35% over running them on AWS as per their media reports. Streamlined operations, simplified administration and proximity are the other additional benefits. Built-in tools from Visual Studio and MSSQL provide convenience to migrations for applications and databases respectively.

One of the key differences between the migrations to either public cloud is the offering for Hybrid Benefit from Azure. The Hybrid Benefit is a licensing offer that helps migration to Azure by applying existing licenses to Windows Azure, SQL Server and Linux subscriptions that can realize substantial cost savings. Additionally, services like Azure Arc help to use Azure Kubernetes Service and Azure Stack for Hyperconverged clustering solution to run virtualized workloads on-premises which makes it easy to consolidate aging infrastructure and connect to Azure for cloud services.

Another difference between the migrations to either public cloud is the offering of a calculator to calculate Total Cost of Ownership by Azure. The TCO calculator helps to understand the cost areas that affect the current applications today such as server hardware, software licenses, electricity and labor. It recommends a set of equivalent services in Azure that will support the applications. The analysis shows each cost area with an estimate of the on-premises spending versus the spending in Azure. There are several cost categories that either decrease or go away completely when moving workloads to the cloud. Finally, it helps to create a customized business case to justify migration to Azure. All it takes is a set of three steps: enter a few details about the current infrastructure, review the assumptions and receive a summary with supporting analysis.

The only limitation that an organization faces is one that is self-imposed. Organizations and big company departments might be averse to their employees increasing their cloud budget to anything beyond a thousand dollars a month. This is not the only gap. Business owners cite those existing channels of supply and demand are becoming savvy in their competition with the cloud while the architects do not truly enforce the right practice to keep the overall budget of cloud computing expenses to be under a limit. Employees and resource users are being secured by role-based access control but the privilege to manage subscriptions is granted to those users which allows them to disproportionately escalate costs.

When this is overcome, the benefits outweigh the costs and apprehension.

Monday, March 13, 2023

Continuous event analysis for Fleet Management software:

Use case: Continuous events from fleet management operations involve data that pertain to geospatial analytics and driverless vehicles weblogs in clickstream analytics and point of sale from inventory control. The real-time fleet management of a station involves routing incoming vehicles through the station and scheduling their departures with the objective of optimizing punctuality and regularity of transit service. The purpose is to develop an automated vehicle traffic control system. The scheduling problem is modeled as a bicriteria job shop scheduling problem with additional constraints. There are two objective functions in lexicographical order: first, the minimization of tardiness/earliness and second, the headway optimization. This problem is solved in two steps. A heuristic builds a feasible solution by considering the first objective function. Then the regularity is optimized. This also works well for simulated data at the station. This article investigates the use of a data pipeline and cloud native resources for the management of a fleet.

Implementing a data pipeline:

The example taken here is with regards to the Azure public cloud for pointing to specific products and principles, but any equivalent public cloud resources can be used. There is a point of ingestion from data sources typically via Azure Event Hubs, IoT hub, or BLOB storage. Even tottering options and time windows can be suitably adjusted to perform aggregations. The language of query is SQL, and it can be extended with JavaScript or C sharp user-defined functions. Queries written in SQL are easy to apply to filtering, sorting, and aggregation. Open-source stream analytics software such as Apache Flink also provide SQL like querying ability in addition to the structured query operations familiar with collections and per event processing methods. The topology between ingestion and delivery is handled by this stream analytics service while allowing extensions with the help of reference data stores, Azure functions, and real-time scoring via machine learning services. Event Hubs, Azure BLOB storage, and IoT hubs can collect data on the ingestion side, while they are distributed after analysis via alerts and notifications, dynamic dashboarding, data warehousing, and storage/archival. The fan-out of data to different services is itself a value addition but the ability to transform events into processed events also generates more possibilities for downstream usages including reporting and visualizations. As with all the services in the Azure portfolio, a data pipeline comes with standard deployment using Azure resource manager templates, health monitoring via Azure monitoring, billing usages that can drive down costs, and various forms of programmability options such as SDK, REST-based API services, command-line interfaces, and PowerShell automation. It can be offered as a fully managed PaaS offering so the infrastructure and workflow initializers need not be set up by hand for most deployments. It can also run directly in the cloud instead of an infrastructure like Kubernetes hosted in the cloud and scale to many events with relatively low latency. Such a cloud native continuous event fleet management service can not only be production ready but also reliable in mission-critical deployments. Security and compliance are not sacrificed for the sake of performance as is typical with the best practices of cloud resources.

Sunday, March 12, 2023

MySQL managed instance in the cloud:

Organizations planning to switch to cloud often find a suite of small scale monitoring applications that need to be migrated to the cloud. These are small applications typically persisting state in a MySQL backend store. Among the choices that they have, they include are re-host, re-platform or re-architect.

Monitoring applications are usually written with the intent to monitor resources continuously, catch issues before they become a bottleneck for the operations, understand what is going on and why and prepare contingency plans beforehand.

A simple sample monitoring application when deployed to a public cloud has the following topology usually. It has entry-points via a load balancer for the frontend that is accessible over the internet for its customers and a CLI/CloudShell for the administrators. These entry points reach resources that are deployed within a VNet that spans the web tier and data tier. There can be load balancers before those tiers are accessed because it helps to spread out the traffic for high availability and low latency. The data tier might consist of a flexible MySQL server which uses a read replica.

When we modernize an existing application, we can ease our move to the cloud with the full promise of cloud technology. With a cloud native microservice approach, scalability and flexibility inherent to the cloud can be taken advantage of. Modernizing the cloud native applications enables applications to run concurrently and seamlessly connect with existing investments. Barriers that prohibit productivity and integration are removed.

One of the tenets of modernizing involves "Build-once-and-deploy-on-any-cloud". This process begins with assessing the existing application, building the applications quickly, automating the deployments for productivity and run and consistently manage the modernized application.

Identifying applications that can be readily moved into the cloud platform and those that require refactoring is the first step because the treatments of lift-and-shift and refactoring are quite different. Leveraging containers as the foundation for applications and services is another aspect.

Automating deployments for productivity with a DevOps pipeline makes it quick and reliable. A common management approach to consolidate operations for all applications ensure faster problem resolution.

When the application readiness is assessed, there are four tracks of investigation: cloud migration, cost reduction, agile delivery and innovation resulting in VMs in the cloud for migration purposes or containers for repackaging, re-platforming and refactoring respectively - all of these in the build phase of the build-deploy and run. While VMs are handled by migration accelerators in the deploy phases, the containers are handled by the modern DevOps pipelines in the deploy phase. The modern application runtimes for containers are also different from the common operations on VMs between the migration and modernization paths in the run phase. Finally, the migration results in a complex relocated traditional application while the modernization results in traditional application via repackaging, cloud ready application via re-platforming and cloud native application via refactoring.

Saturday, March 11, 2023

Drone Fleet Management communications

One of the major concerns to address in autonomous fleet management is maintaining the connectivity between drone networks. If all the drones in a single swarm can access a base station or a central satellite system in a single hop manner, then it is optimal in terms of communication messages, their size and count and their relays. This might not scale, because of the requirement for a single base station. A multi-hop ad hoc network infrastructure provides an alternative that can scale and reduce costs. A large area can be covered with drones when they have a reliable and continuous connection.

One of the methods of strengthening connectivity in the wireless network is the deployment of k-connected networks. The network must be at least 1-connected for nodes to communicate with each other. In 1-connected networks, malfunction of a single node can break the network. In a 2-connected framework, there is some redundancy. Malfunction of two nodes can create two networks that cannot communicate with each other. Following this pattern, at least k-nodes must fail for the connectivity to be terminated in a k-connected network. Any node can have arbitrary number of connections with other nodes but it only takes failure of k nodes to split the network. A break in the connectivity can be fixed by adding new nodes or moving existing nodes such that the current k-value of the network can be maintained as much as possible.

Border Gateway Protocol sets a precedent for full mesh network connectivity. It is an inter-autonomous system routing protocol designed for TCP/IP internets. All BGP speakers within a single Autonomous system must be fully meshed, so that any external routing information must be re-distributed to all other routers within that autonomous system. Azure VNet provides another precedent for a full mesh network.

A graph can be used to represent the drone network. It is considered connected if each node has a path to all other vertices in the graph. The minimum number of required vertices, which makes the connected graph disconnected when they are removed is the vertex connectivity of the graph. A k-connected graph has a vertex connectivity of k.

Drone data is significant for analysis because it provides unparalleled vantage. It follows that the decision science using this data can be better improved with data mining. Some of these are captured via a comparison table of well-known data mining algorithms.

Friday, March 10, 2023

Some more notes about contemporary fleet management software follow the previous post.

One of the problem spaces in Fleet management deserves a special mention. This is one where fleet is comprised of vehicles with varying purposes and require a great deal of maintenance. Agricultural fleet management is often considered a concern for farmers or machine contractors. It involves resource allocation, scheduling, routing, and real-time monitoring of vehicles and materials. In order to optimize this management task, fleet management tasks are used for decision support to improve scheduling, routing, and other operational measures for a fleet of agricultural machines. Additionally, this fleet management involves the process of supervising the use and maintenance of machines. The scheduling and routing problems are also heterogeneous. Since the deployment of this fleet is for the purposes of agricultural productivity, operational efficiency is a suitable metric which measures the ration between the actual in-field productivity and the maximum theoretical productivity defined by the maximum operating speed and the maximum working width. It is important to maintain a high efficiency as the non-productive time elements provide a greater proportion of loss in potential machine production. Tractors, combine harvesters and other machinery items radically changed the nature of field operations towards more automation, both in terms of technology and management measures. A combination of factors such as shift from larger to smaller machines, and more intelligent robotics has introduced new capabilities such as establishing and nurturing plants at an individual level. This opportunity to modernize large and small machines in this domain comes with a new requirement for scheduling, monitoring and on-line coordination of multiple vehicles.

The fleet management problem differs from drones in a warehouse by virtue of the interactivity required from the farmer or machine operators. This requires real-time asset management focusing on current fleet locations and the prediction of planned tasks. One of the challenges facing this problem space is the lower general user acceptance that has even inhibited the use of fleet management software into agricultural usage.

Agricultural fleet must involve operations of configurations of teams of identical machines, co-operative machines or machines in co-operation with laborers. Farmers require optimized decision making regarding resource allocation, scheduling routing, real-time monitoring of vehicles and materials and timely field operations or customer orders. Transport control, route guidance in connection with visiting customers, invoicing, data acquisition and other such operations are in focus. Farmers voice requests for more on-farm functionalities such as on-line monitoring and routing.

Thursday, March 9, 2023

This article surveys the contemporary automations and software available from the industry for fleet management.

Most usages of fleet management software such as the open source mentioned above are in the areas of food delivery, emergency services, utility companies, construction, landscaping, public transportation, courier and package delivery services. The users use this software to reduce labor and gas costs, remain in compliance with state and federal regulations, locate and track fleet vehicles, manage vehicle maintenance, improve fleet and driver safety and optimize cost savings.

Commercial fleet management software like the Autosist is primarily a fleet inventory management system with a platform that charges per month per asset if paid annually. They offer flexible pricing options no matter the size of the fleet, which entices small businesses. With the focus on process automation, commercial software provides proprietary software resources and policies for fleet management.

These software applications make it easy to keep track of drivers in the field, plan routes as efficiently as possible to save on fuel costs and stay one step ahead of maintenance tasks. They support GPS vehicle tracking, timesheet tracking, shift and route assignments and group messaging with drivers.

Some are built directly on the cloud. For example, Samsara is a cloud-based fleet management solution that offers features such as GPS tracking, trailer tracking, dashboard camera, routing and dispatch and reefer monitoring. It helps to track the physical location of their fleets and monitor their drivers’ behavior to stay compliant with Electronic Logbook (ELD) and Federal Motor Carrier Safety Administration (FMCSA) regulations which encourage safer work environment for commercial motor vehicle drivers and truck drivers.

FleetIO for instance automates multiple complex management operations, including asset life cycles, fuel efficiency, safety reports, documents associated with the vehicle such as fuel and service receipts and tiles and insurance cards. Cross-platform access and programmability is key to intermodal network. Webhooks and APIs continue to help synchronize the data between disparate networks. There’s also a commenting, photo and notification element to add on, which allows for instant feedback.

Companies like Azuga even provide hardware for fleet managers to install on their vehicles. It tracks equipment and driver behavior. It believes in creating healthy competition among drivers and applies gamification in its driver rewards program to help reward drivers often to prevent churn.

Tracking and alerts are another feature of such commercial software. Some provide real-time tracking while others provide slower refreshes but with detailed alerts.

Wednesday, March 8, 2023

Data in motion – IoT solution and data replication

The transition of data from edge sensors to the cloud is a data engineering pattern that does not always get a proper resolution with the boilerplate Event-Driven architectural design proposed by the public clouds because much of the fine tuning is left to the choice of the resources, event hubs and infrastructure involved in the streaming of events. This article explores the design and data in motion considerations for an IoT solution beginning with an introduction to the public cloud proposed design, the choices between products and the considerations for the handling and tuning of distributed, real-time data streaming systems with particular emphasis on data replication for business continuity and disaster recovery. A sample use case can include the continuous events for geospatial analytics in fleet management and its data can include driverless vehicles weblogs.

Event Driven architecture consists of event producers and consumers. Event producers are those that generate a stream of events and event consumers are ones that listen for events. The right choice of architectural style plays a big role in the total cost of ownership for a solution involving events.

The scale out can be adjusted to suit the demands of the workload and the events can be responded to in real time. Producers and consumers are isolated from one another. IoT requires events to be ingested at very high volumes. The producer-consumer design has scope for a high degree of parallelism since the consumers are run independently and in parallel, but they are tightly coupled to the events. Network latency for message exchanges between producers and consumers is kept to a minimum. Consumers can be added as necessary without impacting existing ones.

Some of the benefits of this architecture include the following: The publishers and subscribers are decoupled. There are no point-to-point integrations. It's easy to add new consumers to the system. Consumers can respond to events immediately as they arrive. They are highly scalable and distributed. There are subsystems that have independent views of the event stream.

Some of the challenges faced with this architecture include the following: Event loss is tolerated so if there needs to be guaranteed delivery, this poses a challenge. IoT traffic mandates a guaranteed delivery. Events are processed in exactly the order they arrive. Each consumer type typically runs in multiple instances, for resiliency and scalability. This can pose a challenge if the processing logic is not idempotent, or the events must be processed in order.

The benefits and the challenges suggest some of these best practices. Events should be lean and mean and not bloated. Services should share only IDs and/or a timestamp. Large data transfer between services is an antipattern. Loosely coupled event driven systems are best.

IoT Solutions can be proposed either with an event driven stack involving open-source technologies or via a dedicated and optimized storage product such as a relational engine that is geared towards edge computing. Either way capabilities to stream, process and analyze data are expected by modern IoT applications. IoT systems vary in flavor and size. Not all IoT systems have the same certifications or capabilities.

When these IoT resources are shared, isolation model, impact-to-scaling performance, state management and security of the IoT resources become complex. Scaling resources helps meet the changing demand from the growing number of consumers and the increase in the amount of traffic. We might need to increase the capacity of the resources to maintain an acceptable performance rate. Scaling depends on number of producers and consumers, payload size, partition count, egress request rate and usage of IoT hubs capture, schema registry, and other advanced features. When additional IoT is provisioned or rate limit is adjusted, the multitenant solution can perform retries to overcome the transient failures from requests. When the number of active users reduces or there is a decrease in the traffic, the IoT resources could be released to reduce costs. Data isolation depends on the scope of isolation. When the storage for IoT is a relational database server, then the IoT solution can make use of IoT Hub. Varying levels and scope of sharing of IoT resources demands simplicity from the architecture. Patterns such as the use of the deployment stamp pattern, the IoT resource consolidation pattern and the dedicated IoT resources pattern help to optimize the operational cost and management with little or no impact on the usages.

Edge computing relies heavily on asynchronous backend processing. Some form of message broker becomes necessary to maintain order between events, retries and dead-letter queues. The storage for the data must follow the data partitioning guidance where the partitions can be managed and accessed separately. Horizontal, vertical, and functional partitioning strategies must be suitably applied. In the analytics space, a typical scenario is to build solutions that integrate data from many IoT devices into a comprehensive data analysis architecture to improve and automate decision making.

Event Hubs, blob storage, and IoT hubs can collect data on the ingestion side, while they are distributed after analysis via alerts and notifications, dynamic dashboarding, data warehousing, and storage/archival. The fan-out of data to different services is itself a value addition but the ability to transform events into processed events also generates more possibilities for downstream usages including reporting and visualizations.

One of the main considerations for data pipelines involving ingestion capabilities for IoT scale data is the business continuity and disaster recovery scenario. This is achieved with replication. A broker stores messages in a topic which is a logical group of one or more partitions. The broker guarantees message ordering within a partition and provides a persistent log-based storage layer where the append-only logs inherently guarantee message ordering. By deploying brokers over more than one cluster, geo-replication is introduced to address disaster recovery strategies.

Each partition is associated with an append-only log, so messages appended to the log are ordered by the time and have important offsets such as the first available offset in the log, the high watermark or the offset of the last message that was successfully written and committed to the log by the brokers and the end offset where the last message was written to the log and exceeds the high watermark. When a broker goes down, subsequent durability and availability must be addressed with replicas. Each partition has many replicas that are evenly distributed but one replica is elected as the leader and the rest are followers. The leader is where all the produce and consume requests go, and followers replicate the writes from the leader.

A pull-based replication model is the norm for brokers where dedicated fetcher threads periodically pull data between broker pairs. Each replica is a byte-for-byte copy of each other, which makes this replication offset preserving. The number of replicas is determined by the replication factor. The leader maintains a ledge called the in-sync replica set, where messages are committed by the leader after all replicas in the ISR set replicate the message. Global availability demands that brokers are deployed with different deployment modes. Two popular deployment modes are 1) a single broker that stretches over multiple clusters and 2) a federation of connected clusters.

Some replicas are asynchronous by nature and are called observers. They do not participate in the in-sync replica or become a partition leader, but they restore availability to the partition and allow producers to produce data again. Connected clusters might involve clusters in distinct and different geographic regions and usually involve linking between the clusters. Linking is an extension of the replica fetching protocol that is inherent to a single cluster. A link contains all the connection information necessary for the destination cluster to connect to the source cluster. A topic on the destination cluster that fetches data over the cluster link is called a mirror topic. This mirror may have a same or prefixed name, synced configurations, byte for byte copy and consumer offsets as well as access control lists.

Managed services over brokers complete the delivery value to the business from standalone deployments of brokers such that cluster sizing, over-provisioning, failover design and infrastructure management are automated. They are known to amplify the availability to 99.99% uptime service-level agreement. Often, they involve a replicator which is a worker that executes connector and its tasks to co-ordinate data streaming between source and destination broker clusters. A replicator has a source consumer that consumes the records from the source cluster and then passes these records to the Connect framework. The Connect framework would have a built-in producer that then produces these records to the destination cluster. It might also have dedicated clients to propagate overall metadata updates to the destination cluster.

In a geographically distributed replication for business continuity and disaster recovery, the primary region has the active cluster that the producers and consumers write to and read from, and the secondary region has read-only clusters with replicated topics for read only consumers. It is also possible to configure two clusters to replicate to each other so that both of them have their own sets of producers and consumers but even in these cases, the replicated topic on either side will only have read-only consumers. Fan-in and Fan-out are other possible arrangements for such replication.

Disaster recovery almost always occurs with a failover of the primary active cluster to a secondary cluster. When disaster strikes, the maximum amount of data usually measured in terms of time that can be lost after a recovery is minimized by virtue of this replication. This is referred to as the Recovery Point Objective. The targeted duration until the service level is restored to the expectations of the business process is referred to as the Recovery Time Objective. The recovery helps the system to be brought back to operational mode. Cost, business requirements, use cases and regulatory and compliance requirements mandate this replication and the considerations made for the data in motion for replication often stand out as best practice for the overall solution.