Cluster computing

Wednesday, March 8, 2023

Data in motion – IoT solution and data replication

The transition of data from edge sensors to the cloud is a data engineering pattern that does not always get a proper resolution with the boilerplate Event-Driven architectural design proposed by the public clouds because much of the fine tuning is left to the choice of the resources, event hubs and infrastructure involved in the streaming of events. This article explores the design and data in motion considerations for an IoT solution beginning with an introduction to the public cloud proposed design, the choices between products and the considerations for the handling and tuning of distributed, real-time data streaming systems with particular emphasis on data replication for business continuity and disaster recovery. A sample use case can include the continuous events for geospatial analytics in fleet management and its data can include driverless vehicles weblogs.

Event Driven architecture consists of event producers and consumers. Event producers are those that generate a stream of events and event consumers are ones that listen for events. The right choice of architectural style plays a big role in the total cost of ownership for a solution involving events.

The scale out can be adjusted to suit the demands of the workload and the events can be responded to in real time. Producers and consumers are isolated from one another. IoT requires events to be ingested at very high volumes. The producer-consumer design has scope for a high degree of parallelism since the consumers are run independently and in parallel, but they are tightly coupled to the events. Network latency for message exchanges between producers and consumers is kept to a minimum. Consumers can be added as necessary without impacting existing ones.

Some of the benefits of this architecture include the following: The publishers and subscribers are decoupled. There are no point-to-point integrations. It's easy to add new consumers to the system. Consumers can respond to events immediately as they arrive. They are highly scalable and distributed. There are subsystems that have independent views of the event stream.

Some of the challenges faced with this architecture include the following: Event loss is tolerated so if there needs to be guaranteed delivery, this poses a challenge. IoT traffic mandates a guaranteed delivery. Events are processed in exactly the order they arrive. Each consumer type typically runs in multiple instances, for resiliency and scalability. This can pose a challenge if the processing logic is not idempotent, or the events must be processed in order.

The benefits and the challenges suggest some of these best practices. Events should be lean and mean and not bloated. Services should share only IDs and/or a timestamp. Large data transfer between services is an antipattern. Loosely coupled event driven systems are best.

IoT Solutions can be proposed either with an event driven stack involving open-source technologies or via a dedicated and optimized storage product such as a relational engine that is geared towards edge computing. Either way capabilities to stream, process and analyze data are expected by modern IoT applications. IoT systems vary in flavor and size. Not all IoT systems have the same certifications or capabilities.

When these IoT resources are shared, isolation model, impact-to-scaling performance, state management and security of the IoT resources become complex. Scaling resources helps meet the changing demand from the growing number of consumers and the increase in the amount of traffic. We might need to increase the capacity of the resources to maintain an acceptable performance rate. Scaling depends on number of producers and consumers, payload size, partition count, egress request rate and usage of IoT hubs capture, schema registry, and other advanced features. When additional IoT is provisioned or rate limit is adjusted, the multitenant solution can perform retries to overcome the transient failures from requests. When the number of active users reduces or there is a decrease in the traffic, the IoT resources could be released to reduce costs. Data isolation depends on the scope of isolation. When the storage for IoT is a relational database server, then the IoT solution can make use of IoT Hub. Varying levels and scope of sharing of IoT resources demands simplicity from the architecture. Patterns such as the use of the deployment stamp pattern, the IoT resource consolidation pattern and the dedicated IoT resources pattern help to optimize the operational cost and management with little or no impact on the usages.

Edge computing relies heavily on asynchronous backend processing. Some form of message broker becomes necessary to maintain order between events, retries and dead-letter queues. The storage for the data must follow the data partitioning guidance where the partitions can be managed and accessed separately. Horizontal, vertical, and functional partitioning strategies must be suitably applied. In the analytics space, a typical scenario is to build solutions that integrate data from many IoT devices into a comprehensive data analysis architecture to improve and automate decision making.

Event Hubs, blob storage, and IoT hubs can collect data on the ingestion side, while they are distributed after analysis via alerts and notifications, dynamic dashboarding, data warehousing, and storage/archival. The fan-out of data to different services is itself a value addition but the ability to transform events into processed events also generates more possibilities for downstream usages including reporting and visualizations.

One of the main considerations for data pipelines involving ingestion capabilities for IoT scale data is the business continuity and disaster recovery scenario. This is achieved with replication. A broker stores messages in a topic which is a logical group of one or more partitions. The broker guarantees message ordering within a partition and provides a persistent log-based storage layer where the append-only logs inherently guarantee message ordering. By deploying brokers over more than one cluster, geo-replication is introduced to address disaster recovery strategies.

Each partition is associated with an append-only log, so messages appended to the log are ordered by the time and have important offsets such as the first available offset in the log, the high watermark or the offset of the last message that was successfully written and committed to the log by the brokers and the end offset where the last message was written to the log and exceeds the high watermark. When a broker goes down, subsequent durability and availability must be addressed with replicas. Each partition has many replicas that are evenly distributed but one replica is elected as the leader and the rest are followers. The leader is where all the produce and consume requests go, and followers replicate the writes from the leader.

A pull-based replication model is the norm for brokers where dedicated fetcher threads periodically pull data between broker pairs. Each replica is a byte-for-byte copy of each other, which makes this replication offset preserving. The number of replicas is determined by the replication factor. The leader maintains a ledge called the in-sync replica set, where messages are committed by the leader after all replicas in the ISR set replicate the message. Global availability demands that brokers are deployed with different deployment modes. Two popular deployment modes are 1) a single broker that stretches over multiple clusters and 2) a federation of connected clusters.

Some replicas are asynchronous by nature and are called observers. They do not participate in the in-sync replica or become a partition leader, but they restore availability to the partition and allow producers to produce data again. Connected clusters might involve clusters in distinct and different geographic regions and usually involve linking between the clusters. Linking is an extension of the replica fetching protocol that is inherent to a single cluster. A link contains all the connection information necessary for the destination cluster to connect to the source cluster. A topic on the destination cluster that fetches data over the cluster link is called a mirror topic. This mirror may have a same or prefixed name, synced configurations, byte for byte copy and consumer offsets as well as access control lists.

Managed services over brokers complete the delivery value to the business from standalone deployments of brokers such that cluster sizing, over-provisioning, failover design and infrastructure management are automated. They are known to amplify the availability to 99.99% uptime service-level agreement. Often, they involve a replicator which is a worker that executes connector and its tasks to co-ordinate data streaming between source and destination broker clusters. A replicator has a source consumer that consumes the records from the source cluster and then passes these records to the Connect framework. The Connect framework would have a built-in producer that then produces these records to the destination cluster. It might also have dedicated clients to propagate overall metadata updates to the destination cluster.

In a geographically distributed replication for business continuity and disaster recovery, the primary region has the active cluster that the producers and consumers write to and read from, and the secondary region has read-only clusters with replicated topics for read only consumers. It is also possible to configure two clusters to replicate to each other so that both of them have their own sets of producers and consumers but even in these cases, the replicated topic on either side will only have read-only consumers. Fan-in and Fan-out are other possible arrangements for such replication.

Disaster recovery almost always occurs with a failover of the primary active cluster to a secondary cluster. When disaster strikes, the maximum amount of data usually measured in terms of time that can be lost after a recovery is minimized by virtue of this replication. This is referred to as the Recovery Point Objective. The targeted duration until the service level is restored to the expectations of the business process is referred to as the Recovery Time Objective. The recovery helps the system to be brought back to operational mode. Cost, business requirements, use cases and regulatory and compliance requirements mandate this replication and the considerations made for the data in motion for replication often stand out as best practice for the overall solution.

Tuesday, March 7, 2023

Some more comparisons with contemporary fleet management software follow the previous post.

This article surveys the contemporary automations and software available from the industry.

Most usages of fleet management software such as the open source mentioned above are in the areas of food delivery, emergency services, utility companies, construction, landscaping, public transportation, courier and package delivery services. The users use these software to reduce labor and gas costs, remain in compliance with state and federal regulations, locate and track fleet vehicles, manage vehicle maintenance, improve fleet and driver safety and optimize cost savings.

Commercial fleet management software like the Autosist is primarily a fleet inventory management system with a platform that charges per month per asset if paid annually. They offer flexible pricing options no matter the size of the fleet which entices small businesses. With the focus on process automation, commercial software provides proprietary software resources and policies for fleet management.

These software applications make it easy to keep track of drivers in the field, plan routes as efficiently as possible to save on fuel costs and stay one step ahead of maintenance tasks. They support GPS vehicle tracking, timesheet tracking, shift and route assignments and group messaging with drivers.

Some are built directly on the cloud. For example, Samsara is a cloud-based fleet management solution that offers features such as GPS tracking, trailer tracking, dashboard camera, routing and dispatch and reefer monitoring. It helps to track the physical location of their fleets and monitor their drivers’ behavior to stay compliant with Electronic Log Book (ELD) and Federal Motor Carrier Safety Administration (FMCSA) regulations which encourage safer work environment for commercial motor vehicle drivers and truck drivers.

FleetIO for instance automates multiple complex management operations, including asset life cycles, fuel efficiency, safety reports, documents associated with the vehicle such as fuel and service receipts and tiles and insurance cards. Cross-platform access and programmability is key to intermodal network. Webhooks and APIs continue to help synchronize the data between disparate networks. There’s also a commenting, photo and notification element to add on, which allows for instant feedback.

Companies like Azuga even provide hardware for fleet managers to install on their vehicles. It tracks equipment and driver behavior. It believes in creating healthy competition among drivers and applies gamification in its driver rewards program to help reward drivers often to prevent churn.

Tracking and alerts are another feature of such commercial software. Some provide real-time tracking while others provide slower refresh but with detailed alerts.

Monday, March 6, 2023

An earlier article introduced us to some of the algorithms and models in fleet management as applied to different problem spaces which included Vehicle routing and scheduling, dynamic fleet management, city logistics, urban public transport, Dial-a-ride transport, air-transport, Maritime transport and Rail and intermodal transport,

Sunday, March 5, 2023

This article surveys the contemporary automations and software available from the industry. We start with the open source which appear to address and automate complex software processes that include dispatch management, GPS-based vehicle tracking, route optimization, vehicle maintenance, and fuel management. Open source is particularly relevant to low-cost deployments. Everlance, Kuebix and Odoo are our picks for comparison. Everlance is a mileage tracking and expense management software solution for businesses with small and large fleets. It uses GPS tools to keep a record of the trips taken by drivers to ensure location accuracy. Drivers can keep Everlance to track and report mileage rates on the field. Fleet Managers can track trip frequencies and suggest optimized routes to drivers. Everlance offers cloud-based deployments as well. Kuebix is a transportation management solution that enables fleet manager to keep track of dispatches. It features include dispatch management, fleet management, routing, shipping, and carrier management. Both cloud based and on-premise deployments are available. Odoo app suite is a customizable open source software suite that provides business solutions to a variety of industries, including fleet management. Odoo allows users to save service records, contracts, vehicle tags, and the make and model of the vehicles in their fleets. The reporting module has graphs and charts for visualization. Cloud based deployments as well as mobile applications are available.

Saturday, March 4, 2023

Several interesting algorithms have been proposed for the Rail and Intermodal problem space and this article continues the enumeration of those that were discussed earlier.

The real-time rail management of a terminus involves routing incoming trains through the station and scheduling their departures with the objective of optimizing punctuality and regularity of train service. The purpose is to develop an automated train traffic control system. The scheduling problem is modeled as a bicriteria job shop scheduling problem with additional constraints. The two objective functions in lexicographical order are the minimization of tardiness/earliness and the headway optimization. The problem is solved in two steps. A heuristic builds a feasible solution by considering the first objective function. Then the regularity is optimized. This works well for simulations of a terminus.

A simulation-based approach is also used for tactical locomotive fleet sizing. Their study shows the throughput increases with the number of locomotives up to a certain level. After that the congestion is caused by the movements of many locomotives in a capacity constrained rail network.

One correlation that seems to hold true is that the decisions on sizing a rail car fleet has a tremendous influence on utilizing that fleet. The optimum use of empty rail cars for demand response is one of the advantages to building a formulation and solving it to optimize the fleet size and freight car allocation under uncertainty demands.

This correlation has given rise to a model that formulates and solves for the optimum fleet size and freight car allocation. This model also provides rail network information such as yard capacity, unmet demands, and number of loaded and empty railcars at any given time and location. It is helpful to managers or decision makers of any train company for planning and management activities. A two-stage solution procedure for solving rail-car fleet.

Drayage operations are specific to rail. Intermodal transportation improves when these operations are considered. In the cities or urban areas, drayage suffers from random transit times. This makes fleet scheduling difficult. A dynamic optimization model could use real-time knowledge of the fleet’s position, permanently enabling the planner to reallocate tasks as the problem conditions change. Tasks can be flexible or well-defined. One application of this model was tried out on test data and then applied to a set of random drayage problems of varying sizes and characteristics.

Tactical design of scheduled service networks for transportation systems is one where different network coordinate and their coordination is critical to the success of the operations. For a given demand, a new model was proposed to determine departure times of the service such that the throughput time is minimized. This is the time that involves processing, inspection, move and queue times during demand. It could be considered a hybrid of some models discussed earlier such as the service network design that involves asset management and multiple fleet co-ordination to emphasize the explicit modeling of different vehicle fleets. Synchronization of collaborative networks and removal of border crossing operations have a significant impact on the throughput time for the freight.

Friday, March 3, 2023

Some more interesting algorithms for the Rail and Intermodal problem space, now follows.

There’s even an approach to do real-time management of a metro rail terminus. It involves routing incoming trains through the station and scheduling their departures with the objective of optimizing punctuality and regularity of train service. The purpose is to develop an automated train traffic control system. The scheduling problem is modeled as a bicriteria job shop scheduling problem with additional constraints. The two objective functions in lexicographical order, are the minimization of tardiness/earliness and the headway optimization. The problem is solved in two steps. A heuristic builds a feasible solution by considering the first objective function. Then the regularity is optimized. This works well for simulations of a terminus.

One correlation that seems to hold true is that the decisions on sizing a rail car fleet has a tremendous influence on utilizing that fleet. The optimum use of empty rail cars for demand response is one of the advantages to building a formulation and optimizing it to optimize the fleet size and freight car allocation under uncertainty demands.

Thursday, March 2, 2023

Several interesting algorithms have been proposed for the Rail and Intermodal problem space. This is a complex system composed by different transport networks, infrastructures, different transport means and operators, such as dryage operators, terminal operators, network operators and others. Intermodal means there are lots of decision makers who must work in a coordinated manner for the system to run smoothly. If intermodal transport is to be developed, it will require more decision-making support tools to assist the decision-makers and stakeholders.

When train operations are perturbed, a new conflict free timetable must be recomputed such that the deviation from the original must be minimized. This scheduling problem is modeled with an alternative graph formulation and a branch and bound algorithm is developed. Some approaches use an integrated framework which deals with signal layout optimization, train scheduling optimization at microscopic level, and others.

Heuristic approaches include a look-ahead greedy heuristic and a global neighborhood search algorithm, in terms of railway total train delay. Scheduling additional train services to be integrated into the current timetables is a problem that is modeled as a hybrid job shop scheduling techniques that operate upon a disjunctive graph model of trains.

One approach develops a train slot selection model based on multicommodity network flow concepts for determining freight train timetables. This helps to schedule rail services along multiple interconnected routes. This model seeks to minimize operating costs incurred by the carriers and delays incurred by the shippers. The schedules and demand levels are ensured to be mutually consistent. When the model is embedded in a simulation, it can be used iteratively and together with the output of the scheduling solution.

Another approach solves the freight transportation on hybrid rail networks used to transport both passengers and freight. It uses a preferred timetable as input for each freight train. Some overrides are permitted such as specifying a path different from the one in the ideal timetable. Its objective is to introduce as many new freight trains as possible by assigning them timetables that are as close as possible to the ideal ones. An integer linear programming method is used in the model.

A third approach specifically considers the double-track train scheduling. It focuses on the high-speed passenger rail line in an existing network and minimizes both the expected wait times for high-speed trains and the total travel times of both speed trains. Using the priority for speed, the problem is translated as multi-mode resource project. It is then solved for scheduling with a branch and bound algorithm and a beam search algorithm.