Saturday, April 17, 2021

 Writing an automation to synchronize devices using a queue:

Introduction:  When devices access ingress APIs directly, there is no necessity for a queue. Yet queues provicde the ability to hold the requests when background workers complete the task. The former is synchronous and the latter asynchronous. APIs can also support an interface that is asynchronous and when devices call them directly, they can do so in parallel. Devices can also register callbacks. This is a popular programming interface. But the onus is on the devices to be savvy about using the interface to synchronize their state and the server has no knowledge about the client-side processing other than honoring their requests. Queues formalize the definition for the asynchronous behavior in the system. Each request is encapsulated in a message that can be journaled and audited. It supports retries and requeuing that gives an opportunity to multiple stages of workflows to complete. This article describes the usage of the latter whenever it is proper.

Description. We will need a database, a message broker, and a parallel-task library for the purposes of automating the synchronization requests. The message broker is deployed to a cluster so that it can be highly-available by virtue of the elasticity of the nodes servicing the cluster. The queue can support millions of requests of a few hundred bytes each. The state of the devices to be reconciled is kept in a database and the state can be changed both by virtue of the processing of the requests in a queue or by administrative actions on the database. The database does not have any exposure to the devices directly other than the queue. This enables the database to be the source of truth for the device state. The message broker can have internal and external devices and the update to the state is bidirectional. When the devices wake up, they can request their state to be refreshed. This perfects the write update because the data does not need to be sent out. If the queue sends messages back to the devices, it is a fan-out process. The devices can choose to check-in at selective times and the server can be selective about which devices to update. Both methods work well in certain situations. The fan-out happens in both writing as well as loading. It can be made selective as well. The fan-out can be limited during both pull and push. Disabling the writes to all devices can significantly reduce the cost. Other devices can load these updates only when reading. It is also helpful to keep track of which devices are active over a period so that only those devices get preference. 

The library that automates the translation of states to messages and back supports parallelization so that each worker can take one message or device state at a time and perform the conversion. The translation between state and message is one-to-one mapping and the workers are also assigned the ownership of the translation so that there is no overlap between the tasks executed by the workers.  The conversion can happen multiple times so the workers can support multiple stage workflows independent of the devices simply by constructing internal messages for other workers to pick up. All the activities of the workers are logged with the timestamp of the message, the identity of the device for which the state is being synchronized and the identity of the worker. These logs are stored in a way that they can be indexed and searched based on these identifiers for troubleshooting purposes.

The workers can also execute web requests to target the devices directly. They have access to the message broker, the database and the devices. The background jobs that create these workers can be scheduled or periodic or in some cases polled from the queue so that a message on arrival can be associated with a worker.

This completes the system of using background workers to perform automation of device synchronization. With a one-to-one mapping between messages and workers and having several workers, it becomes easy to scale the system to handle a large number of devices.


No comments:

Post a Comment