Cluster computing

This is a continuation of the article that introduces a crowdsourcing application. The original problem statement is included again for context.

Social engineering applications provide a wealth of information to the end-user, but the questions and answers received on it are always limited to just that – social circle. Advice solicited for personal circumstances is never appropriate for forums which can remain in public view. It is also difficult to find the right forums or audience where the responses can be obtained in a short time. When we want more opinions in a discrete manner without the knowledge of those who surround us, the options become fewer and fewer. In addition, crowd-sourcing the opinions for a personal topic is not easily available via applications. This document tries to envision an application to meet this requirement.

The previous approach leveraged public cloud services for provisioning queue and document store. It talked a bit about the messaging platform required to support this social-engineering application. The problems encountered with social engineering are well-defined and have precedence in various commercial applications. They are primarily about the feed for each user and the propagation of solicitations to the crowd.

In this section, we refer to the compute requirements for these posts and their responses. The choice of the products or cloud services or their mode of deployment or their SKU is left out of this discussion. The queue can support millions of requests of a few hundred bytes each. The state of the document whose responses are to be collected, is kept in a document store and the state can be changed both by virtue of the processing of the requests in a queue or by administrative actions on the document. The database does not have any exposure to the clients directly other than the queue. This enables the database to be the source of truth for the client state. The queue can have questions or crowdsourced answers and the update to a document is bidirectional. When the clients wake up, they can request their state to be refreshed. This perfects the write update because the data does not need to be sent out. If the queue sends messages back to the clients, it is a fan out process. The devices can choose to check-in at selective times and the server can be selective about which clients to update. Both methods work well in certain situations. The fan-out happens in both writing as well as loading. It can be made selective as well. The fan-out can be limited during both pull and push. Disabling the writes to all devices can significantly reduce the cost. Other devices can load these updates only when reading. It is also helpful to keep track of which devices are active over a period so that only those devices get preference.

The library that automates the translation of states to messages and back supports parallelization so that each worker can take one message or client state at a time and perform the conversion. The translation between state and message is one-to-one mapping and the workers are also assigned the ownership of the translation so that there is no overlap between the tasks executed by the workers. The conversion can happen multiple times so the workers can support multiple stage workflows independent of the clients simply by constructing internal messages for other workers to pick up. All the activities of the workers are logged with the timestamp of the message, the identity of the client for which the state is being synchronized and the identity of the worker. These logs are stored in a way that they can be indexed and searched based on these identifiers for troubleshooting purposes.

The workers can also execute web requests to target the clients directly. They have access to the queue, the database, and the clients. The background jobs that create these workers can be scheduled or periodic or in some cases polled from the queue so that a message on arrival can be associated with a worker. This completes the system of using background workers to perform automation of posting feeds to clients. With a one-to-one mapping between messages and workers and having several workers, it becomes easy to scale the system to handle many clients. Clients are unique by installation on a phone or a mobile handheld device or a web browser.

Cluster computing

Friday, April 29, 2022

No comments:

Post a Comment