This is a continuation of the article
that introduces a crowdsourcing application. The original problem statement is
included again for context.
Social engineering applications provide a wealth of
information to the end-user, but the questions and answers received on it are
always limited to just that – social circle. Advice solicited for personal
circumstances is never appropriate for forums which can remain in public view.
It is also difficult to find the right forums or audience where the responses
can be obtained in a short time. When we want more opinions in a discrete
manner without the knowledge of those who surround us, the options become fewer
and fewer. In addition, crowd-sourcing the opinions for a personal topic is not
easily available via applications. This document tries to envision an
application to meet this requirement.
The previous approach leveraged public cloud services for
provisioning queue and document store. It talked a bit about the messaging
platform required to support this social-engineering application. The problems encountered
with social engineering are well-defined and have precedence in various
commercial applications. They are primarily about the feed for each user and
the propagation of solicitations to the crowd.
In this section, we refer to the compute requirements for these
posts and their responses. The choice of the products or cloud services or
their mode of deployment or their SKU is left out of this discussion. The queue
can support millions of requests of a few hundred bytes each. The state of the
document whose responses are to be collected, is kept in a document store and
the state can be changed both by virtue of the processing of the requests in a
queue or by administrative actions on the document. The database does not have
any exposure to the clients directly other than the queue. This enables the
database to be the source of truth for the client state. The queue can have
questions or crowdsourced answers and the update to a document is
bidirectional. When the clients wake up, they can request their state to be
refreshed. This perfects the write update because the data does not need to be
sent out. If the queue sends messages back to the clients, it is a fan out
process. The devices can choose to check-in at selective times and the server
can be selective about which clients to update. Both methods work well in
certain situations. The fan-out happens in both writing as well as loading. It
can be made selective as well. The fan-out can be limited during both pull and
push. Disabling the writes to all devices can significantly reduce the cost.
Other devices can load these updates only when reading. It is also helpful to
keep track of which devices are active over a period so that only those devices
get preference.
The library that automates the translation of states to
messages and back supports parallelization so that each worker can take one
message or client state at a time and perform the conversion. The translation
between state and message is one-to-one mapping and the workers are also
assigned the ownership of the translation so that there is no overlap between
the tasks executed by the workers. The
conversion can happen multiple times so the workers can support multiple stage
workflows independent of the clients simply by constructing internal messages
for other workers to pick up. All the activities of the workers are logged with
the timestamp of the message, the identity of the client for which the state is
being synchronized and the identity of the worker. These logs are stored in a
way that they can be indexed and searched based on these identifiers for
troubleshooting purposes.
The workers can also execute web requests to target the clients
directly. They have access to the queue, the database, and the clients. The
background jobs that create these workers can be scheduled or periodic or in
some cases polled from the queue so that a message on arrival can be associated
with a worker. This completes the system of using background workers to perform
automation of posting feeds to clients. With a one-to-one mapping between
messages and workers and having several workers, it becomes easy to scale the
system to handle many clients. Clients are unique by installation on a phone or
a mobile handheld device or a web browser.
No comments:
Post a Comment