This is a continuation of a series of articles on
crowdsourcing application and including the most recent article. The
original problem statement is included again for context.
Social engineering applications provide a wealth of
information to the end-user, but the questions and answers received on it are
always limited to just that – social circle. Advice solicited for personal
circumstances is never appropriate for forums which can remain in public view.
It is also difficult to find the right forums or audience where the responses
can be obtained in a short time. When we want more opinions in a discrete
manner without the knowledge of those who surround us, the options become fewer
and fewer. In addition, crowd-sourcing the opinions for a personal topic is not
easily available via applications. This document tries to envision an
application to meet this requirement.
The previous article continued the elaboration on the usage
of the public cloud services for provisioning queue, document store and
compute. It talked a bit about the messaging platform required to support this
social-engineering application. The problems encountered with social
engineering are well-defined and have precedence in various commercial
applications. They are primarily about the feed for each user and the
propagation of solicitations to the crowd. The previous article described
selective fan out. When the clients wake up, they can request their state to be
refreshed. This perfects the write update because the data does not need to be
sent out. If the queue sends messages back to the clients, it is a fan-out
process. The devices can choose to check-in at selective times and the server
can be selective about which clients to update. Both methods work well in
certain situations. The fan-out happens in both writing as well as loading. It
can be made selective as well. The fan-out can be limited during both pull and
push. Disabling the writes to all devices can significantly reduce the cost.
Other devices can load these updates only when reading. It is also helpful to
keep track of which clients are active over a period so that only those devices
get preference.
For crowdsourcing applications where
the number of users spans a segment of the population on the planet, the
ability to store becomes like that used by the companies offering social
engineering applications. For example, Presto can be used to store the
high-volume data in NoSQL stores but with the ability to bridge a SQL query
over the data. Presto from Facebook is a distributed SQL query engine can
operate on streams from various data source supporting ad-hoc queries in near
real-time. It does not partition based on MapReduce and executes the query with
a custom SQL execution engine written in Java. It has a pipelined data model
that can run multiple stages at once while pipelining the data between stages
as it becomes available. This reduces end to end time while maximizing
parallelization via stages on large data sets.
Given that users are interested mostly
in the accumulated responses, it might be helpful to view the store as a data
warehouse and one that can be supported in the cloud in virtual data centers,
preferably one that can support data ingestion in the form of JSON from data
pipelines. The ability to perform queries over this warehouse follows the
conventional Online Analytical Processing model and serves the campaign and responses
very well. While the choice of an external data store is not ruled out, it must
scale. There are cost-benefit ratios to consider when deploying custom stores
via something offered from public clouds.
No comments:
Post a Comment