This is a
continuation of a series of articles on crowdsourcing application and including
the most recent article.
The original problem statement is included again for context.
Social
engineering applications provide a wealth of information to the end-user, but
the questions and answers received on it are always limited to just that –
social circle. Advice solicited for personal circumstances is never appropriate
for forums which can remain in public view. It is also difficult to find the
right forums or audience where the responses can be obtained in a short time.
When we want more opinions in a discrete manner without the knowledge of those
who surround us, the options become fewer and fewer. In addition,
crowd-sourcing the opinions for a personal topic is not easily available via
applications. This document tries to envision an application to meet this
requirement.
The previous
article continued the elaboration on the usage of the public cloud services for
provisioning queue, document store and compute. It talked a bit about the
messaging platform required to support this social-engineering application. The
problems encountered with social engineering are well-defined and have
precedence in various commercial applications. They are primarily about the
feed for each user and the propagation of solicitations to the crowd. The
previous article described selective fan out. When the clients wake up, they
can request their state to be refreshed. This perfects the write update because
the data does not need to be sent out. If the queue sends messages back to the
clients, it is a fan-out process. The devices can choose to check-in at
selective times and the server can be selective about which clients to update.
Both methods work well in certain situations. The fan-out happens in both
writing as well as loading. It can be made selective as well. The fan-out can
be limited during both pull and push. Disabling the writes to all devices can
significantly reduce the cost. Other devices can load these updates only when
reading. It is also helpful to keep track of which clients are active over a
period so that only those clients get preference.
We talk about
databases to meet the transactional aspects of the processing on both sides of
the campaign generation side and the response accumulation side. The relational
data from both these sides will need a warehouse where analytical queries can
be run for reporting stacks. Separation
of read-only from read-write store helps with both performance and security.
The choice of
relational/cloud databases is left outside this discussion. Instead, we focus
on the choice of this warehouse. There are five major players – Azure,
BigQuery, Presto, RedShift and Snowflake.
The responses accumulation is inherently tied to users and the warehouse
can expect a lot of users to be differentiated based on their campaigns and
responses. The type of queries invoked on the data is only relevant based on
its accumulation and not in the stream of responses. One response is just like another,
and the queries have little or no benefit to processing them in a stream like
manner as opposed to processing them after their accumulation both from the
individual’s point of view as well as the administrator’s point of view. The
warehouse is also able to reconcile campaign and response activities to remain
a source of truth and maintain accuracy on the tally. It provides the ability
to write queries in simple SQL language and comes free from maintenance when
hosted in the cloud regardless of the size of the data accrued. Picking one or
the other warehouse will enable separation of reporting stack and the fostering
of the other microservices that may be envisioned for future offerings. For
example, a campaign based on responses accumulation could be forked as its own
campaign management microservice utilizing only the database and a message
broker. The microservice model is also best suited for separation of concerns
in promoting offerings from this one-stop shop for responses while the data
layer remains the same. All the microservices are expected to be slim because
there is only a connection facilitated between producers and consumers of responses.
A virtual elastic warehouse is the right choice to make this connection because
it facilitates all kinds of workflows associated with the data most of which
are independent of the transactional processing. Even message brokers work well
with warehouses when the warehouse accepts json. The archiving of response
accumulation mentioned earlier can now be automated to be redirected to the
virtual data warehouse using an automated ingestion capability.
No comments:
Post a Comment