This is a continuation of a series of articles on
crowdsourcing application and including the most recent article. The original problem statement is included
again for context.
Social engineering applications provide a wealth
of information to the end-user, but the questions and answers received on it
are always limited to just that – social circle. Advice solicited for personal
circumstances is never appropriate for forums which can remain in public view.
It is also difficult to find the right forums or audience where the responses
can be obtained in a short time. When we want more opinions in a discrete
manner without the knowledge of those who surround us, the options become fewer
and fewer. In addition, crowd-sourcing the opinions for a personal topic is not
easily available via applications. This document tries to envision an
application to meet this requirement.
The previous article continued the elaboration on
the usage of the public cloud services for provisioning queue, document store
and compute. It talked a bit about the messaging platform required to support
this social-engineering application. The problems encountered with social
engineering are well-defined and have precedence in various commercial
applications. They are primarily about the feed for each user and the
propagation of solicitations to the crowd. The previous article described
selective fan out. When the clients wake up, they can request their state to be
refreshed. This perfects the write update because the data does not need to be
sent out. If the queue sends messages back to the clients, it is a fan-out
process. The devices can choose to check-in at selective times and the server
can be selective about which clients to update. Both methods work well in
certain situations. The fan-out happens in both writing as well as loading. It
can be made selective as well. The fan-out can be limited during both pull and
push. Disabling the writes to all devices can significantly reduce the cost.
Other devices can load these updates only when reading. It is also helpful to
keep track of which clients are active over a period so that only those clients
get preference.
In this section, we talk about Chatty I/O antipattern. When I/O requests are frequent
and numerous, they can have a significant impact on performance and
responsiveness. Network calls and other I/O operations are much slower compared
to compute tasks. Each I/O request has a significant overhead as it travels up
and down the networking stack on local and remote and includes the round trip time,
and the cumulative effect of numerous I/O operations can slow down the system.
There are some common causes of chatty I/O which include:
Reading and
writing individual records to a database as distinct requests – When records
are often fetched one at a time, then a series of queries are run one after the
other to get the information. It is exacerbated when the Object-Relational
Mapping hides the behavior underneath the business logic and each entity is
retrieved over several queries. The same might happen on write for an entity.
Implementing a
single logical operation as a series of HTTP requests. This occurs when objects
residing on a remote server are represented as proxy in the memory of the local
system. The code appears as if an object is modified locally when in fact every
modification is coming with at least the cost of the RTT. When there are many
networks round trips, the cost is cumulative and even prohibitive. It is easily
observable when a proxy object has many properties, and each property get / set
requires a relay to the remote object. In such case, there is also the
requirement to perform validation after every access.
Reading and
writing to a file on disk – File I/O also hides the distributed nature of
interconnected file systems. Every byte
written to a file on a mount must be relayed to the original on the remote
server. When the writes are several, the cost accumulates quickly. It is even
more noticeable when the writes are only a few bytes and frequent.
There are
several ways to fix the problem. They are about detection and remedy. When the
number of I/O requests are many, they can be batched into coarse requests. The
database can be read with one query substituting many queries. It also provides
an opportunity for the database to execute it better and faster. Web APIs can
be designed with the REST best practices. Instead of separate GET method for
different properties there can be single GET method for the resource
representing the object. Even if the response body is large, it will likely be
a single request. File I/O can be improved with buffering and using cache.
Files may need not be opened or closed repeatedly. This helps to reduce
fragmentation of the file on disk.
When more
information is retrieved via fewer I/O calls, there is a risk of falling into
the extraneous fetching antipattern. The right tradeoff depends on the usages.
It is also important to read only as much as necessary to avoid both the size
and the frequency of calls. Sometimes, data can also be partitioned into two chunks,
frequently accessed data that accounts for most requests and less frequently
accessed data that is used rarely. When data is written, resources need not be
locked at too large a scope or for longer duration.
No comments:
Post a Comment