This is a continuation of a series of articles on crowdsourcing application and including the most recent article. The original problem statement is included again for context.
Social engineering applications provide a wealth
of information to the end-user, but the questions and answers received on it
are always limited to just that – social circle. Advice solicited for personal
circumstances is never appropriate for forums which can remain in public view.
It is also difficult to find the right forums or audience where the responses
can be obtained in a short time. When we want more opinions in a discrete
manner without the knowledge of those who surround us, the options become fewer
and fewer. In addition, crowd-sourcing the opinions for a personal topic is not
easily available via applications. This document tries to envision an
application to meet this requirement.
The previous article continued the elaboration on
the usage of the public cloud services for provisioning queue, document store
and compute. It talked a bit about the messaging platform required to support
this social-engineering application. The problems encountered with social
engineering are well-defined and have precedence in various commercial
applications. They are primarily about the feed for each user and the
propagation of solicitations to the crowd. The previous article described
selective fan out. When the clients wake up, they can request their state to be
refreshed. This perfects the write update because the data does not need to be
sent out. If the queue sends messages back to the clients, it is a fan-out
process. The devices can choose to check-in at selective times and the server
can be selective about which clients to update. Both methods work well in
certain situations. The fan-out happens in both writing as well as loading. It
can be made selective as well. The fan-out can be limited during both pull and
push. Disabling the writes to all devices can significantly reduce the cost.
Other devices can load these updates only when reading. It is also helpful to
keep track of which clients are active over a period so that only those clients
get preference.
In this section, we talk about extraneous fetching. When services
call datastores, they retrieve data for a business operation, but they often
result in unnecessary I/O overhead and reduced responsiveness. This antipattern can occur if the application
is trying to save on the number of requests by fetching more than required.
This is a form of overcompensation and is commonly seen with catalog operations
because the filtering is delegated to the middle tier. For example, user may need to see a subset of
the details and probably does not need to see all the responses at once yet a
large dataset from the campaign is retrieved.
Even if the user is browsing the entire campaign, paginating the results
avoids this antipattern.
Another example of this problem is the
inappropriate choices in design or code where for example, a service gets all
the response details via the entity framework and then filters only a subset of
the fields while discarding the rest. Yet another example is when the
application retrieves data to perform an aggregation such as a count of
responses that could be done by the database instead. The application
calculates total sales by getting every record for all orders sold instead of
executing a query where the predicates are pushed down to the store. Similarly
other manifestations might come about when the EntityFramework uses LINQ to
entities. In this case, the filtering is done in memory by retrieving the
results from the table because a certain method in the predicate could not be
translated to a query. The call to AsEnumerable is a hint that there is a
problem because the filtering based on IEnumerable is usually done on the
client side rather than the database. The default for LINQ to Entities is
IQueryable which pushes the filters to the data source.
Fetching only the relevant columns from a table
as compared to fetching all the columns is another classic example of this
antipatterns and even though this might have worked when the table was only a
few columns wide, it changes the game when the table adds several more columns.
Similarly, aggregation performed in the database overcomes this antipattern
instead of doing it in memory on the application side.
As with data access best practice, some
considerations for performance holds true here as well. Partitioning data
horizontally may reduce contention. Operations that support unbounded queries
can implement pagination. Features that are built right into the data store can
be leveraged. Some calculations need not be repeated especially with summation
forms. Queries that return a lot of results can be further filtered. Not all
operations can be offloaded to the database but those where the database is
highly optimized can be offloaded.
A few ways to detect this antipattern include
identifying slow workloads or transactions, behavioral patterns exhibited by
the system due to limits, correlating the instances of slow workloads with
those patterns, identifying the data stores being used, identify any slow
running queries that reference these data source and performing a resource
specific analysis of how the data is used and consumed.
These are some of the ways to mitigate this
antipattern.
Some of the metrics that help with detecting and mitigation
of extraneous fetching antipattern include total bytes per minute, average
bytes per transaction and requests per minute.
No comments:
Post a Comment