This is a
continuation of an article that describes operational considerations for hosting
solutions on Azure public cloud.
There are several references to best practices throughout
the series of articles we wrote from the documentation for the Azure Public
Cloud. The previous article focused on the antipatterns to avoid, specifically
the cloud readiness antipatterns. This article focuses on the no-caching
antipattern.
A no-caching
antipattern occurs when a cloud application handles many concurrent requests,
and they fetch the same data. Since there is contention for the data access, it
can reduce performance and scalability. When the data is not cached, it leads
to many manifestations of areas for improvement.
First, the fetching of data can traverse several layers
and go deep into the stack taking significant resource consumption and
increasing costs in terms of I/O overhead and latency. It repeatedly constructs
the same objects or data structures.
Second, it makes excessive calls to a remote service that
has a service quota and throttles clients past a certain limit.
Both these can lead to degradation in response times,
increased contention, and poor scalability.
The examples of no-caching antipattern are easy to spot.
Entity framework calls that are repeatedly called for the same read-only data
fits this antipattern. The use of a cache might have simply been overlooked but
usually the case is that the cache could not be included in the design because
of some unknowns. The benefits and drawbacks of using a cache is not clear
then. There might be a concern about the accuracy and the freshness of the
cached data.
Other times, the cache was left out because the
application was migrated from on-premises where network latency and response
times were controlled. The system might have been running on expensive
high-performance hardware unlike the commodity cloud virtual machine scale
sets.
Rarely, it might even be the case where the caching was
simply left out of the architecture design and for operations to include via
standalone independent products which was not clearly communicated. Other
times, the introduction of a cache might increase latency, maintenance and
ownership and decrease overall availability. It might also interfere with
existing caching strategies and expiration policies of the underlying systems.
Some might prefer to not add an external cache to a database and only as a
sidecar for the web services. It’s true that databases can cache even
materialized views for a connection, but the addition of a cache lookup could
be cheap in all cases where the compute in the deeper systems could be costly
and can be avoided.
There are two strategies to fix the problem. The first
one includes the on-demand network or cache-aside strategy. When the
application tries to read the data from the cache, and if it isn’t there, it
retrieves and puts it in the cache. When the application writes the change
directly to the data source, it removes the old value from the source but
refilled the next time it is required.
Another strategy might be to always keep static resources
in the cache with no expiration date. This is equivalent to CDN usage although
CDNs are for distribution. Applications
that cache dynamic data should be designed to support eventual consistency.
No matter how the cache is implemented, it must support
fallback to the deep data access when the data is not available in the cache.
This Circuit-breaker pattern merely avoids overwhelming the data source.
No comments:
Post a Comment