Cluster computing

Friday, November 19, 2021

This is a continuation of an article that describes operational considerations for hosting solutions on Azure public cloud.

There are several references to best practices throughout the series of articles we wrote from the documentation for the Azure Public Cloud. The previous article focused on the antipatterns to avoid, specifically the Chatty I/O antipattern. This one focuses on improper instantiation antipattern

When new instances of classes are continually created instead of once, they can have a significant impact on performance and responsiveness. Connections and clients are significantly costly resources to setup. They must be created once and reused. Each connection or client instantiation requires server handshakes which not only incur network delay but also involve memory usage, and the cumulative effect of numerous setup requests can slow down the system. There are some common causes of improper instantiation which include:

Connections and clients are created for the purpose of a data access request. When they are scoped to one request for the sake of cleanup, they involve the server to respond. Reading and writing individual records to a database as distinct requests – When records are often fetched one at a time, then a series of queries are run one after the other to get the information. It is exacerbated when the shared libraries use hides this behavior and each access request recreates a connection or a client. The same might happen on write requests.

Implementing a single logical operation as a series of data access requests. This occurs when objects use wrappers for connections and clients and they are scoped to methods invoking them which results in connections and clients to be disposed often. The code appears as if a wrapper is used locally when in fact every instantiation of the wrapper is coming with at least the cost of the RTT. When there are many networks round trips, the cost is cumulative and even prohibitive. It is easily observable when a wrapper has many instantiations, and each time it creates a connection or client. In such case, there is also the requirement to perform validation after every access.

Reading and writing to a file on disk – File I/O also hides the distributed nature of interconnected file systems. Every byte written to a file on a mount must be relayed to the original on the remote server. When the writes are several, the cost accumulates quickly. It is even more noticeable when the writes are only a few bytes and frequent. If each access requires its own connection or client, the application might not even know the high number of connections it is making.

There are several ways to fix the problem. They are about detection and remedy. When the number of server handshakes are many, they can be batched into reused connections via shared clients or connection pooling. The database can be read with a shared and reusable connection pool rather than a single connection. It also provides an opportunity for the database to free up memory corresponding to client connections. Web APIs can be designed with the REST best practices. Instead of separate GET method for different properties there can be single GET method for the resource representing the object.

When more information is retrieved via fewer connection and client instantiations, there is a risk of falling into the extraneous fetching antipattern by trying to prefetch more than is necessary. The right tradeoff depends on the usages. It is also important to read only as much as necessary to avoid both the size and the frequency of connections. Sometimes, connections and clients can also be involving a mixed mode, shared for accounts with most requests and dedicated for everything else. When connection is reused from a shared pool, they need not be locked at too large a scope or for longer duration.

Cluster computing

Friday, November 19, 2021

No comments:

Post a Comment