Cluster computing

Saturday, November 20, 2021

This is a continuation of an article that describes operational considerations for hosting solutions on Azure public cloud.

There are several references to best practices throughout the series of articles we wrote from the documentation for the Azure Public Cloud. The previous article focused on the antipatterns to avoid, specifically the cloud readiness antipatterns. This article focuses on the extraneous fetching antipattern.

When services call datastores, they retrieve data for a business operation, but they often result in unnecessary I/O overhead and reduced responsiveness. This antipattern can occur if the application is trying to save on the number of requests by fetching more than required. This is a form of overcompensation and is commonly seen with catalog operations because the filtering is delegated to the middle tier. For example, user may need to see a subset of the details and probably does not need to see all the products at once yet a large dataset from the catalog is retrieved. Even if the user is browsing the entire catalog, paginating the results avoids this antipattern.

Another example of this problem is the inappropriate choices in design or code where for example, a service gets all the product details via the entity framework and then filters only a subset of the fields while discarding the rest. Yet another example is when the application retrieves data to perform an aggregation that could be done by the database instead. The application calculates total sales by getting every record for all orders sold instead of executing a query where the predicates are pushed down to the store. Similarly other manifestations might come about when the EntityFramework uses LINQ to entities. In this case, the filtering is done in memory by retrieving the results from the table because a certain method in the predicate could not be translated to a query. The call to AsEnumerable is a hint that there is a problem because the filtering based on IEnumerable is usually done on the client side rather than the database. The default for LINQ to Entities is IQueryable which pushes the filters to the data source.

Fetching only the relevant columns from a table as compared to fetching all the columns is another classic example of this antipatterns and even though this might have worked when the table was only a few columns wide, it changes the game when the table adds several more columns. Similarly, aggregation performed in the database overcomes this antipattern instead of doing it in memory on the application side.

As with data access best practice, some considerations for performance holds true here as well. Partitioning data horizontally may reduce contention. Operations that support unbounded queries can implement pagination. Features that are built right into the data store can be leveraged. Some calculations need not be repeated especially with summation forms. Queries that return a lot of results can be further filtered. Not all operations can be offloaded to the database but those where the database is highly optimized can be offloaded.

A few ways to detect this antipattern include identifying slow workloads or transactions, behavioral patterns exhibited by the system due to limits, correlating the instances of slow workloads with those patterns, identifying the data stores being used, identify any slow running queries that reference these data source and performing a resource specific analysis of how the data is used and consumed.

These are some of the ways to mitigate this antipattern.

Some of the metrics that help with detecting and mitigation of extraneous fetching antipattern include total bytes per minute, average bytes per transaction and requests per minute.

Cluster computing

Saturday, November 20, 2021

No comments:

Post a Comment