Cluster computing: Predicate push-down for OData clients:

Abstract:

Data access is an important operational consideration for application performance, but it is often not given enough attention on architecture diagrams. The trouble with data access is that it is often depicted by a straight-line arrow on the data path diagrams between a source and a destination. But the size of data and the queries that can be run over the data might result in vast temporal and spatial spread of bytes transferred and incur varying processing delays. When it is overlooked, it might bring about additional architectural components such as background processors, polling mechanisms and redundant technology stacks for fast path. This article discusses some of the challenges and remediations as it pertains to OData which exposes data to the web.

Description:

The resolution for improving performance of queries is that it mostly involves pushing predicates down into the database and more so even to the query execution and optimization layer within the database so that the optimizer has a chance to determine the best query plan for it.

In the absence of a database, there will be emulation of the work of the query execution inside the database and this is still not likely to be efficient and consistent in all cases simply because it involves an enumeration-based data structure only in the web-service layer.

On the other hand, the database is closest to the storage, indexes and organizes the records so that they are looked up more efficiently. The query plans can be compared and the most efficient can be chosen. Having an in-memory iteration only data structure will only limit us and will not scale to size of data when the query processing is handled at the service layer rather than at the data layer.

Predicates are expected to evaluate the same way regardless of which layer they are implemented in. If we have a set of predicates and they are separated by or clause as opposed to and clause, then we will have a result set from each predicate, and they may involve the same records in the results of each predicate. If we filter based on one predicate and we also allow matches based on another predicate, the two result sets may then be merged into one so that the result can then be returned to the caller. The result sets may have duplicates so the merge may have to return only the distinct elements. This can easily be done by comparing the unique identifiers of each record in the result set.

Cluster computing

Sunday, January 23, 2022

Predicate push-down for OData clients:

No comments:

Post a Comment