Cluster computing

Sunday, April 22, 2018

I came across an interesting discussion on how to efficiently search for multiple predicates and support paging.
The resolution is that we almost always have to push predicates down into the database and more so even to the query so that the optimizer has a chance to determine the best query plan for it.
In the absence of a database, we will be emulating the work of the query execution inside the database and are still not likely to be efficient and consistent in all cases simply because we have an enumeration based data structure only in the web-service layer.
On the other hand, the database is closest to the storage, indexes and organizes the records so that they are looked up more efficiently. The query plans can be compared and the most efficient can be chosen. Having an in-memory iteration only data structure will only limit us and will not scale to size of data.
That said, the predicates are expected to evaluate the same way regardless of which layer they are implemented in. If we have a set of predicates and they are separated by or clause as opposed to and clause, then we will likely have a result set from each predicate and they may involve the same records in the results of each predicate. If we filter based on one predicate and we also allow matches based on another predicate, the two result sets may then be merged into one so that the result can then be returned to the caller. The result sets may have duplicates so the merge may have to return only the distinct elements. This can easily be done by comparing the unique identifiers of each record in the result set.
The selection of the result is required prior to determining the section that needs to be returned to the user. This section is determined by the start,offset pair in the enumeration of the results. If the queries remain the same over time, and the request only varies in the paging parameters, then we can even cache the result and return only the paged section. The API will likely persist the predicate, resultsets in cache so that subsequent calls for paging only results the same responses. This can even be done as part of predicate evaluation by simply passing the well known limit and offset parameter directly in the SQL query. In the enumerator we do this with Skip and Take.

#codingexercise : https://1drv.ms/u/s!Ashlm-Nw-wnWti-TEUQKcdNRtPgJ
#sqlexercise : https://goo.gl/Zz1yoV

Cluster computing

Sunday, April 22, 2018

No comments:

Post a Comment