Cluster computing: finding answers quickly

Wednesday, February 20, 2013

finding answers quickly

Many times when we search, we really are interested in the first few results rather than all of the results we get. There is an urgency for the user to get only the first few or the best few answers quickly. Rarely does the user go past the second or third page of the results. If the user does not find what they want, they usually refine the query and resubmit it. This is true not just for searches but also for decision support. Another pattern is when the user may want to avoid writing a complex query and instead write one that gives an approximate answer quickly, then refine the query continually until an exact answer is found.
Top N Queries refers to the first of these kinds of querying. This is very helpful for the users to be able to explicitly indicate how many answers they want, making it possible for the DBMS to optimize execution. As an example, one could request the products that are top 10 by sales.
So here the optimizer would add a predicate that checks for sales to be greater than a cut off value. Histograms can be used to find to reflect the top values. This cut-off value can be refined if the the tuples are less or more than the number specified by the user but the approach is based on pre-computation.
Another approach is to use online aggregation where the answer quality is continually refined during computation progress. First, the grouping column value is determined to be within a range with a confidence level, then the user input priority is used to determine the order of the tuples to work with and finally use non-blocking algorithms for relational operators. For example, the sort-merge join algorithm blocks because sorting requires all input tuples. Instead nested loops join and hash join are therefore preferable to sort-merge join for online aggregation

Cluster computing

Wednesday, February 20, 2013

finding answers quickly

No comments:

Post a Comment