Monday, October 26, 2015

This paper introduces a technique for evaluating the transformation rules that drive query optimization in a relational database server such as Microsoft SQL Server.  It studies the usage of transformation rules by the query optimizer with the help of profiling tools. Profiling in this case is explained as finding the set of rules that are needed for a particular query to obtain an optimal plan.  This is referred to as an essential set of rules for a particular query.  There may be more than one essential set for a query if there are different ways to generate an optimal plan for the query. Consequently there may be some overlap of rules within the query. A rule that is included in every essential set for a query is deemed a relevant transformation rule for that query. The paper empirically studies the metrics for a variety of workloads. Plan and cost for an execution plan depends on what is chosen by the optimizer and are considered functions of the input query. 
While this paper succeeds in articulating metrics that yield fewer transformation rules for the purpose of studying different workloads by the same query optimizer, the paper could have considered an alternate metric such as precision and recall.  Precision in this case is the ratio that explains number of selected items that are relevant. It is the ration of the true positives to all that were selected by the query optimizer for this query. A true positive is one that improves the query execution plan A false positive doesn’t but shows up in the query plan.  Recall on the other hand is a metric that determines how many relevant items are selected. It is the ratio of the true positives to all those that would have improved the query execution plan from the global set of transformation rules including such that the optimizer did not select. Together precision and recall yields a different metric called F-score which gives the effectiveness of retrieval with respect to a given query. 
The advantages of the metrics from the paper are not lost with the advantages of the proposed metric. On the other hand, it comes closer to the evaluation the paper sought out earlier. The paper suggests that there are less than twenty transformation rules that fall under its “relevant” definition The inclusion of such rule in every execution set merely strengthens the criteria that it is “relevant” This equates to identifying the common rules in the “true positives” of every essential set but their inclusion alone is not a guarantee that they are indeed improving the plan by bringing the cost down At the same time the demarcation of true versus false positives is much more cleaner in what the optimizer has selected and is indeed-relevant to the optimization of the plan. The essential set and the combination of true and false positives are similar in nature although it can be argued that without any comparision across optimization plans for the same query, the true and false positives give a better indicator within the same chosen set for an execution plan. Furthermore, while it can be argued that most if not all the transformation rules may be true positives for a highly effective query optimizer, the differentiation with another choice of query plan cannot really be as well quantified by the “relevant” definition of the paper alone as with the precision and recall. Even the differentiation in terms of the false positives is brought to better light with the use of precision and recall. 
 #coding exercise
Check whether a given linked list represents a palindrome.
The challenge here is sequential acess of elements for comparison rather than the indexed base access.
Solution 1
Determine length of linked list in first pass.
Traverse exactly half and then check the latter half to be the reverse of the preceding half.
Linear time complexity.


No comments:

Post a Comment