Wednesday, October 21, 2020

Network engineering continued ...

 This is a continuation of the earlier posts starting with this one: http://ravinote.blogspot.com/2020/09/best-practice-from-networking.html

Sampling: Sometimes it is difficult to estimate cost without actually visiting each and every value via estimation. If it were feasible to analyze and summarize the distribution of values with the help of histograms, then it is easier to make a call. Instead we could use sampling techniques to get an estimation without having to exhaust the scan.

Full Iterations are sometimes the only way to exhaust the search space. In top-down approach, at least an early use of cartesian product for instance can be helpful. This has been acknowledged even in the determination of plan space where the base tables are nested as right- hand inputs only after the cartesian product has been estimated.

Strategies never remain the same if the data and the business change. Consequently, even the longest running strategy is constantly re-evaluated to see if it can still perform as well. This has in fact been demonstrated in commercial database systems with the use of query compilation and recompilation and holds equally true for classifiers and other kinds of analysis.

Since strategy is best described by logic, it is very helpful to export is as a module so that it can run anywhere after being written once. This has been demonstrated by my machine learning packages and data mining algorithms regardless of the domain in which the data exists.  At a low-level, the same applies to strategies within individual components because even if they are not immediately re-used, it will be helpful to have version control on them.

Optimization of an execution does not merely depend on the data and the strategy. It involves hints from the users, environmental factors and parameters.  All of this play a role in driving down the costs and some are easier to tweak than others.

No comments:

Post a Comment