Sunday, December 30, 2018

Today we continue discussing the best practice from storage engineering:

250) The algorithm for load-balancing can even be adaptive based on choosing appropriate metrics to determine traffic patterns that are well-known.  We start with a single number to quantify load on each partition and each server and then use the product of request latency and request rate to represent loads.

251) Bitmap indexes are useful for columns with small number of values because they take up less space than B+ tree which requires a value and record pointer tuple for each record. Bitmap are also helpful for conjunctive filters.

252) B+ trees are helpful for fast insertion, delete and update of records. They are generally not as helpful to warehouses as Bitmaps

253) Bulk-load is a very common case in many storage products including data warehouses. They have to be an order of magnitude faster than individual insertions. Typically they will not incur the same overhead for every record and will take up the overhead upfront before the batch or stream into the storage.

254) Bulk Loads may not be as prevalent as when the storage product is already real-time. The only trouble with real-time products is that the read write is not separated from read only and they may contend for mutual exclusion. Moreover sets of queries may not see compatible answers.

255) Update in place and historical queries real-time challenges. If the values of the updates are maintained with their chronological order, then the queries may simply respond with the values of recent past. Such a collection of queries with answers from the same point of time are compatible

A use case for visibility of storage products: https://1drv.ms/w/s!Ashlm-Nw-wnWuDSAzBSGbG3Wy6aG 

Saturday, December 29, 2018


Today we continue discussing the best practice from storage engineering:

245) The process pool per disk worker model has alleviated the need to fork processes and tear down and every process in the pool is capable of executing any of the read-writes from any of the clients. The process pool size is generally finite if not fixed. This has all of the advantages from the process per disk worker model above and with the possibility of differentiated processes in the pool and their quota.
246) When compute and storage are consolidated, they have to be treated as commodity and the scalability is achieved only with the help of scale-out. On the other hand, they are inherently different. Therefore, nodes dedicated to computation may be separated from nodes dedicated to storage. This lets them both scale and load balance independently.
247) Range-based partitioning /indexing is much more beneficial for sequential access such as with stream which makes enumeration easier and faster because of the locality of a set of ranges. This helps with performance. Hash based indexing is better when we have to fan out the processing in their own partitions for performance and all the hashes fall in the same bucket. This helps with load balancing.
248) Third, throttling or isolation is very useful when accounts are not well behaved.  The statistics is collected by the partition server which keeps track of request rates for accounts and partitions.  The same request rate may also be used for load balancing.
249) Automatic load balancing can now be built on range based partitioning approach and the account-based throttling.  This improves multi-tenancy in the environment as well as handling of peaks in traffic patterns.
250) The algorithm for load-balancing can even be adaptive based on choosing appropriate metrics to determine traffic patterns that are well-known.  We start with a single number to quantify load on each partition and each server and then use the product of request latency and request rate to represent loads.

Friday, December 28, 2018

Today we continue discussing the best practice from storage engineering:

240) Sparse data storage involves setting a large number of attributes as null. For a table, this means many of the columns will be empty. The solution to overcoming tremendous disk space waste is to re-organize the data in terms of columns of data tables rather than the rows. This column-oriented storage is very popular in massive data such as with Google’s BigTable, TaggedColumns used by Microsoft Active Directory, and the Resource Description Framework for Semantic Web.
241) Flash Memory is viable and supported in a broad market. It provides notable cost/performance trade-off relative to disk and RAM. Yet disks are not going away anytime soon. They may even show significant catch-up in terms of being intelligent with respect to power management and time-based scheduling of I/O. 
242) Clusters deal with nodes and disks as commodity making no differentiation in terms of capacity improved or nodes added. They are tolerant to nodes going down and view the disk array as Network Access Storage. If they could improve resource management with storage classes where groups of disks are treated differently based on power management and I/O scheduling, it will provide tremendous quality of service levels to workloads.
243) While there can be co-ordination between the controller nodes and data nodes in a cluster, an individual disk or a group of disks in a node does not have a dedicated disk worker to schedule I/O to the disks since storage has always been progressive towards higher and higher disk capacity. When the disks are so much cheaper that their expansion by way of numerous additions and earmarking for purposes are possible, then the dispatcher and execution worker model can even be re-evaluated.
244) The process per disk worker model is still in use today. It was used by early DBMS implementations.  The I/O scheduling manages the time sharing of the disk workers and the operating system offers protection. This model has been helpful to debuggers and memory checkers.
245) The process pool per disk worker model has alleviated the need to fork processes and tear down and every process in the pool is capable of executing any of the read-writes from any of the clients. The process pool size is generally finite if not fixed. This has all of the advantages from the process per disk worker model above and with the possibility of differentiated processes in the pool and their quota.

Thursday, December 27, 2018

Today we continue discussing the best practice from storage engineering:

233) Strategies never remain the same if the data and the business change. Consequently, even the longest running strategy is constantly re-evaluated to see if it can still perform as well. This has in fact been demonstrated in commercial database systems with the use of query compilation and recompilation and holds equally true for classifiers and other kinds of analysis.

234) Since strategy is best described by logic, it is very helpful to export is as a module so that it can run anywhere after being written once. This has been demonstrated by my machine learning packages and data mining algorithms regardless of the domain in which the data exists.  At a low-level, the same applies to strategies within individual components because even if they are not immediately re-used, it will be helpful to have version control on them.

235) Optimization of an execution does not merely depend on the data and the strategy. It involves hints from the users, environmental factors and parameters.  All of this play a role in driving down the costs and some are easier to tweak than others.

236) Builtins from storage products are helpful to customers because they are already available and tested. However the use of builtins may not always be deterministic and free of side-effects. Care must be taken to document how the behavior might change.

237) Customers might not see how a speed up may occur in complex systems. While auto-tuning may cover some scenarios, the bulk of scenarios at user level maybe covered with automation.

238) Parametrization helps queries because the plans remains the same. Even if the storage
product merely implements API for listing resources, it could parametrize the filters to apply and the same may be applied to all other similar API.

#codingexercise
Int GetSortedTail(String a) {
For (int i = a.length - 1; i >=0; i--)
    For (int j = a+1; j < a.length; j++)
      {
            If (a[i] > a[j]) {
                      Return len - i;
             }
      }
Return len;
}

Wednesday, December 26, 2018

Today we continue discussing the best practice from storage engineering:

230) The results of query execution and materialized views are equally helpful to be cached and persisted separately from the actual data. This reduces the load on the product as well as makes the results available sooner to the queries. 

231) Sampling: Sometimes it is difficult to estimate cost without actually visiting each and every value via estimation. If it were feasible to analyze and summarize the distribution of values with the help of histograms, then it is easier to make a call. Instead we could use sampling techniques to get an estimation without having to exhaust the scan. 

232) Full Iterations are sometimes the only way to exhaust the search space. In top-down approach, at least an early use of Cartesian product for instance can be helpful. This has been acknowledged even in the determination of plan space where the base tables are nested as right- hand inputs only after the Cartesian product has been estimated. 

233) Strategies never remain the same if the data and the business change. Consequently, even the longest running strategy is constantly re-evaluated to see if it can still perform as well. This has in fact been demonstrated in commercial database systems with the use of query compilation and recompilation and holds equally true for classifiers and other kinds of analysis. 

234) Since strategy is best described by logic, it is very helpful to export is as a module so that it can run anywhere after being written once. This has been demonstrated by my machine learning packages and data mining algorithms regardless of the domain in which the data exists.  At a low-level, the same applies to strategies within individual components because even if they are not immediately re-used, it will be helpful to have version control on them. 

235) Optimization of an execution does not merely depend on the data and the strategy. It involves hints from the users, environmental factors and parameters.  All of this play a role in driving down the costs and some are easier to tweak than others.

Tuesday, December 25, 2018

Today we continue discussing the best practice from storage engineering:


225) A shared nothing system must mitigate partial failures. This is a term used to describe the condition when one or more of the participating nodes goes down. In such cases the mitigation may be one of the following: 1) bring down all of the nodes when any one fails which is equivalent to a shared –memory system, 2) use “data skipping” where queries are allowed to be executed on any node that is up and the data on the failed node is skipped and 3) use as much redundancy as necessary to allow queries access to all the data regardless of any unavailability.

226) Search algorithm over data in storage tend to be top-down. Top-down search implies lower costs because it can prune the query plan to what is just relevant. However, top-down search can exhaust memory. It is often helpful, if there could be additional hints taken from the user and the storage system be capable of using the hints.

227) A single query may be run synchronously and serially usually. However, if the user does not see it and there are ways to gain improvement by parallelizing the workers, then it is always better to use that. The caveat here is there forms of two stages: first the work estimation and then the workload distribution

228) Any storage system can be made to perform better with the help of “auto-tuning” In this method, the same workload is studied with different “what-if” plans so that the outcome is chosen as the one most beneficial.

229) Storage queries that are repeated often are useful to cache because chances are the data has not changed significantly to alter the plan that best suits the execution of the query. While the technique has been very popular with relational databases, it actually holds true for many forms of queries and storage products.

230) The results of query execution and materialized views are equally helpful to be cached and persisted separately from the actual data. This reduces the load on the product as well as makes the results available sooner to the queries.


Monday, December 24, 2018

Today we continue discussing the best practice from storage engineering:


221) Sometimes it is helpful to phase out decisions to multiple tiers. For example, with admission control, the tier that handles the connections and dispatches processes may choose to keep the number of client connections below a threshold. At the same time the inner system layer might determine whether the execution is postponed, begins execution with fewer resources or begins execution without restraints.

222) The decision on the resources can come from the cost involved in the query plan. These costs might include the disk devices that the query will access, the number of random and sequential I/Os per device, the estimates of the CPU load of the query, the number of key-values to process and the amount of memory foot-print of the query data structures.

223) With a shared-nothing architecture, there is no sharing at the hardware resource level. In such cases multiple instances of the storage product may be installed or a cluster mode deployment may be involved. Each system in the cluster stores only a portion of the data and requests are sent to other members for their data. This facilitates the horizontal partitioning of data.

224) When the data is partitioned with different collections rather than the same collection but different ranges over participating nodes, it is referred to as vertical partitioning. There are some use cases for this where data may have groups and a group might not require partitioning.

225) A shared nothing system must mitigate partial failures. This is a term used to describe the condition when one or more of the participating nodes goes down. In such cases the mitigation may be one of the following: 1) bring down all of the nodes when any one fails which is equivalent to a shared –memory system, 2) use “data skipping” where queries are allowed to be executed on any node that is up and the data on the failed node is skipped and 3) use as much redundancy as necessary to allow queries access to all the data regardless of any unavailability.