Sunday, December 30, 2018

Today we continue discussing the best practice from storage engineering:

250) The algorithm for load-balancing can even be adaptive based on choosing appropriate metrics to determine traffic patterns that are well-known.  We start with a single number to quantify load on each partition and each server and then use the product of request latency and request rate to represent loads.

251) Bitmap indexes are useful for columns with small number of values because they take up less space than B+ tree which requires a value and record pointer tuple for each record. Bitmap are also helpful for conjunctive filters.

252) B+ trees are helpful for fast insertion, delete and update of records. They are generally not as helpful to warehouses as Bitmaps

253) Bulk-load is a very common case in many storage products including data warehouses. They have to be an order of magnitude faster than individual insertions. Typically they will not incur the same overhead for every record and will take up the overhead upfront before the batch or stream into the storage.

254) Bulk Loads may not be as prevalent as when the storage product is already real-time. The only trouble with real-time products is that the read write is not separated from read only and they may contend for mutual exclusion. Moreover sets of queries may not see compatible answers.

255) Update in place and historical queries real-time challenges. If the values of the updates are maintained with their chronological order, then the queries may simply respond with the values of recent past. Such a collection of queries with answers from the same point of time are compatible

A use case for visibility of storage products: https://1drv.ms/w/s!Ashlm-Nw-wnWuDSAzBSGbG3Wy6aG 

No comments:

Post a Comment