Cluster computing

Friday, December 28, 2018

Today we continue discussing the best practice from storage engineering:

240) Sparse data storage involves setting a large number of attributes as null. For a table, this means many of the columns will be empty. The solution to overcoming tremendous disk space waste is to re-organize the data in terms of columns of data tables rather than the rows. This column-oriented storage is very popular in massive data such as with Google’s BigTable, TaggedColumns used by Microsoft Active Directory, and the Resource Description Framework for Semantic Web.
241) Flash Memory is viable and supported in a broad market. It provides notable cost/performance trade-off relative to disk and RAM. Yet disks are not going away anytime soon. They may even show significant catch-up in terms of being intelligent with respect to power management and time-based scheduling of I/O.
242) Clusters deal with nodes and disks as commodity making no differentiation in terms of capacity improved or nodes added. They are tolerant to nodes going down and view the disk array as Network Access Storage. If they could improve resource management with storage classes where groups of disks are treated differently based on power management and I/O scheduling, it will provide tremendous quality of service levels to workloads.
243) While there can be co-ordination between the controller nodes and data nodes in a cluster, an individual disk or a group of disks in a node does not have a dedicated disk worker to schedule I/O to the disks since storage has always been progressive towards higher and higher disk capacity. When the disks are so much cheaper that their expansion by way of numerous additions and earmarking for purposes are possible, then the dispatcher and execution worker model can even be re-evaluated.
244) The process per disk worker model is still in use today. It was used by early DBMS implementations. The I/O scheduling manages the time sharing of the disk workers and the operating system offers protection. This model has been helpful to debuggers and memory checkers.
245) The process pool per disk worker model has alleviated the need to fork processes and tear down and every process in the pool is capable of executing any of the read-writes from any of the clients. The process pool size is generally finite if not fixed. This has all of the advantages from the process per disk worker model above and with the possibility of differentiated processes in the pool and their quota.

Cluster computing

Friday, December 28, 2018

No comments:

Post a Comment