Cluster computing: Network engineering continued ...

Thursday, October 22, 2020

Network engineering continued ...

This is a continuation of the earlier posts starting with this one: http://ravinote.blogspot.com/2020/09/best-practice-from-networking.html

The process per disk worker model is still in use today. It was used by early DBMS implementations. The I/O scheduling manages the time-sharing of the disk workers and the operating system offers protection. This model has been helpful to debuggers and memory checkers.

The process pool per disk worker model has alleviated the need to fork processes and tear down and every process in the pool is capable of executing any of the read-writes from any of the clients. The process pool size is generally finite if not fixed. This has all of the advantages from the process per disk worker model above and with the possibility of differentiated processes in the pool and their quota.

When compute and storage are consolidated, they have to be treated as commodities and the scalability is achieved only with the help of scale-out. On the other hand, they are inherently different. Therefore, nodes dedicated to computation may be separated from nodes dedicated to storage. This lets them both scale and load balance independently.

Range-based partitioning /indexing is much more beneficial for sequential access such as with stream which makes enumeration easier and faster because of the locality of a set of ranges. This helps with performance. Hash-based indexing is better when we have to fan out the processing in their own partitions for performance and all the hashes fall in the same bucket. This helps with load balancing.

Third, throttling or isolation is very useful when accounts are not well behaved.  The statistics are collected by the partition server which keeps track of request rates for accounts and partitions.  The same request rate may also be used for load balancing.

No comments:

Post a Comment

Subscribe to: Post Comments (Atom)