Friday, March 3, 2017

We continue with a detailed study of Microsoft Azure stack as inferred from an introduction of Azure by Microsoft. We reviewed some more features of Azure storage. We saw how erasure coding works in Windows Azure Storage aka WAS and how its different from Pelican. We saw what steps WAS takes to improve durability. We also saw the benefits of journaling. We started reviewing the partition layer which translates objects such as blob, table or queue to storage.  It maintains a massively scalable namespace for the objects and performs load balancing across the available partition servers using a data structure called object table. The partition layer consists of the partition manager, partition servers and lock service.  The partition manager assigns RangePartitions from the object table to individual partition servers and the PM stores this assignment in the Partition Map Table. The partition servers handle concurrent transactions to objects for a RangePartition using the lock service. 
The data structure representing a RangePartition includes the Metadata Stream, the Commit Log Stream, the Row Data Stream and the Blob Data Stream. The Metadata Stream is like the identifier with all information on the RangePartition including the commit logs and the data. It also bears runtime information such as the states of outstanding split and merge operations. The commit log is used to store the recent, insert, update and delete operations since the last checkpoint The two data streams store the checkpoint row data and the blob data bits respectively. At the stream layer, each of the above translates to a stream and as such is persisted. We now look at in-memory data streams.
The in-memory version of the commit log for a RangePartition with changes that have not yet been checkpointed is cached in a memory table.  The index cache stores the checkpoint indexes of the row data stream. This differs from the row data cache that stores the checkpoint row data pages.Bloom filters are used to search based on checkpoints so as to avoid blindly examining all the data stream.
The primary server records both changes and change logs  in the memory table and commit logs respectively. When either of them grows to a certain size, a checkpoint is added with the contents of the memory table stored on the row data stream. This frees up the commit log. To keep the checkpoints finite, periodically they are combined into a larger checkpoint. Blobs are treated somewhat differently. The blob data bit writes are made directly to the commit log stream and the Blob type property of the row tracks the location. 
Extents are stitched together pretty easily because  it just involves pointer operations.
Having looked at the data flow and operations, let us now look at the operations performed by the partition manager on the object table. These involve load balancing, split and merge operations.  Splitting is done when a single RangePartition has too much load and reassigns them to two or more partition servers. The Merge operation merge lightly used RangePartitions together. WAS keeps the total number of partitions between a low watermark and a high watermark. 
#codingexercise
int GetMin(Node root)
{
if (root == null) return INT_MIN;
while (root.left != null)
          root = root.left;
return root.data;
}

No comments:

Post a Comment