Cluster computing

Wednesday, March 1, 2017

We continue with a detailed study of Microsoft Azure stack as inferred from an introduction of Azure by Microsoft. We reviewed some more features of Azure storage. We were discussing the replication flow in Windows Azure Storage service and we looked at sealing operation among extent nodes. We also saw how erasure coding works in Windows Azure Storage aka WAS and how its different from Pelican.
WAS also had to come up with its own IO scheduling. This was due to the design of many hard disk drives that favored throughput over fairness. This implied that a long stream would make a disk lock out non -sequential IO for as long as 2300 milliseconds to service large pipelined reads or writes. To avoid this problem, WAS scheduled new IO to a spindle when there is over 100ms of pending IO request scheduled but not serviced for over 200ms. This improved fairness but came at the cost of some latency on sequential requests.
WAS also improves durability by making sure the replicas are stored on power safe storage which safeguards it from cluster wide power failure.
Moreover, on each extent node, a whole hard disk or SSD is reserved as a journal drive for all writes into the extent node. Since the journal data is time series and all sequential, it let WAS take advantage of the full throughput of the disk. When a write happens, all of it is first written to the journal drive and then queued up to the data disk where the extent files live on that EN. when the journal data write succeeds, the data is buffered in memory which handles reads until it is flushed to data disks. By virtue of buffering, writes can now be consolidated. This has a tradeoff for good latency at the cost of an extra writeoff the critical path. It may be interesting to note that all the stream data is append only and yet the journaling has benefits because the appends do not contend with the data going to the disks. The journal allows append time from the partition layer to have more consistent and lower latencies.

#codingexercise
Print all root to leaf paths
void PrintAllRootToLeaf(Node root, ref List<Node> path)
{
if (root == null) return;
path.Add(root);
PrintAllRootToLeaf(root.left, ref path);
PrintAllRootToLeaf(root.right, ref path);
if (root.left == null && root.right == null)
Console.WriteLine(path.ForEach(x => Console.Write("{0} ", x.data)));
path.RemoveLast();
}
Given a BinaryTree and a value, check if the path sum from root to leaf equals the given value.
void getPathSum(Node root, int sum, ref List<Node> path, ref List<Node> paths)
{
if (root == null) return;
path.Add(root);
getPathSum(root.left, sum, ref path, ref paths);
getPathSum(root.right, sum, ref path, ref paths);
if (root.left == null && root.right == null && path.Sum() == sum)
paths.add(path);
path.RemoveLast();
}
We can compare the above to
bool hasPathSum(Node root, int sum)

{

if (root ==null) return sum == 0;

int newsum = sum-root.data;

if (newsum == 0 && root.left == null && root.right == null) return true;

return hasPathSum(root.left, newsum) || hasPathSum(root.right, newsum);

}

Cluster computing

Wednesday, March 1, 2017

No comments:

Post a Comment