Saturday, July 23, 2016

#codingexercise
Merge two binary search trees
Node Merge(Node tree1, Node tree2)
{
var in1 = new List<int>();
InOrder(Node tree1, ref in1);
var in2 = newList<int>();
InOrder(Node tree2, ref in2);
var result = in1.Merge(in2);
return ToBST(result, 0, result.count-1);
}

void InOrder (Node root, List<int> tree)
{
if (root == null) return;
InOrder(root.left);
tree.add(root.data);
InOrder(root.right);
}

Node ToBST(List<int> nums, int start, int end)
{
if (start > end) return null;
int mid = (start + end) / 2;
var root == new Node();
root.data = nums[mid];
root.left = ToBST(nums,start, mid-1);
root.right = ToBST(nums, mid+1, end);
return root;
}
Today we continue our discussion of the  paper titled "Pelican: a  building block for exascale cold data storage". Pelican treats a group of disks as a single schedulable unit. Resource restrictions such as power consumption, vibrations and failure domains are expressed as constraints over these units.
With the help of resource constraints and scheduling units, Pelican aims to be better than its overprovisioned counterpart racks using computations in software stacks
Pelican uses resources from a set of resource domains which is a subset of disks. Pelican proposes a data layout and IO scheduling algorithms by expressing these resource domains as constraints over the disks. 
Pelican reduces the disk layout problem from 1152 Choose 18 disks to one involving l ^ 2 pairs of logical groups.Furthermore, if one of the disks in one group collides with another group, all the disks of that group will collide with those in the other. This reduces the collision probability from g ^ 2 to being proportional to g. This is close to the lower bound on the domain collision probability because g is the number of cooling domains used by a group ( only one disk can be active per cooling domain and disks within a group are domain disjoint)
l and g can be determined in the following manner. we want to maximize the number l of groups of size g given l x g = 1152 and with g >= k + r where r is the groups large enough to store a stripe. In the current implementation the authors use g = 24 rather than g = 18 so that blob can entirely reside within a single group even after some disks have failed. Stripe stacks stored on failed drives are regenerated and stored on other drives in the group. This gives l = 48 which is then divided into 4 classes of 12 groups. Each class is independent of the other. 
Notice how the disk layout algorithm tries to form groups to improve the concurrency. In the paper on local reconstruction codes, there was another attempt to form groups to find reconstruction co-efficients. In this case each group holds within itself all the constraints because disks in  a group are all domain-disjointed which means they don't share any common resource domain.

Data day seattle talks:

https://1drv.ms/w/s!Ashlm-Nw-wnWk3Uo1l-ol2OcM4Nl

No comments:

Post a Comment