Today we continue our discussion of the  paper titled "Pelican: a  building block for exascale cold data storage". Pelican treats a group of disks as a single schedulable unit. Resource restrictions such as power consumption, vibrations and failure domains are expressed as constraints over these units.
With the help of resource constraints and scheduling units, Pelican aims to be better than its over provisioned counterpart racks using computations in software stacks
Pelican uses resources from a set of resource domains which is a subset of disks. Pelican proposes a data layout and IO scheduling algorithms by expressing these resource domains as constraints over the disks. 
Groups are used instead of the underlying hardware constraints. The IO Scheduler in Pelican simply considers which class the group is in rather than the constraints on all the drives. If we increase the number of groups which totally collide, we also increase the number of independent groups leading to better throughput and lower latency.
Let us review next how this IO scheduler works. Pelican reorders requests in order to minimize the impact of spin latency. A set of groups form a class. Within a class only one group can be spinning at a time because of the domain conflicts. The remaining domains are not conflicted and form other classes of groups. In this case, the 48 groups divide into four classes of 12 groups and each class is independent from the others. An independent instance of the scheduler is run for each class and it services requests only for its class. Reordering is therefore class-level.
Traditional IO re-ordering attempts to order the request based on a continuous cost function. In contrast in Pelican there is a fixed constant cost of spinning up a group which is independent of the current set of disks spinning. The cost function is binary and is zero if the currently spinning group and the group to perform IO are the same and one otherwise.
The cost of a proposed schedule of IO request as the sum of the costs for each request. Only one group can be spun up at any time and if there are q requests queued then the cost is 0.92 q because the probability of two consecutive operations on different groups is 0.92 and the worst case is one where every request causes a group to spin. The goal of the IO scheduler is to try to minimize c. In the worst case each spin up incurs a latency of 8 seconds and in the best case only 8 requests are serviced per minute.
There's also cost from failures in addition to the success run.  A disk failure triggers a set of rebuild operations to regenerate the lost stripe-stacks. This rebuild requires activity on the group for a length of time equal to the data of the failed disk divided by the disk throughput. Rebuild traffic cannot be simply prioritized over the incoming requests. The IO scheduler addresses these two challenges using two mechanisms - request reordering and rate limiting.
#codingexercise
Given a Binary Search Tree (BST) and a range [min, max], remove all keys which are outside the given range. The modified tree should also be BST.
void Prune(ref node root, int min, int max)
{
if (root == null) return;
Prune(ref root.left, min, max);
Prune(ref root.right, min, max);
if (root.data < min)
{
node temp = root.right;
delete root;
root = temp
}
if (root.data > max)
{
node temp = root.left;
delete root;
root = temp
}
}
 
No comments:
Post a Comment