Cluster computing

Saturday, August 6, 2016

Today we continue our discussion of the paper titled "Pelican: a building block for exascale cold data storage". Pelican treats a group of disks as a single schedulable unit. Resource restrictions such as power consumption, vibrations and failure domains are expressed as constraints over these units.

With the help of resource constraints and scheduling units, Pelican aims to be better than its over provisioned counterpart racks using computations in software stacks

Pelican uses resources from a set of resource domains which is a subset of disks. Pelican proposes a data layout and IO scheduling algorithms by expressing these resource domains as constraints over the disks. We were discussing evaluation of Pelican and the Poisson process to generate a range of workloads Most of these workloads focused on read request because the write requests are offloaded to other storage tiers, we assume that servicing read requests is the key performance requirement for Pelican. The read requests are randomly distributed across all the blobs stored in the rack. Requests operate on a blob size of 1 GB. The evaluation used the following metrics:

Completion time - this is the end to end time between the request initiation by the client and the last data received. It captures the queuing delay, spin up latency, and the time to read and transfer data. The completion time increased linearly with the number of requests for both the rack and the simulator. This was after the simulator and the rack were cross validated. It was configured in such a way that the prototype Pelican rack hardware matched the simulator. It goes to show that both the sequential file read and the disks being spun up take a toll on the completion time Further the disks are being spun up only one at a time and only one disk spinning per tray rather than two disks spinning up. The completion time increased by the order of 3 for an increase by 64 times the number of requests. This covered the range of the most practical range of workload that Pelican would typically be used for. It was also clear that the simulator could be run for extrapolation with just as much reliability as the pelican rack hardware. Consequently the completion time graph gave satisfying results in terms of predictability. The distribution of workload and the effectiveness of Pelican was clear from this linear plot.

#codingexercise

You are given partial information of a binary tree: for each node, its id, and the sum of the ids of its children. This sum is 0 if the node is a leaf, and if the node has just one child, then it is equal to the child's id. Given this, you are asked to output ids of all possible roots of such a tree, given that such a tree does exist.
int GetRootId(List<int>id, List<int>sum)
{
assert(id.count == sum.count);
assert(id.all(x => x > 0));
assert(sum.all(x=>x > 0));
int root = 0;
for (int i =0; i<n; i++)
{
root += id[i] - sum[i];
}
return root;
}

Return the square root of a number

Double SqRt (int n)

{

Double g = n/2;

Double y = n/g;

While(abs (y-g) > 0.001)

{

g = (g + n/g) /2;

y = n / g;

}

Return y;

}

Cluster computing

Saturday, August 6, 2016

No comments:

Post a Comment