Cluster computing

Sunday, July 24, 2016

Today we continue our discussion of the paper titled "Pelican: a building block for exascale cold data storage". Pelican treats a group of disks as a single schedulable unit. Resource restrictions such as power consumption, vibrations and failure domains are expressed as constraints over these units.

With the help of resource constraints and scheduling units, Pelican aims to be better than its over provisioned counterpart racks using computations in software stacks

Pelican uses resources from a set of resource domains which is a subset of disks. Pelican proposes a data layout and IO scheduling algorithms by expressing these resource domains as constraints over the disks.

The data layout algorithm of Pelican divides the disks into l groups. The value of l is selected such that it is the maximum when each group of size g with l x g >= 1152 is such that g>= k + r. K is the number of fragments and r is the number of additional fragments containing redundancy information using a Cauchy Reed-Solomon erasure code. k+ r is the total number of fragments to store. and is referred to a stripe. Therefore it follows that each group must be as large as k + r

One of the advantages of using groups is that it spans multiple failure domains. Disks belonging to a group are distributed across the trays and all the backplanes. In addition, groups reduce time required to recover from a failed disk because all the required data is contained within the same group.

Data is stored in unstructured immutable chunks called blobs. Blobs vary in size from 200MB to 1TB and each blob is uniquely identified by a 20 byte key. Pelican is designed to store blobs which are infrequently accessed. A blob is written to k+r disks in a single randomly selected group.
Disks are selected to store the blog. They are first split into six sets each containing disks from the same backplane failure domain, then they are ordered on spare capacity and then the three disks with the highest spare capacity are selected.
#codingquestion
Int max(int[] nums)
{
Int max = int _min;
for(int i = 0; i < nums.count; i++)
If (nums[i] > max)
Max = nums[i];
Return max;
}

#DataDaySeattleContinued:

https://1drv.ms/w/s!Ashlm-Nw-wnWk3r3LxoLq77q9TSY

Cluster computing

Sunday, July 24, 2016

No comments:

Post a Comment