Cluster computing

Wednesday, April 6, 2016

Today we start reading the paper Pelican: a building block for exascale cold data storage by Balakrishnan et al. Cold data is the data that is rarely accessed. Pelican is a rack-scale hard-disk based storage unit designed as the basic building block for Exabyte scale storage for cold data. A rack-scale set of computers is a class of hardware with pooled storage, compute and network such that thousands of cores are interconnected at high speed. A rack is increasingly replacing the individual server as the basic building block of modern data centers. At Microsoft, the rack-scale computing, the Rack-scale computing project aims to redesign hardware, storage, networking and OS stacks in a cross layer co-design manner so as to improve efficiencies. By choosing a building block of this size, Pelican aims to make a building block of this size for cold data workloads. The provisioning is considered just right when the total cost of ownership reduces as compared to traditional disk-based storage. For example, only 8% of the drives can be concurrently spinning in this building block. This introduces complex resource management to be handled by Pelican.
This is met in the following way. Resource restrictions are expressed as constraints over the harddrive. The data layer and IO scheduling ensure that these constraints are not violated. Pelican and conventional storage are compared side by side using a cross validated simulator. Pelican demonstrates high throughput with acceptable latency for cold workloads.
The cost of storing data appears as a sliding scale. The storage in Amazon S3 for example is near the high end. But cold data is also growing phenomenally. Amazon Glacier and Facebook cold data storage are just a few examples where S3 doesn't suffice and cloud scale tiers need to be provisioned. Moreover these systems have to minimize up front capital costs of buying the storage as well as the costs of running the storage. By co-designing hardware and software stacks, Pelican aims to allow the entire rack's resources to be carefully balanced to support cold data workloads.
#coding exercise
Given a sorted positive integer array nums and an integer n, add/patch elements to the array such that any number in range [1, n] inclusive can be formed by the sum of some elements in the array. Return the minimum number of patches required.
This has a dynamic programming approach
If the range of sums that can be generated using the first k numbers in nums is from 0 to r, and the next number has value x, then either:
• x <= r+1 and it is possible to generate any sum from 0 to r+x using the first k+1 numbers, or
• x > r + 1 and it is not possible to generate the sum r + 1, since x and all the following numbers have greater values.
We keep adding x=r+1 and counting such additions until r meets or exceeds n.

int GetMinPatches(List<int> nums, int n)
{
int count = 0;
Int index = 0;
int r = MaxSumImpossible(nums, 0, index);
while ( index < nums.count && r < n)
{
Index++;
int x = MaxSumImpossible(nums, 0, index+1)
If ( nums.contains(x) == false){
count ++;
Nums.add (x, index);
Index++
}
r = r +x;
}
return count;
}

int MaxSumImpossible(Numbers, I, j)
if len(Numbers) == 0:
return 0;
If len(Numbers) == 1:
Return Numbers[0]
m[I,j] = 0
For k = 0 to j-I-1
if (jth coin <= MaxSumImpossible(Numbers Choose K, I,j-1)
q = MaxSumImpossible(Numbers Choose K, I,j-1) + MaxSumImpossible(Numbers with jth Number only, j,j)
If q > m[I,j]
M[I,j] = q
return m[I,j]

Cluster computing

Wednesday, April 6, 2016

No comments:

Post a Comment