Cluster computing

Sunday, March 13, 2016

We continue reading the paper "Erasure coding in Windows Azure storage" by Cheng Huang, Huseyin Simitci, Yikang Xu, Aaron Ogus, Brad Calder, Parikshit Gopalan, Jin Li, and Sergey Yekhanin.
Erasure coding is a data protection method where data is broken into fragments, expanded and encoded with redundant data pieces and stored across a different set of locations or media. This paper talks about Erasure coding in windows Azure storage WAS. We were discussing the design decisions such as why Reed-Solomon code was not sufficient and why the paper introduced a new one. Today we continue to read Local Reconstruction Codes.
The Reed-Solomon code explains the concept of a reconstruction cost. In this method, each parity is computed from all of the data fragments.Therefore a (k,n) code will require all k fragments for parity. When any fragment goes down, the number of fragments required to reconstruct is k. This is the reconstruction cost.
In LRC, this is reduced by reducing the number of fragments used to compute a parity. A subset of data fragments is used to compute some of the parities. In a (6,3) code, LRC generates 4 instead of 3 parities where the first two parities are global parities and are computed from all the data fragments. The other two parities the LRC divides the data fragments into two equal size groups and computes one local parity for each group. Instead of reading a global parity and all of the remaining data fragments, it is now more efficient to read the parity from the group and any two data fragments in that group to reconstruct the missing parity. The reconstruction of any single data fragment requires only three fragments, half the number required by the Reed Solomon Code. This is true for any single data fragment. The addition of a data parity block could be called a storage overhead but this is an altogether different tradeoff point from earlier.
To generalize the subgroups and the extra parity needed, a (k,l,r) LRC divides k data fragments into l groups, with k/l data fragments in each group. It computes one local parity against each group. In addition, it computes r global parities from all the data fragments. The reconstruction cost reduces from n = k to n = k/l with a storage overhead of (k+l+r)/k
#codingexercise
we discussed the codejam problem yesterday
we mentioned a solution as
List<int>GetOriginalSet(List<int> a)
{
var ret = new List<int>();
for(int i = 0; a.Count > 0 && i<a[a.Count-1]; i++)
{
if (ret.Combinations().Sums().Contains(i) == false && a.Contains(i)){
ret.Add(i);
}
}
return ret;
}

Cluster computing

Sunday, March 13, 2016

No comments:

Post a Comment