Wednesday, March 30, 2016

We continue reading the paper "Erasure coding in Windows Azure storage"  by Cheng Huang, Huseyin Simitci, Yikang Xu, Aaron Ogus, Brad Calder, Parikshit Gopalan, Jin Li, and Sergey Yekhanin.
Erasure coding is a data protection method where data is broken into fragments, expanded and encoded with redundant data pieces and stored across a different set of locations or media. This paper talks about Erasure coding in windows Azure storage WAS. It introduces new codes called the Local Reconstruction Codes.A (k,l,r) LRC divides k data fragments into l local groups. It encodes l local parities, one for each local group and r global parities. We were reviewing the performance of LRC in WAS. We now review related work. Erasure coding has been applied in many large scale distributed storage systems. Its better than simple replication because it provides higher reliability and it requires much lower storage for the same reliability. Even when the distributed systems apply erasure coding, they typically apply only Reed-Solomon. LRC has proved better than that both in terms of efficiency and performance tradeoff. We also read on related work and how LRC differs from those. Reed-Solomon codes are maximum distance separable (MDS) codes which require minimum storage overhead for given fault tolerance. LRC is not MDS and requires additional storage overhead for the same fault tolerance.This is exploited for efficient reconstruction. This tradeoff between storage overhead and reconstruction efficiency is also attempted by other systems such as Weaver codes, HoVer codes and Stepped Combination codes. LRC achieves better trade-offs than all of the above to meet WAS erasure coding design goals. One of the alternatives to reconstruction is to read from more fragments instead of fewer fragments but to reduce the amount read from each fragment.However, this does not have the same savings in terms of I/O and bandwidth as LRC does.
Thus we have seen that erasure coding reduces the cost of cloud storage, where our target storage overhead is 1.33x of the original data.The challenge in erasure coding is the fast reconstruction of offline data fragments such as due to disk, node, rack or switch failures as well as during upgrades. LRC reduces the number of fragments to read and perform the reconstruction. It has slightly more overhead but comparable latency for small I/Os and better latency for large I/Os  At the same time it  provide better durability than the traditional approach of keeping 3 copies. Lastly we reviewed some of the implementation and design considerations.
#codingexercise
Reed Solomon corrected errors theoretically by finding the most popular message polynomial. The potential polynomials are repeatedly produced using subsets. How many subsets are there
Solution : The number of subsets is the binomial coefficient n choose k.
int getSubsets (int n, int k)
{
   return Factorial(n) / (Factorial(n-k) * Factorial(k));

No comments:

Post a Comment