Cluster computing

Thursday, March 17, 2016

We continue reading the paper "Erasure coding in Windows Azure storage" by Cheng Huang, Huseyin Simitci, Yikang Xu, Aaron Ogus, Brad Calder, Parikshit Gopalan, Jin Li, and Sergey Yekhanin.
Erasure coding is a data protection method where data is broken into fragments, expanded and encoded with redundant data pieces and stored across a different set of locations or media. This paper talks about Erasure coding in windows Azure storage WAS. We were discussing how to construct the coding equations. The coding coefficients for each group are assigned elements from a finite field and the elements are represented by four bits. The coding coefficient for one group are chosen with their lower order two bits as zero and that of the other is chosen with their highest order two bits as zero. This way they will never be equal and they satisfy the conditions that they are not individually zero and the groupwise sum of the coding coefficients are not equal.
This way all the three failure patterns can be decoded and about 86% of all the four failures. In short, the LRC achieves the maximally recoverable property.
A (k,l,r) LRC divides k data fragments into l local groups. It encodes l local parities and r global parities. The key properties of a (k,l,r) LRC are as follows: i) single data fragment failure can be decoded from k/l fragments ii) arbitrary failures up to r+1 can be decoded. The paper proves that the number of parities of LRC meets the lower bound exactly, LRC achieves its properties with the minimum number of parities.
#coding exercise
Given a set of Paths between nodes in a graph as a string of nodes with overlaps, normalize them by splitting. Overlaps have to include one of the ends of the string and cannot be size less than or equal to two.
We compare strings pairwise to detect overlap. This is an O(N^2) comparisions
However each detection need not cost to the order of the square of the number of elements.
For each pair, we detect the overlap by applying Knuth Morris Pratt which yields O(N) running time. KMP informs the maximum amount of one string matched at the end of other.

Cluster computing

Thursday, March 17, 2016

No comments:

Post a Comment