Cluster computing

Monday, March 14, 2016

We continue reading the paper "Erasure coding in Windows Azure storage" by Cheng Huang, Huseyin Simitci, Yikang Xu, Aaron Ogus, Brad Calder, Parikshit Gopalan, Jin Li, and Sergey Yekhanin.
Erasure coding is a data protection method where data is broken into fragments, expanded and encoded with redundant data pieces and stored across a different set of locations or media. This paper talks about Erasure coding in windows Azure storage WAS. We were discussing the design decisions such as why Reed-Solomon code was not sufficient and why the paper introduced a new one. We also saw how the Local Reconstruction code introduces group fragment parity and improves reconstruction cost. This is with some storage overhead but provides more options with a new trade-off point.
We now look at Fault Tolerance. We saw how data fragments are used to compute each parity in LRC. We now see how the parities are computed from the data fragments. These are called coding equations.They have maximally recoverable property. By this property, LRC can decode any failure pattern which is theoretically decodeable. For example a 6,2,2 LRC contains four parity fragments and can tolerate upto four failures but these failures cannot be arbitrary. If the failures are on x1, x2, x3, and px, then this failure pattern is non-decodeable. There are only two parities which are global and they can only help to decode three missing data fragments and py is not available. It is impossible to decode three data fragments from merely two parity fragments. Such failure patterns are called information-theoretically non-decodeable

#codejam problem
We were discussing sums generated from number sequences.
This has a dynamic programming approach as well
The range of sums that can be generated
In general, if the range of sums that can be generated using the first k numbers in the result is from 0 to r, and the next number has value x, then either:
• x <= r+1 and it is possible to generate any sum from 0 to r+x using the first k+1 numbers, or
• x > r + 1 and it is not possible to generate the sum r + 1, since x and all the following numbers have greater values.

MaxSumImpossible(Numbers, I, j)
If len(Numbers) == 1
Return Numbers[0]
m[I,j] = 0
For k = 0 to j-I-1
if (jth coin <= MaxSumImpossible(Numbers Choose K, I,j-1)
q = MaxSumImpossible(Numbers Choose K, I,j-1) + MaxSumImpossible(Numbers with jth Number only, j,j)
If q > m[I,j]
M[I,j] = q

return m[I,j]

Cluster computing

Monday, March 14, 2016

No comments:

Post a Comment