We continue reading the paper "Erasure coding in Windows Azure storage" by Cheng Huang, Huseyin Simitci, Yikang Xu, Aaron Ogus, Brad Calder, Parikshit Gopalan, Jin Li, and Sergey Yekhanin.
Erasure coding is a data protection method where data is broken into fragments, expanded and encoded with redundant data pieces and stored across a different set of locations or media. This paper talks about Erasure coding in windows Azure storage WAS. It introduces new codes called the Local Reconstruction Codes.A (k,l,r) LRC divides k data fragments into l local groups. It encodes l local parities, one for each local group and r global parities. We were discussing the components of the WAS architecture. We discussed the Erasure coding is in the stream layer. We now review LRC in windows Azure storage. The placement of data and parity fragments is based on two factors : load which favors the less occupied and sparse distribution and reliability which favors the separation of faults from correlated domains. There are two such correlated domains one is the rack which can fail altogether and the second is the upgrade which can fail as groups of fragments are taken offline together.
#coding exercise
Determine overlaps in linear time between two sequences
We need offsets and length
If we find a match we remove it from the original string
We repeat until there are no more left.
Matches have same length and sequence.
We iterate from the end of the last match in one string.
Erasure coding is a data protection method where data is broken into fragments, expanded and encoded with redundant data pieces and stored across a different set of locations or media. This paper talks about Erasure coding in windows Azure storage WAS. It introduces new codes called the Local Reconstruction Codes.A (k,l,r) LRC divides k data fragments into l local groups. It encodes l local parities, one for each local group and r global parities. We were discussing the components of the WAS architecture. We discussed the Erasure coding is in the stream layer. We now review LRC in windows Azure storage. The placement of data and parity fragments is based on two factors : load which favors the less occupied and sparse distribution and reliability which favors the separation of faults from correlated domains. There are two such correlated domains one is the rack which can fail altogether and the second is the upgrade which can fail as groups of fragments are taken offline together.
#coding exercise
Determine overlaps in linear time between two sequences
We need offsets and length
If we find a match we remove it from the original string
We repeat until there are no more left.
Matches have same length and sequence.
We iterate from the end of the last match in one string.
No comments:
Post a Comment