Tuesday, March 29, 2016

We continue reading the paper "Erasure coding in Windows Azure storage"  by Cheng Huang, Huseyin Simitci, Yikang Xu, Aaron Ogus, Brad Calder, Parikshit Gopalan, Jin Li, and Sergey Yekhanin.
Erasure coding is a data protection method where data is broken into fragments, expanded and encoded with redundant data pieces and stored across a different set of locations or media. This paper talks about Erasure coding in windows Azure storage WAS. It introduces new codes called the Local Reconstruction Codes.A (k,l,r) LRC divides k data fragments into l local groups. It encodes l local parities, one for each local group and r global parities. We were reviewing the performance of LRC in WAS. We now review related work. Erasure coding has been applied in many large scale distributed storage systems. Its better than simple replication because it provides higher reliability and it requires much lower storage for the same reliability. Even when the distributed systems apply erasure coding, they typically apply only Reed-Solomon. LRC has proved better than that both in terms of efficiency and performance tradeoff. Moreover, LRC reduces costs. For example, node failures trigger reconstruction which is a significant cost. No data is lost on such transient errors and these account for nearly 90% of data center failures. During these periods as well as upgrades, even though there is no background data rebuilding, reads accessing these unavailable nodes are served through reconstruction. Some system techniques like load balancing and prioritization help but LRC explores whether the erasure coding scheme itself can be optimized to reduce repair traffic and improve user I/Os. In addition to these benefits, LRC exploits non-uniform parity degrees, where some parities connect to fewer data nodes than others. This facilitates efficient reconstruction and follows the new trend in the storage industry.  Previous work applied enumeration and heuristic methods to search for parity check erasure codes of small length, Due to exponential search space, the exploration was limited to 3,4 and 5 parities. These could not tolerate arbitrary three failures  where as LRC can.
#codingexercise
List<int> GetReedSolomon(Int[,] coefficients, List<String> message, int n)
{
var Cx = new List<int>();
for (int i = 0; i < n; i++)
{
Cx.Add(getPolynomial(coefficients, ref message));
}
return Cx;
}
String getPolynomial( Int[,] coefficients ref List<string> message)
{
int sum = 0;
for (int i = 0; i <message.count; i++)
sum += get_coefficient(coefficients, i-1)*message[i];
return sum;

}

Monday, March 28, 2016

We continue reading the paper "Erasure coding in Windows Azure storage"  by Cheng Huang, Huseyin Simitci, Yikang Xu, Aaron Ogus, Brad Calder, Parikshit Gopalan, Jin Li, and Sergey Yekhanin.
Erasure coding is a data protection method where data is broken into fragments, expanded and encoded with redundant data pieces and stored across a different set of locations or media. This paper talks about Erasure coding in windows Azure storage WAS. It introduces new codes called the Local Reconstruction Codes.A (k,l,r) LRC divides k data fragments into l local groups. It encodes l local parities, one for each local group and r global parities. We were reviewing the performance of LRC in WAS with small I/Os and large I/Os.  Finally we read up on the latency comparision for decoding fragments between LRC and Reed Solomon in this performance study. The latency is expected to be improved in LRC due to the reduced number of fragments. However this is much smaller by orders of magnitude than the overall latency to transfer the fragments to perform the reconstruction. This makes the latency comparable between LRC and Reed Solomon
#coding exercise
Implement Galois field polynomial  subtraction
Polynomial is represented as coefficient and power pairs in ascending order
They are grouped and operated.
List <Term> SubtractPolynomial ( List <Term> first, List <Term> Second)
{
int ret = new List <Term>();
first.sort();
second.sort();
int i = 0;
int j = 0;
while (i < first.Count && j < second.count)
{
if (first[i].power == second[j].power){
    ret.Add(new Term(first[i].coefficient - second[j].coefficient, first[i].power));
    i++;
    j++;
    }
else if (first[i].power < second[j].power)
{
    ret.Add(new Term(first[i].coefficient , first[i].power));
    i++;
}
else
{
    ret.Add(new Term(second[j].coefficient , second[j].power));
    j++;
}
}

while ( i < first.Count)
{
    ret.Add(new Term(first[i].coefficient , first[i].power));
    i++;
}
while (j < second.Count)
{
    ret.Add(new Term(second[j].coefficient , second[j].power));
    j++;
}
Return ret;
}

Sunday, March 27, 2016

We continue reading the paper "Erasure coding in Windows Azure storage"  by Cheng Huang, Huseyin Simitci, Yikang Xu, Aaron Ogus, Brad Calder, Parikshit Gopalan, Jin Li, and Sergey Yekhanin.
Erasure coding is a data protection method where data is broken into fragments, expanded and encoded with redundant data pieces and stored across a different set of locations or media. This paper talks about Erasure coding in windows Azure storage WAS. It introduces new codes called the Local Reconstruction Codes.A (k,l,r) LRC divides k data fragments into l local groups. It encodes l local parities, one for each local group and r global parities. We were discussing data consistency checks and the use of CRC. We now look at performance considerations in WAS. Storage is in the firm of blobs, tables, queues and drive. Application have different workloadstyles with smaller size  I/Os for tables and queues and larger I/Os for blobs and drives, typically greater than 4MB. A production cluster was used with significant load variations over time. The LRC was compared with ReedSolomon codes. The key metric for small I/Os was latency and the number of I/Os made by the requests. The key metric for large I/Os was latency and the bandwidth. In the small I/O case the experiments were run with a mixture of direct read  of a single fragment reconstruction reads. When the cluster load is light all the reconstruction cost is very low. The reconstruction reads performed poorly due to effect of the slowest fragments. LRC performed better than Reed Solomon due to the lower number of fragments used.
The results from the large I/O were very different  from the small I/O case. Even when the cluster load is light, the reconstruction with Erasure coding us much slower than direct read cost  due to the bandwidth consumed. When the cluster load was heavy, the latency came from  the network card. Even here the LRC performed better than Reed Solomon due to the reduced number of fragments.
#coding exercise
Implement Galois field polynomial addition or subtraction
Polynomial is represented as coefficient and power pairs in as ending order
They are all grouped and operated.
List <Term> AddPolynomial ( List <Term> first, List <Term> Second)
{
int ret = new List < Term>();
int max = first.last.power;
If ( max < second.last.power )
       max = second.last.power
For (int I = 0; I <= max; I++)
{
      var firstTerm = first.find( power = i); // returns empty if not found
       var secondTerm = second.find(power = i);
      Ret.add ( new Term (firstTerm.coefficient + secondTerm.coefficient,  i));
 }
Return ret;
}

Saturday, March 26, 2016

We continue reading the paper "Erasure coding in Windows Azure storage"  by Cheng Huang, Huseyin Simitci, Yikang Xu, Aaron Ogus, Brad Calder, Parikshit Gopalan, Jin Li, and Sergey Yekhanin.
Erasure coding is a data protection method where data is broken into fragments, expanded and encoded with redundant data pieces and stored across a different set of locations or media. This paper talks about Erasure coding in windows Azure storage WAS. It introduces new codes called the Local Reconstruction Codes.A (k,l,r) LRC divides k data fragments into l local groups. It encodes l local parities, one for each local group and r global parities. We saw how to distribute the fragments in LRC for WAS. We were also reviewing the designing for Erasure Coding where we saw how various I/O types required to be scheduled. We also saw the reconstruction read-ahead and caching. Now we read the consistency of coded data. This is largely done to mitigate data corruption which can occur when data is at rest, being read and written or in transit. Therefore it is essential to check the consistency of data in every step of the storage system and to periodically scrub the data at rest. This is done with checksum and parity which are two primary mechanisms to protect against data corruption.In Windows Azure Storage each append block contains a CRC of the data block which can help detect data and metadata corruptions. If the CRC check fails, then the data is reconstructed using other combinations of erasure coded fragments on the next available EN.
To ensure that the erasure coding operation does not itself introduce data inconsistency, each operation is validated by a few decoding operations. For LRC, these validations include randomly choosing one data fragment in each local group and reconstruct it using its local group, doing the same with one global parity and the remaining data fragments, doing it with the other global parity and the remaining data fragments, choosing two data fragments and reconstructing them, choosing three data fragments and reconstructing them and randomly choosing four data fragments. Finally the coordinator EN performs a CRC of all of the final data fragments. and checks this CRC against that of the original extent Together all these checks ensure data consistency.
Erasure  Coding involves Galois Field arithmetic but using it directly is quite expensive. Instead addition and multiplication tables are used. Reed Solomon codes are better implemented with  the use of XOR operations and more so by ordering the XOR operations based on the patterns in the coding and decoding matrices. This way some redundancies and repetitions are avoided.
#codingexercise
Implement a 3 bit CRC on a 14 bit message
List<Bit> GetCRC(List<bit> message, List<bit> polynomial, int n)
{
for (int i = 0; i < n; i ++) message.add(0); // pad right to n
int last = message.Length - polynomial.Length;
for (int start = 0; start < last; start++)
    if (message.Value() != 0)
        for (int i = start;  < polynomial.Length; i++) message[i] = message[i] ^ polynomial[i];
return message.GetRange(last. n);
}

Friday, March 25, 2016

We continue reading the paper "Erasure coding in Windows Azure storage"  by Cheng Huang, Huseyin Simitci, Yikang Xu, Aaron Ogus, Brad Calder, Parikshit Gopalan, Jin Li, and Sergey Yekhanin.
Erasure coding is a data protection method where data is broken into fragments, expanded and encoded with redundant data pieces and stored across a different set of locations or media. This paper talks about Erasure coding in windows Azure storage WAS. It introduces new codes called the Local Reconstruction Codes.A (k,l,r) LRC divides k data fragments into l local groups. It encodes l local parities, one for each local group and r global parities. We saw how to distribute the fragments in LRC for WAS. We now review the design aspects such as scheduling of various I/O types. There are a variety of I/O types from clients including open/close, read/append, create or delete, replicate, scrub and move operations. If all these I/Os were to happen at their own pace, the system will tank. To make the system fair and responsive, operations are subject to throttling and scheduling at all levels of the storage system. I/O requests can be accepted, rejected or delayed on every extent node and stream manager. Similarly the SM keeps track of data replication load on individual ENs and the system as a whole to make decisions on when to initiate replication, erasure coding, deletion and various other system maintenance.  Coding and decoding both involve multiple ENs, scheduling and throttling of these information become important. It is also important to make sure erasure coding is keeping up with the incoming data rate from the customers as well as the internal system functions  such as garbage collection. If we look at the scale of the overhead in terms of the data size which can easily be petabytes in every few days, the importance of scheduling such operations becomes all the more important.
Reconstruction read ahead and caching is another design point for LRC in WAS. Reconstruction of unavailable fragments is done in unit sizes. These have to be greater than the individual append blocks to reduce the number of disk and network I/Os. Append blocks are 5MB in sizes. Read ahead data of upto 250 MB is cached in memory  (upto 256 MB) of the Extent Node that has done the reconstruction. When the reads are sequential, there is now no need to goto the disk because the cache works well in this regard.
#codingexercise
print all permutations of a string in ascending order
Void Permute (String a, StringBuilder b, bool[] used, ref List<string> permutations) 
{ 
  If ( b.Length == a.Length { permutations.Add(b.toString()); return;} 
  For (int I = 0; I < a.Length ; i++) 
  { 
    If (used[i]) continue; 
     used[i] = true; 
     B += A[i]; 
     Permute(a, b, used, ref permutations); 
     B [B.Length – 1] = ‘/0’; 
     used[i] = false; 
  } 
} 
permutations.Sort();
Another way could be to sort the input string beforehand so as to eliminate the last step sort above.

Thursday, March 24, 2016

We continue reading the paper "Erasure coding in Windows Azure storage"  by Cheng Huang, Huseyin Simitci, Yikang Xu, Aaron Ogus, Brad Calder, Parikshit Gopalan, Jin Li, and Sergey Yekhanin.
Erasure coding is a data protection method where data is broken into fragments, expanded and encoded with redundant data pieces and stored across a different set of locations or media. This paper talks about Erasure coding in windows Azure storage WAS. It introduces new codes called the Local Reconstruction Codes.A (k,l,r) LRC divides k data fragments into l local groups. It encodes l local parities, one for each local group and r global parities.  We were discussing Local Reconstruction codes in windows azure storage.  The placement of the fragments was based on two factors: load which favors the less occupied extent nodes and reliability which avoids placing fragments into the same correlated domain.  Correlated domains can be those from a fault such as on a rack that brings down a few nodes together or they can be on an upgrade domain which categorizes a group of nodes that are taken offline. Let us now see how to place a fragment in WAS.  A WAS stamp consists of twenty racks There are sixteen fragments possible for an extent and each is placed on a different rack for maximum reliability. If each fragment were similarly placed on a different upgrade domain, there would be one too many and resulting in slow down.  There are up to 10 upgrade domains used. In order to place the fragments across upgrade domains, the local group property of LRC is exploited and the group fragments belonging to the different local groups are placed into the same upgrade domains. The local data fragments are placed in the same upgrade domain. Similarly the local data parities are placed in the same upgrade domain.  The global parities are placed in separate upgrade domains.
#codingexercise
After detecting the overlaps from two strings, find if a new given string is a concatenation of two or more overlaps
Void Combine (List<List<char>> overlaps, List<List<char>> b, List<char> target,  int start, int level)
{
For (int I = start ; I < overlaps.Length; i++)
{
b[level] = overlap[i];
if (b.toString() == target.toString()) Console.WriteLine('True');
If (I < overlaps.Length)
Combine(overlaps,b, target, start+1,level+1)
B[level] = NULL;
}

Wednesday, March 23, 2016

We continue reading the paper "Erasure coding in Windows Azure storage"  by Cheng Huang, Huseyin Simitci, Yikang Xu, Aaron Ogus, Brad Calder, Parikshit Gopalan, Jin Li, and Sergey Yekhanin.
Erasure coding is a data protection method where data is broken into fragments, expanded and encoded with redundant data pieces and stored across a different set of locations or media. This paper talks about Erasure coding in windows Azure storage WAS. It introduces new codes called the Local Reconstruction Codes.A (k,l,r) LRC divides k data fragments into l local groups. It encodes l local parities, one for each local group and r global parities. We were discussing the components of the WAS architecture. We discussed the Erasure coding is in the stream layer. We now review LRC in windows Azure storage. The placement of data and parity fragments is based on two factors : load which favors the less occupied and sparse distribution and reliability which favors the separation of faults from correlated domains. There are two such correlated domains one is the rack which can fail altogether and the second is the upgrade which can fail as groups of fragments are taken offline together.
#coding exercise
Determine overlaps in linear time between two sequences
We need offsets and length
If we find a match  we remove it from the original string
We repeat until there are no more left.
Matches have same length and sequence.
We iterate from the end of the last match in one string.