Friday, March 25, 2016

We continue reading the paper "Erasure coding in Windows Azure storage"  by Cheng Huang, Huseyin Simitci, Yikang Xu, Aaron Ogus, Brad Calder, Parikshit Gopalan, Jin Li, and Sergey Yekhanin.
Erasure coding is a data protection method where data is broken into fragments, expanded and encoded with redundant data pieces and stored across a different set of locations or media. This paper talks about Erasure coding in windows Azure storage WAS. It introduces new codes called the Local Reconstruction Codes.A (k,l,r) LRC divides k data fragments into l local groups. It encodes l local parities, one for each local group and r global parities. We saw how to distribute the fragments in LRC for WAS. We now review the design aspects such as scheduling of various I/O types. There are a variety of I/O types from clients including open/close, read/append, create or delete, replicate, scrub and move operations. If all these I/Os were to happen at their own pace, the system will tank. To make the system fair and responsive, operations are subject to throttling and scheduling at all levels of the storage system. I/O requests can be accepted, rejected or delayed on every extent node and stream manager. Similarly the SM keeps track of data replication load on individual ENs and the system as a whole to make decisions on when to initiate replication, erasure coding, deletion and various other system maintenance.  Coding and decoding both involve multiple ENs, scheduling and throttling of these information become important. It is also important to make sure erasure coding is keeping up with the incoming data rate from the customers as well as the internal system functions  such as garbage collection. If we look at the scale of the overhead in terms of the data size which can easily be petabytes in every few days, the importance of scheduling such operations becomes all the more important.
Reconstruction read ahead and caching is another design point for LRC in WAS. Reconstruction of unavailable fragments is done in unit sizes. These have to be greater than the individual append blocks to reduce the number of disk and network I/Os. Append blocks are 5MB in sizes. Read ahead data of upto 250 MB is cached in memory  (upto 256 MB) of the Extent Node that has done the reconstruction. When the reads are sequential, there is now no need to goto the disk because the cache works well in this regard.
#codingexercise
print all permutations of a string in ascending order
Void Permute (String a, StringBuilder b, bool[] used, ref List<string> permutations) 
{ 
  If ( b.Length == a.Length { permutations.Add(b.toString()); return;} 
  For (int I = 0; I < a.Length ; i++) 
  { 
    If (used[i]) continue; 
     used[i] = true; 
     B += A[i]; 
     Permute(a, b, used, ref permutations); 
     B [B.Length – 1] = ‘/0’; 
     used[i] = false; 
  } 
} 
permutations.Sort();
Another way could be to sort the input string beforehand so as to eliminate the last step sort above.

No comments:

Post a Comment