Cluster computing

Saturday, March 12, 2016

Today we continue reading the paper "Erasure coding in Windows Azure storage" by Cheng Huang, Huseyin Simitci, Yikang Xu, Aaron Ogus, Brad Calder, Parikshit Gopalan, Jin Li, and Sergey Yekhanin.
Erasure coding is a data protection method where data is broken into fragments, expanded and encoded with redundant data pieces and stored across a different set of locations or media. This paper talks about Erasure coding in windows Azure storage WAS. We saw that there are huge cost savings with this method because it saves three copies and uses less storage overhead. The tradeoff for using erasure coding instead of keeping three full copies is performance. The performance hit comes when dealing with a lost or offline data fragment and with hot storage nodes. In Erasure encoding, data is broken into k fragments and a set of parity fragments. In WAS, a data fragment may be lost due to a disk, node or rack failure. In addition, cloudservices may frequently be upgraded. This may cause data fragments to become offline for seconds to a few minutes. An online read would then need to be served by dynamically reconstructing the data being asked for to return the data to the client. This reconstruction needs to be optimized to be as fast as possible and use as little networking bandwidth and I/Os as possible and to meet the SLAs required. If the read is for a fragment that is on a specific storage node, it can become hot that leads to latency. One way to reduce this would be to age the fragments appropriately and tier the data between memory and external storage.Another way would be to load balance. But until the fragment becomes hot, read performance can suffer. Instead if the reconstruction could be done in parallel while the fragment was being fetched from storage node, then the one that returns earlier can be returned
The problem is that the reconstruction is only as fast as the slowest storage node to respond to reading. We also have to keep storage costs low as well. We could do this with Reed-Solomon codes that keeps 12 data fragments and 4 code fragments. However reconstruction would require all of those 12 data fragments and this will incur higher network costs and I/Os as well as the chances of hitting a hot node.
Therefore a new family of codes was required. This had the following design requirements: First, reduce the minimal number of fragments that need to be read from to reconstruct a data fragment.
Second provide significant reduction in storage overhead while maintaining higher durability than a system that keeps 3 copies of the data.
The authors call this the Local Reconstruction Codes.
#code jam problem
Given a set of integers, a powerset is a combination of all such elements including the null and the set itself
To generate the powerset of the elements we can use the combine method where we combine different numbers of elements. We just have to include the null set and the set itself.
Now we are given a powerset with the sums of each individual set in sorted order. We are also given the frequencies of the sum. Can we determine the original elements ?
For example if we have a powerset with sums
0,1,2,3,4,5,6,7, and frequencies
1,1,1,1,1,1,1,1,
will result in a set of 1,2,4
Solution:
The result set is also non-decreasing like the original power set sum
It has a few zeroes in the beginning both from null set as well as non empty set and this can be found from the first sum
And for each sum given, we can expand the sums as an element for occurrences that we can determine from the frequency
Next with the help of a running lists of sums computed so far and initialized by 1 at the position given by the current element, we add the value of the sum to a new list at the current location as well as the location offset element a. Since we are adding elements incrementally by 1 and including it in the result only when the sums tally, we get a final list of numbers found by incrementing 1s.

Cluster computing

Saturday, March 12, 2016

No comments:

Post a Comment