Cluster computing

Monday, March 14, 2016

We continue reading the paper "Erasure coding in Windows Azure storage" by Cheng Huang, Huseyin Simitci, Yikang Xu, Aaron Ogus, Brad Calder, Parikshit Gopalan, Jin Li, and Sergey Yekhanin.
Erasure coding is a data protection method where data is broken into fragments, expanded and encoded with redundant data pieces and stored across a different set of locations or media. This paper talks about Erasure coding in windows Azure storage WAS. We were discussing the design decisions such as why Reed-Solomon code was not sufficient and why the paper introduced a new one. We also saw how the Local Reconstruction code introduces group fragment parity and improves reconstruction cost. This is with some storage overhead but provides more options with a new trade-off point.
We now look at Fault Tolerance. We saw how data fragments are used to compute each parity in LRC. We now see how the parities are computed from the data fragments. These are called coding equations.They have maximally recoverable property. By this property, LRC can decode any failure pattern which is theoretically decodeable. For example a 6,2,2 LRC contains four parity fragments and can tolerate upto four failures but these failures cannot be arbitrary. If the failures are on x1, x2, x3, and px, then this failure pattern is non-decodeable. There are only two parities which are global and they can only help to decode three missing data fragments and py is not available. It is impossible to decode three data fragments from merely two parity fragments. Such failure patterns are called information-theoretically non-decodeable

#codejam problem
We were discussing sums generated from number sequences.
This has a dynamic programming approach as well
The range of sums that can be generated
In general, if the range of sums that can be generated using the first k numbers in the result is from 0 to r, and the next number has value x, then either:
• x <= r+1 and it is possible to generate any sum from 0 to r+x using the first k+1 numbers, or
• x > r + 1 and it is not possible to generate the sum r + 1, since x and all the following numbers have greater values.

MaxSumImpossible(Numbers, I, j)
If len(Numbers) == 1
Return Numbers[0]
m[I,j] = 0
For k = 0 to j-I-1
if (jth coin <= MaxSumImpossible(Numbers Choose K, I,j-1)
q = MaxSumImpossible(Numbers Choose K, I,j-1) + MaxSumImpossible(Numbers with jth Number only, j,j)
If q > m[I,j]
M[I,j] = q

return m[I,j]

Sunday, March 13, 2016

We continue reading the paper "Erasure coding in Windows Azure storage" by Cheng Huang, Huseyin Simitci, Yikang Xu, Aaron Ogus, Brad Calder, Parikshit Gopalan, Jin Li, and Sergey Yekhanin.
Erasure coding is a data protection method where data is broken into fragments, expanded and encoded with redundant data pieces and stored across a different set of locations or media. This paper talks about Erasure coding in windows Azure storage WAS. We were discussing the design decisions such as why Reed-Solomon code was not sufficient and why the paper introduced a new one. Today we continue to read Local Reconstruction Codes.
The Reed-Solomon code explains the concept of a reconstruction cost. In this method, each parity is computed from all of the data fragments.Therefore a (k,n) code will require all k fragments for parity. When any fragment goes down, the number of fragments required to reconstruct is k. This is the reconstruction cost.
In LRC, this is reduced by reducing the number of fragments used to compute a parity. A subset of data fragments is used to compute some of the parities. In a (6,3) code, LRC generates 4 instead of 3 parities where the first two parities are global parities and are computed from all the data fragments. The other two parities the LRC divides the data fragments into two equal size groups and computes one local parity for each group. Instead of reading a global parity and all of the remaining data fragments, it is now more efficient to read the parity from the group and any two data fragments in that group to reconstruct the missing parity. The reconstruction of any single data fragment requires only three fragments, half the number required by the Reed Solomon Code. This is true for any single data fragment. The addition of a data parity block could be called a storage overhead but this is an altogether different tradeoff point from earlier.
To generalize the subgroups and the extra parity needed, a (k,l,r) LRC divides k data fragments into l groups, with k/l data fragments in each group. It computes one local parity against each group. In addition, it computes r global parities from all the data fragments. The reconstruction cost reduces from n = k to n = k/l with a storage overhead of (k+l+r)/k
#codingexercise
we discussed the codejam problem yesterday
we mentioned a solution as
List<int>GetOriginalSet(List<int> a)
{
var ret = new List<int>();
for(int i = 0; a.Count > 0 && i<a[a.Count-1]; i++)
{
if (ret.Combinations().Sums().Contains(i) == false && a.Contains(i)){
ret.Add(i);
}
}
return ret;
}

Saturday, March 12, 2016

Today we continue reading the paper "Erasure coding in Windows Azure storage" by Cheng Huang, Huseyin Simitci, Yikang Xu, Aaron Ogus, Brad Calder, Parikshit Gopalan, Jin Li, and Sergey Yekhanin.
Erasure coding is a data protection method where data is broken into fragments, expanded and encoded with redundant data pieces and stored across a different set of locations or media. This paper talks about Erasure coding in windows Azure storage WAS. We saw that there are huge cost savings with this method because it saves three copies and uses less storage overhead. The tradeoff for using erasure coding instead of keeping three full copies is performance. The performance hit comes when dealing with a lost or offline data fragment and with hot storage nodes. In Erasure encoding, data is broken into k fragments and a set of parity fragments. In WAS, a data fragment may be lost due to a disk, node or rack failure. In addition, cloudservices may frequently be upgraded. This may cause data fragments to become offline for seconds to a few minutes. An online read would then need to be served by dynamically reconstructing the data being asked for to return the data to the client. This reconstruction needs to be optimized to be as fast as possible and use as little networking bandwidth and I/Os as possible and to meet the SLAs required. If the read is for a fragment that is on a specific storage node, it can become hot that leads to latency. One way to reduce this would be to age the fragments appropriately and tier the data between memory and external storage.Another way would be to load balance. But until the fragment becomes hot, read performance can suffer. Instead if the reconstruction could be done in parallel while the fragment was being fetched from storage node, then the one that returns earlier can be returned
The problem is that the reconstruction is only as fast as the slowest storage node to respond to reading. We also have to keep storage costs low as well. We could do this with Reed-Solomon codes that keeps 12 data fragments and 4 code fragments. However reconstruction would require all of those 12 data fragments and this will incur higher network costs and I/Os as well as the chances of hitting a hot node.
Therefore a new family of codes was required. This had the following design requirements: First, reduce the minimal number of fragments that need to be read from to reconstruct a data fragment.
Second provide significant reduction in storage overhead while maintaining higher durability than a system that keeps 3 copies of the data.
The authors call this the Local Reconstruction Codes.
#code jam problem
Given a set of integers, a powerset is a combination of all such elements including the null and the set itself
To generate the powerset of the elements we can use the combine method where we combine different numbers of elements. We just have to include the null set and the set itself.
Now we are given a powerset with the sums of each individual set in sorted order. We are also given the frequencies of the sum. Can we determine the original elements ?
For example if we have a powerset with sums
0,1,2,3,4,5,6,7, and frequencies
1,1,1,1,1,1,1,1,
will result in a set of 1,2,4
Solution:
The result set is also non-decreasing like the original power set sum
It has a few zeroes in the beginning both from null set as well as non empty set and this can be found from the first sum
And for each sum given, we can expand the sums as an element for occurrences that we can determine from the frequency
Next with the help of a running lists of sums computed so far and initialized by 1 at the position given by the current element, we add the value of the sum to a new list at the current location as well as the location offset element a. Since we are adding elements incrementally by 1 and including it in the result only when the sums tally, we get a final list of numbers found by incrementing 1s.

Friday, March 11, 2016

Today we start reading the paper "Erasure coding in Windows Azure storage" by Cheng Huang, Huseyin Simitci, Yikang Xu, Aaron Ogus, Brad Calder, Parikshit Gopalan, Jin Li, and Sergey Yekhanin.

Erasure coding is a data protection method where data is broken into fragments, expanded and encoded with redundant data pieces and stored across a different set of locations or media. This paper talks about Erasure coding in windows Azure storage WAS where customers can store unlimited data for any amounts of time. They only pay for what they use. Erasure codes help with the durability of the data and to keep the costs low. This paper introduces a new set of codes for Erasure coding called Local reconstruction codes (LRC). LRC reduces the number of Erasure coding needed to reconstruct the data fragments that are offline. It also reduces the storage overhead involved. LRC reduces the bandwidth and io required for repair reads over prior codes. This together with the low overhead is the main benefit of LRC LRC also demonstrates consistently low read latencies.
WAS uses LRC. WAS stores all of its data in an append only distributed file systemcalled the stream layer. Data is appended on to the end of active extents which are replicated three times by the underlying stream layer. These three copies remain till the extent gets filled. Once the extents are full, they are sealed and are never modified again. A background process lazily generates Erasure codes for the sealed extents and deletes the three copies which are no longer needed.
Erasure codes can reduce the cost of storage by 50% and given the size of cloud storage is in the order of exabyte, this is a significant saving.
#code jam question
Given two ominous find the minimum number of rearrangement needed to transform one omino into another.
Ignore rotations and translations of both oninoes. Oninoes are groups of unit squares joined together on one or more edges.

Solution since the two ominoes are properly oriented and centered for comparision.
We proceed row wise parity from bottom to top. If the number of unit squares in the bottom row are short for being equal, then more is borrowed from the row above or returned otherwise. When the numbers are equal, the sequence has to be the same.

Thursday, March 10, 2016

Today we continue reading the paper "Horton: Online Query Execution Engine For Large Distributed Graphs" by Sarwat, Elnikety, He and Kliot.
This paper talks about large graphs that don't fit on a single server. The paper introduces Horton a graph query execution engine. We were reviewing its architecture.
It has four components - the graph client library, the graph coordinator, graph partitions and the graph manager.We saw that the implementation was in C# using Codebook graphs. This is a social network application that represents software engineers, software components and their interactions in large software projects. Codebook manages information about source code with its revisions, documentation and the organizational structure of developers. It can answer queries such as "Who wrote this function?" "what libraries depend on this library", "Whom should I contact to fix this bug"
We review the query processing phases. For example,
"Who wrote the specification for the MethodSquare code ?" translates to query such as
C:\ Horton -query *(Code MethodSquare) MentionedBy WordDocument AuthoredBy Person*
and "Who is the manager of the person who closed or created work item bug #673 ?" translates to
C:\Horton -query "Person Manages Person (Closed | Created) (WorkItemRevision #673)"
The finite state machine for the latter query makes the following changes:
Node_Type = person
Edge_Type = Manages
Node_Type = person
Edge_Type = closed Edge_Type = created
Node_Type = Node_Type =
WorkItemRevision Node_ID = #673 WorkItemRevision Node_ID = #673
The above finite state machine has a disjunction for closed and created cases.
If the query were to be optimized a '-query' will be added.
The system reports the execution time of each query. The query result is in the form of graph paths ( a sequence of graph nodes).
#codejam
Given two orientations of the same ominoes as in the earlier posts, determine whether the clockwise or the anticlockwise is the shorter transition from one to the other.
int countl = 0;
for (int i = 0; i < 360; i+=90)
var temp = rotate_matrix(matrix, i);
countl += 1;
if (compareTo(matrix, temp) == 0)
return count;
countr = 0;
for (int i = 0; i < 360; i+=90)
var temp = rotate_matrix(matrix, 0-i);
countr += 1;
if (compareTo(matrix, temp) == 0)
return count;
if (countl < countr) return -1;
if (countl > countr) return +1;
if (countl == countr && countl == 0) return -1;
return 0;

Wednesday, March 9, 2016

Today we continue reading the paper "Horton: Online Query Execution Engine For Large Distributed Graphs" by Sarwat, Elnikety, He and Kliot.
This paper talks about large graphs that don't fit on a single server. The paper introduces Horton a graph query execution engine. We were reviewing its architecture.
It has four components - the graph client library, the graph coordinator, graph partitions and the graph manager. We discussed the first three components. We now look at the Graph Manager. This provides an administrative interface to manage the graph such as loading a partitioned graph or adding and removing servers. It also assigns partitions to the servers by taking the output of partitioning algorithm and placing nodes in the right partitions. Partitioning algorithms are external because they are not online. Any algorithm can be used such as a simple hashing that's based on node and edge attributes. If the partitions don't suffice in keeping graph access local, then the partitioning algorithm can be changed.
Horton is written in C# using the .Net framework and the asynchronous communication is implemented using sockets and task parallel library.Horton is built on top of Orleans which is a distributed runtime for cloud applications.
Some sample queries on the graph for Codebook using Horton include the following:
Who wrote this function ?
What libraries depend on this library ?
Whom should I contact to fix this bug ?
Adhoc queries can also be executed showing the flexibility of the query language and the effectiveness of the execution engine.
#codejam problem
We discussed compareTo between ominoes in the previous post that takes different orientations and then compares. Tanslations and different positions in the matrix can also make the ominoes look different. Implement compare for ominoes with same orientation

int compare(int[,] matrix1, int[,] matrix2, int rows, int columns)
{
int k = find_top_left_corner(matrix1);
int tr1 = k /columns;
int tc1 = k % columns;
int h = find_bottom_right_corner(matrix1);
int br1 = h/columns;
int bc1 = h%columns;

k = find_top_left_corner(matrix1);
int tr2 = k /columns;
int tc2 = k % columns;
h = find_bottom_right_corner(matrix1);
int br2 = h/columns;
int bc2 = h%columns;

for (int i = 0; i <= br1-tr1; i++)
for (int j = 0; j <= bc1-tc1; j++)
if (matrix1[br1+i,bc1+j] != matrix [br2+i, bc2+j])
return matrix1[br1+i,bc1+j] - matrix [br2+i, bc2+j];

return 0;
}

In comparing the sequences of 1 and 0 for the portion containing the arrangementsame in the two matrices, we pick the diagonal corners of the bounding rectangle. To determine this we pick the
Min pair (r, c) and max pair (r,c) from all those having value 1
Int r = k;
Int c = k;
For (int I = 0; I < k; i++)
For (int j =0; j < k; j++)
If ( matrix1[i, j] == 1 ){
If (I < r) r = i;
If (j < c) c = j;
}

Tuesday, March 8, 2016

Today we continue reading the paper "Horton: Online Query Execution Engine For Large Distributed Graphs" by Sarwat, Elnikety, He and Kliot.
This paper talks about large graphs that don't fit on a single server. The paper introduces Horton a graph query execution engine. We were reviewing its architecture.
It has four components - the graph client library, the graph coordinator, graph partitions and the graph manager. The graph client library sends queries to the graph coordinator which prepares an execution plan for the query. The graph partition manages a set of graph nodes and edges. Horton is able to scale out mainly because of graph partitions. The graph manager provides an administrative interface to manage the graph with chores like loading and adding and removing servers.
We review these in detail.
The graph coordinator receives the query from the client library, validates it and parses it, optimizes a query plan and translates it into a finite state machine which is sent to the partitions for parallel execution. The validation involves checks against the nodes and edge predicates. The optimization involves taking a query in a parsed regular expression form, enumerating the different execution plans and choosing the one with the lowest cost. It uses a dynamic programming framework to enumerate the execution plans and to choose the one with the lowest cost. One example of optimization is to order the predicates. A good choice of predicate order can save costs that are orders of magnitude different from each other. If n is the number of predicates in the query, this method finds a suitable choice in O(n^3). After the query is optimized, the query plan is translated into a finite state machine The state machine not only expresses the execution efficiently but also packages it in a way that can be easily sent to local graph partitions for execution by their local engines. The communication to send state machine and get results is done in an asynchronous manner which allows for the results to be directly streamed from a graph partition machine to the client without involving the graph co-ordinator.
#codingquestion
In the previous examples we used NChooseK as a recursive function to find K from N. If we were to partition the data, we could rewrite this

void NChooseKPartitioned(int N, int K, ref List<int> array, int start, int count)
{
m = N/2;
n = K/2;
return NChooseK(m, n, ref List<int> array, start, count) + NChooseK(N-m, K-n, ref List<int> array, start, count);
}

To compare if two ominous are the same by change of orientation or position
Int compareTo (int [,] matrix1, int [,] matrix2)
{
For (int i=0; I < 360; I += 90)
{
var rotated = rotate (matrix1, i);
If (compare (matrix2, rotated) == 0)
Return 0;
}
Return count-of-1 (matrix1) - count-of-1 (matrix2);
}