Cluster computing

Monday, September 19, 2016

#algorithm discussion continued
Today we continue reading the flow networks. We were looking at maximum flows

Maximum flows can also be calculated using "push-relabel" method.

We reviewed relabel-to-front algorithm next. This algorithm improves the running time of the maximum flow problem by choosing the order of basic operations carefully and managing the network data structure efficiently. We saw what admissible edges are. If a vertex u is overflowing and (u,v) is an admissible edge, the push (u,v) applies and while it does not create any new admissible edge, it may cause the edge u,v to become admissible. The admissible edges also applies to relabel operation

The discharge operation proceeds like this :

While there is a neighbor to u, apply the following operations. Set v to the current neighbor. If v is nil, apply the relabel operation to u. Reset the current to the head of the new neighbor list. If v is not nil and the capacity of preflow from u to v is greater than zero and the height of u is one level greater than that of v, apply the push operation on (u,v). If neither relabel nor push occurs, skip to the next neighbor.

The Relabel to front algorithm proceeds on a flow network G with source s and sink t this way. We initialize the preflow in the network. We choose a set of all vertices in G excluding the source and sink and in arbiyrary order. For each vertex in this set, We set its current to head. We start with the head of the set L as u. While u is not nil, we peform the following operations. We remember the old height of u. Then we appy the diacharge method to u. If the height of u is greater than the old height, we move u to the front of list L. We choose the next u.

#codingexercise

Given a binary tree, defining a term “complete path sum” as, sum of values of nodes lying in a path from root to the leaf; Now given a value ‘k’, we have to find the k-light path and prune the binary tree i.e. prune/delete nodes for which complete path sum is greater than k.

node PruneKLight(Node root, int K, ref int sum)

{

if (root== null) return;

sum += root.data;

root.left= PruneKLight(root.left, K, ref sum);

root.right = PruneKLight(root.right, K, ref sum);

if (root.left == null && root.right == null)

{

if (sum > k ){

Sum -= root.data;

delete root;

root = null;

Return root;

}

sum-=root.data;

return root;

}

Given a BST, transform it into greater sum tree where each node contains sum of all nodes greater than that node.
void ToGreater(Node root, ref int sum)
{
if (root == null) return;
// reverse inorder
ToGreater(root.right, ref int sum);
sum += root.data;
root.data = sum-root.data;
ToGreater(root.left, ref int sum);
}

Given a BST, transform it into lesser sum tree where each node contains sum of all nodes lesser than that node.
void ToLesser(Node root, ref int sum)
{
if (root == null) return;
ToLesser(root.left, ref int sum);
Var val = root.data;
Root.data = sum;
sum += val;
ToLesser(root.right, ref int sum);
}

Sunday, September 18, 2016

#algorithm discussion continued
Today we continue reading the flow networks. We were looking at maximum flows

Maximum flows can also be calculated using "push-relabel" method.

Now we see neighbor lists. Edges in this algorithm are organized into "neighbor lists". Given a flow network G = (V,E), the neighbor list is simply a linked list of vertices v for which there may be a residual edge (u,v). The relabel to front algorithm cycles through each neighbor list in an arbitrary order that is fixed throughout the execution of the algorithm.

The neighboring vertices are the ones to which the excess flow at u is discharged through admissible edges. This discharge operation includes relabeling to cause edges leaving u to become admissible. As we know relabel operations are performed when push is not possible and this iteration happens for each vertex in the neighboring list. If the relabel happens, the neighboring list is reset and the currently evaluated neighbor points to the head. If the push happens, the current is not changed because this may be repeated. If neigher push nor relabel applies, the current is updated to the next neighbor.

In fact, one of the properties of the discharge operation is that when it calls push, the push applies to (u,v) and when it calls relabel, the relabel applies to (u,v). The relabel only applies when we can show that all the edges leaving u are inadmissible. This depends on the neighbors and their number. If the discharge starts and ends with the head of the list, all the edges are inadmissible. The list may not be completely traversed as the current is not always updated.

When the list is completely traversed, the calls, the procedure relabels u. This resets the pass and to pass over a neighbor, the edge(u,v) must be deemed admissible. Since the initialization and progress guarantee that each neighbor is evaluated, all the edges leaving u have been deemed inadmissible. At the termination of the pass, push operations do not create admissible edges and the only other operation relabel must have but a relabeled vertex has no entering admissible edges and u is not relabeled. Therefore at the end of the pass, all edges leaving u remain inadmissible.

#codingexercise

Given a BST find the maximum N elements in the tree

List<node> GetMaxN(node root, int N)

{

var inorder = ToInOrder(root);

assert (inorder.Count > N);

return inorder.GetRange(inOrder.Count - N, N).ToList();

}

Another way to do this would be to augment the tree data structure with the rank of the node. The rank informs how the number of elements in the right subtree of the current node. It stands for the number of nodes that have value more than the current. If the rank is N-1, then the current node inclusive and the right subtree are the number of nodes required.

void GetMaxN(node root, int N, ref List<Node> nodes)

{

if (!root) return;

GetMaxN(root.left, N, ref nodes);

if (root.data > N) { GetMaxN(root.right, N, ref nodes); }

else {

nodes.Add(root);

GetMaxN(root.right, N-1, ref nodes);

}

The same holds for finding the minimum N elements in the tree

List<node> GetMinN(node root, int N)

{

var inorder = ToInOrder(root);

assert (inorder.Count > N);

return inorder.GetRange(0, N).ToList();

}

Given a binary tree, defining a term “complete path sum” as, sum of values of nodes lying in a path from root to the leaf; Now given a value ‘k’, we have to find the k-heavy path and prune the binary tree i.e. prune/delete nodes for which complete path sum is smaller than k.

node PruneKHeavy(Node root, int K, ref int sum)
{
if (root== null) return;
sum += root.data;
root.left= PruneKHeavy(root.left, K, ref sum);
root.right = PruneKHeavy(root.right, K, ref sum);
if (root.left == null && root.right == null)
{
if (sum < k ){
Sum -= root.data;
delete root;
root = null;
Return root;
}
}
sum-=root.data;
return root;
}

Saturday, September 17, 2016

#algorithm discussion continued
Today we continue reading the flow networks. We were looking at maximum flows

Maximum flows can also be calculated using "push-relabel" method.

We reviewed relabel-to-front algorithm next.

This algorithm improves the running time of the maximum flow problem by choosing the order of basic operations carefully and managing the network data structure efficiently.

The relabel to front algorithm maintains a list of the vertices in the network. It starts from the front and scans the list repeatedly selecting an overflowing vertex u and then removing the excess at that vertex. The push and relabel operations happen as usual in the push-relabel-algorithm until u has no positive excess. On relabeling a vertex, it is moved to the front and the algorithm begins its scan again.

This algorithm works on admissible edges which are edges in the residual network through which flow can be pushed. In such case, cf(u,v) > 0 and h(u) = h(v) + 1 otherwise (u,v) is inadmissible

Another property can be stated this way: If a vertex u is overflowing and (u,v) is an admissible edge, the push (u,v) applies and while it does not create any new admissible edge, it may cause the edge u,v to become admissible.

The admissible edges also applies to relabel operation. If a vertex u is overflowing and there are no admissible edges leaving u, then relabel of the vertex u applies. After this operation, there is at least one admissible edge leaving u but there are no admissible edges entering u. Recall that an admissible edge is one where we can push flow and its not saturated yet. Also the relabel operation applies to a vertex that is overflowing and to an edge that is upwards or at same level and by definition, there must be one edge outbound from u so that the flow occurs and the height of u increases. Relabel gives u the greatest height allowed by the constraints on heights function. Initially there was no admissible edges leaving u and consequently no push possible and that was why the relabel occurred.After the relabel the height of u is adjusted to be one unit higher than the a minimum v and hence one outbound edge. At this time there are no admissible edges entering u because if it were the destination vertex v would be higher than one level over height of u before the relabel and we know already there are no residual edges between vertices whose heights differ by 1. Also relabel does not change the residual network. Therefore the property that there are at least one admissible edge leaving u but there are no admissible edges entering u.

#codingexercise

Given two rectangles, find whether they are overlapping or not?
static bool intersects(Rectangle r1, Rectangle r2)
{
bool isdisjoint = false;
// are they disjoint along x axis ?
if (r1.tl.x < r2.tl.x)
isdisjoint = r1.br.x < r2.tl.x;
else
isdisjoint = r2.br.x < r1.tl.x;
if (isdisjoint == true) return false;

// are they disjoint along y-axis ?
if (r1.br.y < r2.br.y)
isdisjoint = r1.tl.y < r2.br.y;
else
isdisjoint = r2.tl.y < r1.br.y;
return isdisjoint == false;
}
https://d.docs.live.net/d609fb70e39b65c8/AppInfrastructure.doc

Friday, September 16, 2016

#algorithm discussion continued
Today we continue reading the flow networks. We were looking at maximum flows

Maximum flows can also be calculated using "push-relabel" method.

We reviewed relabel-to-front algorithm next.

This algorithm improves the running time of the maximum flow problem by choosing the order of basic operations carefully and managing the network data structure efficiently.

#codingquestion
Count the number of possible decodings of a given sequence if leading, trailing and consecutive zeros are excluded
For example, 121 => ABA , AU, LA
if A= 1, B =2, .. Z = 26

int countDecoding(List<int> digits, int n)
{
if (n == 0 || n == 1)
   return 1;
int count = 0;
if (digits[n-1] > 0)
     count = countDecoding(digits, n-1);
if (digits[n-2] < 2 || (digits[n-2] == 2 && digits[n-1] < 7))
       count += countDecoding(digits, n-2);
return count;
}

void decode(List<int>digits, int n, ref stringbuilder s, ref List<string> results)
{
if (n == 0) {results.Add(s.reverse()); return;}
if (n == 1) { s += ToLetter(digits, n-1, 1); results.Add(s.reverse()); return; }
if (digits[n-1] > 0){
      s += ToLetter(digits, n-1, 1);
     count = countDecoding(digits, n-1);
      s.RemoveLast();
}
if (digits[n-2] < 2 || (digits[n-2] == 2 && digits[n-1] < 7)){
       s += ToLetter(digits, n-2, 2);
       count += countDecoding(digits, n-2);
       s.RemoveLast();
}
}

Yesterday we discussed stars and bars theorems. If we were to apply this to distribute say seven indistinguishable one dollar coins the distributions are essentially equivalent to tuples of three positive integers whose sum is 7. There fore there are binomial coefficient n-1 choose k-1 ways or 6 choose 2 = 6! / (2! 4!) = 15 ways.

Reverse every k nodes of a linked list
Void reverseK(node head, int k)
{
Int n = 0;
For( node cur = head; cur; cur=cur.next) n++;
Node cur = head;
Node prev =null;
Node root = null;
For(int i = 0; i < n/k+1; i++)
{
Node next=cur;
For(int j = 0; next && j <k-1; next=next.next, j++);
Node Temp = null;
If (next){
temp = next.next;
Next.next = null;
}
Reverse(ref cur);
Node head =cur;
If(!root) root = head;
While(cur.next)cur=cur.next;
If(!prev){
Prev = cur;
}else { prev.next=head; prev = cur;}
Cur=temp;
}
Return root;
}

Thursday, September 15, 2016

#algorithm discussion continued
Today we continue reading the flow networks. We were looking at maximum flows

Maximum flows can also be calculated using "push-relabel" method.

This method performs two basic operations: pushing flow excess from a vertex to one of its neighbors and relabeling a vertex.

We know that the push relabel method indeed terminates because there is a bound on the number of operations it performs. We bound separately each of the three types of operations - relabels, saturating pushes, and non-saturating pushes. With these bounds, the algorithm runs in O(V^2E) time

We now look at relabel-to-front algorithm.

This algorithm improves the running time of the maximum flow problem by choosing the order of basic operations carefully and managing the network data structure efficiently.

#coding exercise

Consider a row of n coins of values v1 . . . vn, where n is even. We play a game against an opponent by alternating turns. In each turn, a player selects either the first or last coin from the row, removes it from the row permanently, and receives the value of the coin. Determine the maximum possible amount of money we can definitely win if we move first.
If we take the example of
1. 8, 15, 3, 7 we know that the max value can be 15 + 7
but the strategy of the players cannot be to greedy. For example if player one chooses 8 the opponent chooses 15, player one chooses 7 and the opponent chooses 3 then no player is left with the maximum. Instead the players must use a strategy to minimize the profit for the other.
Given coins laid out from i to j, a player can collect
F(i,j) = max(Vi + min(F(i+1, j)) , Vj + min(F(i, j-1))
The recursive solution, taking only our turns into account, is therefore
F(i,j) = Max(Vi + max( F( i+2, j), F(i+1,j-1)) ,
Vj + max( F(i+1, j-1), F(i,j-2)))
Int GetCoins(List<int> coins, int i, int j)
{
int n = coins.count;
If (i >= j) return 0;
If (i==j) return coins[i];
If (j == i+1) return max(coins[i], coins[j]);
Return Max(coins[i] + max(GetCoins(coins, i+2, j), GetCoins(coins, i+1,j-1)) ,
Coins[j] + max(GetCoins(coins,i+1, j-1), GetCoins(coins,i,j-2)));
}
int GetBest(List<int> coins)
{
int n = coins.count;
int table[n, n];
int i, j, k;
for ( k = 0; k < n; k++)
{
for (i = 0; j = k; j < n; i++; j++)
{
int x = ((i+2) <= j) ? table[i+2, j] : 0;
int y = ((i+1) <= j-1) ? table[i+1, j-1] : 0;
int z = (i <= (j-2)) ? table[i, j-2]: 0;
table[i,j] = max(coins[i] + max(x,y), coins[j]+max(y,z));
}
}

return table[0, n-1];
}
#puzzle
How many ways can n objects put in k bins ?
The answer is the binomial coefficient n-1 choose k-1
We said we could view this as arranging stars and bars over stars where the bars are analogous to bins that must contain at least one object each and the stars are all lined up with at most k-1 bin partitions between them since there are n-1 gaps, this is equivalent to choosing k-1 from n-1
This is also the number of k tuples of positive integers that form a sum n
If the bins were allowed to be empty, the number changes. Here there can be more than one bin partitition or an empty bin between the stars. We just have arrange stars and bars between them. This is similar to choosing n positions for the star from n+k-1 positions.

This is also similar to choosing k-1 positions for the bar from n+k-1 positions
This is also equal to the number of multisets of cardinality k-1 taken from a set of size n+1.

Wednesday, September 14, 2016

#algorithm discussion continued
Today we continue reading the flow networks. We were looking at maximum flows

Maximum flows can also be calculated using "push-relabel" method.

This method performs two basic operations: pushing flow excess from a vertex to one of its neighbors and relabeling a vertex.
The generic-push-relabel method is as follows:
    Initialize-Preflow(G, s)
    while there exists an applicable push or relabel operation
              select an applicable push or relabel operation and perform it.
the generic push relabel algorithm guarantees maximum flow because
The intialization step creates a preflow.
The progress of the algorithm maintains the preflow.
At termination, there are no overflowing vertices and this preflow is a valid flow and there is no path in the residual network. Consequently, the flow is maximum.
We know that the push relabel method indeed terminates because there is a bound on the number of operations it performs. We bound separately each of the three types of operations - relabels, saturating pushes, and non-saturating pushes. With these bounds, the algorithm runs in O(V^2E) time
We use the following lemmas to calculate the bound:
For any overflowing vertex x, there is a simple path from x to source s in the residual network Gf. We prove this by contradiction. If the source is not in the paths to x, then there are two partitions of the overall vertices, one with the source and no path to x and the other opposite. The excess at the vertex x is the sum of the flow outbound - sum of the flow inbound which applies to both partitions. The excess can therefore be calculated by reordering the flows to read as the sum of the flow outbound from both partitions - sum of the flow inbound to both partitions. Since the excess is positive, the sum of the flow outbound has to be greater than zero which implies that there must be residual edge and consequently a simple path from x to that edge's destination vertex, thus contradicting the definition of the partition we made.
The next lemma is also used to calculate the bound.
At any time during the execution of the generic-push-relabel method, we have the height of a source vertex as less than two times the total number of vertices in the network V minus one.
This is easy to prove because the source and destination heights are set and do not change. Moreover they are both less than quoted number in the lemma. When a vertex is relabeled, it is overflowing and has a simple path from that vertex to source in the original network. This puts its height to range from 0 to V minus 1. But for each of those values we have a residual edge that is one level down and adding the height of the vertices we get 2 times V minus one.
The lemma bounds the height of the vertices. Its corollary bounds the number of relabel operations performed in total. The relabel operation is V minus two times the maximum number of relabels per vertex which is given by the lemma as 2 times V minus one. The proof is straightforward because each relabel increases the height of a vertex and the height is bounded for every vertex.
The other operation is the saturating pushes. There is a lemma to bound the saturating pushes to be less than two times the product V E. This is easy to see because if there are any saturating pushes between vertices u and v, at least one direction is an edge in E and if a push has occurred another push cannot happen until the height of u increases by two as the flow is one level down. Also the max height is bounded as two times V minus one. So the height has to be less than two times V and considering this for all edges, we have the quoted product in the lemma.

Another lemma bounds the number of non-saturating pushes as less than four times V^2 times the sum of vertex and edge counts taken together. While the relabel and saturating pushes increase the heights, the non-saturating pushes decreases it by at least one. We use their respective bounds to set the bound on non-saturating pushes.
Let us assume a potential function pi which is the sum of the heights of all vertices that have an excess flow.
During the relabel operation of a vertex i, its height cannot increase more than the max height which is two times V minus one. During the saturating push, the heights dont change and only the vertex v with max height 2 times v mimus one can become overflowing.
In both cases, the increase in potential is less than two times V.
The relabel happens at most 2 V^2 times and the saturating push happens at most 2 (VE) times. The total increase in potential is less than 4V^2(V+E).
u was overflowing before the non-saturating push and v may or may not have been overflowing. After the push, u is no longer overflowing and unless v is the source, it may or may not be overflowing. The potential has decreased by u.h and it has increased by 0 or v.h. Since u.h = v.h + 1, this potential decreases by at least one for a non-saturating push
Since potential is greater than 0, the total amount of decrease or the total number of saturating pushes must be less than 4V^2(V+E).
The bounds of each operation taken individually bounds the number of iterations of the generic-push-relabel algorithm.

#codingexercise
Bool ismorethan(node tree1, node tree2)
{
if (tree1 == null && tree2 == null) return true;
if (tree1 == null || tree2 == null) return false:
if(tree1.data <= tree2.data) return false;
Return ismorethan(tree1.left, tree2.left) && ismorethan(tree1.right, tree2.right);
}

#puzzle
How many k tuples of non-negative integer are there that sum upto n if we are given n and k as positive integers.

The answer is the binomial coefficient n+k-1 choose k-1. This is the same as the other binomial coefficient n+k-1 choose n

Tuesday, September 13, 2016

Today we cover eigenvectors graph centrality. This is a measure of closeness centrality but it is not based on the sum of the distances from nodes to all others because that can be misleading. For example, a node A that is close to small and a fairly closed group within a large graph may be distant from the rest of the graph. Another node B may be at moderate distance from the rest of the graph. Yet the farness measures for node A and node B may be quite similar in magnitude. However node B is really more central than A.
This method finds the more central nodes in terms of the overall structure of the graph and pays less attention to the influences of local sub-graphs. This approach identifies "dimensions" of the distances between the nodes. The first dimension captures the global aspects of distances among nodes and the subsequent dimensions capture more specific and local sub-structures.
In a way the eigenvector centrality is a measure of the centrality of the vertex based on how important its neighbors are. We will try to get a firm understanding of eigenvectors.
An eigenvector v of a matrix B is a non-zero vector that does not rotate when B is applied to it except perhaps to point it in precisely the opposite direction. v may change length or reverse its direction but it won't turn sideways. In other words there is some scalar constant lambda such that Bv = lambda v and this lambda is the eigen value. If you scale an eigenvector, it is still an eigenvector.
Iterating methods apply B to a vector over and over again. In such cases if the eigenvalue is less than one the application of B to v that many times results in smaller and smaller v to the point that the product vanishes. If the eigenvalue is greater than one, the product will grow to infinity.
These eigenvectors are found by the definition det ( B - lambda I ) = 0.
As an example, we have matrix B as
B = 2 -4
       -1 -1
det (B-lambda I) = 2-lambda -4
                                -1    --1-lambda
                             = (2-lambda) (-1-lambda) - (-4)(-1)
                              = lambda ^ 2 - lambda - 6
                              = (lambda -3)(lambda+2)
This gives eigen values 3 and -2 and we find the eigenvector v by solving (B-lambda.I)v = 0
For lambda = 3, we get -v1- 4v2 = 0.
Thus -4
           1
is an eigenvector.
If B is symmetric, then there exists a set of n linearly independent eigenvectors of B, denoted by v1, v2, ... vn. This set is not unique because each eigenvector can be scaled. Each eigenvector has a corresponding eigenvalue that is uniquely defined for matrix B.
Eigenvectors are useful because any vector can be considered a sum of other vectors whose behavior is understood. Any n-dimensional vector can be expressed as a linear combination of these vectors while retaining the effect of B on each eigenvector separately.
Having understood eigenvectors, it is now straightforward to find eigenvector centrality. because we know that the graph is represented as an adjacency matrix B and Bx= lambda x.
There are many eigenvectors possible but we find the one with the largest eigenvalue based on Perrona Frobenius theorem which states that there is a unique and positive solution if lambda is the largest eigenvalue associated with the eigenvector of the adjacency matrix B.
NetworkX implements a method to find this:
>>> G = nx.path_graph(4)
>>> centrality = nx.eigenvector_centrality_numpy(G)
>>> print(['%s %0.2f'%(node,centrality[node]) for node in centrality])
['0 0.37', '1 0.60', '2 0.60', '3 0.37']

It uses SciPy sparse eigenvalue solver to find the largest eigenvalue/eigenvector pair. The eigenvector centrality is based on the normalization of the largest eigen vector.
Courtesy : introduction to social network methods by Hanneman and Riddle
Introduction to conjugate gradients by Shewchuk

#puzzle
Yesterday the question was how many ways can n elements be divided into k groups and the answer was the (n+k-1)!/(n!(k-1)!)
Today we want to find the number of k tuples of positive integers whose sum is n. The answer is the binomial coefficient (n-1) Choose (k-1)

This is the same as k-1 element subsets of a set with n-1 elements.