Cluster computing

Friday, September 16, 2016

#algorithm discussion continued
Today we continue reading the flow networks. We were looking at maximum flows

Maximum flows can also be calculated using "push-relabel" method.

We reviewed relabel-to-front algorithm next.

This algorithm improves the running time of the maximum flow problem by choosing the order of basic operations carefully and managing the network data structure efficiently.

The relabel to front algorithm maintains a list of the vertices in the network. It starts from the front and scans the list repeatedly selecting an overflowing vertex u and then removing the excess at that vertex. The push and relabel operations happen as usual in the push-relabel-algorithm until u has no positive excess. On relabeling a vertex, it is moved to the front and the algorithm begins its scan again.
This algorithm works on admissible edges which are edges in the residual network through which flow can be pushed. In such case, cf(u,v) > 0 and h(u) = h(v) + 1 otherwise (u,v) is inadmissible. The admissible network consists of those edges through which we can push flow. It turns out that such as network is directed acyclic graph because if there were a cycle the heights for the vertices should remain unchanged but it increases around the cycle and this is a contradiction.

Another property can be stated this way: If a vertex u is overflowing and (u,v) is an admissible edge, the push (u,v) applies and while it does not create any new admissible edge, it may caused the edge u,v to become admissible. This follows from the fact that when this operation is applied, the only new residual edge is the reverse edge but v is lower than u by one unit so this reverse edge becomes inadmissible. If the operation is a saturating push, then the capacity for preflow between u and v becomes zero and forward edge becomes inadmissible.

#codingquestion
Count the number of possible decodings of a given sequence if leading, trailing and consecutive zeros are excluded
For example, 121 => ABA , AU, LA
if A= 1, B =2, .. Z = 26

int countDecoding(List<int> digits, int n)
{
if (n == 0 || n == 1)
   return 1;
int count = 0;
if (digits[n-1] > 0)
     count = countDecoding(digits, n-1);
if (digits[n-2] < 2 || (digits[n-2] == 2 && digits[n-1] < 7))
       count += countDecoding(digits, n-2);
return count;
}

void decode(List<int>digits, int n, ref stringbuilder s, ref List<string> results)
{
if (n == 0) {results.Add(s.reverse()); return;}
if (n == 1) { s += ToLetter(digits, n-1, 1); results.Add(s.reverse()); return; }
if (digits[n-1] > 0){
      s += ToLetter(digits, n-1, 1);
     count = countDecoding(digits, n-1);
      s.RemoveLast();
}
if (digits[n-2] < 2 || (digits[n-2] == 2 && digits[n-1] < 7)){
       s += ToLetter(digits, n-2, 2);
       count += countDecoding(digits, n-2);
       s.RemoveLast();
}
}

Yesterday we discussed stars and bars theorems. If we were to apply this to distribute say seven indistinguishable one dollar coins the distributions are essentially equivalent to tuples of three positive integers whose sum is 7. There fore there are binomial coefficient n-1 choose k-1 ways or 6 choose 2 = 6! / (2! 4!) = 15 ways.

Reverse every k nodes of a linked list
Void reverseK(node head, int k)
{
Int n = 0;
For( node cur = head; cur; cur=cur.next) n++;
Node cur = head;
Node prev =null;
Node root = null;
For(int i = 0; i < n/k+1; i++)
{
Node next=cur;
For(int j = 0; next && j <k-1; next=next.next, j++);
Node Temp = null;
If (next){
temp = next.next;
Next.next = null;
}
Reverse(ref cur);
Node head =cur;
If(!root) root = head;
While(cur.next)cur=cur.next;
If(!prev){
Prev = cur;
}else { prev.next=head; prev = cur;}
Cur=temp;
}
Return root;
}

Thursday, September 15, 2016

#algorithm discussion continued
Today we continue reading the flow networks. We were looking at maximum flows

Maximum flows can also be calculated using "push-relabel" method.

This method performs two basic operations: pushing flow excess from a vertex to one of its neighbors and relabeling a vertex.

We know that the push relabel method indeed terminates because there is a bound on the number of operations it performs. We bound separately each of the three types of operations - relabels, saturating pushes, and non-saturating pushes. With these bounds, the algorithm runs in O(V^2E) time

We now look at relabel-to-front algorithm.

This algorithm improves the running time of the maximum flow problem by choosing the order of basic operations carefully and managing the network data structure efficiently.

#coding exercise

Consider a row of n coins of values v1 . . . vn, where n is even. We play a game against an opponent by alternating turns. In each turn, a player selects either the first or last coin from the row, removes it from the row permanently, and receives the value of the coin. Determine the maximum possible amount of money we can definitely win if we move first.
If we take the example of
1. 8, 15, 3, 7 we know that the max value can be 15 + 7
but the strategy of the players cannot be to greedy. For example if player one chooses 8 the opponent chooses 15, player one chooses 7 and the opponent chooses 3 then no player is left with the maximum. Instead the players must use a strategy to minimize the profit for the other.
Given coins laid out from i to j, a player can collect
F(i,j) = max(Vi + min(F(i+1, j)) , Vj + min(F(i, j-1))
The recursive solution, taking only our turns into account, is therefore
F(i,j) = Max(Vi + max( F( i+2, j), F(i+1,j-1)) ,
Vj + max( F(i+1, j-1), F(i,j-2)))
Int GetCoins(List<int> coins, int i, int j)
{
int n = coins.count;
If (i >= j) return 0;
If (i==j) return coins[i];
If (j == i+1) return max(coins[i], coins[j]);
Return Max(coins[i] + max(GetCoins(coins, i+2, j), GetCoins(coins, i+1,j-1)) ,
Coins[j] + max(GetCoins(coins,i+1, j-1), GetCoins(coins,i,j-2)));
}
int GetBest(List<int> coins)
{
int n = coins.count;
int table[n, n];
int i, j, k;
for ( k = 0; k < n; k++)
{
for (i = 0; j = k; j < n; i++; j++)
{
int x = ((i+2) <= j) ? table[i+2, j] : 0;
int y = ((i+1) <= j-1) ? table[i+1, j-1] : 0;
int z = (i <= (j-2)) ? table[i, j-2]: 0;
table[i,j] = max(coins[i] + max(x,y), coins[j]+max(y,z));
}
}

return table[0, n-1];
}
#puzzle
How many ways can n objects put in k bins ?
The answer is the binomial coefficient n-1 choose k-1
We said we could view this as arranging stars and bars over stars where the bars are analogous to bins that must contain at least one object each and the stars are all lined up with at most k-1 bin partitions between them since there are n-1 gaps, this is equivalent to choosing k-1 from n-1
This is also the number of k tuples of positive integers that form a sum n
If the bins were allowed to be empty, the number changes. Here there can be more than one bin partitition or an empty bin between the stars. We just have arrange stars and bars between them. This is similar to choosing n positions for the star from n+k-1 positions.

This is also similar to choosing k-1 positions for the bar from n+k-1 positions
This is also equal to the number of multisets of cardinality k-1 taken from a set of size n+1.

Wednesday, September 14, 2016

#algorithm discussion continued
Today we continue reading the flow networks. We were looking at maximum flows

Maximum flows can also be calculated using "push-relabel" method.

This method performs two basic operations: pushing flow excess from a vertex to one of its neighbors and relabeling a vertex.
The generic-push-relabel method is as follows:
    Initialize-Preflow(G, s)
    while there exists an applicable push or relabel operation
              select an applicable push or relabel operation and perform it.
the generic push relabel algorithm guarantees maximum flow because
The intialization step creates a preflow.
The progress of the algorithm maintains the preflow.
At termination, there are no overflowing vertices and this preflow is a valid flow and there is no path in the residual network. Consequently, the flow is maximum.
We know that the push relabel method indeed terminates because there is a bound on the number of operations it performs. We bound separately each of the three types of operations - relabels, saturating pushes, and non-saturating pushes. With these bounds, the algorithm runs in O(V^2E) time
We use the following lemmas to calculate the bound:
For any overflowing vertex x, there is a simple path from x to source s in the residual network Gf. We prove this by contradiction. If the source is not in the paths to x, then there are two partitions of the overall vertices, one with the source and no path to x and the other opposite. The excess at the vertex x is the sum of the flow outbound - sum of the flow inbound which applies to both partitions. The excess can therefore be calculated by reordering the flows to read as the sum of the flow outbound from both partitions - sum of the flow inbound to both partitions. Since the excess is positive, the sum of the flow outbound has to be greater than zero which implies that there must be residual edge and consequently a simple path from x to that edge's destination vertex, thus contradicting the definition of the partition we made.
The next lemma is also used to calculate the bound.
At any time during the execution of the generic-push-relabel method, we have the height of a source vertex as less than two times the total number of vertices in the network V minus one.
This is easy to prove because the source and destination heights are set and do not change. Moreover they are both less than quoted number in the lemma. When a vertex is relabeled, it is overflowing and has a simple path from that vertex to source in the original network. This puts its height to range from 0 to V minus 1. But for each of those values we have a residual edge that is one level down and adding the height of the vertices we get 2 times V minus one.
The lemma bounds the height of the vertices. Its corollary bounds the number of relabel operations performed in total. The relabel operation is V minus two times the maximum number of relabels per vertex which is given by the lemma as 2 times V minus one. The proof is straightforward because each relabel increases the height of a vertex and the height is bounded for every vertex.
The other operation is the saturating pushes. There is a lemma to bound the saturating pushes to be less than two times the product V E. This is easy to see because if there are any saturating pushes between vertices u and v, at least one direction is an edge in E and if a push has occurred another push cannot happen until the height of u increases by two as the flow is one level down. Also the max height is bounded as two times V minus one. So the height has to be less than two times V and considering this for all edges, we have the quoted product in the lemma.

Another lemma bounds the number of non-saturating pushes as less than four times V^2 times the sum of vertex and edge counts taken together. While the relabel and saturating pushes increase the heights, the non-saturating pushes decreases it by at least one. We use their respective bounds to set the bound on non-saturating pushes.
Let us assume a potential function pi which is the sum of the heights of all vertices that have an excess flow.
During the relabel operation of a vertex i, its height cannot increase more than the max height which is two times V minus one. During the saturating push, the heights dont change and only the vertex v with max height 2 times v mimus one can become overflowing.
In both cases, the increase in potential is less than two times V.
The relabel happens at most 2 V^2 times and the saturating push happens at most 2 (VE) times. The total increase in potential is less than 4V^2(V+E).
u was overflowing before the non-saturating push and v may or may not have been overflowing. After the push, u is no longer overflowing and unless v is the source, it may or may not be overflowing. The potential has decreased by u.h and it has increased by 0 or v.h. Since u.h = v.h + 1, this potential decreases by at least one for a non-saturating push
Since potential is greater than 0, the total amount of decrease or the total number of saturating pushes must be less than 4V^2(V+E).
The bounds of each operation taken individually bounds the number of iterations of the generic-push-relabel algorithm.

#codingexercise
Bool ismorethan(node tree1, node tree2)
{
if (tree1 == null && tree2 == null) return true;
if (tree1 == null || tree2 == null) return false:
if(tree1.data <= tree2.data) return false;
Return ismorethan(tree1.left, tree2.left) && ismorethan(tree1.right, tree2.right);
}

#puzzle
How many k tuples of non-negative integer are there that sum upto n if we are given n and k as positive integers.

The answer is the binomial coefficient n+k-1 choose k-1. This is the same as the other binomial coefficient n+k-1 choose n

Tuesday, September 13, 2016

Today we cover eigenvectors graph centrality. This is a measure of closeness centrality but it is not based on the sum of the distances from nodes to all others because that can be misleading. For example, a node A that is close to small and a fairly closed group within a large graph may be distant from the rest of the graph. Another node B may be at moderate distance from the rest of the graph. Yet the farness measures for node A and node B may be quite similar in magnitude. However node B is really more central than A.
This method finds the more central nodes in terms of the overall structure of the graph and pays less attention to the influences of local sub-graphs. This approach identifies "dimensions" of the distances between the nodes. The first dimension captures the global aspects of distances among nodes and the subsequent dimensions capture more specific and local sub-structures.
In a way the eigenvector centrality is a measure of the centrality of the vertex based on how important its neighbors are. We will try to get a firm understanding of eigenvectors.
An eigenvector v of a matrix B is a non-zero vector that does not rotate when B is applied to it except perhaps to point it in precisely the opposite direction. v may change length or reverse its direction but it won't turn sideways. In other words there is some scalar constant lambda such that Bv = lambda v and this lambda is the eigen value. If you scale an eigenvector, it is still an eigenvector.
Iterating methods apply B to a vector over and over again. In such cases if the eigenvalue is less than one the application of B to v that many times results in smaller and smaller v to the point that the product vanishes. If the eigenvalue is greater than one, the product will grow to infinity.
These eigenvectors are found by the definition det ( B - lambda I ) = 0.
As an example, we have matrix B as
B = 2 -4
       -1 -1
det (B-lambda I) = 2-lambda -4
                                -1    --1-lambda
                             = (2-lambda) (-1-lambda) - (-4)(-1)
                              = lambda ^ 2 - lambda - 6
                              = (lambda -3)(lambda+2)
This gives eigen values 3 and -2 and we find the eigenvector v by solving (B-lambda.I)v = 0
For lambda = 3, we get -v1- 4v2 = 0.
Thus -4
           1
is an eigenvector.
If B is symmetric, then there exists a set of n linearly independent eigenvectors of B, denoted by v1, v2, ... vn. This set is not unique because each eigenvector can be scaled. Each eigenvector has a corresponding eigenvalue that is uniquely defined for matrix B.
Eigenvectors are useful because any vector can be considered a sum of other vectors whose behavior is understood. Any n-dimensional vector can be expressed as a linear combination of these vectors while retaining the effect of B on each eigenvector separately.
Having understood eigenvectors, it is now straightforward to find eigenvector centrality. because we know that the graph is represented as an adjacency matrix B and Bx= lambda x.
There are many eigenvectors possible but we find the one with the largest eigenvalue based on Perrona Frobenius theorem which states that there is a unique and positive solution if lambda is the largest eigenvalue associated with the eigenvector of the adjacency matrix B.
NetworkX implements a method to find this:
>>> G = nx.path_graph(4)
>>> centrality = nx.eigenvector_centrality_numpy(G)
>>> print(['%s %0.2f'%(node,centrality[node]) for node in centrality])
['0 0.37', '1 0.60', '2 0.60', '3 0.37']

It uses SciPy sparse eigenvalue solver to find the largest eigenvalue/eigenvector pair. The eigenvector centrality is based on the normalization of the largest eigen vector.
Courtesy : introduction to social network methods by Hanneman and Riddle
Introduction to conjugate gradients by Shewchuk

#puzzle
Yesterday the question was how many ways can n elements be divided into k groups and the answer was the (n+k-1)!/(n!(k-1)!)
Today we want to find the number of k tuples of positive integers whose sum is n. The answer is the binomial coefficient (n-1) Choose (k-1)

This is the same as k-1 element subsets of a set with n-1 elements.

Monday, September 12, 2016

#algorithm discussion continued
Today we continue reading the flow networks. We were looking at maximum flows

Maximum flows can also be calculated using "push-relabel" method.

This method performs two basic operations: pushing flow excess from a vertex to one of its neighbors and relabeling a vertex.

The Push operation applies only if u is an overflowing vertex, cf(u,v) > 0 and h(u) = h(v) + 1. The push operation updates the preflow f and the excess flows for u and v. It assumes that we can compute residual capacity in constant time given c and f.

All the flow is downward in the flow network so all the residual edges are at the same level, one level down or upwards. we discussed the following lemmas

1) An overflowing vertex can be either pushed or relabeled.

2) Vertex heights never decrease

3) The execution of the generic-push-relabel method on G maintains the attribute h as a height function.

4) Another lemma that gives ank important property on the height function states that

If f is a preflow in the flow network G, then there is no path from source to sink through the residual network Gf.

With the above lemmas we proved the generic push relabel algorithm.

#codingexercise

Void getcounthigher(node head, int k, ref int count)

{

If (head == null) return;

getcounthigher(head.next, k, ref count );

If (head.data > k) Count += 1;

}

Bool ismorethanEqualTo(node tree1, node tree2)
{
if (tree1 == null && tree2 == null) return true;
if (tree1 == null || tree2 == null) return false:
if(tree1.data < tree2.data) return false;
Return ismorethanEqualto(tree1.left, tree2.left) && ismorethanEqualTo(tree1.right, tree2.right);
}

#puzzle
How many binary search trees are possible from a sequence of n numbers ?

The nunber of binary search trees is the nth Catalan number.
1/(n+1) 2n C n
= (2n)!/ (n+1)!n!
Let us say node i is chosen to be the root. Then there are i-1 nodes smaller than i and n-i nodes bigger than i If t(n) is the total number of BSTs, then t(n) = sum t(i-1).t(n-1) for i = 1 to n
t(2) = t(0)t(1) + t(1)t(0) = 2
t(3) = t(0)t(2)+t(1)t(1) + t(2)t(0) = 5

By finding the catalan numbers for lower numbers of n we can reconstruct the values for higher numbers.

Find the number of ways of arranging n items in k groups
The answer is (n+k-1)!/n!(k-1)!

The reasoning can be intuitively explained as follows. We position items and groups along a line. The number of arrangements is equivalent to what we need. If we fix the positions of items, the positions of the group fall in place. This is simply n+k-1 choose n.

Sunday, September 11, 2016

#algorithm discussion continued
Today we continue reading the flow networks. We were looking at maximum flows

Maximum flows can also be calculated using "push-relabel" method.

This method performs two basic operations: pushing flow excess from a vertex to one of its neighbors and relabeling a vertex.

All the flow is downward in the flow network so all the residual edges are at the same level, one level down or upwards. we discussed the following lemmas

1) An overflowing vertex can be either pushed or relabeled.

2) Vertex heights never decrease

3) The execution of the generic-push-relabel method on G maintains the attribute h as a height function.

Another lemma that gives ank important property on the height function states that

If f is a preflow in the flow network G, then there is no path from source to sink through the residual network Gf.

If we assumed the reverse and said that there is a path source s, v0, v1, ... ,vk, sink t where k is less than the total number of vertices n, then we know h(vi) <= h(vi+1) +1 and combining them successively we get h(s) <= h(t) + k. We already know that h(t) = 0 and k < n, so h(s) <= k < n which contradicts the invariant that h(s) = n

With the above lemmas we prove the generic push relabel algorithm as follows :
The intialization step creates a preflow.
The progress of the algorithm maintains the preflow.
At termination, there are no overflowing vertices and this preflow is a valid flow and there is no path in the residual network. Consequently, the flow is maximum.

#codingexercise

Void getcount(node head, ref int count)

{

If (head == null) return;

getcount(head.next, ref count);

Count += 1;

}

Bool islessthanEqualTo(node tree1, node tree2)
{
if (tree1 == null && tree2 == null) return true;
if (tree1 == null || tree2 == null) return false:
if(tree1.data > tree2.data) return false;
Return islessthanEqualto(tree1.left, tree2.left) && islessthanEqualTo(tree1.right, tree2.right);
}

How many binary search trees are possible from a sequence of n numbers ?

The nunber of binary search trees is the nth Catalan number.
1/(n+1) 2n C n
= (2n)!/ (n+1)!n!

#algorithm discussion continued
Today we continue reading the flow networks. We were looking at maximum flows

Maximum flows can also be calculated using "push-relabel" method.
This method performs two basic operations: pushing flow excess from a vertex to one of its neighbors and relabeling a vertex.
The Push operation applies only if u is an overflowing vertex, cf(u,v) > 0 and h(u) = h(v) + 1. The push operation updates the preflow f and the excess flows for u and v. It assumes that we can compute residual capacity in constant time given c and f.
All the flow is downward in the flow network so all the residual edges are at the same level, one level down or upwards. we discussed the following lemmas
1) An overflowing vertex can be either pushed or relabeled.

2) Vertex heights never decrease

3) The execution of the generic-push-relabel method on G maintains the attribute h as a height function.

#codingexercise

Void getcount(node head, ref int count)

{

If (head == null) return;

getcount(head.next, ref count);

Count += 1;

}