Cluster computing

Sunday, April 6, 2014

The basic Ford Fulkerson algorithm is given by the following:
Ford-Fulkerson(G, s, t)
for each edge (u,v) belonging to E[G]
do f[u,v] = 0;
f[v,u] = 0;
While there exists a path p from s to t in the residual network Gf
do cf(p) = min[cf(u,v) : (u,v) is in p]
for each edge (u,v) in p
do f[u,v] = f[u,v] + cf(p)
f[v,u] = -f[u,v]

The Ford Fulkerson algorithm expands on the Ford Fulkerson method mentioned in the previous post. The while loop repeatedly finds an augmenting path p in Gf and augments flow f along the residual capacity cf (p). When no augmenting path exists, the flow f is the maximum flow.
How the augmentation is done determine s how well the algorithms performs.
If a breadth first traversal is chosen, the algorithm runs in polynomial time.
However let's first take the case when the augmentation path is chosen arbitrarily and the capacities are integral.
The complexity of the initialization steps are O (E)

The complexity of the while loop is O (E f*) where f* is the maximum flow through the system. The while loop is executed at most F* times. The efficiency of the while loop can be increased with the choice of a suitable data structure.

Saturday, April 5, 2014

Today we discuss Ford Fulkerson method to solve maximum flow problem. A maximum flow problem is one where we can interpret a directed graph as a flow network. Each directed edge has a stated capacity.In the maximum flow problem, we wish to capture the greatest rate at which material can be perceived to flow through the graph without violating any capacity constraints.
The Ford Fulkerson method is defined as follows:
Ford-Fulkerson-method(G,s,t)
initialize flow f to 0
while there exists an augmenting path p
do augment flow f along p
return f
As we can see, this method is iterative. At each iteration, we increase the flow by an augmenting path which is simply a path along which we can send more flow and then augmenting the flow along this path. We repeat this process until no augmenting path can be found. There is a way to show that this yield maximum flow which is the max flow min cut theorem. In the Ford Fulkerson method, there can be different implementations to attain the maximum flow and hence named as a method instead of an algorithm.
In the graph with a flow network and a flow, we will have some edges that can admit more flow. This additional flow that we can push through is called residual capacity. The Ford Fulkerson method repeatedly augments the flow along augmenting paths until a maximum flow can be found. The max-flow min-cut theorem tells us that a flow is maximum if and only if its residual network contains no augmenting path. To prove this theorem, a technique called a cut of flow is used.
A cut (S, T) of flow network is a partition of V into S and T = V - S such that s belongs to S and t belongs to T. If f is the flow, then the net flow across the cut is f(S,T). The capacity of the cut is C(S,T). A minimum cut of a network is a cut whose capacity is minimum over all cuts of the network.
The max-flow min-cut theorem states that if f is a flow in a flow network G = (V, E) with source s and sink t, then the following conditions are equivalent:
1. f is a maximum flow in G
2. The residual network Gf contains no augmenting paths
3. |f| = c(S,T) for some cut (S,T) of G
The first condition implies the second condition. This we can show by proof of contradiction. If f is the maximum flow and there is an augmenting path then the flow sum f + fp has a flow in G with a strictly greater than |f| contradicting the assumption we just made.
The third condition implies the first because the value of any flow in network is bounded from above by the capacity of any cut of G. The condition |f| = c(S,T) thus implies that f is a maximum flow.
In a basic Ford Fulkerson algorithm, in each iteration, we find some augmenting path and increase the flow f on each edge of p by the residual capacity cf(p). In this implementation, we make use of the formula that the net flow across the cut (S,T) is f(S,T) = |f| which lets us calculate the residual capacity. Given that edges have capacity and flow and no edge implies no flow and no capacity, we update the flow f[u,v] between each pair of vertices that are connected by an edge by calculating the residual capacity in a temporary variable and augmenting the flow with this capacity.
The residual capacity is the minimum of the capacities. We update the flow in each step until there is none.

Friday, April 4, 2014

We will continue our discussion on Prims algorithm. We discussed that Prim's algorithm is a greedy method. We look for the minimum cost edges that we can add to our tree. Here we determined that at a vertex v and with a min cost edge e at v, there is a minimum cost spanning tree containing e. We also said that for any vertex v, some minimum cost edge at v is included in every minimum cost spanning tree. We then generalized it with partitions and said that some min cost edge across the partition must be included in every minimum cost spanning tree.
With the lemma above we choose the minimum weight edge that traverses the set of vertices in the tree and those we have not explored yet and keep growing our tree . Thus, Prim's algorithm works in finding the minimum cost spanning tree.

We continue our discussion on proof of correctness for Prims algorithm
To prove lemma 1, we take the edges e and say there is a cycle and we return via edge f at v. Lets suppose that edge e is not included in the tree. We add e to the tree and remove f. Since e is minimum cost, we know that cost of e < cost of f . The new tree must be less than or equal to original. In fact it has to be equal to f. This we see that we can replace min cost edges at a vertex.
We can generalize lemma 2.We take the set of vertices and split it up in two parts - S and V-S. Some min cost edge from V-S to S must be included in every min cost spanning tree.
S = {v} , V - S = V - {v} => Lemma 2
We have an edge with ends in both partitions. By similar proof as earlier where we take a returning edge and swap with our edge, we can show that the cost is unchanged and this generalization works.
Consider lemma 3 with a special case that all edge weights are distinct. These edges traverse from S to V-S. We know that unique min cost edge must be included in every min cost spanning tree.
At any point in Prims algorithm, we have t and V-t and it picks for some i, an edge from i in {S-V} to near(i) in {S} which is minimum cost. By Lemma 3, Prim's algorithm always picks a valid edge to add which gives us the proof of correctness.
We now adapt Lemma 3 and Proof of correctness for Prim with multiple copies of the distinct edge weights which resembles the real case to show that it works there as well.
Here we argue that every copy of the distinct edge weight will also traverse the partitions {S} and {S-V} and by picking the minimum cost we add a valid edge. The order of edges added is irrelevant because we know that we can swap one valid edge for another. In other words, having proved that we do pick a valid edge for distinct values, we pick a valid edge again for a same valued pair (i, near(i)). This gives us the proof of correctness for the real case where the graph can have arbitrary weights.

Wednesday, April 2, 2014

We will now look at the proof of correctness for Prim algorithm. Prim's algorithm will produce different minimum spanning trees depending upon the edges chosen in ties between equal weight edges. In this algorithm, we compute the smallest edge extending the tree.
We begin by picking two vertices i and j and an edge. We maintain a two dimensional data structure that maintains the endpoints of the edges. We also keep track of the mincost. for one of the remaining vertices, if there's no edge to either i or j, we set the cost to -1, otherwise we add the edge.
we proceed with the rest of the tree. We now look at a lemma. Let e be some minimum cost edge attached to vertex v. The lemma states there is some spanning tree (min cost) that contains e (minimum cost) that contains e. We use this lemma in this way. We need not start tree with the smallest edge vertex. If edges are not connected to this vertex then we set the distances to maximum. So the only variation from the original algorithm is the initialization step. The complexity of this algorithm is that the outer loop runs O(n). Inside each iteration, the dominant cost is to scan these pairs (i, near(i) to find minimum. To improve the efficiency, we need a data structure to keep track of near(i). the Red-Black tree is suitable for this need. It allows insert, finding minimum, deleting arbitrary value and searching arbitrary value each of these operations is of the order of logarithmic n. Its once for each edge n. we do nlogn operations to find min . if the edges are size m then we have mlogn operations to update near(). The total complexity is O(m+n)logn. If m is order of n ^2 where there are a lot of edges them r-b tree is not suitable. In summary, we have one greedy implementation of min-cost spanning tree.
we will now look at the proof of correctness for this algorithm. We list two lemma. We've listed the first one already. This is where we say that when we pick a vertex v with a min cost edge e at v there is a min cost spanning tree containing it. The second lemma is that for the vertex v with a min cost edge e, the edge e is included in every min cost spanning tree. to prove the second lemma, we take a vertex v and say that there is a min cost tree t with no min cost edge for some v. we add edges to create a cycle. e and f are two edges to v. cost of edge e < cost of edge f. If we delete f, we restore the tree and we can show that the tree is a min cost spanning tree since everything is connected and it includes e which contradicts the assumption that the t we started out with is a min cost spanning tree.

Today we look at prim algorithm. This is also a greedy method. Here we take grow the tree by adding the minimum cost edge that connects to it. That is we look at all the edges that connect to the existing tree and choose the lowest cost.we grow the tree one edge at a time. The algorithm is described as below
Algorithm Prim(E, cost, n, t)
{
Let (k,l) be on edge of minimum cost in E;
t[1,1] = k
t[1,2] = l
mincost = cost[k,l]
for i = 1 to n do
if (cost[i,1] < cost[i,k]) then near[i] = l;
else if cost[i,k] is not large) then near[i] = k
ese near[i] = -1;
near[k] = near[l] = 0;

for ( i = 2 to n-1 ) do
{
Choose j such that near[j] > 0 and cost[j, near[j]] is minimum.
t[i,1] = j; t[i,2] = near[j]; mincost = mincost + cost[j, near[j]]
near[j] = 0;
for k = 1 to n do {
if (near[k] > 0) and (cost[k, near[k]]) > cost[k, j]) or
(near[k] < 0) and (cost[k,j] is not large)
then
near[k] = j;
}
}
return mincost;
}

Tuesday, April 1, 2014

Today we will look at Kruskal algorithm. This algorithm is a greedy method. It enumerates the edges in the order of ascending weights. The edges need not be connected. That is where at any given time we have disjoint sets that are called forests. The algorithm proceeds like this
We have two helper methods
Find (u) that finds the component containing u and
Union (u, v) that combines component s.
Algorithm Kruskal (E, cost, n, t)
{
Organize the edges of the graph in a min heap H , based on cost
For I = 1 to N do component [I] = I
I = 0 mincost =0
while ( i < n - 1)
{
    (u,v) = delete-min(H)
    j = Find(u); k = Find(v);
    if (j != k) then {
        i = i + 1;
        t[i,1] = u; t[i,2] = v;
       mincost = mincost + cost[u, v];
       Union(j, k);
      }
}
return mincost;
}