Cluster computing

Thursday, June 19, 2014

In the previous post, we described a Chinese Whispers clustering algorithm. We now study the algorithm with two different approaches. The first approach is what we had discussed earlier as using small world edge weights.The outcome of this is similar to the min-cut algorithm where dense regions are considered clusters. But unlike min-cut there is no hierarchical clustering and instead gives a flat partition. It does not require any threshold as input parameter and is more efficient. The second approach is based on Matrix operations. This algorithm can be considered a special case of Markov Chain Clustering. MCL is the parallel simulation of all possible random walks up to a finite length on a graph G. Walks will terminate usually in the same cluster that they started from. This is an observation that comes from MCL. MCL simulates flow in the graph by repeatedly updating transition probabilities between all nodes eventually converging to a transition matrix after k steps. The updates to the transition probabilities between all nodes is done in two steps and then these two steps are iterated. The first step is the expansion step. This step involves the matrix multiplication of the graph matrix with the current transition matrix. The second step is the inflation step where the contrast between small and large transition probabilities is increased and normalized to unity thus providing a way to segment the matrix. The inflation step occurs in column wise operations.
The k-matrix multiplications of the expansion step has a significant cost.
In the early stages of the iterations, the matrices operated on are dense. But as the iterations proceed and with the help of a strong inflation operator, matrices in the later steps tend to be sparse. As an optimization the inflation step cost can be lowered by using only a few entries per column and can even be reduced to a single largest value.
The Chinese Whispers algorithm can now be described in matrix operations as follows:
Lets start with the identity matrix of dimension n
for each of the t iterations
we run a maxrow operator on all entries of a row to zero except the largest entry which is set to 1
then we multiply this resulting matrix with the adjacency matrix of the graph and use the result as the input to the next iteration.
Since the time complexity is dependent on the n non-zero entries in the result of the max-row operation, in the worst case this could equal the time complexity of MCL which is a fully connected graph.

Cluster computing

Thursday, June 19, 2014

No comments:

Post a Comment