Thursday, June 19, 2014

We continue our discussion on Chinese Whispers graph clustering program. Its a bottom up clustering algorithm where the nodes start out with different class labels and coalesce to their strongest neighborhood class labels. We determine the strongest labels by the node with the maximal sum of edge weights. The clustering usually proceeds for a small number of iterations. The clustering here is different from clustering vectors because there is no distance metric and consequently no centroid or average. The similarity is measured in terms of edges. Chinese Whispers is one such graph clustering algorithm. It is often used with small world graphs. A small world graph is a kind of graph in which most nodes are not neighbors of each other and yet most nodes can be reached by every other by a short number of hops. As an example consider of a small world graph, consider a set of nodes on the circumference of  a circle.
In the small world graphs clustering, sometimes a random mutation rate is introduced that assigns new classes and this is done with decreasing probability This greatly improves the convergence in early iterations.  If we take a graph with eleven nodes where the nodes are arranged in two connected sub-graphs with some edges connecting to each other to form a small world graph, this technique converges the class labeling in just two iterations because the mutation acts as a test that then facilitates the proper labeling.
The algorithm does not cross component boundaries because there are no edges between nodes belonging to different components. This means that nodes that are not connected by any edges are discarded by this algorithm which leaves a portion of the nodes unclustered.
We see that the algorithm does not always converge as in the case of labeling a node with a tie i.e it could be assigned one or another class label with the same edge weights.  This wouldn't happen if the edges were weighted in the graph. Also the classes generally don't update much after a few iterations.
Note that this simple technique does not take any parameters and the class labels are hard. If soft labels are required then we can assign a class distribution.This can be done as a final step by taking a weighted distribution of hard labels.

No comments:

Post a Comment