Friday, November 29, 2013

Since we talked about DBSCAN clustering method in the previous post, let's quickly revisit the agglomerative hierarchical clustering as well. They are an important category of clustering algorithms. Hierarchical clustering can be agglomerative or divisive. Both are often displayed visually using a tree-like diagram called a dendrogram which displays both the cluster and the sub-cluster relationships and the order in which the clusters were merged or split. For sets of two dimensional points,  a hierarchical clustering can also be illustrated using a nested cluster diagram
The algorithm proceeds like this:
1. Compute the proximity matrix, if necessary.
2. repeat
3. Merge the closest two clusters
4. Update the proximity matrix to reflect the proximity between new and original clusters.
5. until only one cluster remains.
Step 3. can be done using single link, complete link and group average methods.
The single link is the minimum distance between clusters as measured between two points - one from each cluster that are nearest. The complete link is the maximum distance between clusters as measured between two points - one from each cluster that are farthest. And the group average takes the average of pairwise distances of all pairs of points from the two different clusters.
Proximity can also be measured in terms of the distances between the centroids of the cluster. Ward's method also uses centroid but it measures the proximity between two clusters but it measures the proximity between two clusters in terms of the increase in sum-of-squared-error (SSE) that results from the merging of two clusters.  SSE gives an indication of the cluster quality and Ward's method attempts to minimize the SSE of points from their cluster centroids.
Due to the use of proximity matrix, the space complexity is O(m^2). Since the algorithm updates the proximity matrix and these are updated in m-1 iterations, the time complexity is also O(m^3). With the use of some data-structures for efficiently organizing as heap, the complexity could be reduced to O(m^2.logm).
This kind of clustering does not globally optimize an objective function. Instead it uses criteria described above to determine the clusters to be merged (or split) at each step. The local minima is easily calculated but the time complexity and space complexity for the overall run is high.

No comments:

Post a Comment