Saturday, November 30, 2013

bisecting k-means

A bisecting k-means algorithm overcomes some of the limitations with the choice of centroids and finding the minimum SSE. It is based on the idea that to obtain k clusters, split the set of all points into two clusters, select one of these clusters to split, and so on until K clusters have been produced.
The steps for the bisecting K-means algorithm include the following:
1. Initialize the list of clusters with one that contains all the points
2. repeat
3. Remove a cluster from the list of clusters
4. Perform several trial bisections of the chosen cluster
5. for i - 1 to number of trials do
6.    Bisect the selected cluster using basic k-means
7 end for
8. Select the two clusters from the bisection with the lowest total SSE
9. Add these two clusters to the list of clusters
10. until the list of clusters contains K clusters.
There are a number of ways to choose which cluster to split. We can choose the largest cluster at each split or the one with the largest SSE or both.
Further, we can refine the resulting clusters by using their centroids as the initial centroids for the basic K-means algorithm.

1 comment: