Friday, May 3, 2013

A clustering method for finding keywords in a text

Given a distance function between two terms that measures the similarity between the two terms, we build a tree of clusters which we traverse to insert the term in the cluster with the nearest center. For that cluster we recompute the center as if the record r is inserted into it. If the cluster threshold is exceeded, we can proceed to the next record. If the tree grows beyond a maximum number of clusters because we want to keep only a few clusters, then we can increase the threshold so that the clusters can be merged or accomodate more records

B+ Tree:
class Node:
  def __init__(self, data, l = None,  r = None, center = None):
   self.l = l
   self.next = None
   l.next = r
   self.r = r
   self.center = center
  
def value(self):
  return self.center

No comments:

Post a Comment