Tuesday, May 14, 2013

Here we discuss an implementation from previous posts to finding topics based on a set of keywords. Let us say we have a function similar() that returns a set of words that co-occur with the words in the language corpora. Let us say we have selected a set of keyword candidates in set W. 
For each of the words, we have found the similar co-occurring words and put them in a cluster. The clusters have  a root keyword and all the similar words as leaves. When two clusters share common words, the clusters are merged. So the clusters could be additive. The root word of the combined cluster is the combination of the root words of their individual clusters. Similarly the leaves of the cluster are a combination of the leaves of the individual clusters. We may have to iterate several times until we find that there are no cluster pairs that share similar words.

No comments:

Post a Comment