Saturday, September 6, 2014

In the post previous to the last, I mentioned applying centrality to keyword extraction. In that we said we could find the edges in the graph based on mutual information. Mutual information is based on co-occurrence. That helps when the words are expressed in the same form. But writers seldom repeat their words and express their import using synonyms and belabored language. In such cases, the function of the keywords is also as important as the form. Therefore,  we could use alternative metrics for using pair wise relationships. One such metric could be based on ontology. We look up two words to see whether they are similar. With WordNet for example, we get a distance metric. Such a metric adds more relevance to the edges of the graph. Synonyms and antonyms could have a constant measure different from the default for unknown words. By adding a metric for import in addition to co-occurrence, we improve the ranking we get from centrality.
SemanticSimilarity semsim=new SemanticSimilarity() ;
    float score=semsim.GetScore(word1, word2);
                               
Here is a link for the same : http://www.codeproject.com/Articles/11835/WordNet-based-semantic-similarity-measurement 

I want to mention a caveat that this is not a substitute for clustering or mining algorithms. For more on mining algorithms, we can refer here:  http://msdn.microsoft.com/en-us/library/ms175595.aspx 

and they include the following algorithms :
Classification algorithms
Regression algorithms
Segmentation algorithms
Association algorithms
Sequence Analysis Algorithms

No comments:

Post a Comment