Tuesday, October 22, 2013

We briefly discuss other synonym extraction methods. One is the Distance method and another is the ArcRank method. In the distance method, one possible way of defining  a synonym distance is to see if the words appear in the definitions of many common words or have many common words in their definition. A way of formalizing this is to count the number of words that appear in one of the definitions but not in both in which case the distance can be expressed as |Ai - Aj| + |Aj - Ai| where A is the adjacency matrix.
Unlike other methods presented in this chapter, we can apply this algorithm directly to the entire dictionary graph rather than the neigborhood graph. However, it has been found that better results are obtained when the neighborhood graph is used.
ArcRank is a method for building a thesaurus. The intent with the approach was not to find synonyms but related words. Both the ArcRank method and the method discussed in the previous post had used the 1913 Webster dictionary for source. Here a pagerank is associated with each vertex of the dictionary graph in the following way: All vertices start with identical initial ranking and then iteratively distribute it to the vertices they point to. And they receive the sum of ranks from the vertices they point to. Here too the normalized ranking converges to the principal eigen- vector of the adjacency matrix of the graph so there is a natural order in the ranking. The algorithm is slightly modified so that the edge cases don't have extreme values i.e nodes with no incoming edges or outgoing edges. If a were the number of outgoing edges and pi the page rank of vertex i, the edge relevance between s and t is defined as rs,t = (ps/as)/pt
These edge relevances are then coverted to rankings which are computed only once. Related words for a word w can then be found by selecting the edges with the best ranking and extract the incident vertices.

No comments:

Post a Comment