Friday, September 30, 2016

Yesterday we were discussing the role of thesaurus in natural language processing.
The essence of the use of thesaurus is to build, for every significant word of the text, a list of heads under which they appear, and to look for intersection between these lists. The candidates that are poor translations are replaced with the words or phrases that are selected from the head that contains the most words from the initial translation. In addition, Thesaurus has been used for word sense disambiguation.
Using the thesaurus, we can also build lexical chains which are strong semantic relations between the words that it is made up of. Morris and Hirst calculated lexical cohesion. Stairman automated this process. Ellman used lexical chains to reconstruct  the representation of text's meaning from its content.
As an aside, WordNet helped tremendously with word sense disambiguation. It has also been used in different languages. Due to the limitations of the printed version of the thesaurus, many have turned to WordNet.But if the Thesaurus were machine tractable and free, Jarmasz argued in his thesis, it would be better than WordNet. Further both of them can be combined.
It is not straightforward to use the lexical chain words as vector features because they would lose the cohesion in the chain. Similarly pair wise words distances cannot be used because we don’t have the relationships between the words to select them as candidates before looking them up in the thesaurus. However k-means clustering directly applies to words when they are looked up in the thesaurus. Consequently cluster centroids can be found and then the words can be selected based on and all the candidates will be tagged to the nearest cluster. 
The main difference between the thesaurus and the bag of words model is that the former selects words from the thesaurus for semantic relationships while the latter selects the words based on collocation in contexts.
That said, we can contruct vectors from semantically correlated words just as we would from the text.
Levy and Goldberg introduced Pointwise Mutual Information and this is equally applicable to semantically correlated words.
While they chose random contexts in negative sampling, we can substitute with lexical chains upto a finite number. Further if we had a technique to use semantic cluster centroids, we could use them directly instead of lexical chains.

#codingexercise
For given array a of size n we create a set of a[i] , a[a[i]] , a[a[a[i]]] ….. i varies from 0 to n-1 , find the max size of such set.
Let us take an array with two elements  for binary values 1 and 0 as first and second elements. Then we can create a set
a[0] , a[a[0]] , a[a[a[0]]] …..
a[1] , a[a[1]] , a[a[a[1]]] …..
Which are
1, 0,1, 0, ...
0, 1, 0, 1, ...
Void printNested( list<int> A, int i, int index)
{
If (i < A.count){
  If( index < A.count){
     Console.write(A[index]);
     PrintNested(A, i, A[index]);
   }else{
      PrintNested(A, i+1, i+1);
    }
}
}
}

It may be interesting to note that there can be infinite series

No comments:

Post a Comment