Today we present a summary of paper titled "Embedding a Semantic Network in Word Space" by Johannson and Nieto Pina
A semantic network is created with vectors that represent word meanings (from thesaurus) of words in a text. Those that have many meanings are represented by vectors that can be described as a combination of singular sense vectors. And those that have a singular sense vector is kept similar to the neighbors in a network. This leads to a constrained optimization problem. The distance function is a straightline distance function and squared.
To give an example of the senses for a word, the word 'mouse' may refer to a 'rat or an 'electronic device'. Furthermore, a 'mouse' is a 'rodent' so it has prominent 'teeth'. The purpose of this vectorization is to find a vector that can represent mouse in the rodent sense.
This paper tries to embed a graph structure in the word space. This approach uses two sources of information - corpus statistics and network structure. This was shown to work well as a classifier that creates lexical units for FrameNet frames. FrameNet is a collection of over 170000 manually annotated sentences which provide a unique training dataset for role labeling
The senses are chosen based on a number of hypernyms and hyponyms from WordNet alhough we could use a thesaurus.
The algorithm maps each sense sij to a sense embedding, a real-valued vector E(sij) in the same vector space as the lemma embeddings. A mixed constraint binds the two embeddings. The lemma and sense embeddings are related through a mix constraint. This mix constraint is expressed with the occurrence probabilities of the senses as weights to the real valued vectors. This is a way to say that we count the contexts to form the vectors when the words have more than one sense. It has a bonus benefit that we disambiguate the words. The motivation behind this paper is now formalized as minimizing the sum of distances between each sense and its neighbours, while satisfying the mix constraint for each lemma.
The mix constraint helps the lemma to be true to the text while the minimization helps build the graph. It also leaves the words with singular meaning unchanged.
The algorithm can now be described in the following way. It proceeds in an online fashion by considering one lemma at a time.
1) Pick a lemma and choose some senses from the thesaurus
2) Adjust the embeddings of the senses as well as their mix in order to minimize the loss function which is expressed as the weighted square of distance between two vectors ranging over all of the senses of each pair (the sense and its neighbor). The embeddings of the neighbors are kept fixed.
3) Iterate through all the lemmas for a fixed number of times or a threshold for the change.
Improvements in each iteration does not need to be gradient based. We already know that the weighted centroid of the neighbors is the likely target. Therefore the residual is the difference between the linear combination of weighted centroids and the lemma embedding. The sense embedding for lemma is the weighted offset from the centroid. If the mix variable is zero, the sense embedding is the centroid which goes to say that it is determined completely by the neighbors and if it is 1 it is completely the embedding of the lemma.
A semantic network is created with vectors that represent word meanings (from thesaurus) of words in a text. Those that have many meanings are represented by vectors that can be described as a combination of singular sense vectors. And those that have a singular sense vector is kept similar to the neighbors in a network. This leads to a constrained optimization problem. The distance function is a straightline distance function and squared.
To give an example of the senses for a word, the word 'mouse' may refer to a 'rat or an 'electronic device'. Furthermore, a 'mouse' is a 'rodent' so it has prominent 'teeth'. The purpose of this vectorization is to find a vector that can represent mouse in the rodent sense.
This paper tries to embed a graph structure in the word space. This approach uses two sources of information - corpus statistics and network structure. This was shown to work well as a classifier that creates lexical units for FrameNet frames. FrameNet is a collection of over 170000 manually annotated sentences which provide a unique training dataset for role labeling
The senses are chosen based on a number of hypernyms and hyponyms from WordNet alhough we could use a thesaurus.
The algorithm maps each sense sij to a sense embedding, a real-valued vector E(sij) in the same vector space as the lemma embeddings. A mixed constraint binds the two embeddings. The lemma and sense embeddings are related through a mix constraint. This mix constraint is expressed with the occurrence probabilities of the senses as weights to the real valued vectors. This is a way to say that we count the contexts to form the vectors when the words have more than one sense. It has a bonus benefit that we disambiguate the words. The motivation behind this paper is now formalized as minimizing the sum of distances between each sense and its neighbours, while satisfying the mix constraint for each lemma.
The mix constraint helps the lemma to be true to the text while the minimization helps build the graph. It also leaves the words with singular meaning unchanged.
The algorithm can now be described in the following way. It proceeds in an online fashion by considering one lemma at a time.
1) Pick a lemma and choose some senses from the thesaurus
2) Adjust the embeddings of the senses as well as their mix in order to minimize the loss function which is expressed as the weighted square of distance between two vectors ranging over all of the senses of each pair (the sense and its neighbor). The embeddings of the neighbors are kept fixed.
3) Iterate through all the lemmas for a fixed number of times or a threshold for the change.
Improvements in each iteration does not need to be gradient based. We already know that the weighted centroid of the neighbors is the likely target. Therefore the residual is the difference between the linear combination of weighted centroids and the lemma embedding. The sense embedding for lemma is the weighted offset from the centroid. If the mix variable is zero, the sense embedding is the centroid which goes to say that it is determined completely by the neighbors and if it is 1 it is completely the embedding of the lemma.
No comments:
Post a Comment