Today we use the semantic network embedding as a word vector feature in the collocation context model.
From the Skip gram negative sampling (SGNS) approach, we do this by directly taking the semantic network as indicator of word, context pairs. When we recall the skip gram model, we see that the word, context pairs which were to be chosen as positive sampling were those that were having a high probability of being found in the corpus. The contexts are chosen from around the word. But while a large number of word,contexts could appear in the corpus, the quality of the selection improves when the contexts are themselves content-full and linearly far away from the focus words. In other words the similarity have to be topical. We can achieve this directly with the help of a semantic network. The lookup of the semantic network is now described. We already have a mapping between the sense and the real valued vector that builds the semantic network. When the words and contexts appear as different senses but are mapped to either the same node or are co-located as neighbors in the semantic network, then we know that they have similarity by topic. The semantic network was already built by the relatedness of the topics, we just need to map those labels back to the words. This is easily done with the word mapping to senses as part of the semantic network embedding. The semantic network already guarantees the vectors representing the lemmas will be related to those corresponding to the senses. All we need to see if the word and contexts are together or far apart in terms of the semantic network.
Why go through the effort of establishing a semantic network and associating words and contexts with their topics? Either approach alone does not suffice. In fact there is no clear answer to why SGNS produces good word representations. The semantic network articulates one way to say it. We just need to have a threshold in terms of a number of word context pairs with semantic correlation that the SGNS will consider from all word contexts considered and early on. Complex models built with a variety of factors do not serve as a panacea. At the same time this effort involved is optional and guaranteed to benefit. This immediately boosts the quality of positive sampling in the skip gram model without even introducing negative sampling.
The purpose of the negative sampling on the other hand is to allow the vectors to not result in being all similar. This is usually the case when the scalar product of the word and context vectors is a large number say about 40 because their vectors are similar. Our approach is also at risk from this threat. However the semantic network also comes in useful with negative sampling. We know that the word, context pairs that are far apart in the semantic network embedding have very little topic correlation. So instead of picking random word, context pairs we are more methodical in the selection with similar objective. In negative sampling, the word, context pair do not appear in the corpus. Therefore, we fix one and pick the other from those associated with far away nodes so that there are at least a threshold number of such negative samples. This improves the quality of negative samples as well. Overall we are saying that with articulated picks of positive and negative sampling we improve the overall word representations.
From the Skip gram negative sampling (SGNS) approach, we do this by directly taking the semantic network as indicator of word, context pairs. When we recall the skip gram model, we see that the word, context pairs which were to be chosen as positive sampling were those that were having a high probability of being found in the corpus. The contexts are chosen from around the word. But while a large number of word,contexts could appear in the corpus, the quality of the selection improves when the contexts are themselves content-full and linearly far away from the focus words. In other words the similarity have to be topical. We can achieve this directly with the help of a semantic network. The lookup of the semantic network is now described. We already have a mapping between the sense and the real valued vector that builds the semantic network. When the words and contexts appear as different senses but are mapped to either the same node or are co-located as neighbors in the semantic network, then we know that they have similarity by topic. The semantic network was already built by the relatedness of the topics, we just need to map those labels back to the words. This is easily done with the word mapping to senses as part of the semantic network embedding. The semantic network already guarantees the vectors representing the lemmas will be related to those corresponding to the senses. All we need to see if the word and contexts are together or far apart in terms of the semantic network.
Why go through the effort of establishing a semantic network and associating words and contexts with their topics? Either approach alone does not suffice. In fact there is no clear answer to why SGNS produces good word representations. The semantic network articulates one way to say it. We just need to have a threshold in terms of a number of word context pairs with semantic correlation that the SGNS will consider from all word contexts considered and early on. Complex models built with a variety of factors do not serve as a panacea. At the same time this effort involved is optional and guaranteed to benefit. This immediately boosts the quality of positive sampling in the skip gram model without even introducing negative sampling.
The purpose of the negative sampling on the other hand is to allow the vectors to not result in being all similar. This is usually the case when the scalar product of the word and context vectors is a large number say about 40 because their vectors are similar. Our approach is also at risk from this threat. However the semantic network also comes in useful with negative sampling. We know that the word, context pairs that are far apart in the semantic network embedding have very little topic correlation. So instead of picking random word, context pairs we are more methodical in the selection with similar objective. In negative sampling, the word, context pair do not appear in the corpus. Therefore, we fix one and pick the other from those associated with far away nodes so that there are at least a threshold number of such negative samples. This improves the quality of negative samples as well. Overall we are saying that with articulated picks of positive and negative sampling we improve the overall word representations.
No comments:
Post a Comment