Statistical framework is especially relevant in natural language processing.
The purpose of using random variables and a Bayes framework is that it has advantages over the frequentist approach. The focus on the frequentist approach is that the estimated model will be closer to the truth when there is sufficient amount of data available. In the Bayes framework, this is in terms of probabilities from the training sample. For example, if we want to define a probability distribution over a language vocabulary, then one can define a random variable X(w) where w ranges over the words in the vocabulary. The probability of a word in the vocabulary then becomes p(X=w). Random variables can be both discrete and continuous. In the case of discrete variables we attach a weight to each element in the sample space such as in a multinomial distribution and the discrete variable is thus assumed to have an underlying mass function which is the sum of the probabilities ranging over the possible values that it can take. The same is done for a continuous random variable except that we integrate it over a sample space and call it a probability density function. Since multiple random variables are defined on the same sample space, we can describe a joint distribution
Word2vec uses CBOW architecture predicts a word based on the surrounding words and the skip gram predicts the surrounding words based on the current word. This is done in a specific way called the softmax function and it is summarized in a formulation as:
p(wo/wi) = exp(vo dash transpose. vi) / sum of such expectations for i,j ranging over the entire vocabulary where vw and vw' are input and output vector representations of word w in vocabulary W. This is a heavily optimized expression but the idea is that each word tries to predict every word around it. Further research by others tried to predict the word based on surrounding words but based on context where context is not just words but anything that contributes to context such as preposition.
#codingexercise
The purpose of using random variables and a Bayes framework is that it has advantages over the frequentist approach. The focus on the frequentist approach is that the estimated model will be closer to the truth when there is sufficient amount of data available. In the Bayes framework, this is in terms of probabilities from the training sample. For example, if we want to define a probability distribution over a language vocabulary, then one can define a random variable X(w) where w ranges over the words in the vocabulary. The probability of a word in the vocabulary then becomes p(X=w). Random variables can be both discrete and continuous. In the case of discrete variables we attach a weight to each element in the sample space such as in a multinomial distribution and the discrete variable is thus assumed to have an underlying mass function which is the sum of the probabilities ranging over the possible values that it can take. The same is done for a continuous random variable except that we integrate it over a sample space and call it a probability density function. Since multiple random variables are defined on the same sample space, we can describe a joint distribution
Word2vec uses CBOW architecture predicts a word based on the surrounding words and the skip gram predicts the surrounding words based on the current word. This is done in a specific way called the softmax function and it is summarized in a formulation as:
p(wo/wi) = exp(vo dash transpose. vi) / sum of such expectations for i,j ranging over the entire vocabulary where vw and vw' are input and output vector representations of word w in vocabulary W. This is a heavily optimized expression but the idea is that each word tries to predict every word around it. Further research by others tried to predict the word based on surrounding words but based on context where context is not just words but anything that contributes to context such as preposition.
#codingexercise
Given an array of n elements, where each element is at most k away from its target position, sort the array in the reverse order
Void sortKReverse(ref List<int> A, int k)
{
Assert(A.count > k);
Var h = new Heap<int>();
For (int i =0: i < k && i < n; i++)
{
h.Add(A[i]);
}
Int i = k+1;
For(int j =0; j < n; j++)
{
   If (i < n){
       A[j] = h.replacemax(A[i]);
       I ++;
}else{
      A[J]  = h.max()
}
}
}
 
No comments:
Post a Comment