Today we continue our discussion on natural language processing with a particular word ontology called FrameNet. We see how it is different from other ontologies , such as WordNet, PropBank , AMR etc. While AMR introduced graphical model, it did not have the rich semantic grouping that FrameNet had. Now let us look at a specific example of a model that make use of terms in a graphical data structure to see what FrameNet may bring to it. To do that we first look at one such model that does PoS tagging. A similarity graph is constructed and the labels are propagated. A classifier is trained on prelabeled data. Then it is used to predict the decode the labels from the target domain. We saw how a standard label propagation technique works.
Let us now look at one instance of this classifier as a tagger.
We start with a set of labeled source domain examples that gives pairs of parts of speech labels to inputs and a disparate test data that has yet to be labeled. The input is a sequence of words in the sentence and the output is a sequence of labels. Previous label affects the current label and the current label is independent of the previous ancestors except for the parent. Also the current label is iid. Therefore this is a good candidate for markov chain or a linear CRF. Therefore the goal here is to learn a CRF of the form exponentials of weighted feature functions where the functions range from 1 to K and the inputs range over some window of input sequence. Given only the labeled data, the optimal feature weights are given by the sum of conditional probabilities with respect to a specific set of feature weights and some adjustment based on a trade-off parameter and a regularizer. This we can decide based on training data. The regularizer helps with guiding the learning process. The CRF is based on conditional distributions and not on the actual probabilities of the inputs. So we cannot rely on the untrained data. That is why we use a regularizer and the graph comes in useful as a smoothness regularizer. Subramanya et al use this technique to train the CRF. They take a set of CRF parameters, first compute the marginals over the untrained data which is simply the conditional distribution in this iteration over all the data. The marginals over tokens are then aggregated to the marginals over types, which are used to initialize the graph label distributions. The labels are then propagated. The posteriors from the graph are then used to smooth the state posteriors.
A posterior is a conditional distribution. While the prior represents the uncertainty over our choice parameter before we have sampled the data, the posterior represents the uncertainty over our choice parameter after we have sampled the data and attempted to estimate it. Prior is regular probability while the posterior is conditional probability distribution because it conditions over the observed data.
The unlabeled data then produces a new set of automatic annotations that can be combined with the labeled data to retrain the CRF.
#codingexercise
Given an array integer k, find the maximum of each and every contiguous subarray of size k
List<int> GetMaxForWindows(List<int> nums, int windowSize)
{
var ret = new List<int>();
for (int i =0; i <= nums.Count - windowSize; i++)
{
max = nums[i];
for (int j = 1; j < windowSize; j++)
{
if (nums[i+j] > max)
max = nums[i+j];
}
ret.Add(max);
}
return ret;
}
Let us now look at one instance of this classifier as a tagger.
We start with a set of labeled source domain examples that gives pairs of parts of speech labels to inputs and a disparate test data that has yet to be labeled. The input is a sequence of words in the sentence and the output is a sequence of labels. Previous label affects the current label and the current label is independent of the previous ancestors except for the parent. Also the current label is iid. Therefore this is a good candidate for markov chain or a linear CRF. Therefore the goal here is to learn a CRF of the form exponentials of weighted feature functions where the functions range from 1 to K and the inputs range over some window of input sequence. Given only the labeled data, the optimal feature weights are given by the sum of conditional probabilities with respect to a specific set of feature weights and some adjustment based on a trade-off parameter and a regularizer. This we can decide based on training data. The regularizer helps with guiding the learning process. The CRF is based on conditional distributions and not on the actual probabilities of the inputs. So we cannot rely on the untrained data. That is why we use a regularizer and the graph comes in useful as a smoothness regularizer. Subramanya et al use this technique to train the CRF. They take a set of CRF parameters, first compute the marginals over the untrained data which is simply the conditional distribution in this iteration over all the data. The marginals over tokens are then aggregated to the marginals over types, which are used to initialize the graph label distributions. The labels are then propagated. The posteriors from the graph are then used to smooth the state posteriors.
A posterior is a conditional distribution. While the prior represents the uncertainty over our choice parameter before we have sampled the data, the posterior represents the uncertainty over our choice parameter after we have sampled the data and attempted to estimate it. Prior is regular probability while the posterior is conditional probability distribution because it conditions over the observed data.
The unlabeled data then produces a new set of automatic annotations that can be combined with the labeled data to retrain the CRF.
#codingexercise
Given an array integer k, find the maximum of each and every contiguous subarray of size k
List<int> GetMaxForWindows(List<int> nums, int windowSize)
{
var ret = new List<int>();
for (int i =0; i <= nums.Count - windowSize; i++)
{
max = nums[i];
for (int j = 1; j < windowSize; j++)
{
if (nums[i+j] > max)
max = nums[i+j];
}
ret.Add(max);
}
return ret;
}
No comments:
Post a Comment