Today we read a survey of using graphs in natural language processing. There are many applications of graph in nlp ranging from summarization, word embedding and syntactic and semantic relationships.
Graphs are directly applicable in syntactic and semantic relations. WordNet is a great example. It's a semantic network that is heavily used for word sense disambiguation, semantic similarity, question answering, and others. There have been efforts to expand or improve this ontology by connecting different domains such as in linkeddata.org. This uses the web to connect related data that wasn't previously linked. It connects data on the Semantic Web using URIs and RDF. Graphs have also come to be different from the traditional nodes and vertices with the introduction of heterogeneous graphs, graphs with multilayered edges etc.
A heterogeneous graph is one where the vertices may correspond to different types of entities and the edges to different types of links between vertices of the same or different type. An example of a heterogeneous graph is one consisting of articles, their authors and the bibliographic references. Edges between authors could correspond to coauthorship/collaboration, citation, edges between authors and their papers represent authorship.
Graphs has been used in unsupervised learning, language identification, part of speech induction or word sense induction. In addition to traditional application of graph in terms of answering queries, where graphs differentiate from other data structures in nlp, is that we now have persisted nodes/edges/features to do tasks such as clustering, and to improve statistical relational learning. for example we can have similarity as edges with measures as weights. Also, conditional random fields (CRF) allow us to directly model the conditional probability distribution p(y|x) which is sufficient for classification
Several different clustering approaches have already been directly applied to nlp. Some of the other most used techniques include min-cut, spectral graph transducer, random walk-based approaches and label propagation. The last one is a method by which we allow the labels of a small set of annotated data to spread in a consistent fashion to unlabeled data.
We start with a word similarity graph. The similarity graph is used during a training of CRF on the target domain. Local sequence contexts in the form of n-grams are represented as graph vertices. The part of speech (POS) of a word occurrence is determined by its local context as can be observed from text. For each n-gram, a set of context features is extracted whose values are the pointwise mutual information (PMI) between the n-gram and its features. Each node is represented by the PMI vectors and the similarity function is a cosine of these vectors. The neighbors of the nodes are used as features for the CRF and this results in a "word embedding" of the larger contextual information in the model. The graph is then used to discover new features, to propagate adjustments to the weights of know features and to train the CRF in a semi -supervised manner.
#codingexercise
Check if a binary tree is a binary search tree
bool isBST(node root)
{
return IsBstHelper(root, INT_MIN, INT_MAX);
}
bool IsBstHelper(node root, int min, int max)
{
if (root==null) return true;
if (root.data < min || root.data> max) return false;
return IsBstHelper(root.left, min, root.data-1) &&
IsBstHelper(root.right, root.data+1, max);
}
Find the largest binary tree
int GetMaxBST(node root)
{
if (isBST(root)) return treesize(root);
return max(GetMaxBST(root.left), GetMaxBST(root.right));
}
Another tree question
void printLeftView(node root, int level, ref int maxlevel)
{
if (root == null) return;
if (maxlevel < level){
console.write(root.data);
return level;
}
printLeftView(root.left, level+1, ref maxlevel);
printLeftView(root.right, level+1, ref maxlevel);
}
Graphs are directly applicable in syntactic and semantic relations. WordNet is a great example. It's a semantic network that is heavily used for word sense disambiguation, semantic similarity, question answering, and others. There have been efforts to expand or improve this ontology by connecting different domains such as in linkeddata.org. This uses the web to connect related data that wasn't previously linked. It connects data on the Semantic Web using URIs and RDF. Graphs have also come to be different from the traditional nodes and vertices with the introduction of heterogeneous graphs, graphs with multilayered edges etc.
A heterogeneous graph is one where the vertices may correspond to different types of entities and the edges to different types of links between vertices of the same or different type. An example of a heterogeneous graph is one consisting of articles, their authors and the bibliographic references. Edges between authors could correspond to coauthorship/collaboration, citation, edges between authors and their papers represent authorship.
Graphs has been used in unsupervised learning, language identification, part of speech induction or word sense induction. In addition to traditional application of graph in terms of answering queries, where graphs differentiate from other data structures in nlp, is that we now have persisted nodes/edges/features to do tasks such as clustering, and to improve statistical relational learning. for example we can have similarity as edges with measures as weights. Also, conditional random fields (CRF) allow us to directly model the conditional probability distribution p(y|x) which is sufficient for classification
Several different clustering approaches have already been directly applied to nlp. Some of the other most used techniques include min-cut, spectral graph transducer, random walk-based approaches and label propagation. The last one is a method by which we allow the labels of a small set of annotated data to spread in a consistent fashion to unlabeled data.
We start with a word similarity graph. The similarity graph is used during a training of CRF on the target domain. Local sequence contexts in the form of n-grams are represented as graph vertices. The part of speech (POS) of a word occurrence is determined by its local context as can be observed from text. For each n-gram, a set of context features is extracted whose values are the pointwise mutual information (PMI) between the n-gram and its features. Each node is represented by the PMI vectors and the similarity function is a cosine of these vectors. The neighbors of the nodes are used as features for the CRF and this results in a "word embedding" of the larger contextual information in the model. The graph is then used to discover new features, to propagate adjustments to the weights of know features and to train the CRF in a semi -supervised manner.
#codingexercise
Check if a binary tree is a binary search tree
bool isBST(node root)
{
return IsBstHelper(root, INT_MIN, INT_MAX);
}
bool IsBstHelper(node root, int min, int max)
{
if (root==null) return true;
if (root.data < min || root.data> max) return false;
return IsBstHelper(root.left, min, root.data-1) &&
IsBstHelper(root.right, root.data+1, max);
}
Find the largest binary tree
int GetMaxBST(node root)
{
if (isBST(root)) return treesize(root);
return max(GetMaxBST(root.left), GetMaxBST(root.right));
}
Another tree question
void printLeftView(node root, int level, ref int maxlevel)
{
if (root == null) return;
if (maxlevel < level){
console.write(root.data);
return level;
}
printLeftView(root.left, level+1, ref maxlevel);
printLeftView(root.right, level+1, ref maxlevel);
}
No comments:
Post a Comment