Sunday, October 30, 2016

Today we continue our discussion on natural language processing with a particular word ontology called FrameNet. We see how it is different from other ontologies , such as WordNet, PropBank , AMR etc. While AMR introduced graphical model, it did not have the rich semantic grouping that FrameNet had. Now let us look at a specific example of a model that make use of terms in a graphical data structure to see what FrameNet may bring to it. To do that we first look at one such model that does PoS tagging. A similarity graph is constructed and the labels are propagated.  A classifier is trained on prelabeled data. Then it is used to predict the decode the labels from the target domain. We saw how a standard label propagation technique works.
Let us now look at one instance of this classifier as a tagger.
We start with a set of labeled source domain examples that gives pairs of parts of speech labels to inputs and a disparate test data that has yet to be labeled. The input is a sequence of words in the sentence and the output is a sequence of labels. Previous label affects the current label and the current label is independent of the previous ancestors except for the parent. Also the current label is iid. Therefore this is a good candidate for markov chain or a linear CRF. Therefore the goal here is to learn a CRF of the form exponentials of weighted feature functions where the functions range from 1 to K and the inputs range over some window of input sequence. Given only the labeled data, the optimal feature weights are given by the sum of conditional probabilities with respect to a specific set of feature weights and some adjustment based on a trade-off parameter and a regularizer. This we can decide based on training data. The regularizer helps with guiding the learning process.  The CRF is based on conditional distributions and not on the actual probabilities of the inputs. So we cannot rely on the untrained data. That is why we use a regularizer and the graph comes in useful as a smoothness regularizer. Subramanya et al use this technique to train the CRF. They take a set of CRF parameters, first compute the marginals over the untrained data which is simply the conditional distribution in this iteration over all the data.  The marginals over tokens are then aggregated to the marginals over types, which are used to initialize the graph label distributions. The labels are then propagated. The posteriors from the graph are then used to smooth the state posteriors. 
A posterior is a conditional distribution. While the prior represents the uncertainty over our choice parameter before we have sampled the data, the posterior represents the uncertainty over our choice parameter after we have sampled the data and attempted to estimate it. Prior is regular probability while the posterior is conditional probability distribution because it conditions over the observed data.
The unlabeled data then produces a new set of automatic annotations that can be combined with the labeled data to retrain the CRF.
#codingexercise
Given an array integer k, find the maximum of each and every contiguous subarray of size k
List<int> GetMaxForWindows(List<int> nums, int windowSize)
{
var ret = new List<int>();
for (int i =0; i <= nums.Count - windowSize; i++)
{
max = nums[i];
for (int j = 1; j < windowSize; j++)
{
   if (nums[i+j] > max)
       max = nums[i+j];
}
ret.Add(max);
}
return ret;
}

Saturday, October 29, 2016

Today we continue our discussion on natural language processing with a particular word ontology called FrameNet. We see how it is different from other ontologies , such as WordNet, PropBank , AMR etc. While AMR introduced graphical model, it did not have the rich semantic grouping that FrameNet had. Now let us look at a specific example of a model that make use of terms in a graphical data structure to see what FrameNet may bring to it.  To do that, we consider building a classifier could be build that creates lexical units for FrameNet frames and maps terms or senses to their corresponding FrameNet Frames.
Let us first look at one such model that does POS tagging with the help of graphs. It relies on the assumption that similar words will have similar Part of Speech tags.  A similarity graph is constructed and the labels are propagated.  Data both labeled and unlabeled is represented by vertices  in a graph. Graph edges link vertices that are likely to have the same label. Edge weights determine how strongly the labels agree. Usually the number of labels are small. And even smaller are the labels that are composites. The number of atomic labels may be small but there may be much large number of ways to combine them.  One way to combine them would be to structure them as a tree.
A classifier is trained on prelabeled data. Then it is used to predict the decode the labels from the target domain. The classifier which here can also be called a tagger can be based on any suitable model. For example, it can be based on word matrices. However computations for large matrices do not scale well especially if there are inversions involved. On the other hand conditional random fields are better suited for these purposes. They scale well because they use only conditional probability distributions. Standard inference and learning techniques as well as standard graph propagation technhiques are also scalable.  The building blocks for the CRF inference could contribute to the efficiency.
The label propagation works like this:
Initialize the labels at all nodes in the network. For a given node, the label at start is the initialization label.
Set the iteration to 1
Take the nodes in a random order and assign it to a list
For each node in that list, use a function to assign the labels from all those adjacent to that node from the assignments in the previous iteration. For example, this could be the most frequently occurring label among the adjacencies.
If every node satisfies the chosen function, stop the algorithm or repeat with the next iteration.

#trivia question
What is a keep-alive header ?
The keep-alive header provides connection use policies. It's a hop by hop header that provides information about a persistent connection. It includes a timeout that determines how long a connection can be idle before it is closed. 

Friday, October 28, 2016

Today we continue our discussion on natural language processing with a particular word ontology called FrameNet. We see how it is different from other ontologies , such as WordNet, PropBank etc. One such was AMR which had extensive documentation. It had PropBank predicate-argument semantics. It had name and value entities with entity linking. Entities and events formed coreference.
It included modality, negations and questions. It had relations between nominals. The content words were canonicalized. It would convert adverbs to adjectives to noun to verbs wherever possible. Moreover it represented all of this in a single graph. AMR Assets is a snazzy annotation tool that comprised of 15,000 AMRs which is roughly 270K words released with another 5000 AMRs or 150K words annotated and in the pipeline. AMR differed from FrameNet in that it's purpose was cost effective annotation together with NLP as opposed to FrameNet's lexicography. FrameNet frames was about rich semantic groupings while AMR retained PropBank's lexicalized rolesets. FrameNet annotated by using labeled spans in sentences. AMR was represented by graph where sentences and words  did not directly translate to nodes. A sentence does not have any explicit relationships across frame annotations in FrameNet where as AMR comprised of predicates with shared arguments.
We note here the functionality of the FrameNet and the advantages of the form of AMR. In general, there are many ways to come up with an ontology or pseudo ontology for different purposes, however a graph based data structure where the nodes are not necessarily terms only comes helpful in many cases.  Further a lookup of each term to such graph nodes also helps with directly representing models over the graph nodes instead of terms with more static pre-discovered relationships available for incorporation in the models.

#puzzle
Prove that for every prime p greater than 3, the value p ^ 2 - 1 is a  multiple of 24.
p ^2 -1 can be written as (p-1)(p+1)
Since p is prime both the components above are even. So they must each be divisible by 2.
Further they are consecutive even numbers, therefore they should be divisible by 4 as well. So we have factors 2 x 2 x 2
Next p-1, p and p+1 are three consecutive numbers, hence they must be divisible by 3. We know p is not divisible, therefore this must be p-1 or p+1. This gives us 3 as a factor.
Consequently the factors are 2 x 2 x 2 x 3 = 24 for all p > 3.

#codingexercise
Find power x, n with logarithmic time complexity and limited storage
int power( int x, uint n)
{
int result = 1;
while (n > 0)
{
   if (n & 1)
      result = result * x;
   n = n >> 1; // reduce n by half
   x = x * x;
}
return result;
}



This can be made recursive as well but it wont be storage efficient.

Wednesday, October 26, 2016

Today we continue our discussion on natural language processing with a particular word ontology called FrameNet. We see how it is different from other ontologies. For example, FrameNet differs from WordNet in the lexical information and is in fact, complimentary in many ways. FrameNet has far fewer nouns, verbs, adjectives and adverbs as compared to WordNet. FrameNet Frame hierarchies provide better generalizations than WordNet synset hierarchies.
FrameNet also differs from PropBank (Proposition Bank discussed by Palmer et al 2005). PropBank began with labeling verbs and their arguments.
FrameNet assigns names such as communicator, evaluee and reason for corresponding PropBank role labels.
There have been other works to represent semantic information such as Abstract Meaning Representation ( Banarescu et al, Law 2013). AMR is a graph based representation of lexical concepts and typed relations between those concepts that are denoted by an English sentence.
However the lexical semantics in FrameNet are much deeper than AMR
What AMR means for Framenet and for our use is that we can represent the Frames in a graph where the relationships are persisted. These edges can then be traversed.  Previously we used graphs with edges established based on similarity measures. Here we establish edges based on the relationships between frames.

#codingexercise
Find the number of islands of 1 in a binary matrix comprising of 0 and 1. In other words, count the number of connected components in an undirected graph.
int GetCountOfIslands(int[,] m, int row, int col)
{
bool visited = new bool[row,col]; // initialized to False
int count = 0;
for (int I = 0; I < row; I ++)
   for (int j = 0; j < col; j++)
       if (m[i.j] == 1 && visited[i,j] == False)
      {
             DFS(m, I, j, visited);
             count++;
      }
 return count;
}
int DFS(int[,] m, int row, int col, bool[,] visited)
{
var radj = new int[] {-1, -1, -1,  0, 0,  1, 1, 1};
var cadj = new int[] {-1,  0,  1, -1, 1, -1, 0, 1};
visited [row,col] = true;
for (int k  = 0; k < 8; k++)
   if (withinMatrixAndNotVisited(m, row + radj[k], col + cadj[k], visited))
      DFS (m, row + radj[k], col + cadj[k], visited);
}
We can also measure the size of each island with a minor modification to above.
Today we continue our discussion on natural language processing with a particular word ontology called FrameNet. We see how it is different from other ontologies. For example, FrameNet differs from WordNet in the lexical information and is in fact, complimentary in many ways. FrameNet has far fewer nouns, verbs, adjectives and adverbs as compared to WordNet. FrameNet Frame hierarchies provide better generalizations than WordNet synset hierarchies.
FrameNet also differs from PropBank (Proposition Bank discussed by Palmer et al 2005). PropBank began with labeling verbs and their arguments.
FrameNet assigns names such as communicator, evaluee and reason for corresponding PropBank role labels.
There have been other works to represent semantic information such as Abstract Meaning Representation ( Banarescu et al, Law 2013). AMR is a graph based representation of lexical concepts and typed relations between those concepts that are denoted by an English sentence.
However the lexical semantics in FrameNet are much deeper than AMR
What AMR means for Framenet and for our use is that we can represent the Frames in a graph where the relationships are persisted. These edges can then be traversed.  Previously we used graphs with edges established based on similarity measures. Here we establish edges based on the relationships between frames.

#codingexercise
Find the number of islands of 1 in a binary matrix comprising of 0 and 1. In other words, count the number of connected components in an undirected graph.
int GetCountOfIslands(int[,] m, int row, int col)
{
bool visited = new bool[row,col]; // initialized to False
int count = 0;
for (int I = 0; I < row; I ++)
   for (int j = 0; j < col; j++)
       if (m[i.j] == 1 && visited[i,j] == False)
      {
             DFS(m, I, j, visited);
             count++;
      }
 return count;
}
int DFS(int[,] m, int row, int col, bool[,] visited)
{
var radj = new int[] {-1, -1, -1,  0, 0,  1, 1, 1};
var cadj = new int[] {-1,  0,  1, -1, 1, -1, 0, 1};
visited [row,col] = true;
for (int k  = 0; k < 8; k++)
   if (withinMatrixAndNotVisited(m, row + radj[k], col + cadj[k], visited))
      DFS (m, row + radj[k], col + cadj[k], visited);
}
Today we continue our discussion on natural language processing with a particular word ontology called FrameNet. We see how it is different from other ontologies. For example, FrameNet differs from WordNet in the lexical information and is in fact, complimentary in many ways. FrameNet has far fewer nouns, verbs, adjectives and adverbs as compared to WordNet. This is because FrameNet has very little to say about common nouns. WordNet has noun hierarchies and encompasses common nouns. WordNet has little syntagmatic information where as FrameNet has a lot.
WordNet separates hiearchy for Nouns, Verbs, Adjectives and Adverbs. Each Frame on the other hand can conver words of any part-of-speech. WordNet has hypernymn or hyponymn relations between synsets. Lexical units come close to synsets but there  are no relations per se between lexical units. However, there are several types of frame relations.  There are also content differences between the two. WordNet has annotated examples showing syntax and semantics of each LU.  FrameNet describes Frame Elements or roles for each frame. FrameNet Frame hierarchies provide better generalizations than WordNet synset hierarchies.
FrameNet also differs from PropBank (Proposition Bank discussed by Palmer et al 2005). PropBank began with labeling verbs and their arguments. PropBank uses Penn POS tags, Penn Tree Bank parse structures. It added nouns and roles from associated verbs. This includes an efficient semantic role labeling system but there is no equivalence to frames in FrameNet. There are two levels of role labels - those that are completely general and those that are specific to lexical unit. The lemmas from text are split into sequences that carry equivalence between PropBank verb specific and FN FE names. FrameNet assigns names such as communicator, evaluee and reason for corresponding PropBank role labels.
There have been other works to represent semantic information such as Abstract Meaning Representation ( Banarescu et al, Law 2013). AMR is a graph based representation of lexical concepts and typed relations between those concepts that are denoted by an English sentence. It integrates several aspects of lexical/relational meaning - abstracting away  from the grammatical details - in a single structure. This structure specifically helps with rapid corpus annotation and data driven NLP. However the lexical semantics in AMR are much shallower than FrameNet
Courtesy : Colin Baker FCSI
In general, it seems to be helpful to have a hash between a term and a corresponding ontology identifier
#codingexercise
Insert into a circular linked list that is sorted but where we are given a random node from the list not the head.
// start is a random node in the sorted circular list
Void insertCircular(ref node start, node target)

{
var head = start;
If (head == null) {head = target; target.next = target; return;}
Node prev = null;
Node curr = head;
Max = int_min;
Var pos = null;
Var prevpos = null;
while(curr.next != head){
If (curr.data > max && cur.data  < target.data ){
Prevpos = prev;
Pos = curr;
Max= curr.data;
}
    Prev = curr;
    Curr = curr.next;
    }

If (curr.data > max && cur.data  < target.data ){
Prevpos = prev;
Pos = curr;
Max= curr.data;

}
 If (pos == null){
    If (head.data < target.data){
         InsertAt(head, target);
    }else{
         InsertAtStart(head, target);
     }
 }else{
   if (pos.data < target.data) {
        InsertAt(pos, target);
   }else{
        InsertAt(prevpos, target);
   }
 }
}
Int rowWithMaximum1sInSortedMatrix(int[,] m)
{
Int max = int_min;
Int index = -1;
For (int i = 0; i < m.rows; i++)
{
int col = findfirst(m, i, 0, m.cols-1, 1);
If (col != -1 && m.cols - col > max)
{
   Max = m.cols -col;
    Index = i;
}
}
Return index;
}
The findfirst method is a binary search method.

Tuesday, October 25, 2016

Today we continue our discussion on natural language processing with a particular word ontology called FrameNet. The idea behind FrameNet is that the definition of a word is incomplete without the relations to conceptual structures and patterns of beliefs, practices, institutions, images etc. which are referred to as semantic frames. - courtesy Filmore 2003. Therefore meanings are relative to Frames. The scope of FrameNet includes Events: for example related to birth or death as noun, Relations: for example being relevant as in adjective or in a personal relationship as in a noun, States: for example on or off as in preposition, and Entities: for example a device. Therefore the words are related to notions that are represented by frames and the meaning is not complete without these relationships. A Semantic Frame is the description of an event, relation, entity or state and its participants thereby capturing the essential knowledge of a given word sense. Roles within the frame are called Frame Elements. Words that evoke the frame are called Lexical units. For example cooking appears in a frame called 'Apply_heat' where the frame elements are the cook, food, container and heating_instrument. and words that evoke it such as fry, bake, boil and broil are lexical units.
FrameNet comprises of over 1195 Semantic frames with approximately 12 frame elements per frame. It includes12,989 lexical units which represent connection between one lemma + POS and one frame. It includes 198,932 manual annotations of corpus examples and about 1774 frame to frame relations. Since the relations involve multiple inheritance, it forms a lattice. It is also constructed bottom-up. It comes with semantic type labels such as animate, human, positive_judgement and assumes construction grammar which describe the grammatical characteristics and semantic import of each construction ( Filmore, Goldman, Rhodes ). It also includes metaphor and metonymy.
FrameNet is not a formal ontology. It includes non-lexical frames that have been made up as needed to connect the nodes in certain cases. It attempts to include cross linguistic differences such as the difference between commercial_transaction versus a criminal_process although both describe methods.
Why is it an improvement over semantic ontologies ?
1) Frame is a useful level of abstraction
2) Roles are more meaningful than semantic labels
3) Frames represent conceptual gestalts - more than just the sum of their parts.

#codingexercise
Insert into a circular linked list that is sorted but where we are given a random node from the list not the head.
Void insertCircular(ref node start, node target)
{
var head = FindHead(ref node start, node target);
If (head == null) {head = target; target.next = target; return;}
Node prev = null;
Node curr = head;
while(curr.data < target.data && curr.next != head){
    Prev = curr;
    Curr = curr.next;
    }
 If (prev == null){
    If (curr.data < target.data){
         InsertAt(cur, target);
    }else{
         InsertAtStart(head, target);
     }
 }else{
   if (curr.data < target.data) {
        InsertAt(cur, target);
   }else{
        InsertAt(prev, target);
   }
 }
}

Node FindHead(ref node start, node target)
{
Node head = null;
if (start == null) return head;
If (start.next == start) return start;
Node curr = start;
min = curr.data;
head = start;
while(curr.next != start){
    if (curr.data < min)
    {
         head = curr;
         min = curr.data;
    }
    curr = curr.next;
}   
 if (curr.data < min)
    {
         head = curr;
         min = curr.data;
    }
return head;
}
Alternatively, we could also insert by iterating the circular list once to find the node that is closest to the given node in its value.