Wednesday, October 26, 2016

Today we continue our discussion on natural language processing with a particular word ontology called FrameNet. We see how it is different from other ontologies. For example, FrameNet differs from WordNet in the lexical information and is in fact, complimentary in many ways. FrameNet has far fewer nouns, verbs, adjectives and adverbs as compared to WordNet. FrameNet Frame hierarchies provide better generalizations than WordNet synset hierarchies.
FrameNet also differs from PropBank (Proposition Bank discussed by Palmer et al 2005). PropBank began with labeling verbs and their arguments.
FrameNet assigns names such as communicator, evaluee and reason for corresponding PropBank role labels.
There have been other works to represent semantic information such as Abstract Meaning Representation ( Banarescu et al, Law 2013). AMR is a graph based representation of lexical concepts and typed relations between those concepts that are denoted by an English sentence.
However the lexical semantics in FrameNet are much deeper than AMR
What AMR means for Framenet and for our use is that we can represent the Frames in a graph where the relationships are persisted. These edges can then be traversed.  Previously we used graphs with edges established based on similarity measures. Here we establish edges based on the relationships between frames.

#codingexercise
Find the number of islands of 1 in a binary matrix comprising of 0 and 1. In other words, count the number of connected components in an undirected graph.
int GetCountOfIslands(int[,] m, int row, int col)
{
bool visited = new bool[row,col]; // initialized to False
int count = 0;
for (int I = 0; I < row; I ++)
   for (int j = 0; j < col; j++)
       if (m[i.j] == 1 && visited[i,j] == False)
      {
             DFS(m, I, j, visited);
             count++;
      }
 return count;
}
int DFS(int[,] m, int row, int col, bool[,] visited)
{
var radj = new int[] {-1, -1, -1,  0, 0,  1, 1, 1};
var cadj = new int[] {-1,  0,  1, -1, 1, -1, 0, 1};
visited [row,col] = true;
for (int k  = 0; k < 8; k++)
   if (withinMatrixAndNotVisited(m, row + radj[k], col + cadj[k], visited))
      DFS (m, row + radj[k], col + cadj[k], visited);
}
We can also measure the size of each island with a minor modification to above.
Today we continue our discussion on natural language processing with a particular word ontology called FrameNet. We see how it is different from other ontologies. For example, FrameNet differs from WordNet in the lexical information and is in fact, complimentary in many ways. FrameNet has far fewer nouns, verbs, adjectives and adverbs as compared to WordNet. FrameNet Frame hierarchies provide better generalizations than WordNet synset hierarchies.
FrameNet also differs from PropBank (Proposition Bank discussed by Palmer et al 2005). PropBank began with labeling verbs and their arguments.
FrameNet assigns names such as communicator, evaluee and reason for corresponding PropBank role labels.
There have been other works to represent semantic information such as Abstract Meaning Representation ( Banarescu et al, Law 2013). AMR is a graph based representation of lexical concepts and typed relations between those concepts that are denoted by an English sentence.
However the lexical semantics in FrameNet are much deeper than AMR
What AMR means for Framenet and for our use is that we can represent the Frames in a graph where the relationships are persisted. These edges can then be traversed.  Previously we used graphs with edges established based on similarity measures. Here we establish edges based on the relationships between frames.

#codingexercise
Find the number of islands of 1 in a binary matrix comprising of 0 and 1. In other words, count the number of connected components in an undirected graph.
int GetCountOfIslands(int[,] m, int row, int col)
{
bool visited = new bool[row,col]; // initialized to False
int count = 0;
for (int I = 0; I < row; I ++)
   for (int j = 0; j < col; j++)
       if (m[i.j] == 1 && visited[i,j] == False)
      {
             DFS(m, I, j, visited);
             count++;
      }
 return count;
}
int DFS(int[,] m, int row, int col, bool[,] visited)
{
var radj = new int[] {-1, -1, -1,  0, 0,  1, 1, 1};
var cadj = new int[] {-1,  0,  1, -1, 1, -1, 0, 1};
visited [row,col] = true;
for (int k  = 0; k < 8; k++)
   if (withinMatrixAndNotVisited(m, row + radj[k], col + cadj[k], visited))
      DFS (m, row + radj[k], col + cadj[k], visited);
}
Today we continue our discussion on natural language processing with a particular word ontology called FrameNet. We see how it is different from other ontologies. For example, FrameNet differs from WordNet in the lexical information and is in fact, complimentary in many ways. FrameNet has far fewer nouns, verbs, adjectives and adverbs as compared to WordNet. This is because FrameNet has very little to say about common nouns. WordNet has noun hierarchies and encompasses common nouns. WordNet has little syntagmatic information where as FrameNet has a lot.
WordNet separates hiearchy for Nouns, Verbs, Adjectives and Adverbs. Each Frame on the other hand can conver words of any part-of-speech. WordNet has hypernymn or hyponymn relations between synsets. Lexical units come close to synsets but there  are no relations per se between lexical units. However, there are several types of frame relations.  There are also content differences between the two. WordNet has annotated examples showing syntax and semantics of each LU.  FrameNet describes Frame Elements or roles for each frame. FrameNet Frame hierarchies provide better generalizations than WordNet synset hierarchies.
FrameNet also differs from PropBank (Proposition Bank discussed by Palmer et al 2005). PropBank began with labeling verbs and their arguments. PropBank uses Penn POS tags, Penn Tree Bank parse structures. It added nouns and roles from associated verbs. This includes an efficient semantic role labeling system but there is no equivalence to frames in FrameNet. There are two levels of role labels - those that are completely general and those that are specific to lexical unit. The lemmas from text are split into sequences that carry equivalence between PropBank verb specific and FN FE names. FrameNet assigns names such as communicator, evaluee and reason for corresponding PropBank role labels.
There have been other works to represent semantic information such as Abstract Meaning Representation ( Banarescu et al, Law 2013). AMR is a graph based representation of lexical concepts and typed relations between those concepts that are denoted by an English sentence. It integrates several aspects of lexical/relational meaning - abstracting away  from the grammatical details - in a single structure. This structure specifically helps with rapid corpus annotation and data driven NLP. However the lexical semantics in AMR are much shallower than FrameNet
Courtesy : Colin Baker FCSI
In general, it seems to be helpful to have a hash between a term and a corresponding ontology identifier
#codingexercise
Insert into a circular linked list that is sorted but where we are given a random node from the list not the head.
// start is a random node in the sorted circular list
Void insertCircular(ref node start, node target)

{
var head = start;
If (head == null) {head = target; target.next = target; return;}
Node prev = null;
Node curr = head;
Max = int_min;
Var pos = null;
Var prevpos = null;
while(curr.next != head){
If (curr.data > max && cur.data  < target.data ){
Prevpos = prev;
Pos = curr;
Max= curr.data;
}
    Prev = curr;
    Curr = curr.next;
    }

If (curr.data > max && cur.data  < target.data ){
Prevpos = prev;
Pos = curr;
Max= curr.data;

}
 If (pos == null){
    If (head.data < target.data){
         InsertAt(head, target);
    }else{
         InsertAtStart(head, target);
     }
 }else{
   if (pos.data < target.data) {
        InsertAt(pos, target);
   }else{
        InsertAt(prevpos, target);
   }
 }
}
Int rowWithMaximum1sInSortedMatrix(int[,] m)
{
Int max = int_min;
Int index = -1;
For (int i = 0; i < m.rows; i++)
{
int col = findfirst(m, i, 0, m.cols-1, 1);
If (col != -1 && m.cols - col > max)
{
   Max = m.cols -col;
    Index = i;
}
}
Return index;
}
The findfirst method is a binary search method.

Tuesday, October 25, 2016

Today we continue our discussion on natural language processing with a particular word ontology called FrameNet. The idea behind FrameNet is that the definition of a word is incomplete without the relations to conceptual structures and patterns of beliefs, practices, institutions, images etc. which are referred to as semantic frames. - courtesy Filmore 2003. Therefore meanings are relative to Frames. The scope of FrameNet includes Events: for example related to birth or death as noun, Relations: for example being relevant as in adjective or in a personal relationship as in a noun, States: for example on or off as in preposition, and Entities: for example a device. Therefore the words are related to notions that are represented by frames and the meaning is not complete without these relationships. A Semantic Frame is the description of an event, relation, entity or state and its participants thereby capturing the essential knowledge of a given word sense. Roles within the frame are called Frame Elements. Words that evoke the frame are called Lexical units. For example cooking appears in a frame called 'Apply_heat' where the frame elements are the cook, food, container and heating_instrument. and words that evoke it such as fry, bake, boil and broil are lexical units.
FrameNet comprises of over 1195 Semantic frames with approximately 12 frame elements per frame. It includes12,989 lexical units which represent connection between one lemma + POS and one frame. It includes 198,932 manual annotations of corpus examples and about 1774 frame to frame relations. Since the relations involve multiple inheritance, it forms a lattice. It is also constructed bottom-up. It comes with semantic type labels such as animate, human, positive_judgement and assumes construction grammar which describe the grammatical characteristics and semantic import of each construction ( Filmore, Goldman, Rhodes ). It also includes metaphor and metonymy.
FrameNet is not a formal ontology. It includes non-lexical frames that have been made up as needed to connect the nodes in certain cases. It attempts to include cross linguistic differences such as the difference between commercial_transaction versus a criminal_process although both describe methods.
Why is it an improvement over semantic ontologies ?
1) Frame is a useful level of abstraction
2) Roles are more meaningful than semantic labels
3) Frames represent conceptual gestalts - more than just the sum of their parts.

#codingexercise
Insert into a circular linked list that is sorted but where we are given a random node from the list not the head.
Void insertCircular(ref node start, node target)
{
var head = FindHead(ref node start, node target);
If (head == null) {head = target; target.next = target; return;}
Node prev = null;
Node curr = head;
while(curr.data < target.data && curr.next != head){
    Prev = curr;
    Curr = curr.next;
    }
 If (prev == null){
    If (curr.data < target.data){
         InsertAt(cur, target);
    }else{
         InsertAtStart(head, target);
     }
 }else{
   if (curr.data < target.data) {
        InsertAt(cur, target);
   }else{
        InsertAt(prev, target);
   }
 }
}

Node FindHead(ref node start, node target)
{
Node head = null;
if (start == null) return head;
If (start.next == start) return start;
Node curr = start;
min = curr.data;
head = start;
while(curr.next != start){
    if (curr.data < min)
    {
         head = curr;
         min = curr.data;
    }
    curr = curr.next;
}   
 if (curr.data < min)
    {
         head = curr;
         min = curr.data;
    }
return head;
}
Alternatively, we could also insert by iterating the circular list once to find the node that is closest to the given node in its value.

Monday, October 24, 2016

Today we continue our discussion on natural language processing with a particular word ontology called FrameNet.
Framenet is a lexical database maintained by Berkeley.
It has the following features:
It has annotations about how words are  used in actual text.
It can be used like a dictionary.
It has over 170000 annotated sentences describing word usage
It provides a unique training data set
It is both human readable and machine readable.
The data is freely available to download.
The annotations show how words have unique combinations.
FrameNet defines the idea that words have a meaning based on the presence of a semantic frame.  This is a type of event, relation or entity and the participants in it. The example cited is that of cooking. The concept of cooking typically involves a person doing the cooking (cook) and the food that is cooked (food) and something to hold (container) along with a source of heat (heating instrument) This is represented as a frame called 'Apply_heat' where the frame elements are the cook, food, container and heating_instrument. The frame units then naturally fit as superimposed on the words that evoke them.
The frame units help with both semantic and syntactic relationships. As such they are better representations than terms themselves while allowing the models based on those terms to work as before, if not improved with the frame elements.


#codingexercise
Insert a node into a sorted circular linked list
Void insertCircular(ref node head, node target)
{
If (head == null) {head = target; target.next = target; return;}
Node prev = null;
Node curr = head;
while(curr.data < target.data && curr.next != head){
    Prev = curr;
    Curr = curr.next;
    }
 If (prev == null){
    If (curr.data < target.data){
         InsertAt(cur, target);
    }else{
         InsertAtStart(head, target);
     }
 }else{
   if (curr.data < target.data) {
        InsertAt(cur, target);
   }else{
        InsertAt(prev, target);
   }
 }
}


void InsertAt(ref node head, ref target)
{
// Insert After head
var next = head.next;
target.next = next;
head.next = target;

}


Sunday, October 23, 2016

#codingexercise
Print all nodes at a distance k from a given node
void printKDistanceDown(Node root, int k)
{
if (root == null || k < 0) return;
if (k == 0)
   Console.Write(root.data);
printKDistanceDown(root.left, k-1);
printKDistanceDown(root.right, k-1);
}

// returns distance of root from target
int PrintKDistanceNode(Node root, Node target, int k)
{
if (root == null) return -1;
if (root == target){printKDistanceDown(root, k); return 0;}
int dl = printKDistanceNode(root.left, target, k);
if (dl != -1)
{
   if (dl + 1 == k)
       Console.Write(root.data);
   else
       printKDistanceDown(root.right, k -dl-2);
   return 1 + dl;
}
int dr = printKDistanceDown(root.right, target, k);
if (dr != -1)
{
      if (dr +1 ==k)
         Console.Write(root.data);
      else
          printKDistanceDown(root.left, k-dr-2);
      return 1+dr;
}
return -1;
}

2. Given a string, recursively remove adjacent duplicate characters from string. The output string should not have any duplicates.
For eg: abcdaadhhhhhzppzl
becomes abchl
 static String RemoveDuplicate(StringBuilder str, int n)
        {
            for (int m = 0; m < int.MaxValue; m++)
            {
                bool modified = false;
                int k = 0;
                int len = str.ToString().Length;
                if (len == 0) return str.ToString();
                for (int i = 0; i < len; i++)
                {
                    if (k == 0) {
                        str[k] = str[i];
                        k++;
                        continue;
                    }
                    if (str[k] == '\0' || str[i] != str[k-1])
                    {
                        str[k] = str[i];
                        k++;
                    }
                    else
                    {
                        k--;
                        str[k] = '\0';
                        modified = true;
                    }
                }
                if (!modified)
                    break;
                str[k] = '\0';
                k++;
                for (int i = k; i < len; i++)
                    str[i] = '\0';
                Console.WriteLine("val =" + str.ToString());
            }
            return str.ToString();
        }

Saturday, October 22, 2016

#codingexercise
Flatten out a linked list with child and next pointers
void Flatten(Node head)
{
if (head == null) return;
Node temp;
Node tail = head;
while (tail.next)
     tail = tail.next;
Node cur = head;
while ( cur != tail)
{
if (cur.child && cur.child.visited == false)
{
tail.next = cur.child; // this goes to the tail
tmp  = cur.child;  // tail is updated
tmp.visited = true;
while (tmp.next)
  tmp = tmp.next;
  tmp.visited = true;
tail = tmp;
}
cur = cur.next;
}
}
#probability question
A fair die is thrown k times. What is the probability of sum of k throws to be equal to a number n?
void FairThrow(int k, int n, ref int current, ref int sum, ref int count, ref int total)
{
 if (current == k) {
      total += 1;
      if (sum == n) { count +=1;}
      return;
}
 for (int i = 1; i <=6; i++)
 {
         sum+= i;
         current += 1;
         FairThrow( k, n, current, sum, count);
         sum -= i;
         current -= 1;
 }
}
Probability = count/total;