Wednesday, October 5, 2016

We were discussing the use of graphs in natural language processing: We said when we have diverse multiple features for a vector that depend on each other, one way to represent them is using conditional random fields.
This comes useful in clustering with word vectors because now we can include features from thesaurus as well. When we represent organizational structure as graphs, we can use the concept of neighbors to determine equivalence. Two vertices are said to have regular equivalence if they are equally related to equivalent others. They are vertices who have neighbors that are themselves similar. In a graph, a vertex is in a class by itself. Vertices that have no other ties with any vertex are in the same class but they have to have a tie with a vertex in the previous class.

The purpose of these equivalence classes is to show the levels of similarity. In the above example we have three levels of equivalence.


We discussed negative sampling in addition to conditional random fields.The conditional random field helped us keep the dependencies of the observed data implicit when associating a graphical framework only with the conditional probabilities. And the observed data input vector could have any  number of heterogenous features such as prefixes, suffixes, word-bigrams.  This comes particularly useful in label propagation and establishing long distance dependencies between labels over the graphical framework. Consequently the thesaurus semantic similarity information can become a feature of the observed vector while allowing the labels to have dependencies with one another.  Using the graphical network and the equivalences we can not only use the labels but also the associated framework. The negative sampling was used to fill out the PMI matrix but now we can use the graphical framework directly for the same relational data.

We apply label propagation on the same graphical model

#codingexercise:
Given the arrival and departure times of trains that stop at a station, find the minimum number of platforms required so that no train waits

int GetMinPlatforms(List<time> arr, List<time> dep)
{
 assert (arr.Count == dep.Count);
 int n = arr.Count;
if (n == 0) return 0;

arr.Sort();
dep.Sort();

int plat= 1;
int max= 1;
int  I  = 1;
int j = 0;

while (I < n && j < n)
{
if (arr[I] < dep[j])
{
  plat++;
  i++;
  if (plat > max)
       max = plat;
}else{
  plat--;
  j++;
}
}
return max;
}

A chef has recently opened a new restaurant with a unique style. The restaurant is divided into K compartments (numbered from 1 to K) and each compartment can be occupied by at most one customer. Each customer that visits the restaurant has a strongly preferred compartment p (1 ? p ? K), and if that compartment is already occupied, then the customer simply leaves. Now obviously, the chef wants to maximize the total number of customers that dine at his restaurant and so he allows (or disallows) certain customers so as to achieve this task. You are to help him with this. Given a list of N customers with their arrival time, departure time and the preferred compartment, you need to calculate the maximum number of customers that can dine at the restaurant

int GetMaxCustomers(List<time>arr, List<time> dep, List<int> comp)
{
assert (arr.Count == dep.Count);
int n = arr.Count
if (n == 0) return 0;
comp.ForEach(x => x = 0); //occupancy
arr.Sort();
dep.Sort();
int customer = 1;
int max = 1;
while (I <n && j <n)
{
if (arr[I] < dep[j])
{
 if (comp[j]) continue;
if (arr[I] < dep[j])
{
  customer++;
  i++;
  comp[j] = 1;
  if (customer > max)
       max = customer;
}else{
  customer--;
  j++;
  comp[j] = 0;
}
}
}
return max;
}

No comments:

Post a Comment