Monday, July 9, 2018

Natural language processing algorithms are based on co-location weight matrix where the document is taken as a collection of words each of which is represented by a limited dimension feature vector comprising of weights from this weight matrix. 


However we can also consider a semantic matrix which assigns weights to words again based on a similar pattern.
Since each matrix represents different vectors in different feature space, a word may no longer be treated as just a vector but a combination of different vectors. 

This results in a form as superimposed adjaceny matrix.
Correlating feature spaces is like building a regression model. We don’t want to do that but we if we assume equal contributions from each of the hidden matrix layers (collocation, semantics etc) then we can aggregate the weights not just along one plane but also vertically. The caveat is that we now seem to have an expanded union of all features across layers. 
The addition of layers does not take away any of the benefits from the layer involving co-location matrix. Instead, it enables custom layers in the form of tags associated with words. We don't necessarily have to go with the well-known co-location and semantics-based relationships and can use tags for domain specific text. 
The weights across layers of matrices also does not need to be uniform. It can be skewed in favor of customization layer so that domain specific processing can have more levers for tuning.
A vector space has a point of reference and dimensions. Combining vector spaces is not easy but not impossible. SpaceTime is an example. Instead of combining the above method of layered matrices merely assigns them different weights for their representation of the whole. 
Another approach may be to take a single matrix where the association between the words is no longer just co-location probability based but a combination of different heuristics. If we could generalize the associativity from merely collocation, we can still use the same technique of finding the hidden matrix using neural net and softmax classifier. In such cases, collocation, semantics and proprietary tagging becomes such as shades of meanings can become dimensions of the associativity between words.

#codingexercise
Find the minimum number of swaps needed to balance brackets. For example:
[]][][ => 1
[[][]] => 0
int GetCountSwaps(String p)
{
validate(p);
Stack<char> open = new Stack<char>();
Stack<char> close = new Stack<char>();
for (int i = 0; i < p.Length; i++)
{
 if (p[i] == '['){
   open.push('[');
 }
 if (p[i] == ']') {
   if (open.isEmpty() == false) {
       open.pop(); / found a match;
   }else{
     close.push(']');
   }//if
 }//if
}//for
 return Math.Min(close.Count, open.Count);
}
The GeekToGeek solution for this appears incorrect.

No comments:

Post a Comment