Tuesday, June 3, 2014

we will continue with our discussion on graph based random walks today. We saw how we can establish relationships in a bipartisan graph. We saw how to perform random walks. We also saw how to compute costs.
What we will see next is how this translates to keyword extractions. This part is more a conjecture at this point.  We want to reuse the concept of contextual similarity but we want to define the context as not just adjacent words. We don't want to use a local vector space model either because it will be computational ly expensive to do both random walk and VSM clustering. Is there a way to define collocation as something other than Adjacent words and VSM
One suggestion is that we use something like co-occurrence. Collocation is different from co-occurrence.  One defines proximity by boundaries  and the other defines counts and groupings. If we can come up with different groups of words and we find patterns in them such as they tend to repeat, that's cooccurrence. How do we discover different groups and different patterns is closely associated with clustering. 

No comments:

Post a Comment