Cluster computing

Wednesday, May 24, 2017

Markov networks and probabilistic graphical models continued
Inference is the step performed both during training and for predicting the labels on new inputs. There are two inference problems 1) during training we compute the marginal distributions over subsets of labels, such as over node marginals and edge marginals. Both these have the same common computations except that we switch to maximization in one instead of the summation in the other. These inference tasks are performed by well known algorithms such as the forward-backward algorithm for computing distributions and Viterbi algorithm for computing the most probable assignment. These algorithms are special cases of general belief propagation algorithms for tree-structured graphical models. For more complex models, approximate inference is necessary. Any inference algorithm for a graphical model is applicable to a CRF with the considerations that it will be called repeatedly for parameter estimation
Pgmpy provides a way to convert Markov Networks to Bayesian graph models involving both random variables and factors. The reverse conversion is also facilitated. Pgmpy can be extended to add more post-inference methods based on conventional graph operations. These include connected components, degree counting, graph coloring, k-core, label-propagation, pagerank, shortest path, and triangle counting.
Out of these methods, perhaps, the pagerank is used more often to selectively filter the results. Since we already calculate the marginals, both for the nodes and for the edges, we already have a graph to calculate the page rank on. PageRank can be found as a sum of two components. The first component represented in the form of a damping factor. The second component is in the summation form of the page ranks of the adjacent vertices each weighted by the inverse of the out-degrees of that vertex. This is said to correspond to the principal eigen vector of the normalized link matrix.
Centrality can also be measured in terms of pagerank once we have a connectivity matrix based on the intra vector cosine similarity.
#codingexercise
Get K Top movies by ratings when their similarity graph is given
List<Movie> GetMovies(Movie movie, int k)
{
Var related = BFS(movie);
Var ret = related.OrderByDescending(x => x.getRating()).ToList();
If (ret.Count < K)
Return ret;
Else
Return ret.GetRange(0,K).ToList();
}
If we were to get the movies by similarity distance alone, we could say :

List<Movie> GetMoviesByDistance(Movie movie, int k)
{
Var connected = BFS(movie);
Var ret = connected.ToList();
If (ret.Count < K){
Return ret;
}
Else{
Return ret.GetRange(0,K).ToList();
}
}

Cluster computing

Wednesday, May 24, 2017

No comments:

Post a Comment