Cluster computing

Tuesday, May 23, 2017

Markov networks and probabilistic graphical models

Probabilistic graphical model use graphs to represent relationship between random variables or relationship between random variables and factors. Factors are compatibility functions which are smaller in number than the variable set and give more meaningful information to what a probability distribution represents. A factor graph is constructed as a bipartite graph between the random variables and the factors and with the help of a partition function that normalizes the compatibility functions.

Markov network focuses only on the first part. It is constructed by all pairs of variables that share a local function with the nodes as the random variables. Although Factors and not the distributions, play a role in the connectivity in the Markov Network, the connections in a Markov network is the conditional independence relationships in a multivariate distribution. It is therefore simpler than a factor graph.

When data is available and the random variables are defined, a Markov network can be used with the fit and predict methods. The fit method takes the incoming data as training and learns the parameters from it. The predict method accepts the data as test and predicts all the missing variables using the model.

Markov models make use of conditional random fields. They do not represent dependencies between the variables in a joint distribution. Instead they allow the factors to depend on the entire observation vector without breaking the linear graphical structure. The feature functions can then be examined with all the input variables together. The extension from linear chains to more general graphs follows an expansion of the same relations.

There are several graphical method libraries available to fit and predict probabilistic graphical models. But one package by name pgmpy is gaining popularity because it is implemented in Python. Pgmgpy is cited with an introduction in the reference 1

MicrosoftML packages in python do not seem to cover probabilistic graphical models. Instead they make trees, forests and ensembles available.

Pgmpy provides a way to convert Markov Networks to Bayesian graph models involving both random variables and factors. The reverse conversion is also facilitated. Pgmpy can be extended to add more post-inference methods based on conventional graph operations. These include connected components, degree counting, graph coloring, k-core, label-propagation, pagerank, shortest path, and triangle counting.

Reference:

Pgmpy: Probabilistic Graphical Models using Python – Ankur Ankan, Abhinash Panda

http://conference.scipy.org/proceedings/scipy2015/pdfs/ankur_ankan.pdf

An introduction to Conditional Random Fields - Sutton and McCallum
Turi documentation

#codingexercise

There are N gas stations along a circular route, where the amount of gas at station i is gas[i].

We have a car with an unlimited gas tank and it costs cost[i] of gas to travel from station i to its next station i+1. We can begin the journey with an empty tank at one of the gas stations. Return all the gas stations where we can travel around the circuit once.

public List<int> canCompleteCircuit(List<int>gas, List<int> cost)

{

var ret = new List<int>();

for (int i = 0; i < gas.Count; i++)

{

int reserve = gas[i];

if (CanCompleteTour(gas, cost, i, reserve, (i+1)%gas.Count))

{

ret.Add(i);

Console.WriteLine("can complete Tour starting at {0}", i);

}

Console.WriteLine();

}

return ret;

}

static bool CanCompleteTour(List<int> gas, List<int> cost, int start, int reserve, int i)

{

if (i == start) return true;

if (reserve + gas[i] < cost[i]) return false;

Console.WriteLine("Reserve={0} at {1}", reserve + gas[i] - cost[i], i);

return CanCompleteTour(gas, cost, start, reserve+gas[i]-cost[i], (i+1)%gas.Count);

}
The calculations are linear when we need to find where the reserve is the minimum because the gas[i]-cost[i] will remain the same at i.

Cluster computing

Tuesday, May 23, 2017

No comments:

Post a Comment