Wednesday, May 24, 2017

Markov networks and probabilistic graphical models continued
Inference is the step performed both during training and for predicting the labels on new inputs.  There are two inference problems 1) during training we compute the marginal distributions over subsets of labels, such as over node marginals and edge marginals. Both these have the same common computations except that we switch to maximization in one instead of the summation in the other. These inference tasks are performed by well known algorithms such as the forward-backward algorithm for computing distributions and Viterbi algorithm for computing the most probable assignment. These algorithms are special cases of general belief propagation algorithms for tree-structured graphical models. For more complex models, approximate inference is necessary. Any inference algorithm for a graphical model is applicable to a CRF with the considerations that it will be called repeatedly for parameter estimation
Pgmpy provides a way to convert Markov Networks to Bayesian graph models involving both random variables and factors. The reverse conversion is also facilitated. Pgmpy can be extended to add more post-inference methods based on conventional graph operations. These include connected components, degree counting, graph coloring, k-core, label-propagation, pagerank, shortest path, and triangle counting.  
Out of these methods, perhaps, the pagerank is used more often to selectively filter the results. Since we already calculate the marginals, both for the nodes and for the edges, we already have a graph to calculate the page rank on.  PageRank can be found as a sum of two components. The first component represented in the form of a damping factor. The second component is in the summation form of the page ranks of the adjacent vertices each weighted by the inverse of the out-degrees of that vertex. This is said to correspond to the principal eigen vector of the normalized link matrix. 
Centrality can also be measured in terms of pagerank once we have a connectivity matrix based on the intra vector cosine similarity. 
#codingexercise
Get K Top movies by ratings when their similarity graph is given
List<Movie> GetMovies(Movie movie, int k)
{
Var related = BFS(movie);
Var ret = related.OrderByDescending(x => x.getRating()).ToList();
If (ret.Count < K)
   Return ret;
Else
   Return ret.GetRange(0,K).ToList();
}
If we were to get the movies by similarity distance alone, we could say : 


List<Movie> GetMoviesByDistance(Movie movie, int k)
{
Var connected = BFS(movie);
Var ret = connected.ToList();
If (ret.Count < K){
   Return ret;
}
Else{
   Return ret.GetRange(0,K).ToList();
}
}

Tuesday, May 23, 2017

Markov networks and probabilistic graphical models 
Probabilistic graphical model use graphs to represent relationship between random variables or relationship between random variables and factors. Factors are compatibility functions which are smaller in number than the variable set and give more meaningful information to what a probability distribution represents. A factor graph is constructed as a bipartite graph between the random variables and the factors and with the help of  a partition function that normalizes the compatibility functions. 
Markov network focuses only on the first part. It is constructed by all pairs of variables that share a local function with the nodes as the random variables. Although Factors  and not the distributions, play a role in the connectivity in the Markov Network, the connections in a Markov network is  the conditional independence relationships in a multivariate distribution.  It is therefore simpler than a factor graph. 
When data is available and the random variables are defined, a Markov network can be used with the fit and predict methods.  The fit method takes the incoming data as training and learns the parameters from it. The predict method accepts the data as test and predicts all the missing variables using the model. 
Markov models make use of conditional random fields. They do not represent dependencies between the variables in a joint distribution. Instead they allow the factors to depend on the entire observation vector without breaking the linear graphical structure. The feature functions can then be examined with all the input variables together. The extension from linear chains to more general graphs follows an expansion of the same relations. 
There are several graphical method libraries available to fit and predict probabilistic graphical models. But one package by name pgmpy is gaining popularity because it is implemented in Python. Pgmgpy is cited with an introduction in the reference 1 
MicrosoftML packages in python do not seem to cover probabilistic graphical models. Instead they make trees, forests and ensembles available.  
Pgmpy provides a way to convert Markov Networks to Bayesian graph models involving both random variables and factors. The reverse conversion is also facilitated. Pgmpy can be extended to add more post-inference methods based on conventional graph operations. These include connected components, degree counting, graph coloring, k-core, label-propagation, pagerank, shortest path, and triangle counting.  
Reference: 
  1. Pgmpy: Probabilistic Graphical Models using Python – Ankur AnkanAbhinash Panda 
http://conference.scipy.org/proceedings/scipy2015/pdfs/ankur_ankan.pdf 
  1. An introduction to Conditional Random Fields - Sutton and McCallum
  2. Turi documentation
#codingexercise
There are N gas stations along a circular route, where the amount of gas at station i is gas[i].
We have a car with an unlimited gas tank and it costs cost[i] of gas to travel from station i to its next station i+1. We can begin the journey with an empty tank at one of the gas stations. Return all the gas stations where we can travel around the circuit once.
        public List<int> canCompleteCircuit(List<int>gas, List<int> cost)
        {
            var ret = new List<int>();

            for (int i = 0; i < gas.Count; i++)
            {
                int reserve = gas[i];
                if (CanCompleteTour(gas, cost, i, reserve, (i+1)%gas.Count))
                 {
                    ret.Add(i);
                    Console.WriteLine("can complete Tour starting at {0}", i);
                }
                Console.WriteLine();
            }
           return ret;
        }

        static bool CanCompleteTour(List<int> gas, List<int> cost, int start, int reserve, int i)
        {
            if (i == start) return true;
            if (reserve + gas[i] < cost[i]) return false;
            Console.WriteLine("Reserve={0} at {1}", reserve + gas[i] - cost[i], i);
            return CanCompleteTour(gas, cost, start, reserve+gas[i]-cost[i], (i+1)%gas.Count);
        }
The calculations are linear when we need to find where the reserve is the minimum because the gas[i]-cost[i] will remain the same at i.

Monday, May 22, 2017

#codingexercise
Given a list of data centers and their connectivity information, find the islands of isolated data centers,
e.g. Input A<->B, B<->C, D<->E // Edges
Output : {A,B,C}, {D,E}
A, B, C , D, E

We create an Adjacency List with the given Edges List and Vertices List
A - B
B - A, C
C - B
D - E
E-  D

List<List<Node>> GetIslands(List<Tuple<Node, Node>> pairs, List<Node> vertices)
        {
            var ret = new List<List<Node>>();
            var t = GetAdjacencyList(pairs, vertices);
           
            foreach (var vertex in vertices)
            {
                 var connected = BFS(t, vertex);
                 bool merge = false;
                 for (int i =0;i < ret.Count; i++)
                 {
                    var path = ret[i];
                     if (path.Intersect(connected).ToList().Count > 0){
                     var g = path.Union(connected).ToList<Node>();
                         ret[i] = g.Distinct<Node>().ToList<Node>();
                         merge = true;
                         break;
                     }                     
                 }
                if (merge == false)
                    ret.Add(connected);
            }
           
            return ret;
        }
Hashtable<Node, List<Node>> GetAdjacencyList(List<Tuple<Node, Node>> pairs, List<Node> vertices)
        {
            // build an adjacency list from the edges
            // initialize a list for every vertex
            var t = new Dictionary<Node, List<Node>>();
            foreach (var p in pairs)
            {
                //forward
                if (t.Keys.Contains(p.Item1))
                {
                    t[p.Item1].Add(p.Item2);
                }
                else
                {
                    t.Add(p.Item1, new List<Node>{p.Item2});
                }
                //backward
                if (t.Keys.Contains(p.Item2))
                {
                    t[p.Item2].Add(p.Item1);
                }
                else
                {
                    t.Add(p.Item2, new List<Node>{p.Item1});
                }  
            }
            // Add empty list for every vertex
            var remaining = vertices.Except(t.Keys).ToList();
             foreach (var v in remaining)
                      t.Add(v, new List<Node>());
            return t;
        }

List<Node> BFS(AdjacencyList h, Node root)
{
    var ret = new List<Node>();
    if (root == null) return ret;
    var q = new Queue();
    q.Enqueue(root);
    var current = q.Dequeue();
    while(current)
    {
        foreach(var v in h[current])
        {
            if (ret.Contains(v) == false && v != root){
                ret.Add(v);
                q.Enqueue(v);
            }
        }
        current = q.Dequeue();
    }
    return ret;
}
We can also make some optimizations in the main loop by breaking early when all the vertices are covered.

Sunday, May 21, 2017

#codingexercise
For a given string and an integer k, find if the string can form a palindrome by removing k characters
Bool IsKPalindrome(string s, int k) 
{ 
If (String.IsNullOrEmpty(s) || k < 0 || k > s.Length) return false; 
Void combinations = new List<string>(); 
Var b = new StringBuilder(s.Length); 
for (int i =0; i< s.Length; i++)
b.Append("\0");
Int start = 0; 
Int level = 0; 
Int length = s.Length - k; 
Combine(s, start, level, length, ref b, ref combinations); 
Return combinations.Any(x=>IsPalindrome(x)); 
}
There's also a variation of the problem where we can look for at most k character removal
This is done by comparing the string to its reverse to see if the edit distance based on deletions alone can cost less than or equal to k
int GetKPalin(string str, string rev, int m, int n)
{
if (m == 0) return n;
if (n == 0) return m;
if (str[m-1] == rev[n-1])
   return GetKPalin(str, rev, m-1, n-1);
return 1 + Math.min(GetKPalin(str, rev, m-1, n), GetKPalin(str, rev, m, n-1));
}
bool isKPalin(string str, int k)
{
string rev = str.reverse();
return GetKPalin(str, rev, str.Length, rev.Length) <= k * 2);
}
The costing can also be based on longest common subsequence.
Note that we can also make early exclusions with a frequency table.
For example:
var h = new Hashtable();
for (int i = 0; i < str.Length; i++)
if (h.Contains(str[i])) h[str[i]]++;
else h.Add(str[i], 1);
int pairs = h.Keys.Count(x => h[x]%2 == 0);
int odd = h.Keys.Count(x=>h[x]%2 == 1);
int singles = h.Keys.Count(x=>h[x] == 1);
// we can check conditions for false and return earlier
If order were not important, we can also check for qualifying strings
// if (pairs.Count > 0 && singles.Count <=k &&  odd.Count <=1 ) return true;

Saturday, May 20, 2017

 We were discussing the MicrosoftML rxFastTree algorithm.
 The gradient boost algorithm for rxFastTree is described by Friedman in his paper as possible with several loss functions including the squared loss function
The algorithm for the least squares regression can be written as :
1. Set the initial approximation 
2. For a set of successive increments or boosts each based on the preceding iterations, do
3. Calculate the new residuals
4. Find the line of search by aggregating and minimizing the residuals
5. Perform the boost along the line of search 

6. Repeat 3,4,5 for each of 2.

We now discuss the rxFastForests method:
The rxFastForest is a fast forest algorithm also used for binary classification or regression. It can be used for churn prediction. It builds several decision trees built using the regression tree learner in rxFastTrees. An aggregation over the resulting trees then finds a Gaussian distribution closest to the combined distribution for all trees in the model

with rxFastTree and rxFastForest, we wonder if rxFastGraph is available. It would help to notice that clasification is inherently tree based and mostly statistical.
The Naive Bayes Classifier produces a simple tree model and relates to a logistic regression model. The Hidden Markov Model relates with a linear chain Conditional Random Fields. The General Directed Model relates to General Conditional Random Fields. These models are pictured as tree, linear chains and graphs.
A generative model takes the form of conditional probabilities over the training sample joint distribution. This conditional distribution is a CRF with a particular choice of feature functions in the case of HMM. With generalization, we move from a linear chain factor graph to a more general factor graph. The conditional distribution is now written as a normalization of factors. The noteworthy consideration in a general CRF is the repeated structure and parameter tying. A few methods to specify the repeated structure include dynamic conditional random fields are sequence models which allow multiple labels at each step and that too with dynamic Bayesian networks. Also relational Markov networks allow parameter tying based on a SQL like syntax.