Monday, May 22, 2017

#codingexercise
Given a list of data centers and their connectivity information, find the islands of isolated data centers,
e.g. Input A<->B, B<->C, D<->E // Edges
Output : {A,B,C}, {D,E}
A, B, C , D, E

We create an Adjacency List with the given Edges List and Vertices List
A - B
B - A, C
C - B
D - E
E-  D

List<List<Node>> GetIslands(List<Tuple<Node, Node>> pairs, List<Node> vertices)
        {
            var ret = new List<List<Node>>();
            var t = GetAdjacencyList(pairs, vertices);
           
            foreach (var vertex in vertices)
            {
                 var connected = BFS(t, vertex);
                 bool merge = false;
                 for (int i =0;i < ret.Count; i++)
                 {
                    var path = ret[i];
                     if (path.Intersect(connected).ToList().Count > 0){
                     var g = path.Union(connected).ToList<Node>();
                         ret[i] = g.Distinct<Node>().ToList<Node>();
                         merge = true;
                         break;
                     }                     
                 }
                if (merge == false)
                    ret.Add(connected);
            }
           
            return ret;
        }
Hashtable<Node, List<Node>> GetAdjacencyList(List<Tuple<Node, Node>> pairs, List<Node> vertices)
        {
            // build an adjacency list from the edges
            // initialize a list for every vertex
            var t = new Dictionary<Node, List<Node>>();
            foreach (var p in pairs)
            {
                //forward
                if (t.Keys.Contains(p.Item1))
                {
                    t[p.Item1].Add(p.Item2);
                }
                else
                {
                    t.Add(p.Item1, new List<Node>{p.Item2});
                }
                //backward
                if (t.Keys.Contains(p.Item2))
                {
                    t[p.Item2].Add(p.Item1);
                }
                else
                {
                    t.Add(p.Item2, new List<Node>{p.Item1});
                }  
            }
            // Add empty list for every vertex
            var remaining = vertices.Except(t.Keys).ToList();
             foreach (var v in remaining)
                      t.Add(v, new List<Node>());
            return t;
        }

List<Node> BFS(AdjacencyList h, Node root)
{
    var ret = new List<Node>();
    if (root == null) return ret;
    var q = new Queue();
    q.Enqueue(root);
    var current = q.Dequeue();
    while(current)
    {
        foreach(var v in h[current])
        {
            if (ret.Contains(v) == false && v != root){
                ret.Add(v);
                q.Enqueue(v);
            }
        }
        current = q.Dequeue();
    }
    return ret;
}
We can also make some optimizations in the main loop by breaking early when all the vertices are covered.

Sunday, May 21, 2017

#codingexercise
For a given string and an integer k, find if the string can form a palindrome by removing k characters
Bool IsKPalindrome(string s, int k) 
{ 
If (String.IsNullOrEmpty(s) || k < 0 || k > s.Length) return false; 
Void combinations = new List<string>(); 
Var b = new StringBuilder(s.Length); 
for (int i =0; i< s.Length; i++)
b.Append("\0");
Int start = 0; 
Int level = 0; 
Int length = s.Length - k; 
Combine(s, start, level, length, ref b, ref combinations); 
Return combinations.Any(x=>IsPalindrome(x)); 
}
There's also a variation of the problem where we can look for at most k character removal
This is done by comparing the string to its reverse to see if the edit distance based on deletions alone can cost less than or equal to k
int GetKPalin(string str, string rev, int m, int n)
{
if (m == 0) return n;
if (n == 0) return m;
if (str[m-1] == rev[n-1])
   return GetKPalin(str, rev, m-1, n-1);
return 1 + Math.min(GetKPalin(str, rev, m-1, n), GetKPalin(str, rev, m, n-1));
}
bool isKPalin(string str, int k)
{
string rev = str.reverse();
return GetKPalin(str, rev, str.Length, rev.Length) <= k * 2);
}
The costing can also be based on longest common subsequence.
Note that we can also make early exclusions with a frequency table.
For example:
var h = new Hashtable();
for (int i = 0; i < str.Length; i++)
if (h.Contains(str[i])) h[str[i]]++;
else h.Add(str[i], 1);
int pairs = h.Keys.Count(x => h[x]%2 == 0);
int odd = h.Keys.Count(x=>h[x]%2 == 1);
int singles = h.Keys.Count(x=>h[x] == 1);
// we can check conditions for false and return earlier
If order were not important, we can also check for qualifying strings
// if (pairs.Count > 0 && singles.Count <=k &&  odd.Count <=1 ) return true;

Saturday, May 20, 2017

 We were discussing the MicrosoftML rxFastTree algorithm.
 The gradient boost algorithm for rxFastTree is described by Friedman in his paper as possible with several loss functions including the squared loss function
The algorithm for the least squares regression can be written as :
1. Set the initial approximation 
2. For a set of successive increments or boosts each based on the preceding iterations, do
3. Calculate the new residuals
4. Find the line of search by aggregating and minimizing the residuals
5. Perform the boost along the line of search 

6. Repeat 3,4,5 for each of 2.

We now discuss the rxFastForests method:
The rxFastForest is a fast forest algorithm also used for binary classification or regression. It can be used for churn prediction. It builds several decision trees built using the regression tree learner in rxFastTrees. An aggregation over the resulting trees then finds a Gaussian distribution closest to the combined distribution for all trees in the model

with rxFastTree and rxFastForest, we wonder if rxFastGraph is available. It would help to notice that clasification is inherently tree based and mostly statistical.
The Naive Bayes Classifier produces a simple tree model and relates to a logistic regression model. The Hidden Markov Model relates with a linear chain Conditional Random Fields. The General Directed Model relates to General Conditional Random Fields. These models are pictured as tree, linear chains and graphs.
A generative model takes the form of conditional probabilities over the training sample joint distribution. This conditional distribution is a CRF with a particular choice of feature functions in the case of HMM. With generalization, we move from a linear chain factor graph to a more general factor graph. The conditional distribution is now written as a normalization of factors. The noteworthy consideration in a general CRF is the repeated structure and parameter tying. A few methods to specify the repeated structure include dynamic conditional random fields are sequence models which allow multiple labels at each step and that too with dynamic Bayesian networks. Also relational Markov networks allow parameter tying based on a SQL like syntax.

Friday, May 19, 2017

 We were discussing the MicrosoftML rxFastTree algorithm.
 The gradient boost algorithm for rxFastTree is described by Friedman in his paper as possible with several loss functions including the squared loss function. The squared loss function especially helps to serve as a "reality check". The reason to treat it as such a standard is that it comes closest to the stagewise approach of iteratively fitting current residuals in any gradient descent method.
The algorithm for the least squares regression can be written as :
1. Set the initial approximation 
2. For a set of successive increments or boosts each based on the preceding iterations, do
3. Calculate the new residuals
4. Find the line of search by aggregating and minimizing the residuals
5. Perform the boost along the line of search 
6. Repeat 3,4,5 for each of 2.
In a joint distribution of all output and input variables, the goal was to obtain an estimate of the mapping function that minimizes the expected value of the least squared loss function. This was the original non-parametric approach which required numerical optimization in a function space.The evaluation of the mapping function at the data point is itself a parameter we seek to minimize. In a function space, there are an infinite number of such parameter but in the datasets only a finite number is involved. Therefore we take a summation over these mapped approximations as 1 to M. Each such incremental function in this range is then called a "step" or a "boost" in the overall optimization method. In the steepest descent method, we write these incremental functions in the form of multiplier and gradients. The multiplier is given by the line of search which is the minimum at the steepest gradient and is expressed as the expected value of the loss function.
However, Friedman mentions in his paper that this non-parametric approach breaks down when the joint distribution of (y,x) is estimated by a finite data sample. In this case of finite data, the expected value cannot be estimated accurately by its data value at each data point. Moreover the data points are all already training points. So the estimation can only be improved by borrowing strength from the nearby data points by imposing smoothness functions such as the Gaussian. The smoothness can also be achieved with the help of a parameterized form of mapping functions and do parameter optimization one at a time over this set.
If the parameterized form is also not feasible, Friedman recommends a "greedy stagewise" approach. The stagewise approach differs from the stepwise approach above which read the last round of entered terms when new ones are added. This stagewise strategy is also called "matching pursuit" by other researchers because the parameter function set is taken from a wavelet-like dictionary. In machine learning, this translates to a "classification tree" and the approach is called "boosting". Since the convergence is in terms of the best greedy step towards the data-based estimate of the approximation objective, the approach is called "greedy stagewise".  It just happens that when we do this for one parameter function at a time from the parameterized set, the greedy approch translates as the steepest descent under that constraint. The steepest gradient matches as a parallel to the approximation of the parameterized class for a particular member of the parameterized class. The approximation is thus updated along the line of search. In this way we have abandoned the smoothness constraint technique in favor of the constraint applied to the rough solution obtained by fitting the parameterized class to what can now be called "pseudoresponses". Putting it all together in the forward stagewise incremental additive modeling, we have this now overall algorithm of Gradient Boost.

Thursday, May 18, 2017

 We were discussing the MicrosoftML rxFastTree algorithm.
 The gradient boost algorithm for rxFastTree is described by Friedman in his paper as :
1. Describe the problem as a minimization function over a parameterized class of functions
2. For each of the parameterized set from 1 to M do
3. Fit the mapping function to the pseudoresponses by calculating the negative gradient from i = 1 to N
4. find the smoothed negative gradient by using any fitting criterion such as least squares
5. Perform the line search using the constrained negative gradient in steepest descent, we take the one that leads to the minimum
6. Update the approximation by performing a step along the direction of line of search.
The gradient boosting method can be applied to several loss criteria such as the Least-squares, Least-absolute-deviation (LAD), Huber(M) and the logistic binomial log-likelihood(L). The first serves as a reality check, whereas the others lead to a new boosting algorithms.
The pseudoresponses in line 3 is found by using real data of joint distribution y and instance data x and calculating the new distribution based on the adjustment of the existing distribution by applying the previous iteration approximation. This helps with the next step of simply fitting the current residuals in line 4 and the line search in line 5. This sequence of steps 3, 4 and 5 is the usual practice of fitting the current residuals iteratively.
Therefore the least squares regression can be written as :
1. Set the initial approximation 
2. For a set of successive increments or boosts each based on the preceding iterations, do
3. Calculate the new residuals
4. Find the line of search by aggregating and minimizing the residuals
5. Perform the boost along the line of search 
6. Repeat 3,4,5 for each of 2.

#codingexercise
Given an unsorted array of integers rearrange it so that the odd index integers are lesser than even numbered index integers.
void Rearrange(ref List<int> A)
{
A.sort();
for (int i=0; i< A.count-1; i+=2)
{
Swap(A,i,i+1);
}
}

Wednesday, May 17, 2017

we were discussing the rxFastTree algorithm.The rxFastTrees is a fast tree algorithm which is used for binary classification or regression. It can be used for bankruptcy prediction.  It is an implementation of FastRank which is a form of MART gradient boosting algorithm. It builds each regression tree in a step wise fashion using a predefined loss function. The loss function helps to find the error in the current step and fix it in the next. The term boosting is used to denote the improvements in numerical optimization in the function space by correlating it with the steepest descent minimization.  When the individual additive components are regression trees, this boosting is termed TreeBoost. Gradient boosting of regression trees is said to produce competitive, highly robust, interpretable procedures for both regression and classification.
When the mapping function is restricted to be a member of a parameterized class of functions, then it can be represented as a weighted summation of the individual functions in the parameterized set. This is called additive expansion. This technique is very helpful for approximations. With gradient boost, the constraint is applied to the rough solution by fitting the parameterized function set to obtain "pseudoresponses"  This permits the replacement of the difficult minimization problem by the least squares function minimization followed by only a single optimization based on the original criterion. 
Therefore the gradient boost algorithm is described by Friedman as :
1. Describe the problem as a minimization function over a parameterized class of functions
2. For each of the parameterized set from 1 to M do
3. Fit the mapping function to the pseudoresponses by calculating the negative gradient from i = 1 to N
4. find the smoothed negative gradient by using any fitting criterion such as least squares
5. Perform the line search using the constrained negative gradient in steepest descent, we take the one that leads to the minimum
6. Update the approximation by performing a step along the direction of line of search.

Tuesday, May 16, 2017

We were discussing the MicrosoftML rxFastLinear algorithm which is a fast linear model trainer based on the Stochastic Dual Coordinate Ascent method. It combines the capabilities of logistic regressions and  SVM algorithms. Today we discuss the rxFastTree algorithm.The rxFastTrees is a fast tree algorithm which is used for binary classification or regression. It can be used for bankruptcy prediction.  It is an implementation of FastRank which is a form of MART gradient boosting algorithm. It builds each regression tree in a step wise fashion using a predefined loss function. The loss function helps to find the error in the current step and fix it in the next. The term boosting is used to denote the improvements in numerical optimization in the function space by correlating it with the steepest descent minimization.  When the individual additive components are regression trees, this boosting is termed TreeBoost. Gradient boosting of regression trees is said to produce competitive, highly robust, interpretable procedures for both regression and classification.
Generally we start with a system consisting of random output or response variable y using a set of random input or explanatory variables x. Using a training sample of known input and output variables, the goal is to obtain an estimate or approximation of the function that maps x to y that minimizes the expected value of some specified loss function such as squared error or absolute error. One approach is to restrict the mapping function to be a member of a parameterized class of functions. Then the mapping function can be represented as a weighted summation of the individual functions in the parameterized set. This is called additive expansion. This technique is very helpful to many approximation methods. In the steepest descent method, the current gradient is taken as the "line of search" and a step is taken along that direction. 
With gradient boost, the constraint is applied to the rough solution by fitting the parameterized function set to obtain "pseudoresponses"  This permits the replacement of the difficult minimization problem by the least squares function minimization followed by only a single optimization based on the original criterion. As long as a suitable least squares algorithm exists with the parameterized function set, any differentiable loss can then be minimized together with the stagewise additive modeling.
courtesy: Friedman's Gradient Boosting Machine
#codingexercise 
http://collabedit.com/sghvg