Monday, November 28, 2016

#codingexercise
Implement AVL tree insertion
Node Insert(ref Node root, Node z)
{
assert(z);
z.height = -1;
if (!root)
     root = z;
else if (z.data < root.data)
     root.left = Insert(root.left, z);
else
     root.right = Insert(root.right, z);

root.height = 1 + max(height(root.left), height(root.right));
Balance(root);
}

Node Balance(Node root)
{
FixHeight(root); // 1 more than left or right
if (Bfactor(root) == 2) // Bfactor is h(right)-h(left)
{
      if (Bfactor(root.right) < 0)
           root.right = RightRotate(root.right); // returns root.right.left
      return LeftRotate(root);
}
if(BFactor(root) == -2)
{
      if (Bfactor(root.left) > 0)
           root.left = LeftRotate(root.left);
      return RightRotate(root);
}
return root;
}
Courtesy : https://kukuruku.co/hub/cpp/avl-trees, the wrong implementation in GeeksforGeeks and my fixes to that implementation.

// BST Tree insert iterative
TREE-INSERT(ref Node root, Node z)
{
Node y = null;
Node x = root;
while (x != null)
{
     y = x;
     if (z.data < x.data)
        x = x.left;
     else
        x = x.right;
}
if (y == null)
    root = z;
else if (z.data < y.data)
            y.left = z;
       else
            y.right = z;
}

string reversal recursive:
string reverse(string input)
{
var chars = input.ToChars();
var rev = chars.Aggregate((sent, next) =>  next + sent);
return rev.ToString();
}


Sunday, November 27, 2016

We continue discussing the paper "Nested sampling  for general Bayesian computation" by Skilling
We were looking at the transformation of computing the evidence based on the prior mass instead of the parameters. We looked at the integration performed. This paper essentially says don't navigate the parameter space. It is sufficient to explore a likelihood weighted space
The method replaces the point of the lowest likelihood by new one drawn from within the likelihood
It increments the evidence by filling in the missing band with weight w = Xj/N  for each surviving point. The random values in the range 0,1 suffice.to draw the samples within the constrained likelihood contour. This method also calculates the information H during the iterations as a by product.
The method terminates when a certain number of iterations have been performed. This could be modified to stop when the largest current likelihood would not increase the current evidence by more than a small fraction f. This threshold implies that the accumulation of the evidence is tailing off and therefore the sum is nearly  complete.
The termination condition determines the total area found. The areas of the strips start by rising with the likelihood increasing faster than the widths decrease. This means more important regions are being found. When the likelihood flattens off, the areas pass across a maximum and start to fall away. Most of the total area is usually found in the region of this maximum with likelihood about e raised to the power -H.  If e were raised to the power -i/N, the method is set to continue  iterating "until the count i significantly exceeds NH".
#codingexercise

Find if two strings are interleaved in a third string
        static bool isInterleaved(String A, String B, String C)
        {
            int ia = 0;
            int ib = 0;
            for (int i = 0; i < C.Length; i++)
            {
                if (ia < A.Length && C[i] == A[ia])
                {
                    ia++;
                }
                else if (ib < B.Length && C[i] == B[ib])
                {
                    ib++;
                }
                else
                return false;
             
            }
            if (ia != A.Length || ib != B.Length)
                return false;
            return true;
        }

Print numbers vertically
void PrintVeritcally(int n)
{
var digits = n.ToDigits();
digits.ForEach (x => Console.WriteLine(x));
}
static List<int>ToDigits(this int n)
{
var ret = new List<int>();
while(n)
{
int k = i%10;
ret.Add(k);
n = n  / 10;
}
ret.reverse();
return ret;
}

Saturday, November 26, 2016

We continue discussing the paper "Nested sampling  for general Bayesian computation" by Skilling
We were looking at the transformation of computing the evidence based on the prior mass instead of the parameters. We looked at the integration performed. This paper essentially says don't navigate the parameter space. It is sufficient to explore a likelihood weighted space.
The nested sampling procedure is therefore :
Choose the number of classes as N and the number of iterations as j
Record the lowest of the current likelihood values
In each iteration
    set the initial prior to the exponential of a value that depends on iteration sequence
    set the weight to be the difference in previous and the current prior
    increment the evidence based on the strip area
   then replace the point of the lowest likelihood by new one drawn from within the likelihood
Increment the evidence by filling in the missing band with weight w = Xj/N  for each surviving point
There are some salient points to consider in this simplification.
First, only one point is replaced with a new candidate for acceptance and this is the one with the worst likelihood value or prior mass equal to 1
Second random values in the range 0,1 suffice.to draw the samples within the constrained likelihood contour
Third the likelihood contours shrink by factors exp(-1/n) in area and are roughly followed by successive sampling points.
Fourth, we can calculate the information H during the iterations as a by product. It is the fraction of prior mass that contains most of the posterior mass.
Fourth the procedure takes about NH +- sqrt(NH) steps where H is information. The bulk of the posterior mass is reached in the first component and crossed in the second component. This follows from the premise that the individual log t are independent, its mean is -1/N and standard deviation is 1/N. After i steps the prior mass is expected to shring to log Xi which is roughly -(i +- sqrt(i))/N
Since the nested sampling procedure works on weighted sum basis, we could consider combining evidence after  following the procedure for the current batch and then merging the evidence from the next batch with that of the current batch once the same log prior mass scale is aligned.
 #codingexercise
Find if a string is the interleaving of two other strings
        static bool IsInterleaved( StringBuilder A, StringBuilder B, StringBuilder C, int ia, int ib, int ic)
        {
            if (ia == A.Length && ib == B.Length && ic == C.Length) return true;
            if (ic == C.Length) return false;
            if (ia < A.Length && ib < B.Length &&  C[ic] != A[ia] && C[ic] != B[ib]) return false;
            if (ia < A.Length && ib == B.Length && C[ic] != A[ia]) return false;
            if (ia == A.Length && ib < B.Length && C[ic] != B[ib]) return false;
            return (((ia < A.Length && ic < C.Length && C[ic] == A[ia])  && IsInterleaved(A, B, C, ia+1, ib, ic+1)) ||
                    ((ib < B.Length && ic < C.Length && C[ic] == B[ib] && IsInterleaved(A,B,C, ia, ib+1, ic+1))));
        }
 

Friday, November 25, 2016

We continue discussing the paper "Nested sampling  for general Bayesian computation" by Skilling
We were looking at the transformation of computing the evidence based on the prior mass instead of the parameters. Today we look at the integration performed.
The integration is straightforward online because the integration is performed by computing the area in strips under the curve. This area of the strips is the weighted sum of Likelihood values for each of the strips. We know the curve is non-increasing between 0 and 1. Therefore each strip is bounded below by any value larger than prior mass X where the strips are arranged from right to left in m strips. Also the lower bound is given by Xm+1 = 0 and the area of the curve has to be greater than or equal to the sum of weighted Likelihood values where the weights are X intervals. There is also an upper bound with the the integration not exceeding the calculated area (weighted sum) plus the last prior mass times max likelihood value. This follows from the trapezoidal rule where the difference between the area covered by the strips and the area under the curve are triangular  whose sum cannot exceed the last strip with Lmax likelihood
We already know the weighted sum is online. When more and more likelihood values are encountered we add the corresponding weighted sum in the proportion of their classes thereby extending the current to the next evidence. But the more interesting thing here is that the likelihood values are all sorted. Do we need these values to be sorted for the computation of the evidence ? The answer is no as we will shortly see. Before that let us discuss the sampling. The integral for the evidence is dominated by wherever the bulk of the posterior mass is found. It occupies a small fraction of the prior. To see the width of the posterior in terms of X just like we did with prior, let us say the likelihood function is a multivariate in rank C where the coefficients are normalized.  The likelihood can then be written as proportional to the exponential in terms of radius squared. The radius is in C dimensions and with the likelihood function in terms of exponential of r squared, we get the area of the curve as proportional to r raised to the power C because there are C dimensions taken together and the integral translates to log. In other words, the posterior is distributed over a range in log X. In order to cover such a range, the sampling should be linear in log X rather than in X, and then we set X1 = t1, X2 = t1t2,  ... Xi = t1t2..ti, ... Xm = t1t2t3...tm where each ti lies between 0 and 1.  By setting values of successive t, we can then write the evidence in terms of weighted sum again but the weights being represented in terms of t and the likelihood values for each interval.
In practice finding the exact value of t is difficult but we can certainly do with a statistical value. We could do this by finding a new point Xi from the prior subject to the condition that it is smaller than the previous prior and the initial prior being equal to 1. Then our knowledge of the new point would be specified by Xi = ti. Xi-1. To find such a point, we could sample uniformly from the points in the restricted range (0,Xi-1) and then investigate from the original sorted likelihood values what its parameter would have been. But in that parameter space, we would be guided by the likelihood value subject to the constraint that the likelihood value for that chosen parameter is greater than the likelihood of the previous interval with the initial likelihood being equal to zero and in proportion to the prior density. Either way we find a random point and with distribution just the same. Therefore we need not choose the latter method which requires sorting based on the likelihood value and instead perform the sampling based on a random value of t. Thus we avoid sorting. And if we set t to be 0.99, then we can reach the bulk of the posterior in 100 steps of width that is proportional to the fraction of prior mass that contains it instead of taking thousand such steps as is generally the case.
Successive data points in sampling the prior within the box defined by the likelihood greater than the current is done by some Markov chain monte carlo approximation  starting at some candidate that obeys the constraints if available or at the worst found from the previous iteration that lies on the previous likelihood contour in the parameter space. This paper essentially says don't navigate the parameter space. It is sufficient to explore a likelihood weighted space.
The nested sampling procedure is therefore :
Choose the number of classes as N and the number of iterations as j
Record the lowest of the current likelihood values
In each iteration
    set the initial prior to the exponential of a value that depends on iteration sequence
    set the weight to be the difference in previous and the current prior
    increment the evidence based on the strip area
   then replace the point of the lowest likelihood by new one drawn from within the likelihood
Increment the evidence by filling in the missing band with weight w using the chosen point

Notice that this greatly simplified procedure invovles only exponentials and weighted sum.
#codingexercise
Yesterday we discussed
int F(List<int>V, int i, int j)
{
if (i > j || i >= V.Count || j >= V.Count) return 0;
if (i==j) return V[i];
if (i ==j + 1) return max(V[i], V[j]);
return Max(V[i] + min (F(V, i+2,j), F(V, i+1, j-1)), V[j] + min (F(V, i+1, j-1), F(V, i, j-2)));
}

Today we write down a memoized way
static int FMemoized(List<int> V, int n)
{
var table = new int[n,n];
for (int k = 0; k < n; k++)
{
   for (int i = 0, j=k; j <n; i++,j++) // the simultaneous increment is due to the fact that we  use each axes for each component of the max function and update the diagonal as the matrix grows in size.
// also we initialize one of the axis differently because we can skip what we have already computed.
This causes the lower left of the matrix to remain stationary while the upper right to update
   {
            int x = ((i+2) <= j) ? table[i+2,j] : 0;
            int y = ((i+1) <= (j - 1)) ? table[i+1, j-1] : 0;
            int z = (i <= (j-2)) ? table[i,j-2] : 0;

            table[i,j] = Math.Max(V[i] + Math.Min(x,y), V[j] + Math.Min(y,z));
   }
}
return table[0,n-1];
}  

Thursday, November 24, 2016


We continue discussing the paper "Nested sampling  for general Bayesian computation" by Skilling with an emphasis on understanding the transformation of computing evidence over parameters to computing evidence over unit prior mass that helps this technique. With this technique, nested sampling estimates directly how the likelihood function relates to previous value of the discrete random variable. And it can be used to compare two data models for the same set by comparing the conditional probability that is assigned after the relevant evidence is taken into account.
This method directly computes the evidence. This technique simplifies the evidence calculation by not summing over the parameters directly but instead performing it on the cumulated prior mass  that cover likelihood values greater than say lambda. As Lambda increases, the enclosed mass X decreases from one to zero. When we write the inverse function as L(X) where L is the likelihood function and L(X(Lambda)) is equivalent to Lambda, the evidence can be transformed from the integration of Likelihood function L(theta) over elements of prior mass dX = pi(theta)delta-theta where theta is the parameter to the integration of L over the prior mass directly as L(X)dX  Therefore the summation simplifies to a one dimensional integral over unit range
We are able to redefine the evidence from a variation over the parameters theta to a variation over the prior mass X by dividing the unit prior mass into tiny elements and sorting them by likelihood.
Let's picture it this way. We know that the evidence is the area under the curve of a plot between the likelihood and the prior mass. And we know that the likelihood decreases as prior mass increases from 0 to 1.  The evidence is therefore found by integration over this curve. We were able to obtain this simpler transformation by sorting the evidence based on likelihood values and arranging them as tiny strips of width dX and because X is cumulative.
This is clearer to see with an example where we have two dimensional parameter and their likelihood fill a matrix. If we take a four by four grid, there are sixteen likelihood values associated with each cell each of which has a equal prior mass 1/16. Also they are not sorted because they correspond to parameters. But if we sort them linearly and take the average likelihood,  then the likelihood corresponding to X = 1/5  is one fifth along this sorted sequence and consequently the fourth sorted cell out of sixteen and the likelihood can be read at that X can be read directly. These sorted descending values represents the curve  L over X which we integrate to find the evidence.


#codingexercise
Consider a row of n coins of values v1 . . . vn, where n is even. We play a game against an opponent by alternating turns. In each turn, a player selects either the first or last coin from the row, removes it from the row permanently, and receives the value of the coin. Determine the maximum possible amount of money we can definitely win if we move first.
For example in a sequence  8, 15, 3, 7
if we choose  7, the opponent chooses 8 and we choose 15 to win with value 22.
There are two ways to choose in a series i to j :
1) Choose the  ith coin with value Vi and collect the coin left behind as minimum :
min (F(i+2,j), F(i+1, j-1))
2) Choose the jth coin with value Vj and collect the coin left behind as minimum:
Vj + min (F(i+1, j-1), F(i,j-2))

Therefore the solution looks like this:
int F(List<int>V, int i, int j)
{
if (i > j || i >= V.Count || j >= V.Count) return 0;
if (i==j) return V[i];
if (i ==j + 1) return max(V[i], V[j]);
return Max(V[i] + min (F(V, i+2,j), F(V, i+1, j-1)), V[j] + min (F(V, i+1, j-1), F(V, i, j-2)));
}

Wednesday, November 23, 2016

We started discussing the paper "Nested sampling  for general Bayesian computation" by Skilling. Nested sampling estimates directly how the likelihood function relates to previous value of the discrete random variable. It can be used to compare two data models for the same set by comparing the conditional probability that is assigned after the relevant evidence is taken into account.
This method directly computes the evidence.The calculation of the evidence allows different model assumptions to be compared through the ratios of evidence values known as Bayes factors.
This is an improvement over Markov Chain Monte Carlo methods discussed in earlier posts because they yielded a set of samples representing the normalized posterior and the evidence was of secondary importance. Here the evidence is the direct result and  calculated with far greater ease than MCMC methods. The parameters are sorted by their likelihood values and then summed up to give the evidence. Since there are many data points, this method performs nested sampling to simulate the operation statistically. The evidence is then associated with corresponding numerical uncertainity. Skilling states that this nested sampling is Bayesian in nature. Furthermore, with the nested samples, we can estimate the density of states, to obtain samples from the posterior and to quantify arbitrary properties of the parameter.
This technique simplifies the evidence calculation by not summing over the parameters directly but instead performing it on the cumulated prior mass  that cover likelihood values greater than say lambda. As Lambda increases, the enclosed mass decreases from one to zero.  Therefore the summation simplifies to a one dimensional integral over unit range.
This is the primary intuition behind this paper. We will consider to make it online instead of batch.

This method is implemented in Python here and is not part of the standard numpy package yet.
http://www.inference.phy.cam.ac.uk/bayesys/python/mininest.py

#codingexercise
Find the number of BSTs possible for numbers ranging from 1 to n.
For example, n = 2, # = 2
n= 3, #=5
        static List<Node> makeBSTs(int start, int end)
        {
            var items = new List<Node>();
            if (start > end)
            {
                items.Add(null);
                return items;
            }
            for (int i = start; i <= end; i++)
            {
                var left = makeBSTs(start, i - 1);
                var right = makeBSTs(i + 1, end);
                if (left.Count == 0 && right.Count == 0)
                {
                    var node = new Node();
                    node.data = i;
                    node.left = null;
                    node.right = null;
                    items.Add(node);
                    continue;
                }
                if (left.Count == 0)
                {
                    for (int k = 0; k < right.Count; k++)
                    {
                        var rightnode = right[k];
                        var node = new Node();
                        node.data = i;
                        node.left = null;
                        node.right = rightnode;
                        items.Add(node);
                    }
                    continue;
                }
                if (right.Count == 0)
                {
                    for (int k = 0; k < left.Count; k++)
                    {
                        var leftnode = left[k];
                        var node = new Node();
                        node.data = i;
                        node.left = leftnode;
                        node.right = null;
                        items.Add(node);
                    }
                    continue;
                }
                for (int j = 0; j < left.Count; j++)
                {
                    var leftnode = left[j];
                    for (int k = 0; k < right.Count; k++)
                    {
                        var rightnode = right[k];
                        var node = new Node();
                        node.data = i;
                        node.left = leftnode;
                        node.right = rightnode;
                        items.Add(node);
                    }
                    continue;
                }

            }
            return items;
        }

we can verify the BST formed with
foreach( var root in items){
        printBST(root);
        Console.WriteLine();
}
which should print in ascending order
where the printBST is inorder traversal as follows:
void printBST(Node root)
{
if (root)
{
printBST(root.left);
Console.Write("{0} ", root.data);
print.BST(root.right);
}
}
which in this case verifies the above code.

Tuesday, November 22, 2016

We started discussing the paper "Nested sampling  for general Bayesian computation" by Skilling. Nested sampling estimates directly how the likelihood function relates to previous value of the discrete random variable. It can be used to compare two data models for the same set by comparing the conditional probability that is assigned after the relevant evidence is taken into account.
This method computes the marginal likelihood directly by integration. Moreover, we can get samples from the unobserved as conditionals on the observed optionally. The sampling proceeds based on the nested contours of the likelihood function and not on their values. This technique allows the method to overcome the limitations that creep into annealing methods.
We looked at the direct summation performed by this method by taking the Bayes in the form
Likelihood x Prior = Evidence x Posterior. The evidence can then be written as a summation over the prior mass elements.
The calculation of the evidence allows different model assumptions to be compared through the ratios of evidence values known as Bayes factors. This works well even for future models because evidence does not have to be recalculated for the current one. In fact the evidence and the posterior are valuable information for anybody who wants to compare models. The evidence gives a certain information on the strength of the model and the posterior gives how the observed data modulates the prior beliefs. Both of them together gives good information on how the likelihood is constructed but also how it is used.
This is an improvement over Markov Chain Monte Carlo methods discussed in earlier posts because they yielded a set of samples representing the normalized posterior. Here we get evidence as well which was not always easy because it was found as intermediary distributions that bridge prior and posterior. Annealing methods took direct likelihood values as intermediary steps. It was expensive to compute, it was a by-product and it had secondary importance with regard to posterior. This approach reverses the importance and directly calculates the evidence first. Although here too we sort the parameters by their likelihood values, they are summed up over the function to give the evidence. Since there are usually far too many points to do this continuously, sampling is nested to do it in steps.
#codingexercise
Find the lexicographic minimum in a circular array, e.g. for the array BCABDADAB, the lexicographic minimum is ABBCABDAD.
String GetMinRotation(string input)
{
String arr[input.Length];
String combined = input + input;
string min = input;
for (int I = 0; I < input.Length; i++)
{
    String candidate = input.substring(I, input.Length);
    if (candidate < min) 
       min = candidate;
}
return min;
}

Get Count rotation to desired output
int GetCount(string input, string output)
{
String arr[input.Length];
String combined = input + input;
int count = 0;
for (int I = 0; I < input.Length; i++)
{
    String candidate = input.substring(I, input.Length);
    count++;
    if (candidate == output) 
       return count;
}
return -1;
}