Friday, November 25, 2016

We continue discussing the paper "Nested sampling  for general Bayesian computation" by Skilling
We were looking at the transformation of computing the evidence based on the prior mass instead of the parameters. Today we look at the integration performed.
The integration is straightforward online because the integration is performed by computing the area in strips under the curve. This area of the strips is the weighted sum of Likelihood values for each of the strips. We know the curve is non-increasing between 0 and 1. Therefore each strip is bounded below by any value larger than prior mass X where the strips are arranged from right to left in m strips. Also the lower bound is given by Xm+1 = 0 and the area of the curve has to be greater than or equal to the sum of weighted Likelihood values where the weights are X intervals. There is also an upper bound with the the integration not exceeding the calculated area (weighted sum) plus the last prior mass times max likelihood value. This follows from the trapezoidal rule where the difference between the area covered by the strips and the area under the curve are triangular  whose sum cannot exceed the last strip with Lmax likelihood
We already know the weighted sum is online. When more and more likelihood values are encountered we add the corresponding weighted sum in the proportion of their classes thereby extending the current to the next evidence. But the more interesting thing here is that the likelihood values are all sorted. Do we need these values to be sorted for the computation of the evidence ? The answer is no as we will shortly see. Before that let us discuss the sampling. The integral for the evidence is dominated by wherever the bulk of the posterior mass is found. It occupies a small fraction of the prior. To see the width of the posterior in terms of X just like we did with prior, let us say the likelihood function is a multivariate in rank C where the coefficients are normalized.  The likelihood can then be written as proportional to the exponential in terms of radius squared. The radius is in C dimensions and with the likelihood function in terms of exponential of r squared, we get the area of the curve as proportional to r raised to the power C because there are C dimensions taken together and the integral translates to log. In other words, the posterior is distributed over a range in log X. In order to cover such a range, the sampling should be linear in log X rather than in X, and then we set X1 = t1, X2 = t1t2,  ... Xi = t1t2..ti, ... Xm = t1t2t3...tm where each ti lies between 0 and 1.  By setting values of successive t, we can then write the evidence in terms of weighted sum again but the weights being represented in terms of t and the likelihood values for each interval.
In practice finding the exact value of t is difficult but we can certainly do with a statistical value. We could do this by finding a new point Xi from the prior subject to the condition that it is smaller than the previous prior and the initial prior being equal to 1. Then our knowledge of the new point would be specified by Xi = ti. Xi-1. To find such a point, we could sample uniformly from the points in the restricted range (0,Xi-1) and then investigate from the original sorted likelihood values what its parameter would have been. But in that parameter space, we would be guided by the likelihood value subject to the constraint that the likelihood value for that chosen parameter is greater than the likelihood of the previous interval with the initial likelihood being equal to zero and in proportion to the prior density. Either way we find a random point and with distribution just the same. Therefore we need not choose the latter method which requires sorting based on the likelihood value and instead perform the sampling based on a random value of t. Thus we avoid sorting. And if we set t to be 0.99, then we can reach the bulk of the posterior in 100 steps of width that is proportional to the fraction of prior mass that contains it instead of taking thousand such steps as is generally the case.
Successive data points in sampling the prior within the box defined by the likelihood greater than the current is done by some Markov chain monte carlo approximation  starting at some candidate that obeys the constraints if available or at the worst found from the previous iteration that lies on the previous likelihood contour in the parameter space. This paper essentially says don't navigate the parameter space. It is sufficient to explore a likelihood weighted space.
The nested sampling procedure is therefore :
Choose the number of classes as N and the number of iterations as j
Record the lowest of the current likelihood values
In each iteration
    set the initial prior to the exponential of a value that depends on iteration sequence
    set the weight to be the difference in previous and the current prior
    increment the evidence based on the strip area
   then replace the point of the lowest likelihood by new one drawn from within the likelihood
Increment the evidence by filling in the missing band with weight w using the chosen point

Notice that this greatly simplified procedure invovles only exponentials and weighted sum.
#codingexercise
Yesterday we discussed
int F(List<int>V, int i, int j)
{
if (i > j || i >= V.Count || j >= V.Count) return 0;
if (i==j) return V[i];
if (i ==j + 1) return max(V[i], V[j]);
return Max(V[i] + min (F(V, i+2,j), F(V, i+1, j-1)), V[j] + min (F(V, i+1, j-1), F(V, i, j-2)));
}

Today we write down a memoized way
static int FMemoized(List<int> V, int n)
{
var table = new int[n,n];
for (int k = 0; k < n; k++)
{
   for (int i = 0, j=k; j <n; i++,j++) // the simultaneous increment is due to the fact that we  use each axes for each component of the max function and update the diagonal as the matrix grows in size.
// also we initialize one of the axis differently because we can skip what we have already computed.
This causes the lower left of the matrix to remain stationary while the upper right to update
   {
            int x = ((i+2) <= j) ? table[i+2,j] : 0;
            int y = ((i+1) <= (j - 1)) ? table[i+1, j-1] : 0;
            int z = (i <= (j-2)) ? table[i,j-2] : 0;

            table[i,j] = Math.Max(V[i] + Math.Min(x,y), V[j] + Math.Min(y,z));
   }
}
return table[0,n-1];
}  

Thursday, November 24, 2016


We continue discussing the paper "Nested sampling  for general Bayesian computation" by Skilling with an emphasis on understanding the transformation of computing evidence over parameters to computing evidence over unit prior mass that helps this technique. With this technique, nested sampling estimates directly how the likelihood function relates to previous value of the discrete random variable. And it can be used to compare two data models for the same set by comparing the conditional probability that is assigned after the relevant evidence is taken into account.
This method directly computes the evidence. This technique simplifies the evidence calculation by not summing over the parameters directly but instead performing it on the cumulated prior mass  that cover likelihood values greater than say lambda. As Lambda increases, the enclosed mass X decreases from one to zero. When we write the inverse function as L(X) where L is the likelihood function and L(X(Lambda)) is equivalent to Lambda, the evidence can be transformed from the integration of Likelihood function L(theta) over elements of prior mass dX = pi(theta)delta-theta where theta is the parameter to the integration of L over the prior mass directly as L(X)dX  Therefore the summation simplifies to a one dimensional integral over unit range
We are able to redefine the evidence from a variation over the parameters theta to a variation over the prior mass X by dividing the unit prior mass into tiny elements and sorting them by likelihood.
Let's picture it this way. We know that the evidence is the area under the curve of a plot between the likelihood and the prior mass. And we know that the likelihood decreases as prior mass increases from 0 to 1.  The evidence is therefore found by integration over this curve. We were able to obtain this simpler transformation by sorting the evidence based on likelihood values and arranging them as tiny strips of width dX and because X is cumulative.
This is clearer to see with an example where we have two dimensional parameter and their likelihood fill a matrix. If we take a four by four grid, there are sixteen likelihood values associated with each cell each of which has a equal prior mass 1/16. Also they are not sorted because they correspond to parameters. But if we sort them linearly and take the average likelihood,  then the likelihood corresponding to X = 1/5  is one fifth along this sorted sequence and consequently the fourth sorted cell out of sixteen and the likelihood can be read at that X can be read directly. These sorted descending values represents the curve  L over X which we integrate to find the evidence.


#codingexercise
Consider a row of n coins of values v1 . . . vn, where n is even. We play a game against an opponent by alternating turns. In each turn, a player selects either the first or last coin from the row, removes it from the row permanently, and receives the value of the coin. Determine the maximum possible amount of money we can definitely win if we move first.
For example in a sequence  8, 15, 3, 7
if we choose  7, the opponent chooses 8 and we choose 15 to win with value 22.
There are two ways to choose in a series i to j :
1) Choose the  ith coin with value Vi and collect the coin left behind as minimum :
min (F(i+2,j), F(i+1, j-1))
2) Choose the jth coin with value Vj and collect the coin left behind as minimum:
Vj + min (F(i+1, j-1), F(i,j-2))

Therefore the solution looks like this:
int F(List<int>V, int i, int j)
{
if (i > j || i >= V.Count || j >= V.Count) return 0;
if (i==j) return V[i];
if (i ==j + 1) return max(V[i], V[j]);
return Max(V[i] + min (F(V, i+2,j), F(V, i+1, j-1)), V[j] + min (F(V, i+1, j-1), F(V, i, j-2)));
}

Wednesday, November 23, 2016

We started discussing the paper "Nested sampling  for general Bayesian computation" by Skilling. Nested sampling estimates directly how the likelihood function relates to previous value of the discrete random variable. It can be used to compare two data models for the same set by comparing the conditional probability that is assigned after the relevant evidence is taken into account.
This method directly computes the evidence.The calculation of the evidence allows different model assumptions to be compared through the ratios of evidence values known as Bayes factors.
This is an improvement over Markov Chain Monte Carlo methods discussed in earlier posts because they yielded a set of samples representing the normalized posterior and the evidence was of secondary importance. Here the evidence is the direct result and  calculated with far greater ease than MCMC methods. The parameters are sorted by their likelihood values and then summed up to give the evidence. Since there are many data points, this method performs nested sampling to simulate the operation statistically. The evidence is then associated with corresponding numerical uncertainity. Skilling states that this nested sampling is Bayesian in nature. Furthermore, with the nested samples, we can estimate the density of states, to obtain samples from the posterior and to quantify arbitrary properties of the parameter.
This technique simplifies the evidence calculation by not summing over the parameters directly but instead performing it on the cumulated prior mass  that cover likelihood values greater than say lambda. As Lambda increases, the enclosed mass decreases from one to zero.  Therefore the summation simplifies to a one dimensional integral over unit range.
This is the primary intuition behind this paper. We will consider to make it online instead of batch.

This method is implemented in Python here and is not part of the standard numpy package yet.
http://www.inference.phy.cam.ac.uk/bayesys/python/mininest.py

#codingexercise
Find the number of BSTs possible for numbers ranging from 1 to n.
For example, n = 2, # = 2
n= 3, #=5
        static List<Node> makeBSTs(int start, int end)
        {
            var items = new List<Node>();
            if (start > end)
            {
                items.Add(null);
                return items;
            }
            for (int i = start; i <= end; i++)
            {
                var left = makeBSTs(start, i - 1);
                var right = makeBSTs(i + 1, end);
                if (left.Count == 0 && right.Count == 0)
                {
                    var node = new Node();
                    node.data = i;
                    node.left = null;
                    node.right = null;
                    items.Add(node);
                    continue;
                }
                if (left.Count == 0)
                {
                    for (int k = 0; k < right.Count; k++)
                    {
                        var rightnode = right[k];
                        var node = new Node();
                        node.data = i;
                        node.left = null;
                        node.right = rightnode;
                        items.Add(node);
                    }
                    continue;
                }
                if (right.Count == 0)
                {
                    for (int k = 0; k < left.Count; k++)
                    {
                        var leftnode = left[k];
                        var node = new Node();
                        node.data = i;
                        node.left = leftnode;
                        node.right = null;
                        items.Add(node);
                    }
                    continue;
                }
                for (int j = 0; j < left.Count; j++)
                {
                    var leftnode = left[j];
                    for (int k = 0; k < right.Count; k++)
                    {
                        var rightnode = right[k];
                        var node = new Node();
                        node.data = i;
                        node.left = leftnode;
                        node.right = rightnode;
                        items.Add(node);
                    }
                    continue;
                }

            }
            return items;
        }

we can verify the BST formed with
foreach( var root in items){
        printBST(root);
        Console.WriteLine();
}
which should print in ascending order
where the printBST is inorder traversal as follows:
void printBST(Node root)
{
if (root)
{
printBST(root.left);
Console.Write("{0} ", root.data);
print.BST(root.right);
}
}
which in this case verifies the above code.

Tuesday, November 22, 2016

We started discussing the paper "Nested sampling  for general Bayesian computation" by Skilling. Nested sampling estimates directly how the likelihood function relates to previous value of the discrete random variable. It can be used to compare two data models for the same set by comparing the conditional probability that is assigned after the relevant evidence is taken into account.
This method computes the marginal likelihood directly by integration. Moreover, we can get samples from the unobserved as conditionals on the observed optionally. The sampling proceeds based on the nested contours of the likelihood function and not on their values. This technique allows the method to overcome the limitations that creep into annealing methods.
We looked at the direct summation performed by this method by taking the Bayes in the form
Likelihood x Prior = Evidence x Posterior. The evidence can then be written as a summation over the prior mass elements.
The calculation of the evidence allows different model assumptions to be compared through the ratios of evidence values known as Bayes factors. This works well even for future models because evidence does not have to be recalculated for the current one. In fact the evidence and the posterior are valuable information for anybody who wants to compare models. The evidence gives a certain information on the strength of the model and the posterior gives how the observed data modulates the prior beliefs. Both of them together gives good information on how the likelihood is constructed but also how it is used.
This is an improvement over Markov Chain Monte Carlo methods discussed in earlier posts because they yielded a set of samples representing the normalized posterior. Here we get evidence as well which was not always easy because it was found as intermediary distributions that bridge prior and posterior. Annealing methods took direct likelihood values as intermediary steps. It was expensive to compute, it was a by-product and it had secondary importance with regard to posterior. This approach reverses the importance and directly calculates the evidence first. Although here too we sort the parameters by their likelihood values, they are summed up over the function to give the evidence. Since there are usually far too many points to do this continuously, sampling is nested to do it in steps.
#codingexercise
Find the lexicographic minimum in a circular array, e.g. for the array BCABDADAB, the lexicographic minimum is ABBCABDAD.
String GetMinRotation(string input)
{
String arr[input.Length];
String combined = input + input;
string min = input;
for (int I = 0; I < input.Length; i++)
{
    String candidate = input.substring(I, input.Length);
    if (candidate < min) 
       min = candidate;
}
return min;
}

Get Count rotation to desired output
int GetCount(string input, string output)
{
String arr[input.Length];
String combined = input + input;
int count = 0;
for (int I = 0; I < input.Length; i++)
{
    String candidate = input.substring(I, input.Length);
    count++;
    if (candidate == output) 
       return count;
}
return -1;
}

Monday, November 21, 2016

Today we start discussing the paper "Nested sampling  for general Bayesian computation" by Skilling. Nested sampling estimates directly how the likelihood function relates to previous value of the discrete random variable. It can be used to compare two data models for the same set by comparing the conditional probability that is assigned after the relevant evidence is taken into account.
This has several advantages. First, this method computes the marginal likelihood directly by integration. Moreover samples from the distribution of the unobserved observations as conditionals on observed data can also be obtained optionally. This method relies on sampling within a hard constraint on likelihood value as opposed to the softened likelihood  of annealing methods. The sampling proceeds based on the shape of nested contours of likelihood, and not on the likelihood values. This technique allows the method to overcome the limitations that creep into annealing methods.
From the Bayes theorem, we often write that a product form of the model as
Likelihood x Prior  = Evidence x Posterior
which are expressed using parameters of the model.
The likelihood is the probability of the acquired data given the parameters and the model assumptions.
The Prior represents the uncertainity over the unknown parameters given the model assumptions and is taken before we have sampled any data and estimated it.
The posterior represents the uncertainity over the unknown parameters after the data has been sampled. The posterior therefore involves the sampled data D which was not considered in the prior.  The prior and the posterior lets us start with a few beliefs about the world, interact with it and then update the beliefs The computation of the posterior with sampled data is in fact an update of our beliefs about the world. The posterior is a conditional distribution on the sampled data which lets us modulate the prior. The prior and the posterior are usually normalized to unit total. With the likelihood function and the beliefs, we can now estimate the marginal likelihoods from the observed data and their probabilities.
When the equation is written in the product form, it lets us find the evidence as  a summation over the prior mass elements.
#codingexercise
Find the maximum area of a rectangle under the histogram of unit widths
public static int MaxArea(ref List<int> h)
{
            if (h == null || h.Length <= 0) return 0;
            var areas = new List<int>();
            for (int i = 0; i < h.Length; i++)
            {
                areas.add(i, h[i]);
                for (int k = i+1; k < h.Length; k++)
                   {
                     if (h[k] >= h[i])
                         areas[i] += h[i];
                     else
                     {                    
                         break;
                     }
                   }          
            }
            return areas.Max();
}
#Find max area under the histogram 
Int MaxAreaOfARectangle(List<int>histogram) 
{ 
Int max = 0; 
For(int I = 0; I < histogram.count; i++){ 
Int area = GetArea(histogram,i); 
If (area > max) 
    max = area; 
} 
Return max; 
} 
Int GetArea(List<int>histogram, int center) 
{ 
Int area = histogram[center]*1; 
For (int I = center-1; 
         i>=0 && histogram[i] > histogram[center]; i++){ 
         area += histogram[center]*I; 
} 
For (int I = center+1; 
         I<histogram[count] && histogram[i] > histogram[center]; i++){ 
         area += histogram[center]*I; 
} 
return area; 
} 

Alternatively,
Int MaxArea (ref int [] h, int start, int end, ref int min)
{
If (start == end)
{
min = h [start];
return min × 1;
}
If (start < end)
{
Int mid = (start + end)/2;
Int minleft = 0;
Int minright = 0;
Int left = MaxArea (c, ref h, start, mid, ref minleft);
Int right = MaxArea (c,ref h, mid +1, end, ref minright);
min = min (minleft, minright) ;
Int minArea= min × (end-start+1);
Return max (left,right, minArea);
}
Return 0;
}

Sunday, November 20, 2016

Today we continue discussing websockets. We said it facilitates duplex communication and is independent of http and we showed how both the client and the server can be both producer  and consumer.
 The frontend updates the page with the statistics on each event that an event listener on the page receives. The frontend is different from the clients that gather the statistics per machine usually with a collection tool. Each client is responsible for its own host.l
The client is in an endless loop, gathering data, publishing data and waiting for some time. The publishing is done with a post request to the server at the endpoint corresponding to the model used by the server for the client.
The server is written in django. It uses signals for creating and destroying model objects pertaining to the client statistics.  Whenever a stats needs to be saved, using signals, the server then notifies the endpoint set up by crossbar.io which dispatches a  a pubsub event by Crossbar.io with a POST request to the endpoint at path say '/notify'.
 The HTTP requests and the publisher-subscriber events on the websockets are not mutually exclusive. They are just complimentary to each other. We would have a restrictive polling mechanism if there was no pub-sub associated with the architecture.
To summarize:
Frontend:
register an event listener
Client:
publish to server
Server:
notify the subscriber
#codingexercise
We were referring to an example to find the next palindrome of a number with very large count of digits in the exercises earlier. In order to find the palindrome that is next, we need to be able to compares such numbers. Here is one way to do it which relies on the notion of a .Net comparator.
        static int compareTo(List<int> current, List<int> digits)
        {
            int ret = 0;
            if (current.Count < digits.Count) return -1;
            if (current.Count > digits.Count) return 1;
            for (int i = 0; i < current.Count && i < digits.Count; i++)
            {
                if (current[i] < digits[i])
                    return -1;
                if (current[i] > digits[i])
                    return 1;
            }
            return ret;
        }

Given a string, its value is defined as the sum of the squares of the frequencies of its letters. Find the minimum value after removing k characters

static int GetValue(string input, int k)
{
var freq = new Hashtable();
for (int I =0; I < input.Length; i++)
freq[I]++;
for (int j = 0; j < k; j++)
{
 char candidate = freq.Max();// key associated with the max value
 if (candidate)
{
    freq[candidate]--;
    if (freq[candidate] == 0)
         freq.Remove(candidate);
}
}
int val = 0;
for (int I = 0; I < freq.keys.count; i++)
{
val += freq[freq.keys[I]] ^ 2;
}
return val;
}

Saturday, November 19, 2016

Yesterday, we were discussing websockets. We said it facilitates duplex communication and is independent of http. As an example use case, we considered displaying real time monitoring of cpu, memory metrics on a dashboard using a python django server. the clients publish ata from each monitored machine to a central django server that in turn publishes to the frontend. the frontend updates on each new statistic because it listens for those events with say Autobahn. Here the difference is highlighted from the traditional model where the front-end pulls the information from a time series database by running a query every once in a while.
The frontend therefore looks something like this:
<code>
      window.addEventListener("load", function(){
        var connection = new autobahn.Connection({
           url: 'ws://127.0.0.1:8080/ws',
           realm: 'realm1'
        });
        connection.onopen = function(session) {
          var clients = document.getElementById("clients");
          session.subscribe('clientstats', function(args){
            var stats = args[0];
            var serverNode = document.getElementById(stats.ip);
            // update
            }
          });
          }
         });
        connection.open();
</code>

The clients push the code and this can be done with statsdclient which runs on every source of metrics.
The server however is different from traditional.
@receiver(post_save, sender=Client, dispatch_uid="server_post_save")
def notify_server_config_changed(sender, instance, **kwargs):
    requests.post("http://127.0.0.1:8080/notify",
                  json={
                      'topic': 'clientconfig',
                      'args': [model_to_dict(instance)]
                  })
The 'notify' is the url configured for the push service.
#codingexercise
We discussed the next palindrome problem yesterday. It involved an addition and carry over. We do it this way:
       static void AddOne(ref List<int>digits, int start)
        {
            if (start > digits.Count) return;
            if (digits[start] + 1 <= 9)
            {
                digits[start] += 1;
                return;
            }
            int carry_over = 1;
            while (start >= 0 && carry_over != 0)
            {
            int sum = digits[start] + 1;
            digits[start] = sum %10;
            carry_over = sum /10;
            if (start == 0)
            {
                digits.Insert(0, carry_over);
            }
            start--;
            }
        }