Cluster computing

Monday, December 7, 2015

#codingexercise
Given a binary tree, find the largest path size from one leaf to another:
int getLargestPathSize(node* root)
{
var leafdepths = new List<int>();
setdepth(root, 0, ref leafdepths);
leafdepths.sort();
leafdepths.reverse();
if (leafdepths.count >= 2)
return leafdepths[0] + leafdepths[1];
return 0;
}

void setdepth(node *root, int level, ref List<int> leafdepths) {
if (!root) return;
int curLevel = level + 1;
setdepth(root->left, curLevel, ref leafdepths);
setdepth(root->right, curLevel, ref leafdepths);
if (root->left == null && root->right == null)
leafdepths.Add(curLevel);
}

Dekel, Gilad-BachRach, Shamir and Xiao described optimal distributed online prediction using mini-batches. They focus on large-scale and high rate prediction problems. The stochastic online prediction problem is one where a stream of inputs z1, z2, … zm is sampled independently from a fixed unknown distribution over a sample space Z. Before observing each sample space zi, a point wi is predicted from the set W. After making the prediction wi, we observe zi and suffer the loss f(wi, zi) where f is a predefined loss function. Then zi is used to improve the prediction mechanism for the future. The quality of predictions is measured using the notion of regret which is the difference between the cumulative loss of our predictions and the cumulative loss of the fixed predictor which is optimal with respect to the underlying distribution. Since regret relies on stochastic variables, it is a random variable.

Furthermore, they bound this regret with O(sq.rt(m)) which is the same as in the optimal for a serial gradient based algorithm. And their method can use any gradient-based update rule by treating the serial online prediction as a black box and convert it into a parallel or distributed online prediction algorithm.

Their method is therefore succinctly put as

For j = 1, 2, … do

predict wj;

receive input zj sampled i.i.d from unknown distribution

suffer loss f(wj, zj)

define gradient gj = delta f(wj,zj)

compute (wj+1, aj+1) = phi (aj, gj, alpha-j);

end

Notice the last step is an unspecified update rule where we take the following three arguments:

- an auxiliary state vector aj that summarizes all of the necessary information about the past

- a gradient gj of the loss function evaluated at wj

- and an iteration dependent parameter alpha-j as stepsize

to compute the next predictor and the new auxiliary state predictor.

This is very similar in nature to what I tried to by evaluating the data once as it arrives using stochastic gradient based summary but by initializing the current residual with similar update to treat the new data while reducing it completely for the previous iteration data. For example we can initialize the residual for the next iteration with the weighted average of the conventional initial residual and the previous initial residual. And we can measure the loss with the sum of squares.

Cluster computing

Monday, December 7, 2015

No comments:

Post a Comment