#codingexercise
Given a binary tree, find the largest path size from one leaf to another:
int getLargestPathSize(node* root)
{
var leafdepths = new List<int>();
setdepth(root, 0, ref leafdepths);
leafdepths.sort();
leafdepths.reverse();
if (leafdepths.count >= 2)
return leafdepths[0] + leafdepths[1];
return 0;
}
void setdepth(node *root, int level, ref List<int> leafdepths) {
if (!root) return;
int curLevel = level + 1;
setdepth(root->left, curLevel, ref leafdepths);
setdepth(root->right, curLevel, ref leafdepths);
if (root->left == null && root->right == null)
leafdepths.Add(curLevel);
}
Given a binary tree, find the largest path size from one leaf to another:
int getLargestPathSize(node* root)
{
var leafdepths = new List<int>();
setdepth(root, 0, ref leafdepths);
leafdepths.sort();
leafdepths.reverse();
if (leafdepths.count >= 2)
return leafdepths[0] + leafdepths[1];
return 0;
}
void setdepth(node *root, int level, ref List<int> leafdepths) {
if (!root) return;
int curLevel = level + 1;
setdepth(root->left, curLevel, ref leafdepths);
setdepth(root->right, curLevel, ref leafdepths);
if (root->left == null && root->right == null)
leafdepths.Add(curLevel);
}
Dekel, Gilad-BachRach, Shamir and Xiao described optimal
distributed online prediction using mini-batches. They focus on large-scale and
high rate prediction problems. The stochastic online prediction problem is one
where a stream of inputs z1, z2, … zm is
sampled independently from a fixed unknown distribution over a sample space Z.
Before observing each sample space zi, a
point wi is predicted from the set W. After making the prediction wi, we
observe zi and suffer the loss f(wi, zi) where f is a predefined loss
function. Then zi is used to improve the
prediction mechanism for the future. The quality of predictions is measured
using the notion of regret which is the difference between the cumulative loss
of our predictions and the cumulative loss of the fixed predictor which is
optimal with respect to the underlying distribution. Since regret relies on
stochastic variables, it is a random variable.
Furthermore, they bound this regret with O(sq.rt(m)) which
is the same as in the optimal for a serial gradient based algorithm. And their
method can use any gradient-based update rule by treating the serial online
prediction as a black box and convert it into a parallel or distributed online prediction
algorithm.
Their method is therefore succinctly put as
For j = 1, 2, … do
predict wj;
receive input zj
sampled i.i.d from unknown distribution
suffer loss f(wj,
zj)
define gradient
gj = delta f(wj,zj)
compute (wj+1, aj+1) = phi (aj, gj, alpha-j);
end
Notice the last step is an unspecified update rule where we
take the following three arguments:
-
an auxiliary state vector aj that summarizes all
of the necessary information about the past
-
a gradient gj of the loss function evaluated at
wj
-
and an iteration dependent parameter alpha-j as
stepsize
to compute the next
predictor and the new auxiliary state predictor.
This is very similar in nature to what I tried to by
evaluating the data once as it arrives using stochastic gradient based summary
but by initializing the current residual with similar update to treat the new
data while reducing it completely for the previous iteration data. For example
we can initialize the residual for the next iteration with the weighted average
of the conventional initial residual and the previous initial residual. And we
can measure the loss with the sum of squares.
No comments:
Post a Comment