Saturday, November 28, 2015

the stochastic gradient method mentioned in the earlier post is different from the batch gradient in the following ways:
1) the batch gradient method operates on all the data before each update  where as the stochastic gradient method updates after reading one training set.
2) If the number of training samples are very large then batch gradient method can be very slow. On the other hand, stochastic gradient method will be faster because it will improve from the first training set itself.
3) The error function is not as well minimized in SGD as it is in GD. However, it does converge though not to the minimum and oscillate around the optimal value. In fact it has been found to be empirically better than GD.
4) While the batch gradient method performs the following :
Repeat until convergence {
  Theta - j = Theta -j + correction  ( for every j )
}
the stochastic gradient method performs the following:
Loop {
for i = 1 to m {
Theta-j = Theta-j + correction ( for every j )
}
}
5) The batch gradient method find the global minima and because it evaluates the entire data it is not susceptible to local minima should they occur.
6) The latter method is called stochastic because the approximate direction that can be computed in every step can be thought of as a random variable of a stochastic process.

No comments:

Post a Comment