Tuesday, October 28, 2014

In today's post I will return to the matrix factorization and optimization techniques. We were discussing Newton's method. This is a method that finds matrix factors by iterations using Taylor series and converging near the roots. Keeping the Taylor series to the first order, this method finds the tangent line to the curve (x, f(x))  at xo where the tangent line intersects the x-axis.
Assuming that the xk+1 converges towards the x, we can define the error in the k-th step. which when we expand, we see that the method converges quadratically. We will next discuss line search and trust region which we use in the computation of the incremental stuff. In line search, we pick a direction and then decide how far to move along that direction. 1
Each iteration of the line search is computed this way:
Xk+1 = Xk + alpha-k Pk
Where the incremental step depends both on Xk and alpha-k. Generally that is chosen which makes the descent to reduce the function. The search is done in two phases the bracketing phase where a range is found to bound the step length and an extrapolation phase where a good one is picked within the range. The iteration terminates when there is said to be sufficient reduction of the function.
The trust region is more robust. It recognizes that the approximation model chosen to compute the next iteration is only applicable locally. Therefore there is a trust region in which the model applies. The line search could be an approximation model too. As the model is revised for each iteration, the trusted region should reduce as the function converges. This is the termination condition for the iteration.
The trust region methods are considered newer as compared to the line search methods. The difficulty with the line search methods was that there is not a sufficient reduction in f at each step.  The condition we maintain is that the reduction in f should be proportional to both the step length alpha-k and the directional derivative. If the reduction in f  denoted by phi decreases, increases consecutively and the second decrease dips below zero, the step length and directional derivative part is a straight line through this curve. which separates the graph into three regions - the first and the last portion where the actual reduction is less than the computed and is therefore acceptable and the middle portion which isn't.  Thus the sufficient decrease condition is not enough by itself to ensure that the algorithm makes reasonable progress, because we see that it is satisfied for all small values of step-length where there is a steep descent.
To rule out unacceptably short steps, a second requirement was introduced, called the curvature condition where we check for the slope of the actual reduction at step length is a constant times the initial slope. In other words, if the slope is strongly negative, we have an indication that we can reduce f significantly by moving further along the chosen direction. So we choose a desired slope and we don't accept those that are too low or when the reduction is small or even positive.  This helps us get a better progress with very small skips. Together the sufficient decrease and curvature conditions are known collectively as Wolfe conditions.

For sparse matrices, conjugate gradient methods are better suited. we will discuss this next.

No comments:

Post a Comment