Wednesday, November 26, 2014

We were reviewing the non-linear conjugate gradient method. We set the residual to the negation of the gradient. We compute the Gram Schmidt conjugation of the residuals to find the search directions. We find the step length by ensuring that the gradient is orthogonal to the search direction. We solve for the component  gradient of next step transposed applied to current search direction to be zero. We set the Gram Schmidt constants by one of two techniques : Fletcher Reeves (FR) formula and Polak Ribiere (PR) formula. FR was used in linear CG method and is easy to compute.PR is similar but uses the difference of the next residual from the current instead of just using the next residual. FR converges if the starting point is close to the desired minimum. PR may or may not converge but when it does it is quicker than FR. To force the convergence in the PR case, we ignore negative values of the constant and choose zero instead.  This is equivalent to restarting the CG search where we forget the previous search directions and start it in the direction of the steepest descent.
We see that in the non-linear CG case, there are fewer convergence guarantees. If the function of the next step is chosen such that it deviates from the quadratic form, the search directions tend to lose conjugacy. Moreover, the global minimum may not be found and even a local minimum may not be found if the function has no lower bound.
A useful technique to improve convergence is to restart CG after every n iterations. This is particularly true when n is small. This follows from the fact that CG can only generate n conjugate vectors in n-dimensional space.
In the non-linear CG, we could also use a fast algorithm to find the zero of the gradient transposed search vector.  If the gradient is a polynomial in step length, two iterative methods for zero-finding are the Newton Raphson method and the Secant method.  Newton Raphson (NR) method also requires that the second-derivative of the function on initial step and step-length in the direction of the search vector be computable. NR method  relies on Taylor series approximation in terms of the function, its first derivative and second derivative. By setting this approximation to zero and finding the steplength, we get a truncated Taylor series which approximates the function with a parabola. Using these, NR method constructs a quadratic approximation of the function and a new point is chosen at the base of the parabola. This procedure is iterated until convergence is reached. To perform an exact line search of the non-quadratic form, repeated steps may need to be taken along the line until the gradient transposed search direction is zero and therefore the CG iteration may include several Newton-Raphson iterations. This is inexpensive or expensive depending on the second derivative. To perform an exact line search without computing the second derivative, the secant method approximates the function by evaluating the first derivative at distinct points where the step length is zero and the step-length is an arbitrary small non-zero number. Like the NR method, the secant method also approximates the function with a parabola but instead of using the first and second derivative at a point, it finds the first derivative at two different points. Both the Newton-Raphson and the Secant methods iterations are done when the answer is reasonably close to the exact solution.
#codingexercise
Int GetDistinctMin (int [] A)
{
if (A == null) return 0;
return A.DistinctMin ();
}

No comments:

Post a Comment