Thursday, November 26, 2015

We bring out the similarities between the Batch Gradient descent and the Conjugate gradient methods both of which we have discussed earlier.
Essentially both try to solve a linear equation of the form Ax = b
Both of them calculate the gradients for descent and the gradient is a vector field that for a given point gradient points in the direction of the steepest change for the objective.
Steepest descent finds orthogonal directions by stretching the descent in a direction as much as possible.
The steepest descent algorithm proceeds this way:
set I to 0
set residual to b - Ax
set delta to residual-transposed.residual.
Initialize delta-0 to delta
while I < I-max and delta > epsilon^2 delta-0 do:
    q = Ar 
    alpha = delta / (r-transposed. q)
    x = x + alpha.residual
    If I is divisible by 50
        r = b - Ax
    else
        r = r - alpha.q
    delta = r-transposed. r
     I = I + 1

The reason for the use of the number 50 is that after a certain number of iterations, the exact residual is recalculated to remove accumulated floating point error. Of course, the number 50 is arbitrary and for large n, square_root(n) may be more appropriate instead.
If the tolerance is large. the residual need not be corrected at all. If the tolerance is close to the limits of the floating point precision of the machine, a test should be added after delta is evaluated, to check if delta is less than epsilon^2.delta-0. and if this test holds true, the exact residual should be recomputed and delta reevaluated. This prevents the procedure from terminating early due to floating point roundoff error.
courtesy: Conjugate Gradient Method by Jonathan Richard Shewchuk

No comments:

Post a Comment