Saturday, May 13, 2017

We were discussing the MicrosoftML rxFastLinear algorithm which is a fast linear model trainer based on the Stochastic Dual Coordinate Ascent method. It combines the capabilities of logistic regressions and  SVM algorithms. The algorithm implements an optimization associated with the regularized loss minimization  of linear predictors. The approach for solving SVM is usually stochastic gradient descent. The advantage of this method is that it does not depend on the number of linear predictors. Therefore it works very well for a large number of linear predictors. This method reaches an approximate solution pretty quickly but it does not converge. This is true for many gradient descent methods. An alternative approach is to use dual coordinate descent. Here the objective is framed as a maximization in terms of a dual variable associated with each example in the training set. In each iteration, a single dual variable is optimized while keeping the rest in tact.  At the optimal solution the objective is known, so we have a duality gap which helps us put an upper bound on the sub-optimality. The method is stochastic and at each round of the iteration, a dual coordinate is chosen at random
Today we try to understand the duality gap as described by the Shalev-Shwartz Zhang paper. Originally the generic optimization problem was stated in the form of AX+B where the first component was the normalization of the scalar convex functions applied on the instance dot linear predictors and the second component was the regularized instance.  The dual problem replaced the instance with the dual co-ordinate and the dual objective is optimized with respect to a single dual variable while the rest of the variables are kept in tact.The second component is in terms of instances only and in the dual objective case we take this as the regularized normalized new coordinate dot linear predictors. We call this the computed instance. The optimal solution of the dual problem is also the optimal solution of the generic problem. At each such computed instance, the computed generic objective value must be greater than or equal to the dual objective at the corresponding dual coordinate. This is the duality gap and because the duality gap closes from the generic objective value at the computed instance to the generic objective value at the optimal solution, the duality gap has an upper bound. 

No comments:

Post a Comment