We were discussing the MicrosoftML rxFastLinear algorithm which is a fast linear model trainer based on the Stochastic Dual Coordinate Ascent method. It combines the capabilities of logistic regressions and SVM algorithms. The algorithm implements an optimization associated with the regularized loss minimization of linear predictors. The approach for solving SVM is usually stochastic gradient descent. This method uses dual coordinate descent. Here the objective is framed as a maximization in terms of a dual variable associated with each example in the training set. In each iteration, a single dual variable is optimized while keeping the rest in tact. At the optimal solution the objective is known, so we have a duality gap which helps us put an upper bound on the sub-optimality. The method is stochastic and at each round of the iteration, a dual coordinate is chosen at random The duality gap is described in the Shalev-Shwartz Zhang paper. Originally the generic optimization problem was stated in the form of AX+B where the first component was the normalization of the scalar convex functions applied on the instance dot linear predictors and the second component was the regularized instance. The dual problem replaced the instance with the dual co-ordinate and the dual objective is optimized with respect to a single dual variable while the rest of the variables are kept in tact.The second component is in terms of instances only and in the dual objective case we take this as the regularized normalized new coordinate dot linear predictors. We call this the computed instance. The optimal solution of the dual problem is also the optimal solution of the generic problem. At each such computed instance, the computed generic objective value must be greater than or equal to the dual objective at the corresponding dual coordinate. This is the duality gap and because the duality gap closes from the generic objective value at the computed instance to the generic objective value at the optimal solution, the duality gap has an upper bound.
The paper provides the rigor to understand the convergence of the duality gap for Stochastic Dual Coordinate Ascent (SDCA). They performed experiments involving randomization which improved the analysis performed earlier. They derived the rate of convergence for SDCA with two types of loss functions - Lipschitz loss and Smooth loss.
The paper provides the rigor to understand the convergence of the duality gap for Stochastic Dual Coordinate Ascent (SDCA). They performed experiments involving randomization which improved the analysis performed earlier. They derived the rate of convergence for SDCA with two types of loss functions - Lipschitz loss and Smooth loss.
No comments:
Post a Comment