Friday, May 12, 2017

Today we continue to discuss Machine Learning which is not limited to using NoSQL databases or graph databases. We can now run Python in SQL Server using stored procedures or remote compute contexts. The package used in Python for machine learning purposes is revoscalepy module. This module has a subset of algorithms and contexts in RevoScaleR  The R utilities are made available in SQL Server 2017 and Microsoft R Server. These include supported compute contexts such as RxSpark and RxInSQLServer. By making the models and contexts as citizens of the database, we can now control who uses them. A certain package is published by Microsoft and this is called the MicrosoftML package and it brings speed, performance and scale to handling a large corpus of text data and  high dimensional categorical data in R models. It provides several algorithms. One such is rxFastLinear algorithm which is a fast linear model trainer based on the Stochastic Dual Coordinate Ascent method. It combines the capabilities of logistic regressions and  SVM algorithms. The algorithm implements an optimization associated with the regularized loss minimization  of linear predictors.
The approach for solving SVM is usually stochastic gradient descent. The advantage of this method is that it does not depend on the number of linear predictors. Therefore it works very well for a large number of linear predictors. This method reaches an approximate solution pretty quickly but it does not converge. This is true for many gradient descent methods. An alternative approach is to use dual coordinate descent. Here the objective is framed as a maximization in terms of a dual variable associated with each example in the training set. In each iteration, a single dual variable is optimized while keeping the rest in tact.  At the optimal solution the objective is known, so we have a duality gap which helps us put an upper bound on the sub-optimality.
The method is stochastic and at each round of the iteration, a dual coordinate is chosen at random.
Courtesy Shwartz and Zhang
#programmingexercise
Find the number of unique BSTs from a range
repeatedly call makeBST with different start and end of range.
Choose each number of the range as a splitting point and recur for the sub ranges. If the left and right sub range are empty form the root. if the left sub range is empty recur on the right sub range for each of the values. repeat for the left if right is empty. finally repeat with one of the sub range.
        static int numUniqueBSTs(int n)
        {
            int count = 0;
            if (n == 0) return 1;
            for (int i = 1; i <= n; i++)
                count += numUniqueBSTs(i - 1) * numUniqueBSTs(n - i);
            return count;
        }
        which is the same as Catalan number
        static int GetNChooseKDP(int n, int k)
        {
            if (k == 0 || k == n)
                return 1;
            return GetNChooseKDP(n - 1, k - 1) + GetNChooseKDP(n - 1, k);
        }
        static int GetCatalan(int n)
        {
            int c = GetNChooseKDP(2 * n, n);

            return c / (n + 1); // Catalan number
        }

No comments:

Post a Comment