Wednesday, May 10, 2017

Yesterday we were discussing Machine Learning is not limited to using NoSQL databases or graph databases. Recently SQL Server announced machine learning Services that are supported in database. We can now run Python in SQL Server using stored procedures or remote compute contexts. The package used in Python for machine learning purposes is revoscalepy module. This module has a subset of algorithms and contexts in RevoScaleR  The R utilities are made available in SQL Server 2017 and Microsoft R Server. These include supported compute contexts such as RxSpark and RxInSQLServer. By making the models and contexts as citizens of the database, we can now control who uses them. Also, the users can be assigned the right to install their own packages or share packages with other users. Users who belong to these roles can install and uninstall R packages on the SQL server computer from a remote development client, without having to go through the database administrator each time. A certain package is published by Microsoft and this is called the MicrosoftML package and it brings speed, performance and scale to handling a large corpus of text data and  high dimensional categorical data in R models. This package provides machine learning transform pipelines where we can specify the transforms to be applied to our data for featurization before training or testing to facilitate these processes.
The MicrosoftML package provides fast and scalable machine learning algorithms for classification, regression and anomaly detection.
The rxFastLinear algorithm is a fast linear model trainer based on the Stochastic Dual Coordinate Ascent method.  It combines the capabilities of logistic regressions and  SVM algorithms. The dual problem is the dual ascent by maximizing the regression in the scalar convex functions adjusted by the regularization of vectors. It supports three types of loss functions - log loss, hinge loss, smoothed hinge loss. This is used for applications in Mortgage default prediction and Email Spam filtering.
The rxOneClassSVM is used for anomaly detection such as in credit card fraud detection.  It is a simple one class support vector machine which helps detect outliers that do not belong to some target class because the training set contains only examples from the target class.
The rxFastTrees is a fast tree algorithm which is used for binary classification or regression. It can be used for bankruptcy prediction.  It is an implementation of FastRank which is a form of MART gradient boosting algorithm. It builds each regression tree in a step wise fashion using a predefined loss function. The loss function helps to find the error in the current step and fix it in the next.
The rxFastForest is a fast forest algorithm also used for binary classification or regression. It can be used for churn prediction. It builds several decision trees built using the regression tree learner in rxFastTrees. An aggregation over the resulting trees then finds a Gaussian distribution closest to the combined distribution for all trees in the model.
The rxNeuralNet is a neural network implementation that helps with multi class classification and regression. It is helpful for applications say signature prediction, OCR, click prediction. A neural network is a weighted directed graph arranged in layers where the nodes in one layer are connected by a weighted edge to the nodes in another layer This algorithm tries to adjust the weights on the graph edges based on the training data.
The rxLogisticRegression is a binary and multiclass classification that classifies sentiments from feedback. This is a regular regression model where the variable that determines the category is dependent on one or more independent variables that has a logistic distribution.

No comments:

Post a Comment