Applying MicrosoftML rxFastLinear algorithm:
While rxLogisticRegression is a binary and multiclass
classification that uses a regular regression model, the rxFastLinear algorithm
is a fast linear model trainer based on the Stochastic Dual Coordinate Ascent
method. It combines the capabilities of
logistic regressions and SVM algorithms. The dual problem is the dual ascent by
maximizing the regression in the scalar convex functions adjusted by the
regularization of vectors. It supports three types of loss functions - log
loss, hinge loss, smoothed hinge loss. This is used for applications in Payment
default prediction and Email Spam filtering.
This form of regression uses statistical measures, is highly
flexible, takes any kind of input and supports different analytical tasks. This
regression folds the effects of extreme values and evaluates several factors
that affect a pair of outcomes. Regression is very useful to calculate a
linear relationship between a dependent and independent variable, and then use
that relationship for prediction. Errors demonstrate elongated scatter plots in
specific categories. Even when the errors come with different error details in
the same category, they can be plotted with correlation. This technique is
suitable for specific error categories from an account.
Default
detection rates can be boosted, and false positives can be reduced using
real-time behavioral profiling as well as historical profiling. Big Data,
commodity hardware and historical data going as far back as three years help
with accuracy. This enables payment default detection to be almost as early as
when it is committed. True real time processing implies stringent response
times.
The algorithm for the least squares regression can be written
as:
1. Set the initial approximation
2. For a set of successive increments or boosts each based on the
preceding iterations, do
3. Calculate the new residuals
4. Find the line of search by aggregating and minimizing the
residuals
5. Perform the boost along the line of search
6. Repeat 3,4,5 for each of 2.
Conjugate
gradient descent can be described with a given
input matrix A, b, a starting value x, a number of iterations i-max and an
error tolerance epsilon < 1 in this way:
set
I to 0
set
residual to b - Ax
set
search-direction to residual.
And
delta-new to the dot-product of residual-transposed.residual.
Initialize
delta-0 to delta-new
while
I < I-max and delta > epsilon^2 delta-0 do:
q = dot-product(A, search-direction)
alpha = delta-new / (search-direction-transposed. q)
x = x + alpha.search-direction
If I is divisible by 50
r = b - Ax
else
r = r - alpha.q
delta-old = delta-new
delta-new = dot-product(residual-transposed,residual)
Beta
= delta-new/delta-old
Search-direction
= residual + Beta. Search-direction
I = I + 1
Sample application:
#! /bin/python
import numpy
import pandas
from
microsoftml import rx_fast_linear, rx_predict
from
revoscalepy.etl.RxDataStep import rx_data_step
from
microsoftml.datasets.datasets import get_dataset
attitude =
get_dataset("attitude")
import
sklearn
if
sklearn.__version__ < "0.18":
from sklearn.cross_validation import
train_test_split
else:
from sklearn.model_selection import
train_test_split
attitudedf =
attitude.as_df()
data_train,
data_test = train_test_split(attitudedf)
model =
rx_fast_linear(
formula="rating ~ complaints +
privileges + learning + raises + critical + advance",
method="regression",
data=data_train)
#
RuntimeError: The type (RxTextData) for file is not supported.
score_ds =
rx_predict(model, data=data_test,
extra_vars_to_write=["rating"])
# Print the
first five rows
print(rx_data_step(score_ds,
number_rows_read=5))
No comments:
Post a Comment