Applying MicrosoftML rxLogisticRegression
algorithm:
While rxNeuralNet is a neural network implementation that helps
with multi class classification and regression, Logistic Regression helps with
binary outcomes. rxLogisticRegression is a binary and multiclass classification
that classifies sentiments from feedback. This is a regular regression model
where the variable that determines the category is dependent on one or more
independent variables that have a logistic distribution.
This form of regression uses statistical measures, is highly
flexible, takes any kind of input and supports different analytical tasks. This
regression folds the effects of extreme values and evaluates several factors
that affect a pair of outcomes. Regression is very useful to calculate a
linear relationship between a dependent and independent variable, and then use
that relationship for prediction. Errors demonstrate elongated scatter plots in
specific categories. Even when the errors come with different error details in
the same category, they can be plotted with correlation. This technique is
suitable for specific error categories from an account.
Default
detection rates can be boosted, and false positives can be reduced using
real-time behavioral profiling as well as historical profiling. Big Data,
commodity hardware and historical data going as far back as three years help
with accuracy. This enables payment default detection to be almost as early as
when it is committed. True real time processing implies stringent response
times.
The algorithm for the least squares regression can be written
as:
1. Set the initial approximation
2. For a set of successive increments or boosts each based on the
preceding iterations, do
3. Calculate the new residuals
4. Find the line of search by aggregating and minimizing the
residuals
5. Perform the boost along the line of search
6. Repeat 3,4,5 for each of 2.
Conjugate
gradient descent can be described with a given
input matrix A, b, a starting value x, a number of iterations i-max and an
error tolerance epsilon < 1 in this way:
set
I to 0
set
residual to b - Ax
set
search-direction to residual.
And
delta-new to the dot-product of residual-transposed.residual.
Initialize
delta-0 to delta-new
while
I < I-max and delta > epsilon^2 delta-0 do:
q = dot-product(A, search-direction)
alpha = delta-new / (search-direction-transposed. q)
x = x + alpha.search-direction
If I is divisible by 50
r = b - Ax
else
r = r - alpha.q
delta-old = delta-new
delta-new = dot-product(residual-transposed,residual)
Beta
= delta-new/delta-old
Search-direction
= residual + Beta. Search-direction
I = I + 1
Sample application:
#!
/bin/python
import numpy
import pandas
from microsoftml import rx_logistic_regression, rx_predict
from revoscalepy.etl.RxDataStep import rx_data_step
from microsoftml.datasets.datasets import get_dataset
infert = get_dataset("infert")
import sklearn
if sklearn.__version__ < "0.18":
from sklearn.cross_validation import
train_test_split
else:
from sklearn.model_selection import
train_test_split
infertdf = infert.as_df()
infertdf["isCase"] = infertdf.case == 1
data_train, data_test, y_train, y_test = train_test_split(infertdf,
infertdf.isCase)
model = rx_logistic_regression(
formula=" isCase ~ age + parity
+ education + spontaneous + induced ",
data=data_train)
print(model.coef_)
# RuntimeError: The type
(RxTextData) for file is not supported.
score_ds = rx_predict(model, data=data_test,
extra_vars_to_write=["isCase", "Score"])
# Print the first five rows
print(rx_data_step(score_ds, number_rows_read=5))
No comments:
Post a Comment