Applying MicrosoftML
rxFastLinear algorithm to Insurance payment default prediction:
Logistic regression is a
well-known statistical technique that is used to model binary outcomes. It can
be applied to detect root causes of payment errors. It uses statistical
measures, is highly flexible, takes any kind of input and supports different
analytical tasks. This regression folds the effects of extreme values and
evaluates several factors that affect a pair of outcomes.
Logistic regression differs from
the other Regression techniques in the use of statistical measures. Regression
is very useful to calculate a linear relationship between a dependent and
independent variable, and then use that relationship for prediction. Errors
demonstrate elongated scatter plots in specific categories. Even when the
errors come with different error details in the same category, they can be
plotted with correlation. This technique is suitable for specific error
categories from an account.
One advantage of logistic
regression is that the algorithm is highly flexible, taking any kind of input,
and supports several different analytical tasks:
·
Use demographics to make predictions about outcomes,
such as probability of defaulting payments.
·
Explore and weigh the factors that contribute to a
result. For example, find the factors that influence customers to make a repeat
past due payment.
·
Classify claims, payments, or other objects that have
many attributes.
Support Vector machines, on the other hand, can detect
non-linear and complex patterns with good predictive power. These are
sophisticated classification machines. These build a predictive model by
finding the dividing line between two categories. In other words, the data is
most distant to these lines and one of them is usually chosen as the best. The
points that are closest to the line are the ones that determine the line and
are called support vectors. Once the line is found, classifying is just a
preference for putting the data in the right category.
The MicrosoftML rxFastLinear
algorithm is a fast linear model trainer based on the Stochastic Dual
Coordinate Ascent method. It combines the capabilities of logistic
regressions and SVM algorithms. The dual problem is the dual ascent by
maximizing the regression in the scalar convex functions adjusted by the
regularization of vectors. It supports three types of loss functions - log
loss, hinge loss, smoothed hinge loss.
An application of rxFastLinear algorithm that encapsulates
Logistic regression and Support Vector machines for the purpose of payment
default prediction would leverage individual oriented scoring instead of broad segment-based
scoring of transactions. Default detection rates can be boosted, and false
positives can be reduced using real-time behavioral profiling as well as
historical profiling. Big Data, commodity hardware and historical data going as
far back as three years help with accuracy. This enables payment default
detection to be almost as early as when it is committed. True real time
processing implies stringent response times.
The algorithm for the least
squares regression can be written as:
1. Set the initial
approximation
2. For a set of successive
increments or boosts each based on the preceding iterations, do
3. Calculate the new residuals
4. Find the line of search by
aggregating and minimizing the residuals
5. Perform the boost along the
line of search
6. Repeat 3,4,5 for each of 2.
Conjugate gradient descent can be described with a given input matrix A, b, a starting value x, a
number of iterations i-max and an error tolerance epsilon < 1 in this way:
set I to 0
set residual to b - Ax
set search-direction to residual.
And delta-new to the dot-product of
residual-transposed.residual.
Initialize delta-0 to delta-new
while I < I-max and delta > epsilon^2
delta-0 do:
q = dot-product(A, search-direction)
alpha = delta-new /
(search-direction-transposed. q)
x = x + alpha.search-direction
If I is divisible by 50
r = b - Ax
else
r = r - alpha.q
delta-old = delta-new
delta-new =
dot-product(residual-transposed,residual)
Beta = delta-new/delta-old
Search-direction = residual + Beta.
Search-direction
I = I + 1
Sample application:
#! /bin/python
from microsoftml import rx_fast_linear, rx_predict
model = rx_fast_linear("clas ~ x + y", data=data)
pred = rx_predict(model, data, extra_vars_to_write=["x",
"y"])
print(pred.head())
#codingexercise
Print all nodes in a binary tree with k
leaves.
int GetNodeWithKLeaves(Node root, int k, ref List<Node>
result)
{
if (root == null) return 0;
if (root.left == null && root.right ==
null) return 1;
int left = GetNodeWithKLeaves(root.left, k,
ref result);
int right = GetNodeWithKLeaves(root.right, k,
ref result);
if (left + right == k)
{
result.Add(root);
}
return left + right;
}
No comments:
Post a Comment