Cluster computing

Thursday, December 14, 2023

Applying MicrosoftML rxNeuralNet algorithm: 

While Logistic regression is used to model binary outcomes, the rxNeuralNet is a neural network implementation that helps with multi class classification and regression. It is helpful for applications say signature prediction, OCR, click prediction. A neural network is a weighted directed graph arranged in layers where the nodes in one layer are connected by a weighted edge to the nodes in another layer. This algorithm tries to adjust the weights on the graph edges based on the training data.

Logistic regression helps to detect root causes of payment errors. It uses statistical measures, is highly flexible, takes any kind of input and supports different analytical tasks. This regression folds the effects of extreme values and evaluates several factors that affect a pair of outcomes.  Regression is very useful to calculate a linear relationship between a dependent and independent variable, and then use that relationship for prediction. Errors demonstrate elongated scatter plots in specific categories. Even when the errors come with different error details in the same category, they can be plotted with correlation. This technique is suitable for specific error categories from an account.  

Default detection rates can be boosted, and false positives can be reduced using real-time behavioral profiling as well as historical profiling. Big Data, commodity hardware and historical data going as far back as three years help with accuracy. This enables payment default detection to be almost as early as when it is committed. True real time processing implies stringent response times.

The algorithm for the least squares regression can be written as:  

1. Set the initial approximation   

2. For a set of successive increments or boosts each based on the preceding iterations, do  

3. Calculate the new residuals  

4. Find the line of search by aggregating and minimizing the residuals  

5. Perform the boost along the line of search  

6. Repeat 3,4,5 for each of 2. 

Conjugate gradient descent can be described with a given input matrix A, b, a starting value x, a number of iterations i-max and an error tolerance  epsilon < 1 in this way:

set I to 0       

set residual to b - Ax    

set search-direction to residual.   

And delta-new to the dot-product of residual-transposed.residual.   

Initialize delta-0 to delta-new   

while I < I-max and delta > epsilon^2 delta-0 do:    

    q = dot-product(A, search-direction)   

    alpha = delta-new / (search-direction-transposed. q)    

    x = x + alpha.search-direction   

    If I is divisible by 50    

        r = b - Ax    

    else    

        r = r - alpha.q    

    delta-old = delta-new   

    delta-new = dot-product(residual-transposed,residual)   

    Beta = delta-new/delta-old   

    Search-direction = residual + Beta. Search-direction   

   I = I + 1 

Sample application: 

#! /bin/python 
import numpy
import pandas
from microsoftml import rx_neural_network, rx_predict
from revoscalepy.etl.RxDataStep import rx_data_step
from microsoftml.datasets.datasets import get_dataset
iris = get_dataset("iris")
import sklearn

if sklearn.__version__ < "0.18":
from sklearn.cross_validation import train_test_split
else:
from sklearn.model_selection import train_test_split

irisdf = iris.as_df()
irisdf["Species"] = irisdf["Species"].astype("category")
data_train, data_test, y_train, y_test = train_test_split(irisdf, irisdf.Species)

model = rx_neural_network(
    formula=" Species ~ Sepal_Length + Sepal_Width + Petal_Length + Petal_Width ",
    method="multiClass",
    data=data_train)

# RuntimeError: The type (RxTextData) for file is not supported.
score_ds = rx_predict(model, data=data_test,
extra_vars_to_write=["Species", "Score"])

# Print the first five rows
print(rx_data_step(score_ds, number_rows_read=5))

Cluster computing

Thursday, December 14, 2023

No comments:

Post a Comment