Wednesday, May 26, 2021

Decision Tree modeling on API Error root cause analysis   

Problem statement: Given a method to collect root causes from many data points from API error codes in logs, can there be a determination of relief time? 

   

Solution: There are two stages to solving this problem:  

1.       Stage 1 – discover root cause and create a summary to capture it  

2.       Stage 2 – use a decision tree modeling to determine relief time.  

 

Stage 1:  

The first stage involves a data pipeline that converts log entries to a json object with request and response details, statuscode, error message, remote server info, query and request parameters and hashes them into buckets.  When the dictionary  are collected from a batch of log entries, we can transform it into a vector representation and using the notable request-response pairs as features. Then we can generate a hidden weighted matrix for the neural network  

We use that hidden layer to determine the salience using the gradient descent method.       

   

All values are within [0,1] co-occurrence probability range.      

   

The solution to the quadratic form representing the embeddings is found by arriving at the minima represented by Ax = b using conjugate gradient method.    

We are given input matrix A, b, a starting value x, a number of iterations i-max and an error tolerance  epsilon < 1    

   

This method proceeds this way:     

   

set I to 0     

   

set residual to b - Ax     

   

set search-direction to residual.    

   

And delta-new to the dot-product of residual-transposed.residual.    

   

Initialize delta-0 to delta-new    

   

while I < I-max and delta > epsilon^2 delta-0 do:     

   

    q = dot-product(A, search-direction)    

   

    alpha = delta-new / (search-direction-transposed. q)     

   

    x = x + alpha.search-direction    

   

    If I is divisible by 50     

   

        r = b - Ax     

   

    else     

   

        r = r - alpha.q     

   

    delta-old = delta-new    

   

    delta-new = dot-product(residual-transposed,residual)    

   

     Beta = delta-new/delta-old    

   

     Search-direction = residual + Beta. Search-direction    

   

     I = I + 1     

   

Root cause capture – API error summaries that are captured from various sources and appear in the logs can be stack hashed. The root cause can be described by a specific summaries, its associated point of time, the duration over which it appears, and the time of fix introduced, if known.   

   

Stage 2: Decision Tree modeling can help predict relief time. involves both a classification and a regression tree. A function divides the rows into two datasets based on the value of a specific column. The two list of rows that are returned are such that one set matches the criteria for the split while the other does not. When the attribute to be chosen is clear, this works well. 

To see how good an attribute is, the entropy of the whole group is calculated.  Then the group is divided by the possible values of each attribute and the entropy of the two new groups are calculated. The determination of which attribute is best to divide on, the information gain is calculated which is the difference between the current entropy and the weighted-average entropy of the two new groups. The algorithm calculates the information gain for every attribute and chooses the one with the highest information gain. 

Each set is subdivided only if the recursion of the above step can proceed. The recursion is terminated if a solid conclusion has been reached which is a way of saying that the information gain from splitting a node is no more than zero. The branches keep dividing, creating a tree by calculating the best attribute for each new node. If a threshold for entropy is set, the decision tree is ‘pruned’.  

When working with a set of tuples, it is easier to reserve the last one for results during a recursion level. Text and numeric data do not have to be differentiated for this algorithm to run. The algorithm takes all the existing rows and assumes the last row is the target value. A training/testing dataset is used with the application for each dataset. Usually, a training/testing data split of 70/30% is used in this regard.  

 


Tuesday, May 25, 2021

Authenticating using Azure SDK

Introduction: The previous article mentioned the way to use the new Azure.Identity credentials with the Fluent library from Azure. This article describes the use of DefaultAzureCredential for the logged-in user.  

Description: The DefaultCredential class in the previous versions of the SDK and the DefaultAzureCredential in the current version of the SDK both support common developer workflows. The DefaultAzureCredential in the Azure SDK is the recommended way to handle the authentication across the local workstation and the deployment environment. It is really automation to finding the right credential to use. It uses the most appropriate credential to use by iterating through four specific locations. These are environment variables, managed identity, the MSAL shared token cache, and the Azure CLI. The environment variables used are AZURE_CLIENT_ID, AZURE_CLIENT_SECRET, and AZURE_TENANT_ID. Once these are set and used, other credentials can be read from the key-vault store. It is also possible to bring up an interactive browser for login with SDKs specific to some languages. Credentials once formed can be stored in encrypted locations reducing the exposure to others. The DefaultAzureCredentials supports and attempts to authenticate with a few more such as VisualStudioCredential, SharedTokenCacheCredential, and InteractiveBrowserCredential. The VisualStudioCredential is an integration with the Azure Account Extension and is the same one used with the “Azure Sign in” command. 

It is also possible to exclude some credentials. This is done with the help of DefaultAzureCredentialOptions where there are flags to exclude every one of the credentials mentioned. 


There is also a technique to fail the authentication and not try the next. This is done with the help of the CredentialUnavailableException.DefaultAzureCredential exception type. When it is added, then the next credential is tried only when CredentialUnavailableException is thrown from the current credential. If a different exception is thrown then it is propagated and the next one is not tried. 


Connecting to a client with the DefaultAzureCrendential is a breeze. Let’s review the syntax: 

In .Net: 

var client = new SecretClient(new Uri(keyVaultUrl), new DefaultAzureCredential(true)); 


And in Java: 

SecretClient client = new SecretClientBuilder() 

        .vaultUrl(keyVaultUrl) 

        .credential(new DefaultAzureCredentialBuilder().build()) 

        .buildClient(); 

Monday, May 24, 2021

Decision Tree modeling on API Error root cause analysis   

Problem statement: Given a method to collect root causes from many data points from API error codes in logs, can there be a determination of relief time? 

   

Solution: There are two stages to solving this problem:  

1.       Stage 1 – discover root cause and create a summary to capture it  

2.       Stage 2 – use a decision tree modeling to determine relief time.  

 

Stage 1:  

The first stage involves a data pipeline that converts log entries to a json object with request and response details, statuscode, error message, remote server info, query and request parameters and hashes them into buckets.  When the dictionary  are collected from a batch of log entries, we can transform it into a vector representation and using the notable request-response pairs as features. Then we can generate a hidden weighted matrix for the neural network  

We use that hidden layer to determine the salience using the gradient descent method.       

   

All values are within [0,1] co-occurrence probability range.      

   

The solution to the quadratic form representing the embeddings is found by arriving at the minima represented by Ax = b using conjugate gradient method.    

We are given input matrix A, b, a starting value x, a number of iterations i-max and an error tolerance  epsilon < 1    

   

This method proceeds this way:     

   

set I to 0     

   

set residual to b - Ax     

   

set search-direction to residual.    

   

And delta-new to the dot-product of residual-transposed.residual.    

   

Initialize delta-0 to delta-new    

   

while I < I-max and delta > epsilon^2 delta-0 do:     

   

    q = dot-product(A, search-direction)    

   

    alpha = delta-new / (search-direction-transposed. q)     

   

    x = x + alpha.search-direction    

   

    If I is divisible by 50     

   

        r = b - Ax     

   

    else     

   

        r = r - alpha.q     

   

    delta-old = delta-new    

   

    delta-new = dot-product(residual-transposed,residual)    

   

     Beta = delta-new/delta-old    

   

     Search-direction = residual + Beta. Search-direction    

   

     I = I + 1     

   

Root cause capture – API error summaries that are captured from various sources and appear in the logs can be stack hashed. The root cause can be described by a specific summaries, its associated point of time, the duration over which it appears, and the time of fix introduced, if known.   

   

Stage 2: Decision Tree modeling can help predict relief time. involves both a classification and a regression tree. A function divides the rows into two datasets based on the value of a specific column. The two list of rows that are returned are such that one set matches the criteria for the split while the other does not. When the attribute to be chosen is clear, this works well. 

To see how good an attribute is, the entropy of the whole group is calculated.  Then the group is divided by the possible values of each attribute and the entropy of the two new groups are calculated. The determination of which attribute is best to divide on, the information gain is calculated which is the difference between the current entropy and the weighted-average entropy of the two new groups. The algorithm calculates the information gain for every attribute and chooses the one with the highest information gain. 

Each set is subdivided only if the recursion of the above step can proceed. The recursion is terminated if a solid conclusion has been reached which is a way of saying that the information gain from splitting a node is no more than zero. The branches keep dividing, creating a tree by calculating the best attribute for each new node. If a threshold for entropy is set, the decision tree is ‘pruned’.  

When working with a set of tuples, it is easier to reserve the last one for results during a recursion level. Text and numeric data do not have to be differentiated for this algorithm to run. The algorithm takes all the existing rows and assumes the last row is the target value. A training/testing dataset is used with the application for each dataset. Usually, a training/testing data split of 70/30% is used in this regard