Sunday, June 26, 2022

Exception StackTrace associations for root cause analysis    

Problem statement: Given a method to collect root causes from many data points in errors in logs, can there be a determination of associations between root causes? 

Solution: There are two stages to solving this problem:   

Stage 1 – discover root cause and create a summary to capture it   

Stage 2 – use an association data mining algorithm on root causes.

Stage 1:  

The first stage involves a data pipeline that converts log entries to exception stacktraces and hashes them into buckets. Sample included.  When the exception stack traces are collected from a batch of log entries, we can transform them into a vector representation and using the notable stack frames as features. Then we can generate a hidden weighted matrix for the neural network  

We use that hidden layer to determine the salience using the gradient descent method.       

   

All values are within [0,1] co-occurrence probability range.      

   

The solution to the quadratic form representing the embeddings is found by arriving at the minima represented by Ax = b using the conjugate gradient method.    

We are given input matrix A, b, a starting value x, several iterations i-max, and an error tolerance epsilon < 1     

   

This method proceeds this way:     

   

set I to 0     

   

set residual to b - Ax     

   

set search-direction to residual.    

   

And delta-new to the dot-product of residual-transposed.residual.    

   

Initialize delta-0 to delta-new    

   

while I < I-max and delta > epsilon^2 delta-0 do:     

   

    q = dot-product(A, search-direction)    

   

    alpha = delta-new / (search-direction-transposed. q)     

   

    x = x + alpha.search-direction    

   

    If I is divisible by 50     

   

        r = b - Ax     

   

    else     

   

        r = r - alpha.q     

   

    delta-old = delta-new    

   

    delta-new = dot-product(residual-transposed,residual)    

   

     Beta = delta-new/delta-old    

   

     Search-direction = residual + Beta. Search-direction    

   

     I = I + 1     

   

Root cause capture – Exception stack traces that are captured from various sources and appear in the logs can be stack hashed. The root cause can be described by a specific stacktrace, its associated point of time, the duration over which it appears, and the time of fix introduced, if known.   

   

Stage 2 

Association data mining determines whether two root causes occur together. The computation involves two computed columns namely Support and Probability. Support defines the percentage of cases in which a rule must exist before it is considered valid. We define that a rule must be found in at least 1 percent of cases.  

Probability defines how likely an association must be before it is considered valid. We will consider any association with a probability of at least 10 percent.  

Bayesian conditional probability and confidence can also be used. Associations have association rules formed with a pair of antecedent and consequent item-sets, so named, because we want to find the value of taking one item with another. Let I be a set of items, T be a set of transactions. Then an association A is defined as a subset of I that occurs together in T. Support (S1) is a fraction of T containing S1. Let S1 and S2 be subsets of I, then the association rule to associate S1 to S2 has support(S1->S2) defined as Support(S1 union S2) and a confidence (S1->S2) = Support(S1 union S2)/ Support(S1).  A third metric Lift is determined as Confidence(S1->S2)/Support(S2) and is preferred because a popular S1 gives high confidence for any S2 and lift corrects that by having a value greater than 1.0 when S2 is also significant.  

Certain databases allow the creation of association models that can be persisted and evaluated against each incoming request. Usually, a training/testing data split of 70/30% is used in this regard.  
Sample: https://jsfiddle.net/g2snw4da/  

      

No comments:

Post a Comment