Tuesday, May 31, 2022

 

Continuous root cause analysis via analysis of time-series events:  

 

Problem statement: Given a method to collect many data points for errors in logs, can there be prediction on the resolution time of the next root-cause   

 

Solution: There are two stages to solving this problem:

1.       Stage 1 – discover root cause and create a summary to capture it

2.       Stage 2 – use a time-series algorithm to predict the relief time.

 

Stage 1:

We start with the hidden weighted matrix that the neural network layer generates and then use that hidden layer to determine the salience using the gradient descent method.     

 

All values are within [0,1] co-occurrence probability range.    

 

The solution to the quadratic form representing the embeddings is found by arriving at the minima represented by Ax = b using conjugate gradient method.  

We are given input matrix A, b, a starting value x, a number of iterations i-max and an error tolerance  epsilon < 1  

 

This method proceeds this way:   

 

set I to 0   

 

set residual to b - Ax   

 

set search-direction to residual.  

 

And delta-new to the dot-product of residual-transposed.residual.  

 

Initialize delta-0 to delta-new  

 

while I < I-max and delta > epsilon^2 delta-0 do:   

 

    q = dot-product(A, search-direction)  

 

    alpha = delta-new / (search-direction-transposed. q)   

 

    x = x + alpha.search-direction  

 

    If I is divisible by 50   

 

        r = b - Ax   

 

    else   

 

        r = r - alpha.q   

 

    delta-old = delta-new  

 

    delta-new = dot-product(residual-transposed,residual)  

 

     Beta = delta-new/delta-old  

 

     Search-direction = residual + Beta. Search-direction  

 

     I = I + 1   

 

Root cause capture – Exception stack traces that are captured from various sources and appear in the logs can be stack hashed. The root cause can be described by  a specific stacktrace, its associated point of time, the duration over which it appears and the time of fix introduced, if known. 

 

Stage 2: A time-series algorithm does not need any attributes other than the historical collection of relief times to be able to predict the next relief time. It only looks at scalar value regardless of the type or factors playing into the relief time of an individual incident or its root cause attributes. The historical data is utilized to predict an estimation on the incoming event as if the relief were a scatter plot along the timeline. Unlike other data mining algorithms that involve additional attributes of the event, this approach uses a single auto-regressive method on the continuous data to make a short-term prediction. The regression is automatically trained as the data accrues

No comments:

Post a Comment