Decision Tree modeling on
root cause analysis
Problem statement: Given a method to collect root causes from many
data points in errors in logs, can there be a determination of relief time?
Solution: There are two stages to solving this problem:
1.
Stage 1 – discover root cause and create a summary to capture it
2.
Stage 2 – use a decision tree modeling to determine relief time.
Stage 1:
The first stage involves a data pipeline that
converts log entries to exception stacktraces and hashes them into buckets. Sample
included. When the exception stack traces are collected from a batch of
log entries, we can transform it into a vector representation and using the
notable stacktraces as features. Then we can generate a hidden weighted matrix
for the neural network
We use that hidden layer to determine the
salience using the gradient descent method.
All values are within [0,1] co-occurrence
probability range.
The solution to the quadratic form representing
the embeddings is found by arriving at the minima represented by Ax = b using
conjugate gradient method.
We are given input matrix A, b, a starting value
x, a number of iterations i-max and an error tolerance epsilon < 1
This method proceeds this way:
set I to 0
set residual to b - Ax
set search-direction to residual.
And delta-new to the dot-product of
residual-transposed.residual.
Initialize delta-0 to delta-new
while I < I-max and delta > epsilon^2
delta-0 do:
q = dot-product(A, search-direction)
alpha = delta-new /
(search-direction-transposed. q)
x = x + alpha.search-direction
If I is divisible by 50
r = b - Ax
else
r = r - alpha.q
delta-old = delta-new
delta-new =
dot-product(residual-transposed,residual)
Beta = delta-new/delta-old
Search-direction = residual + Beta.
Search-direction
I = I + 1
Root cause capture – Exception stack traces that
are captured from various sources and appear in the logs can be stack hashed.
The root cause can be described by a specific stacktrace, its associated point
of time, the duration over which it appears, and the time of fix introduced, if
known.
Stage 2: Decision Tree modeling can help predict relief time. involves both a classification and a
regression tree. A function divides the rows into two datasets based on the
value of a specific column. The two list of rows that are returned are such
that one set matches the criteria for the split while the other does not. When
the attribute to be chosen is clear, this works well.
To see how good an attribute is, the
entropy of the whole group is calculated. Then the group is divided by
the possible values of each attribute and the entropy of the two new groups are
calculated. The determination of which attribute is best to divide on, the information
gain is calculated which is the difference between the current entropy and the
weighted-average entropy of the two new groups. The algorithm calculates the
information gain for every attribute and chooses the one with the highest
information gain.
Each set is subdivided only if the
recursion of the above step can proceed. The recursion is terminated if a solid
conclusion has been reached which is a way of saying that the information gain
from splitting a node is no more than zero. The branches keep dividing,
creating a tree by calculating the best attribute for each new node. If a
threshold for entropy is set, the decision tree is ‘pruned’.
When working with a set of tuples, it
is easier to reserve the last one for results during a recursion level. Text
and numeric data do not have to be differentiated for this algorithm to run.
The algorithm takes all the existing rows and assumes the last row is the
target value. A training/testing dataset is used with the application for each
dataset. Usually, a training/testing data split of 70/30% is used in this
regard.
No comments:
Post a Comment