Thursday, December 31, 2020

Applying Naïve Bayes data mining technique for IT service request

Naïve Bayes algorithm is a statistical probability-based data mining algorithm and is considered somewhat easier to understand and visualize as compared to others in its family.

The probability is a mere fraction of interesting cases to total cases. Bayes probability is a conditional probability that adjusts the probability based on the premise. If the premise is to take a factor into account, we get one conditional probability and if we don’t take the factor into account, we get another probability. Naïve Bayes builds on conditional states across attributes and are easy to visualize. This allows experts to show the reasoning process and it allows users to judge the quality of prediction. All these algorithms need training data in our use case, but Naïve Bayes uses it for explorations and predictions based on earlier requests such as to determine whether the self-help was useful or not – evaluating both probabilities conditionally.

This is widely used for cases where conditions apply, especially binary conditions such as with or without. If the input variables are independent, if their states can be calculated as probabilities, and if there is at least a predictable output, this algorithm can be applied. The simplicity of computing states by counting for a class using each input variable and then displaying those states against those variables for a given value makes this algorithm easy to visualize, debug and use as a predictor.

The conditional probability can be used both for exploration as well as for prediction. Each input column in the dataset has a state calculated by this algorithm which is then used to assign a state to the predictable column. For example, the availability of a Knowledge Base article might show a distribution of input values significantly different from others which indicates that this is a potential predictor.

The viewer also provides values for the distribution so that KB articles that suggest opening service requests with specific attributes will be easier to follow, act upon, and get resolution. The algorithm can then compute a probability both with and without that criteria.

All that the algorithm requires is a single key column, input columns, independent variables, and at least one predictable column. A sample query for viewing the information maintained by the algorithm for a particular attribute would look like this:

SELECT NODE_TYPE, NODE_CAPTION, NODE_PROBABILITY, NODE_SUPPORT, NODE_SCORE FROM NaiveBayes.CONTENT WHERE ATTRIBUTE_NAME = 'KB_PRESENT';

The use of Bayesian conditional probability is not restricted just to this classifier. It can be used in Association data mining as well.

Implementation:

https://jsfiddle.net/za52wjkv/

No comments:

Post a Comment