Thursday, January 14, 2021

Predicting relief time on service tickets – nuances between decision tree and time-series algorithms – a data science essay (continued...)

 The viewer also provides values for the distribution so that KB articles that suggest opening service requests with specific attributes will be easier to follow, act upon, and get resolution. The algorithm can then compute a probability both with and without that criteria 

All that the algorithm requires is a single key column, input columns, independent variables, and at least one predictable column. A sample query for viewing the information maintained by the algorithm for a particular attribute would look like this: 

SELECT NODE_TYPE, NODE_CAPTION, NODE_PROBABILITY, NODE_SUPPORT, NODE_SCORE FROM NaiveBayes.CONTENT WHERE ATTRIBUTE_NAME = 'KB_PRESENT';  

The use of Bayesian conditional probability is not restricted just to this classifier. It can be used in Association data mining as well. The classifiers differ in the metrics used and the nature of prediction. Association data mining relies on the computation of two columns namely Support and Probability. Support defines the percentage of cases in which a rule must exist before it is considered valid. For example, a rule must be found in at least 1 percent of cases. 

Probability defines how likely an association must be before it is considered valid. Any association can be considered if it has a probability of at least 10 percent. 

Associations have association rules formed with a pair of antecedent and consequent itemsets so named because we want to find the support of taking one item with another. Let I be a set of items, T be a set of transactions. Then an association A is defined as a subset of I occurs together in T. Support (S1) is a fraction of T containing S1. Let S1 and S2 be subsets of I, then association rule to associate S1 to S2 has a support(S1->S2) defined as Support(S1 union S2) and a confidence (S1->S2) = Support(S1 union S2)/ Support(S1).  A third metric Lift is determined as Confidence(S1->S2)/Support(S2) and is preferred because a popular S1 gives high confidence for any S2 and lift corrects that by having a value greater than 1.0 when S2 is also significant. 

While Naïve Bayes Classifier predicts an attribute of the test data, the association data mining finds associations between data such as users who opened a ticket for this problem type also opened a ticket for this other problem type. 

No comments:

Post a Comment