Introduction: Service requests are opened by customers who report a problem and request mitigation. The IT department is a magnet for virtually all computing, storage, and networking related tasks requested by its customers. It is usually far more than the IT team can resolve in an easy manner. In this regard, the IT teams look for automation that can provide self-service capabilities to users. Association data mining allows these users to see helpful messages such as “users who opened a ticket for this problem type also opened a ticket for this other problem type”. This article describes the implementation aspect of this data mining technique.
Description:
The centerpiece of this solution relies on the computation of two columns namely Support and Probability. Support defines the percentage of cases in which a rule must exist before it is considered valid. We define that a rule must be found in at least 1 percent of cases.
Probability defines how likely an association must be before it is considered valid. We will consider any association with a probability of at least 10 percent.
Bayesian conditional probability and confidence can also be used. Associations have association rules formed with a pair of antecedent and consequent item-sets, so named, because we want to find the value of taking one item with another. Let I be a set of items, T be a set of transactions. Then an association A is defined as a subset of I that occurs together in T. Support (S1) is a fraction of T containing S1. Let S1 and S2 be subsets of I, then association rule to associate S1 to S2 has a support(S1->S2) defined as Support(S1 union S2) and a confidence (S1->S2) = Support(S1 union S2)/ Support(S1). A third metric Lift is determined as Confidence(S1->S2)/Support(S2) and is preferred because a popular S1 gives high confidence for any S2 and lift corrects that by having a value greater than 1.0 when S2 is also significant.
Certain databases allow the creation of association models that can be persisted and evaluated against each incoming request. Usually, a training/testing data split of 70/30% is used in this regard.
Without the predictions, association rules can be evaluated with a Cartesian product of all known problem types and evaluating their probability and support. The static rules can then be selected based on their support for the top ten and can even be included in the display to the customers.
No comments:
Post a Comment