Tuesday, January 12, 2021

Predicting relief time on service tickets – nuances between decision tree and time-series algorithms – a data science essay

 

Problem statement: Service requests are opened by customers who report a problem and request mitigation. The IT department is a magnet for virtually all computing, storage, and networking related tasks requested by the employees of a company. These requests are usually far more than the IT team can resolve in an easy manner. In this regard, the IT teams look for automation that can provide self-service capabilities to users and set their expectations. One approach to inform the ticket creator is the estimation of relief time on the opened request. A decision tree allows for predictions based on earlier requests such as the estimation of relief time based on past attributes for earlier requests. A time-series algorithm with an auto-regressive model is also useful in the prediction of this relief time. There are nuances between the two that this article explores.

Solution: The decision tree is both a classification and a regression tree. A function divides the rows into two datasets based on the value of a specific column. The two list of rows that are returned are such that one set matches the criteria for the split while the other does not. When the attribute to be chosen is clear, this works well.

To see how good an attribute is, the entropy of the whole group is calculated.  Then the group is divided by the possible values of each attribute and the entropy of the two new groups are calculated. The determination of which attribute is best to divide on, the information gain is calculated which is the difference between the current entropy and the weighted-average entropy of the two new groups. The algorithm calculates the information gain for every attribute and chooses the one with the highest information gain.

Each set is subdivided only if the recursion of the above step can proceed. The recursion is terminated if a solid conclusion has been reached which is a way of saying that the information gain from splitting a node is no more than zero. The branches keep dividing, creating a tree by calculating the best attribute for each new node. If a threshold for entropy is set, the decision tree is ‘pruned’. 


No comments:

Post a Comment