Decision Tree modeling on IT Service requests:
Introduction: Service requests are opened by customers who report a problem and request mitigation. The IT department is a magnet for virtually all computing, storage, and networking related tasks requested by its customers. It is usually far more than the IT team can resolve quickly. In this regard, the IT teams look for automation that can provide self-service capabilities to users. The decision tree algorithm allows for predictions based on earlier requests such as the estimation of relief time based on past attributes for earlier requests. This article describes the implementation aspect of decision tree modeling. Although linear regressions are useful in the prediction of a single column variable, decision trees can build on their own and are easy to visualize which allows experts to show the reasoning process and allows users to judge the quality of prediction.
Description:
The centerpiece of this technique involves both a classification and a regression tree. A function divides the rows into two datasets based on the value of a specific column. The two lists of rows that are returned are such that one set matches the criteria for the split while the other does not. When the attribute to be chosen is clear, this works well.
To see how good an attribute is, the entropy of the whole group is calculated. Then the group is divided by the possible values of each attribute and the entropy of the two new groups is calculated. The determination of which attribute is best to divide on, the information gain is calculated which is the difference between the current entropy and the weighted-average entropy of the two new groups. The algorithm calculates the information gain for every attribute and chooses the one with the highest information gain.
Each set is subdivided only if the recursion of the above step can proceed. The recursion is terminated if a solid conclusion has been reached which is a way of saying that the information gain from splitting a node is no more than zero. The branches keep dividing, creating a tree by calculating the best attribute for each new node. If a threshold for entropy is set, the decision tree is ‘pruned’.
When working with a set of tuples, it is easier to reserve the last one for results during a recursion level. Text and numeric data do not have to be differentiated for this algorithm to run. The algorithm takes all the existing rows and assumes the last row is the target value. A training/testing dataset is used with the application for each dataset. Usually, a training/testing data split of 70/30% is used in this regard.
No comments:
Post a Comment