Tuesday, January 12, 2021

Predicting relief time on service tickets – nuances between decision tree and time-series algorithms – a data science essay

 

Problem statement: Service requests are opened by customers who report a problem and request mitigation. The IT department is a magnet for virtually all computing, storage, and networking related tasks requested by the employees of a company. These requests are usually far more than the IT team can resolve in an easy manner. In this regard, the IT teams look for automation that can provide self-service capabilities to users and set their expectations. One approach to inform the ticket creator is the estimation of relief time on the opened request. A decision tree allows for predictions based on earlier requests such as the estimation of relief time based on past attributes for earlier requests. A time-series algorithm with an auto-regressive model is also useful in the prediction of this relief time. There are nuances between the two that this article explores.

Solution: The decision tree is both a classification and a regression tree. A function divides the rows into two datasets based on the value of a specific column. The two list of rows that are returned are such that one set matches the criteria for the split while the other does not. When the attribute to be chosen is clear, this works well.

To see how good an attribute is, the entropy of the whole group is calculated.  Then the group is divided by the possible values of each attribute and the entropy of the two new groups are calculated. The determination of which attribute is best to divide on, the information gain is calculated which is the difference between the current entropy and the weighted-average entropy of the two new groups. The algorithm calculates the information gain for every attribute and chooses the one with the highest information gain.

Each set is subdivided only if the recursion of the above step can proceed. The recursion is terminated if a solid conclusion has been reached which is a way of saying that the information gain from splitting a node is no more than zero. The branches keep dividing, creating a tree by calculating the best attribute for each new node. If a threshold for entropy is set, the decision tree is ‘pruned’. 


Monday, January 11, 2021

The role of linear regression continued..

 Service Requests have many   attributes in addition to category. These attributes include joins with other tables that describe the affected product, its version, the number of customers impacted and the likes of this place. When the service requests are filtered by one or more of these attributes, they tend to form requests that display scatter plots suitable for linear regression. It is very helpful to use these linear regressions to predict the next data point especially given that the data points are progressing along the timeline.  

The prediction parameter need not restrict itself to relief time. It can be used for any parameter that affects the next service request. These include cost, duration, knowledge base, or any suitable attribute which can give an indication to the customer up front. The opening of a service request is an indication of a concern for the customer and any information provided to the customer that alleviates such concerns with the help of setting expectations or providing self-service hints will be tremendously appreciated. For example, if the customer knows an estimate for the relief time of the incident based on the linear regression of the past service requests of this nature, then the customer can wait to poll the resolver. If the duration for a specific step of the relief is indicated, then the customer may even willingly wait longer. 


The number of incidents that may appear in certain filtered categories might be very small. A linear regression on fewer data set is prone to error. But the regression can be reevaluated when more data points accrue in this narrow category. This calls for an application to use the linear regression in an automatic manner. Such auto-regressors may even be run on every new datapoint. 

Sunday, January 10, 2021

The role of linear regression continued..

 Non-linear equations can also be "linearized" by selecting a suitable change of variables.  This is quite popular because it makes the analysis simpler. But reducing the dimensions is prone to distortion of the error structure. It is an oversimplification of the model.  It violates key assumptions and impacts the resulting parameter values. All of this contributes toward incorrect predictions and are best avoided.  Non-linear squares analysis has well-defined techniques that are not too difficult with computing. Therefore, it is better to do non-linear square analysis when dealing with non-linear inverse models. 

Linear regression differs from correlation. The correlation coefficient describes the strength of the association between two variables. If the two variables, the correlation coefficient tends to +1. If one decreases as the other increases, the correlation coefficient tends to -1.  If they are not related to one another, the correlation coefficient stays at zero. In addition, the correlation coefficient can be related to the results of the regression. This is helpful because we now find a correlation not between parameters but between our notions of cause and effect. This also leads us to use the correlation between any x and y which are not necessarily independent and dependent variables.  This follows from the fact that the correlation coefficient (denoted by r) is symmetric in x and y. This differentiates the coefficient from the regression.

Service Requests have many attributes in addition to category. These attributes include joins with other tables that describe the affected product, its version, the number of customers impacted and the likes of this place. When the service requests are filtered by one or more of these attributes, they tend to form requests that display scatter plots suitable for linear regression. It is very helpful to use these linear regressions to predict the next data point especially given that the data points are progressing along the timeline. 


Friday, January 8, 2021

The role of linear regression in IT service requests:

Problem statement: IT service requests relief time can be plotted all over the chart ranging from a few seconds to well over months. The nature of the service request, it’s priority and severity determine the response towards remediation. Since they come from all quarters, the relief time does not necessarily follow a line. Linear regression is suitable for data that lies somewhat coherently on the chart and linear along the line. Let us see how to apply linear regression.  


Solution: IT service requests demonstrate such distribution in specific categories. Even when the requests come demanding different resolutions in the same category, the relief times are bounded and can be plotted along the timeline. One of the best advantages of linear regression is the prediction about time as an independent variable. When the data point has many factors contributing to their occurrence, a linear regression gives an immediate ability to predict where the next occurrence may happen. This is far easier to do than come with up a model that behaves like a good fit for all the data points. It gives an indication of the trend which is generally more helpful than the data points themselves. Also, a scatter plot is only changing in one dependent variable in conjunction with the independent variable. This lets us pick the dimension we consider fitting the linear regression independent of others. Lastly, the linear regression also gives an indication of how much the data is adhering to the trend via the estimation of errors.  


To determine the best parameters for the slope and intercept of the line, we calculate the partial derivatives with respect to them and set them to zero. This yields two equations to be solved for two unknowns. The standard error of the estimate quantifies the standard deviation of the data at a given value of the independent variable. The standard error of slope and intercept can be used to place confidence intervals.  

Thursday, January 7, 2021

Applying Naive Bayes ... (continued)

 Comparing the decision tree and the time-series to the Naïve Bayes Classifier, it is easy to see that while these two algorithms work with new rows, the Bayes classifier works with attributes of the rows against the last columns as the predictor. Although linear regressions are useful in the prediction of a variable, Naïve Bayes build on conditional states across attributes and are easy to visualize which allows experts to show the reasoning process and allows users to judge the quality of prediction. All these algorithms need training data in our use case, but Naïve Bayes uses it for explorations and predictions based on earlier requests such as to determine whether the self-help was useful or not – evaluating both probabilities conditionally.  

The conditional probability can be used both for exploration as well as for prediction. Each input column in the dataset has a state calculated by this algorithm which is then used to assign a state to the predictable column.  For example, the availability of a Knowledge Base article might show a distribution of input values significantly different from others which indicates that this is a potential predictor. 

The viewer also provides values for the distribution so that KB articles that suggest opening service requests with specific attributes will be easier to follow, act upon and get resolution. The algorithm can then compute a probability both with and without that criteria.  

All that the algorithm requires is a single key column, input columns, independent variables, and at least one predictable column. A sample query for viewing the information maintained by the algorithm for a particular attribute would look like this: 

SELECT NODE_TYPE, NODE_CAPTION, NODE_PROBABILITY, NODE_SUPPORT, NODE_SCORE FROM NaiveBayes.CONTENT WHERE ATTRIBUTE_NAME = 'KB_PRESENT';  

Wednesday, January 6, 2021

Applying Naïve Bayes data mining technique

 Applying Naïve Bayes data mining technique for IT service request  

Naïve Bayes algorithm is a statistical probability-based data mining algorithm and is considered somewhat easier to understand and visualize as compared to others in its family.   

The probability is a mere fraction of interesting cases to total cases. Bayes probability is a conditional probability that adjusts the probability based on the premise. If the premise is to take a factor into account, we get one conditional probability and if we don’t take the factor into account, we get another probability. Naïve Bayes builds on conditional states across attributes and are easy to visualize. This allows experts to show the reasoning process and it allows users to judge the quality of prediction. All these algorithms need training data in our use case, but Naïve Bayes uses it for explorations and predictions based on earlier requests such as to determine whether the self-help was useful or not – evaluating both probabilities conditionally.  

This is widely used for cases where conditions apply, especially binary conditions such as with or without. If the input variables are independent, if their states can be calculated as probabilities, and if there is at least a predictable output, this algorithm can be applied. The simplicity of computing states by counting for a class using each input variable and then displaying those states against those variables for a given value makes this algorithm easy to visualize, debug and use as a predictor.     

The conditional probability can be used both for exploration as well as for prediction. Each input column in the dataset has a state calculated by this algorithm which is then used to assign a state to the predictable column.  For example, the availability of a Knowledge Base article might show a distribution of input values significantly different from others which indicates that this is a potential predictor.  

The viewer also provides values for the distribution so that KB articles that suggest opening service requests with specific attributes will be easier to follow, act upon, and get resolution. The algorithm can then compute a probability both with and without that criteria.   

Tuesday, January 5, 2021

Performing Association Data mining on IT service requests continued ...

We were discussing that association data mining allows IT users to see helpful messages such as “users who opened a ticket for this problem type also opened a ticket for this other problem type”. This article describes the implementation aspect of this data mining technique.  

Evaluating the three metrics for each of the association results in an Association.content table where product pairs have support, confidence and lift. Then the associations can be filtered to have a lift > 1.0 

The apriori algorithm works from the superset down to the associations with the required size for antecedent, consequent item sets. In this case, we have both itemsets of size 1 each. The idea behind the apriori algorithm is that if the inclusion of an item in an itemset is not increasing the lift of that itemset, it will not increase the lift for any subsets formed from that itemset where each subset has that item. This way the cartesian products of antecedent-consequent itemsets can be trimmed by eliminating those consequents where that item is present. Consider several layers of cartesian products formed where the consequent itemset grows by 1 for each layer and the elimination of one or more associations at each layer, then the consequent grows to the optimum size in the final set. In our case, we require only one layer and associations can be sorted in the descending order based on the lift. 

SELECT A.name, B.name from Associations.Content  

ORDER BY lift_y_x DESC 

LIMIT 10; 


Sample Implementation: https://jsfiddle.net/g2snw4da/