Tuesday, January 19, 2021

When-to-use-what data mining algorithms:

 

Outliers Mining Algorithm 

Outliers are the rows that are most dissimilar. Given a relation R(A1, A2, ..., An), and a similarity function between rows of R, find rows in R which are dissimilar to most point in R. The objective is to maximize dissimilarity function in with a constraint on the number of outliers or significant outliers if given.  
The choices for similarity measures between rows include distance functions such as Euclidean, Manhattan, string-edits, graph-distance etc and L2 metrics. The choices for aggregate dissimilarity measures is the distance of K nearest neighbors, density of neighborhood outside the expected range and the attribute differences with nearby neighbors 

The steps to determine outliers can be listed as: 1. Cluster  regular via K-means, 2.  Compute distance of each tuple in R to nearest cluster center and 3. choose top-K rows, or those with scores outside the expected range. Finding outliers is sometimes humanly impossible because the volume of the cases is quite high. Outliers are important to discover new strategies to encompass them. If there are numerous outliers, they will significantly increase organizational costs. If they were not, then the patterns help identify efficiencies. 

Decision tree 

This is probably one of the most heavily used and easy to visualize mining algorithm. The decision tree is both a classification and a regression tree. A function divides the rows into two datasets based on the value of a specific column. The two list of rows that are returned are such that one set matches the criteria for the split while the other does not. When the attribute to be chosen is clear, this works well. 

A Decision Tree algorithm uses the attributes of the service requests to make a prediction such as the relief time on a case resolution. The ease of visualization of split at each level helps throw light on the importance of those attributes.  This information becomes useful to prune the tree and to draw the tree 

Monday, January 18, 2021

When-to-use-what data mining algorithms:

 

Segmentation algorithms

A segmentation algorithm divides data into groups or clusters or items that have similar properties.

Customer segmentation based on service request feature set is a very common application of this algorithm. It helps prioritize the response to certain customers.

Association algorithms

This is used for finding correlations between different attributes in a data set

Association data mining allows these users to see helpful messages such as “users who opened a ticket for this problem type also opened a ticket for this other problem type”

Sequence Analysis Algorithms

This is used for finding groups via paths in sequences. A Sequence Clustering algorithm is like a clustering algorithm mentioned above but instead of finding groups based on similar attributes, it finds groups based on similar paths in a sequence.  A sequence is a series of events. For example, a series of web clicks by a user is a sequence. It can be also be compared to the IDs of any sortable data maintained in a separate table. Usually, there is support for a sequence column. The sequence data has a nested table that contains a sequence ID which can be any sortable data type.

This is very useful to find sequences of service requests opened across customers. Generally, a network failure could result in a database connection failure which could lead to an application failure. This sort of sequence determination in a data driven manner helps find new sequences and target them actively even suggesting the same to the customers who open the request so that they can be better prepared.

Sequence Analysis also helps with ChatBot experience for IT users as described here.

Sunday, January 17, 2021

When-to-use-what data mining algorithms:

 


The following table summarizes the use of data mining algorithms as it pertains to service requests that an IT department receives from the employees of an organization. The idea in data mining is to apply a data driven, inductive and backward technique to identifying a model.  This is different from forward deductive methods in that those build model first, then deduce conclusions and then match with data. If there’s a mismatch between the model prediction and reality, the model would then be tuned.

Data Mining Algorithms

Description

Use case

Classification algorithms

This is useful for finding similar groups based on discrete variables

It is used for true/false binary classification. Multiple label classifications are also supported. There are many techniques, but the data should have either distinct regions on a scatter plot with their own centroids or if it is hard to tell, scan breadth first for the neighbors within a given radius forming trees or leaves if they fall short.

 

Useful for categorization of service requests beyond the nomenclature. Primary use case is to see clusters of service request that match based on features. By translating to a vector space and assessing the quality of cluster with a sum of square of errors, it is easy to analyze large number of requests as belonging to specific clusters for management perspective.

Regression algorithms

This is very useful to calculate a linear relationship between a dependent and independent variable, and then use that relationship for prediction.

IT service requests demonstrate elongated scatter plots in specific categories. Even when the service requests come demanding different resolutions in the same category, the relief times are bounded and can be plotted along the timeline. One of the best advantages of linear regression is the prediction about time as an independent variable. When the data point has many factors contributing to their occurrence, a linear regression gives an immediate ability to predict where the next occurrence may happen. This is far easier to do than come with up a model that behaves like a good fit for all the data points.

Saturday, January 16, 2021

Predicting relief time on service tickets – nuances between decision tree and time-series algorithms – a data science essay (continued...)

 Logistic regression differs from the Regression techniques in the use of the statistical measures. Regression is very useful to calculate a linear relationship between a dependent and independent variable, and then use that relationship for prediction. IT service requests demonstrate elongated scatter plots in specific categories. Even when the service requests come demanding different resolutions in the same category, the relief times are bounded and can be plotted along the timeline. One of the best advantages of linear regression is the prediction about time as an independent variable. When the data point has many factors contributing to their occurrence, a linear regression gives an immediate ability to predict where the next occurrence may happen. This is far easier to do than come with up a model that behaves like a good fit for all the data points.

Another use of statistical regression technique in the data mining of IT service tickets is the case when the factors are beyond the control of the IT department such as holidays, human resources, response to high severity incidents and outages, critical vulnerability response and other parameters that are not part of the routine. It is easier to parameterize these for their probabilities and compare them with a model that otherwise considers only the routine response times.

Thus, we see that the choice of a data mining algorithm is strictly based on the articulation of its associated use case.

Comparisions between other high-level algorithms are described here: https://1drv.ms/w/s!Ashlm-Nw-wnWxBFlhCtfFkoVDRDa?e=aVT37e 


Friday, January 15, 2021

Predicting relief time on service tickets – nuances between decision tree and time-series algorithms – a data science essay (continued...)

 Associations work for sequences also. A Sequence Clustering algorithm is like a clustering algorithm mentioned above but instead of finding groups based on similar attributes, it finds groups based on similar paths in a sequence.  A sequence is a series of events. For example, a series of web clicks by a user is a sequence. It can be also be compared to the IDs of any sortable data maintained in a separate table. Usually, there is support for a sequence column. The sequence data has a nested table that contains a sequence ID which can be any sortable data type.

It is a hybrid algorithm that combines clustering techniques with Markov chain analysis. A Markov chain/model is a directed graph that stores the transitions between different states. The graph is sufficient to capture transitions for all sequences. This algorithm examines all the transition probabilities and calculates the distance between all possible sequences in a dataset in order to determine those that are best for clustering. From these candidates, it uses the sequence information as an input for clustering just like we have centroids for coherent clusters. Any metric can be used for clustering and Expectation Maximization suits well.

When a model is trained, the results can be stored as set of patterns. The most common sequences in data are used to predict the next likely step of a new sequence. 

The sequence clustering algorithm allows several fine tunings. These include controlling the number of clusters, reducing the number of sequences included as attributes, grouping related attributes for the model to be simpler, controlling the length of the sequences, programmatically reducing the value of n in the n-order Markov chain, storing only the probabilities that exceed the threshold.  Using Recursive Neural Network, the state from the sequences can be used to build both an encoder and a decoder.

There are hints we can provide the model about the data such as the column value cannot be null or that it may be missing or existing.  The sequence information is stored as a nested table and it must have a single Key Sequence column. Both the case table and the nested table are sorted in the ascending order on the key that relates the table.

Logistic regression can also be applied to IT requests. This is a form of regression that supports binary outcomes. It uses statistical measures, is highly flexible, takes any kind of input and supports different analytical tasks. This regression folds the effects of extreme values and evaluates several factors that affects a pair of outcomes. IT requests based on demographics can be used to predict the likelihood of a category of request from a customer. It can also be used for finding repetitions in requests  


Thursday, January 14, 2021

Predicting relief time on service tickets – nuances between decision tree and time-series algorithms – a data science essay (continued...)

 The viewer also provides values for the distribution so that KB articles that suggest opening service requests with specific attributes will be easier to follow, act upon, and get resolution. The algorithm can then compute a probability both with and without that criteria 

All that the algorithm requires is a single key column, input columns, independent variables, and at least one predictable column. A sample query for viewing the information maintained by the algorithm for a particular attribute would look like this: 

SELECT NODE_TYPE, NODE_CAPTION, NODE_PROBABILITY, NODE_SUPPORT, NODE_SCORE FROM NaiveBayes.CONTENT WHERE ATTRIBUTE_NAME = 'KB_PRESENT';  

The use of Bayesian conditional probability is not restricted just to this classifier. It can be used in Association data mining as well. The classifiers differ in the metrics used and the nature of prediction. Association data mining relies on the computation of two columns namely Support and Probability. Support defines the percentage of cases in which a rule must exist before it is considered valid. For example, a rule must be found in at least 1 percent of cases. 

Probability defines how likely an association must be before it is considered valid. Any association can be considered if it has a probability of at least 10 percent. 

Associations have association rules formed with a pair of antecedent and consequent itemsets so named because we want to find the support of taking one item with another. Let I be a set of items, T be a set of transactions. Then an association A is defined as a subset of I occurs together in T. Support (S1) is a fraction of T containing S1. Let S1 and S2 be subsets of I, then association rule to associate S1 to S2 has a support(S1->S2) defined as Support(S1 union S2) and a confidence (S1->S2) = Support(S1 union S2)/ Support(S1).  A third metric Lift is determined as Confidence(S1->S2)/Support(S2) and is preferred because a popular S1 gives high confidence for any S2 and lift corrects that by having a value greater than 1.0 when S2 is also significant. 

While Naïve Bayes Classifier predicts an attribute of the test data, the association data mining finds associations between data such as users who opened a ticket for this problem type also opened a ticket for this other problem type. 

Wednesday, January 13, 2021

Predicting relief time on service tickets – nuances between decision tree and time-series algorithms – a data science essay (continued...)

In this way, a decision tree uses the attributes of the service requests to make a prediction on the relief time. A time-series algorithm, on the other hand, does not need any attributes other than the historical collection of relief times to be able to predict the next relief time. It only looks at scalar value regardless of the type of factors playing into the relief time of an individual request. The historical data is utilized to predict an estimation of the incoming event as if the relief were a scatter plot along the timeline. Unlike other data mining algorithms that involve additional attributes of the event, this approach uses a single auto-regressive method on the continuous data to make a short-term prediction. The regression is automatically trained as the data accrue. 

Comparing the decision tree and the time-series to the Naïve Bayes Classifier, it is easy to see that while these two algorithms work with new rows, the Bayes classifier works with attributes of the rows against the last columns as the predictor. Although linear regressions are useful in the prediction of a variable, Naïve Bayes builds on conditional states across attributes and are easy to visualize which allows experts to show the reasoning process and allows users to judge the quality of prediction. All these algorithms need training data in our use case, but Naïve Bayes uses it for explorations and predictions based on earlier requests such as to determine whether the self-help was useful or not – evaluating both probabilities conditionally.  


The conditional probability can be used both for exploration as well as for prediction. Each input column in the dataset has a state calculated by this algorithm which is then used to assign a state to the predictable column.  For example, the availability of a Knowledge Base article might show a distribution of input values significantly different from others which indicates that this is a potential predictor. 


The viewer also provides values for the distribution so that KB articles that suggest opening service requests with specific attributes will be easier to follow, act upon, and get resolution. The algorithm can then compute a probability both with and without that criteria.