Friday, January 15, 2021

Predicting relief time on service tickets – nuances between decision tree and time-series algorithms – a data science essay (continued...)

 Associations work for sequences also. A Sequence Clustering algorithm is like a clustering algorithm mentioned above but instead of finding groups based on similar attributes, it finds groups based on similar paths in a sequence.  A sequence is a series of events. For example, a series of web clicks by a user is a sequence. It can be also be compared to the IDs of any sortable data maintained in a separate table. Usually, there is support for a sequence column. The sequence data has a nested table that contains a sequence ID which can be any sortable data type.

It is a hybrid algorithm that combines clustering techniques with Markov chain analysis. A Markov chain/model is a directed graph that stores the transitions between different states. The graph is sufficient to capture transitions for all sequences. This algorithm examines all the transition probabilities and calculates the distance between all possible sequences in a dataset in order to determine those that are best for clustering. From these candidates, it uses the sequence information as an input for clustering just like we have centroids for coherent clusters. Any metric can be used for clustering and Expectation Maximization suits well.

When a model is trained, the results can be stored as set of patterns. The most common sequences in data are used to predict the next likely step of a new sequence. 

The sequence clustering algorithm allows several fine tunings. These include controlling the number of clusters, reducing the number of sequences included as attributes, grouping related attributes for the model to be simpler, controlling the length of the sequences, programmatically reducing the value of n in the n-order Markov chain, storing only the probabilities that exceed the threshold.  Using Recursive Neural Network, the state from the sequences can be used to build both an encoder and a decoder.

There are hints we can provide the model about the data such as the column value cannot be null or that it may be missing or existing.  The sequence information is stored as a nested table and it must have a single Key Sequence column. Both the case table and the nested table are sorted in the ascending order on the key that relates the table.

Logistic regression can also be applied to IT requests. This is a form of regression that supports binary outcomes. It uses statistical measures, is highly flexible, takes any kind of input and supports different analytical tasks. This regression folds the effects of extreme values and evaluates several factors that affects a pair of outcomes. IT requests based on demographics can be used to predict the likelihood of a category of request from a customer. It can also be used for finding repetitions in requests  


No comments:

Post a Comment