Sunday, January 17, 2021

When-to-use-what data mining algorithms:

 


The following table summarizes the use of data mining algorithms as it pertains to service requests that an IT department receives from the employees of an organization. The idea in data mining is to apply a data driven, inductive and backward technique to identifying a model.  This is different from forward deductive methods in that those build model first, then deduce conclusions and then match with data. If there’s a mismatch between the model prediction and reality, the model would then be tuned.

Data Mining Algorithms

Description

Use case

Classification algorithms

This is useful for finding similar groups based on discrete variables

It is used for true/false binary classification. Multiple label classifications are also supported. There are many techniques, but the data should have either distinct regions on a scatter plot with their own centroids or if it is hard to tell, scan breadth first for the neighbors within a given radius forming trees or leaves if they fall short.

 

Useful for categorization of service requests beyond the nomenclature. Primary use case is to see clusters of service request that match based on features. By translating to a vector space and assessing the quality of cluster with a sum of square of errors, it is easy to analyze large number of requests as belonging to specific clusters for management perspective.

Regression algorithms

This is very useful to calculate a linear relationship between a dependent and independent variable, and then use that relationship for prediction.

IT service requests demonstrate elongated scatter plots in specific categories. Even when the service requests come demanding different resolutions in the same category, the relief times are bounded and can be plotted along the timeline. One of the best advantages of linear regression is the prediction about time as an independent variable. When the data point has many factors contributing to their occurrence, a linear regression gives an immediate ability to predict where the next occurrence may happen. This is far easier to do than come with up a model that behaves like a good fit for all the data points.

No comments:

Post a Comment