Cluster computing

Applications of Data Mining to Reward points collection service

Problem statement: As with any data collected by a web service, analysis is not restricted to mere queries on the accumulated data. Deep learning techniques and data mining provide insights that can empower organizations beyond the employee appreciation for which the service is used. We review some of these uses cases in this article.

Solution: Data mining is a tried and tested method for gaining insights into relational data. The Reward points are collected with a relation between users and their accumulated peer appreciation. Standard mining techniques such as Clustering, sequence mining, decision tree, regression, segmentation, and association algorithms provide a lot of insights. When there are many choices between data mining algorithms that can be applied to a given dataset, it might require some exploration of the data.

If the use case was well articulated, the choice for the data mining algorithm becomes immediately clear. The use case becomes clear only when the data is well-known and the objective for the business purpose is known. Usually, only the latter is mentioned such as the prediction of an attribute associated with the data. For example, the dataset, suitable for supervised learning could have labels that are best determined with some exploration of training data. These techniques are required to determine the rules with which to assign labels to the raw data. If the rules were available for business purposes, then the assignment of labels is merely an automation task and helps prepare the training set for the data.

In the absence of business rules to assign labels to the data, the dataset for data mining is usually large and cannot be compared by mere inspection. Some visualization tools are necessary. In this regard, two algorithms stand out for making this task easier. First, the decision tree algorithm can be used to find the relationships between the rows, and the visualization in the form of attributes that are significant to the outcome can be established. The tree can be pruned to see which attributes matter and which do not matter. The split of the nodes on each level helps visualize the relative strength of those attributes across rows. This is very helpful when the tree is generated without supervision.

The other algorithm is the use of the Naive Bayes Classifier to assign data. This classifier is helpful to explore data, finding relationships between input columns and predictable columns, and then using the initial exploration to create additional algorithms. Since it compares across columns for a given row, it evaluates the binary probabilities for with and without that attribute in each column.

Together these attributes can help with the initial exploration of data to choose the right algorithm for a given purpose. Usually, the split between training data and test data for the purpose of prediction, is 70% for training data and 30% for test data.

Some use cases that stand out from the others for reward points collection service include the following. Classification algorithms are useful for the categorization of reward point assignments based on source attributes. The primary use case is to see clusters of appreciation patterns that match based on attributes. By translating to a vector space and assessing the quality of a cluster with a sum of squares of errors, it is easy to analyze a large number of grants as belonging to specific clusters which can provide insight to management for group dynamics. Reward points grant for a user demonstrate elongated scatter plots in specific categories. Even when the grants come from varying contexts, the time to next appreciation can be plotted along the timeline. One of the best advantages of linear regression is the prediction of time as an independent variable. When the data point has many factors contributing to their occurrence, a linear regression gives an immediate ability to predict where the next occurrence may happen. This is far easier to do than come with up a model that behaves like a good fit for all the data points. Customer segmentation based on reward points is a very common application of this algorithm. It helps prioritize the response to certain customers. Association data mining allows the management of an organization to see helpful patterns such as “employees who appreciated this user also appreciated this other user”. Sequence clustering can be used for patterns of appreciation from the same user. With these examples, it is possible for organizations to understand and appreciate what may be missing from mere performance evaluation from the organizational hierarchy.

Conclusion: The rewards point service is an investment for the organization where the costs and benefits are both improved by promoting organizational health, satisfaction, and productivity. The use of data mining algorithms on this data also empowers the management with better knowledge of group dynamics.

Cluster computing

Monday, March 29, 2021

No comments:

Post a Comment