Cluster computing

Thursday, April 8, 2021

Applications of Data Mining to Reward points collection service

Other than the platform metrics to help monitor and troubleshoot issues with the production deployment of machine learning systems, the model itself may have performance and quality metrics that can be used to evaluate and tune it. These metrics and key performance indicators can be domain-specific such as accuracy which is the ratio of the number of correct predictions to the number of total predictions, confusion matrix of positive and negative predictions for all the class labels in classification, Area under the Receiver Operating Characteristic ROC curve and the area under the ROC curve (AUC), F1 Score using precision and recall, entropy loss, mean squared error and mean absolute error. These steps for the post-processing of predictions are just as important as the data preparation steps for a good performing model.

One of the often-overlooked aspects of deploying machine learning models in production is that a good infrastructure alleviates the concerns from model deployment, tuning, and performance improvements. A platform such as a container orchestration framework such as Docker and a resource reconciliation framework such as Kubernetes removes all the concerns about load-balancing, scalability, HTTP proxy, ingress control, and monitoring as the model hosted in a container can be scaled out to as many running instances as needed while the infrastructure is better positioned to meet the demands of the peak load with no semantic changes to the model.

Features and labels are helpful for the evaluation of the model as well. When the data comes from IoT sensors, it is typically streaming in nature. This makes it suitable for streaming stacks such as Apaches Kafka and Flink. The production models are loaded in a scoring pipeline to get predicted product quality.

The drift is monitored by joining product quality labels and predicted quality labels and summarized over a time window to trend model quality. Multiple such KPIs can be used to cover the scoring criteria. For example, a lagging indicator could determine if the actual labels are lagging behind arrive delayed compared to predicted labels. Thresholds can be set for the delay to specify the acceptable business requirements and to trigger a notification or alert.

The following chart makes a comparison of all the data mining algorithms including the neural networks: https://1drv.ms/w/s!Ashlm-Nw-wnWxBFlhCtfFkoVDRDa?e=aVT37e

Thank you.

Cluster computing

Thursday, April 8, 2021

No comments:

Post a Comment