Cluster computing

Wednesday, April 7, 2021

Applications of Data Mining to Reward points collection service

Monitoring and pipeline contribute significantly towards streamlining the process and answering questions such as why did the model predict this? When was it trained? Who deployed it? Which release was it deployed in? At what time was the production system updated? What were the changes in the predictions? What did the key performance indicators show after the update? Public cloud services have enabled both ML pipeline and their monitoring. The steps involved in creating a pipeline usually involves configuring a workspace and creating a datastore, downloading and storing sample data, registering, and using objects for transferring intermediate data between pipeline steps, downloading, and registering the model, creating, and attaching the remote computer target, writing a processing script, building the pipeline by setting up the environment and stack necessary to execute the script that is run in this pipeline, creating the configuration to wrap the script, creating the pipeline step with the above-mentioned environment, resource, input and output data, and reference to the script, and submitting the pipeline. Many of these steps are easily automated with the help of built-in objects published by the public cloud services to build and run such a pipeline. A pipeline is a reusable object and one can that can be invoked over the wire with a web request.

Machine learning services collect the same kinds of monitoring data as the other public cloud resources. These logs, metrics, and events can then be collected, routed, and analyzed to tune the machine learning model.

Other than the platform metrics to help monitor and troubleshoot issues with the production deployment of machine learning systems, the model itself may have performance and quality metrics that can be used to evaluate and tune it. These metrics and key performance indicators can be domain-specific such as accuracy which is the ratio of the number of correct predictions to the number of total predictions, confusion matrix of positive and negative predictions for all the class labels in classification, Area under the Receiver Operating Characteristic ROC curve and the area under the ROC curve (AUC), F1 Score using precision and recall, entropy loss, mean squared error and mean absolute error. These steps for the post-processing of predictions are just as important as the data preparation steps for a good performing model.

The following chart makes a comparison of all the data mining algorithms including the neural networks: https://1drv.ms/w/s!Ashlm-Nw-wnWxBFlhCtfFkoVDRDa?e=aVT37e

Thank you.

Cluster computing

Wednesday, April 7, 2021

No comments:

Post a Comment