Applications of Data Mining to Reward points
collection service
Continuation of use cases:
In the previous
section, we discussed the training data and the deployment of the trained
model. This does not complete the production system. On the contrary, it is
just the beginning of the lifecycle for that model. Over the time that the
model is used for prediction, it’s accuracy or predictive power may deteriorate.
This occurs due to one of the following three categories of changes: changes in
the concept, changes to the data and changes in the upstream systems. The first
reflects changes to the assumptions made when building the model. As with all
business requirements, they may change over time and the assumptions made
earlier may not hold true or they might need to be improved. For example, the
fraud detection model may have encapsulated a set of policies that might need
to be changed or the statistical model may have made assumptions on the
prediction variable that might need to be redefined. The second type of
deterioration comes from differences in training and test data. Usually, a
70/30 percentage split allows us to find and overcome all the eccentricities in
the data, but the so-called test data is real world data that arrives
continuously unlike the frozen training data. It might change over time or show
preferences and variations that were not known earlier. Such change requires
the model to be tuned. Lastly, the
upstream data changes can be operational changes that change the data quality
and consequently impact the model. These changes and the deterioration caused
to the model are collectively called the drift and the ways to overcome the
drift include ways to measure and actively improve the model. The metrics are
called Model performance metrics and model quality metrics.
Monitoring and
pipeline contribute significantly towards streamlining the process and
answering questions such as why did the model predict this? When was it
trained? Who deployed it? Which release was it deployed in? At what time was
the production system updated? What were the changes in the predictions? What
did the key performance indicators show after the update? Public cloud services
have enabled both ML pipeline and their monitoring. The steps involved in
creating a pipeline usually involves configuring a workspace and creating a
datastore, downloading and storing sample data, registering, and using objects
for transferring intermediate data between pipeline steps, downloading, and
registering the model, creating, and attaching the remote computer target,
writing a processing script, building the pipeline by setting up the
environment and stack necessary to execute the script that is run in this
pipeline, creating the configuration to wrap the script, creating the pipeline
step with the above mentioned environment, resource, input and output data, and
reference to the script, and submitting the pipeline. Many of these steps are
easily automated with the help of builtin objects published by the public cloud
services to build and run such a pipeline. A pipeline is a reusable object and
one can that can be invoked over the wire with a web-request.
Machine learning
services collect the same kinds of monitoring data as the other public cloud
resources. These logs, metrics and events can then be collected, routed, and
analyzed to tune the machine learning model.
The following
chart makes a comparison of all the data mining algorithms including the neural
networks: https://1drv.ms/w/s!Ashlm-Nw-wnWxBFlhCtfFkoVDRDa?e=aVT37e
Thank you.
No comments:
Post a Comment