This article describes how to evaluate models using Azure
Machine Learning Studio. We evaluate
foundation models using our own test data. Microsoft developed foundation
models are a great way to get started with data analysis on your data. The
Model Catalog is the hub for both foundation models as well as OpenAI models.
It can be used to discover, evaluate, fine tune, deploy and import models. Your
own test data can be used to evaluate these models. The model card on any of
the foundational model can be used to pass in the test data, map the columns
for the input data, based on the schema needed for the task, provide a compute
to run the evaluation on, and submit the job. The results include evaluation
metrics and these can help decide if you would like to fine tune the model
using your own training data. Every pre-trained model from the model catalog
can be fine tuned for a specific set of tasks such as text classification,
token classification, and question answering. The data can be in JSONL, CSV, orTSV
fomat and the steps are just like evaluation except that you will pass in
validation data for validation and test data to evaluate the fine-tuned model.
Once the models are evaluated and fine-tuned, they can be deployed to endpoints
for inferencing. There must be enough quota available for deployment.
OpenAI models differ from the foundation models in that they
require a connection with Azure OpenAI. The process of evaluating, fine-tuning,
and deploying remains the same. An Azure Machine Learning Pipeline can be used
to complete a machine learning tasks which usually consists of three steps:
prepare data, train a model and score the model. The pipeline optimizes the
workflow with speed, portability, and reuse so you can focus on
machine-learning instead of infrastructure and automation. A pipeline comprises
of components for each of the three tasks and is build using the Python SDK v2,
CLI or UI. All the necessary libraries such as azure.identity, azure.ai.ml, and
azure.ai.ml.dsl can be imported. A component is a self-contained piece of code
that does one step in a machine learning pipeline. For each component, we need
to prepare the following: prepare the python script containing the execution
logic, define the interface of the component, and add other metadata of the
component. The interface is defined with the @command_component decorative to
Python functions. The studio UI displays the pipeline as a graph and the
components as blocks. The input_data, training_data, and test_data are the
ports of the component which connect to other components for data streaming.
Training and scoring are defined with their respective Python functions. The
components can also be imported into the code. Once all the components and
input data re loaded, they can be composed into a pipeline.
The Azure Machine Learning Studio allows us to view the
pipeline graph, check its output and debug it. The logs and outputs of each
component are available to study them. Optionally components can be registered
to the workspace so they can be shared and reused.
A pipeline component can also be deployed as a batch endpoint. This is helpful to run
machine learning pipeline from other platforms such as custom Java code, Azure
DevOps, GitHub Actions, and Azure Data Factory. A Batch endpoint serves REST
API so it can be invoked from other platforms. By isolating the pipeline
component as a batch endpoint, we can change the logic of the pipeline without
affecting downstream consumers. A pipeline must first be converted to a pipeline
component before being deployed as a batch endpoint. Time-based schedules can
be used to take care of routine jobs. A schedule associates a job with a
trigger which can be a cron.
No comments:
Post a Comment