Cluster computing

Saturday, February 3, 2024

This article describes how to evaluate models using Azure Machine Learning Studio. We evaluate foundation models using our own test data. Microsoft developed foundation models are a great way to get started with data analysis on your data. The Model Catalog is the hub for both foundation models as well as OpenAI models. It can be used to discover, evaluate, fine tune, deploy and import models. Your own test data can be used to evaluate these models. The model card on any of the foundational model can be used to pass in the test data, map the columns for the input data, based on the schema needed for the task, provide a compute to run the evaluation on, and submit the job. The results include evaluation metrics and these can help decide if you would like to fine tune the model using your own training data. Every pre-trained model from the model catalog can be fine tuned for a specific set of tasks such as text classification, token classification, and question answering. The data can be in JSONL, CSV, orTSV fomat and the steps are just like evaluation except that you will pass in validation data for validation and test data to evaluate the fine-tuned model. Once the models are evaluated and fine-tuned, they can be deployed to endpoints for inferencing. There must be enough quota available for deployment.

OpenAI models differ from the foundation models in that they require a connection with Azure OpenAI. The process of evaluating, fine-tuning, and deploying remains the same. An Azure Machine Learning Pipeline can be used to complete a machine learning tasks which usually consists of three steps: prepare data, train a model and score the model. The pipeline optimizes the workflow with speed, portability, and reuse so you can focus on machine-learning instead of infrastructure and automation. A pipeline comprises of components for each of the three tasks and is build using the Python SDK v2, CLI or UI. All the necessary libraries such as azure.identity, azure.ai.ml, and azure.ai.ml.dsl can be imported. A component is a self-contained piece of code that does one step in a machine learning pipeline. For each component, we need to prepare the following: prepare the python script containing the execution logic, define the interface of the component, and add other metadata of the component. The interface is defined with the @command_component decorative to Python functions. The studio UI displays the pipeline as a graph and the components as blocks. The input_data, training_data, and test_data are the ports of the component which connect to other components for data streaming. Training and scoring are defined with their respective Python functions. The components can also be imported into the code. Once all the components and input data re loaded, they can be composed into a pipeline.

The Azure Machine Learning Studio allows us to view the pipeline graph, check its output and debug it. The logs and outputs of each component are available to study them. Optionally components can be registered to the workspace so they can be shared and reused.

A pipeline component can also be deployed as a batch endpoint. This is helpful to run machine learning pipeline from other platforms such as custom Java code, Azure DevOps, GitHub Actions, and Azure Data Factory. A Batch endpoint serves REST API so it can be invoked from other platforms. By isolating the pipeline component as a batch endpoint, we can change the logic of the pipeline without affecting downstream consumers. A pipeline must first be converted to a pipeline component before being deployed as a batch endpoint. Time-based schedules can be used to take care of routine jobs. A schedule associates a job with a trigger which can be a cron.

Cluster computing

Saturday, February 3, 2024

No comments:

Post a Comment