Azure Machine Learning workspace integrates with Azure
container registry, Azure KeyVault, Azure Storage account and Azure Analytics
Insights. Model building requires exploratory data analysis, data-preprocessing
and prototyping to validate hypotheses. What makes it different as an
interactive and experimentation ML platform from others such as databricks
workspace, is that it aims to provide a unified seamless experience with its
own libraries to automate much of the tasks needed to accomplish a machine
learning model that serves the business needs.
For example, the following code automates creation of
compute needed to build a model.
from azureml.core.compute import
AmlCompute, ComputeTarget
amlcompute_cluster_name="cpu-cluster"
provisioning_config =
AmlCompute.provisioning_configuration(vm_size="STANDARD_D2_V2",
max_nodes=1)
compute_target =
ComputeTarget.create(workspace, amlcompute_cluster_name, provisioning_config)
compute_target.wait_for_completion(show_output=True,
min_node_count=None, timeout_in_minutes=20)
The compute is part of the workspace so the Identity and
Access Management provided on the workspace is sufficient to create the
compute.
Storage Accounts, on the other hand, are external and might
have their own access restrictions. There are quite a few ways to connect to an
external Azure storage account from an Azure machine learning workspace. This
section will review some of the ways to do that. All of these require a
Datastore class to be instantiated that store connection information to Azure
Storage Services.
The first method involves the azureml.core library to
instantiate a datastore class as follows:
from azureml.core import
Workspace, Datastore
from azureml.core.dataset import
Dataset
from azureml.data.datapath import
DataPath
ws = Workspace.from_config()
datastore =
Datastore.register_azure_blob_container(ws, datastore_name="ds1",
container_name="temporary",
account_name="somestorageaccount",
sas_token="<for-connecting-with-SAS-URL")
dataset = Dataset.Tabular.from_parquet_files(path
= [(datastore, 'temporary/yellow_tripdata_2023-08.parquet')])
# preview the first 3 rows of the
dataset
dataset.take(3).to_pandas_dataframe()
The Datastore is a common resource across many usages and is
registered with the ML workspace with the credentials required to connect to
the external storage account.
The fully qualified url for locating a blob on the storage
account associated with the ML workspace is:
uri =
f'azureml://subscriptions/{subscription}/resourcegroups/{resource_group}/workspaces/{workspace}/datastores/{datastore_name}/paths/{path_on_datastore}'
The notebook executes with the default credentials of the
logged in user, so it is possible to not specify the credentials when creating
the datastore.
%pip install azure-ai-ml
from azure.ai.ml import MLClient
from azure.identity import
DefaultAzureCredential
from azure.ai.ml import command,
Input
from azure.ai.ml.entities import
AzureBlobDatastore
from azure.ai.ml.entities import
Environment
ml_client = MLClient(
DefaultAzureCredential(), subscription, resource_group, workspace
)
blob_credless_datastore =
AzureBlobDatastore(
name="ds4",
description="Credential-less datastore pointing to a blob
container.",
account_name=account_name,
container_name="temporary",
)
ml_client.create_or_update(blob_credless_datastore)
With the help of datastore, accessing a dataset is as simple
as:
datastore = Datastore.get(ws,
datastore_name="ds4")
dataset = Dataset.Tabular.from_parquet_files(path
= [(datastore, 'temporary/yellow_tripdata_2023-08.parquet')])
# preview the first 3 rows of the
dataset
dataset.take(3).to_pandas_dataframe()
Reference:
Different types of algorithms for models: MLRxFastLinear.docx
No comments:
Post a Comment