Wednesday, March 6, 2024

 

Sample code for reading and writing data on Azure Machine Learning Workspace notebooks can be found online but working examples can be elusive because it is not called out as clearly that azureml.core package has been superceded by azure.ai.ml package. The following example demonstrated just how to do that.

The core objects used in this sample are Datastore and Dataset to describe the connection information and the data associated with a location.

 

from azure.ai.ml.entities import AzureDataLakeGen2Datastore

from azure.ai.ml.entities._datastore.credentials import ServicePrincipalCredentials

from azure.ai.ml import MLClient

 

ml_client = MLClient.from_config()

 

store = AzureDataLakeGen2Datastore(

    name="adls_gen2_example",

    description="Datastore pointing to an Azure Data Lake Storage Gen2.",

    account_name="mytestdatalakegen2",

    filesystem="my-gen2-container",

    credentials=ServicePrincipalCredentials(

            tenant_id="00000000-0000-0000-0000-000000000000",

            client_id="00000000-0000-0000-0000-000000000000",

            client_secret="XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"

        )

)

 

ml_client.create_or_update(store)

from azure.ai.ml import Dataset

dataset_path = "<fully-qualified-url-from-datastore-path>" # this could also be substituted with

# abfss://container@storageaccount.dfs.core.windows.net/path/to/file"

dataset = Dataset.from_delimited_files(dataset_path)

dataset.take(5).to_pandas_dataframe()

 

The same Dataset class can be used to write to a csv using:

import pandas as pd

from azure.ai.ml import Dataset

data = {

    'Name': ['Alice', 'Bob', 'Charlie'],

    'Age': [25, 30, 22],

    'Salary': [60000, 70000, 55000]

}

df = pd.DataFrame(data)

dataset = Dataset.from_pandas_dataframe(df)

csv_path = "<fully-qualified-url-from-datastore-path>" # this could also be substituted with

# abfss://container@storageaccount.dfs.core.windows.net/path/to/file"

dataset.to_delimited_files(csv_path)

No comments:

Post a Comment