Cluster computing

Thursday, February 1, 2024

Question: How is data access performed from non-interactive clusters in Azure Machine Learning Workspace.

Answer: There are several ways to do data access and most of them are like the interactive compute usages from a Python notebook. Connection objects in the form of AzureML Datastores and objects representing credentials are used with various clients such as Blob Services Client and AzureML client to access the data. The difference is primarily that the interactive mode does not require a credential and uses the logged-in identity, and the job can be executed with different credentials.

So, the following code snippets demonstrate how to do that:

1. With account key

# Import the azure-storage package

import azure.storage

# Create a BlobService object with the account name and key

account_name = "your_storage_account_name"

account_key = "your_storage_account_key"

blob_service = azure.storage.blob.BlobService(account_name, account_key)

# Read a blob from the storage account as a string

container_name = "your_container_name"

blob_name = "your_blob_name"

blob_data = blob_service.get_blob_to_text(container_name, blob_name).content

# Convert the blob data to a pandas DataFrame

import pandas as pd

df = pd.read_csv(blob_data)

2. Without account key:

# Import the azure-identity and azure-storage-blob packages

import azure.identity

import azure.storage.blob

# Create a DefaultAzureCredential object that uses the VM's identity

credential = azure.identity.DefaultAzureCredential()

# Create a BlobServiceClient object with the storage account URL and credential

account_url = "https://your_storage_account_name.blob.core.windows.net"

blob_service_client = azure.storage.blob.BlobServiceClient(account_url, credential)

# Read a blob from the storage account as a stream

container_name = "your_container_name"

blob_name = "your_blob_name"

blob_client = blob_service_client.get_blob_client(container_name, blob_name)

blob_stream = blob_client.download_blob()

# Convert the blob stream to a pandas DataFrame

import pandas as pd

df = pd.read_csv(blob_stream)

Also, it is important to make sure that the azure ML Workspace can create UI/submissions folder for the jobs in the associated storage account. Without the code uploaded and the job details persisted in the storage account, it cannot be run.

Previous writings: IaCResolutionsPart70.docx

Cluster computing

Thursday, February 1, 2024

No comments:

Post a Comment