Question: How is data access performed from non-interactive
clusters in Azure Machine Learning Workspace.
Answer: There are several ways to do data access and most of
them are like the interactive compute usages from a Python notebook. Connection
objects in the form of AzureML Datastores and objects representing credentials
are used with various clients such as Blob Services Client and AzureML client
to access the data. The difference is primarily that the interactive mode does
not require a credential and uses the logged-in identity, and the job can be
executed with different credentials. 
So, the following code snippets demonstrate how to do that:
1.      
With account key
# Import the azure-storage package
import azure.storage
# Create a BlobService object with the
account name and key
account_name =
"your_storage_account_name"
account_key =
"your_storage_account_key"
blob_service =
azure.storage.blob.BlobService(account_name, account_key)
# Read a blob from the storage account as a
string
container_name =
"your_container_name"
blob_name = "your_blob_name"
blob_data =
blob_service.get_blob_to_text(container_name, blob_name).content
# Convert the blob data to a pandas
DataFrame
import pandas as pd
df = pd.read_csv(blob_data)
2.      
Without account key:
# Import the azure-identity and azure-storage-blob packages
import azure.identity
import azure.storage.blob
# Create a DefaultAzureCredential object that uses the VM's
identity
credential = azure.identity.DefaultAzureCredential()
# Create a BlobServiceClient object with the storage account
URL and credential
account_url =
"https://your_storage_account_name.blob.core.windows.net"
blob_service_client =
azure.storage.blob.BlobServiceClient(account_url, credential)
# Read a blob from the storage account as a stream
container_name = "your_container_name"
blob_name = "your_blob_name"
blob_client =
blob_service_client.get_blob_client(container_name, blob_name)
blob_stream = blob_client.download_blob()
# Convert the blob stream to a pandas DataFrame
import pandas as pd
df = pd.read_csv(blob_stream)
Also, it is important to make sure that the azure ML
Workspace can create UI/submissions folder for the jobs in the associated
storage account. Without the code uploaded and the job details persisted in the
storage account, it cannot be run.
Previous writings: IaCResolutionsPart70.docx
No comments:
Post a Comment