Monday, March 25, 2024

Azure Machine Learning Data Governance:

The following is a list of practices and features to consider:

Security Posture Azure Policy helps to ensure compliance with organizational policies.
ML Workspace has built-in policies that audit and govern resource state.
By defining custom policies, access to specific operations and resources within Azure Machine Learning can be controlled.

Microsoft Entra ID serves as the identity service provider for Azure Machine Learning.
Security objects (such as users, groups, service principals, and managed identities) can be created and managed using Microsoft Entra ID. Having dedicated groups for pods
and specific to subscription by adding a prefix from the subscription id will make it more granular.
Multifactor authentication (MFA) is already configured in Microsoft Entra ID.
The authentication process involves obtaining an Azure Resource Manager token and a Machine Learning service token for secure access to resources.

Azure Machine Learning uses various compute resources and data stores on the Azure platform.
Data is encrypted both in transit and at rest.
Each resource supports encryption to maintain data security and comes with documentation.

Vulnerabilities can be scanned in the Azure Machine Learning environment and associated container registry.
All mitigations for vulnerabilities follow the same method as for existing registries.

Compliance is enforced by enabling audit.
These policies can be implemented with security postures and common modules.
Insights and resource graph queries can be added.
Azure Monitor based alerts can be setup.
A Metadata repository in the form of a structured database can provide knowledge management capabilities and integrate with Kusto query language.

Microsoft Purview and Azure Machine Learning (AzureML) have a powerful integration that enhances data governance and responsible AI practices:

Azure Machine Learning introduces ML assets as a new object in Purview.
This integration allows you to associate ML models with the data used for training, enabling emerging ML and AI risk and governance scenarios.
ML models are essentially representations of data, and by linking them to their training data, there is better visibility and control over the entire ML lifecycle.

When Azure Machine Learning workspace is registered in Microsoft Purview, metadata from the workspace is automatically pushed to Purview on a daily basis.
No manual scanning is required; the integration handles metadata synchronization seamlessly.

Metadata Extraction: Extracting technical metadata from Azure Machine Learning, including workspace details, models, datasets, and jobs.
Lineage Tracking: Understand the lineage of ML assets and their connections to data.
Data Sharing: Facilitate collaboration by sharing metadata across teams.
Access Policy: Control access to ML assets.
View: ML assets can be visualized within the Purview Data Map

Microsoft Purview provides a holistic view of the data landscape.
Features include automated data discovery, sensitive data classification, and end-to-end data lineage.
Data consumers can access trustworthy data management, ensuring compliance and security.

Data Scientists and Infrastructure teams can discover, track and govern ML assets throughout the MLOps Lifecycle within the context of Microsoft Purview.
NOTE: Connecting Azure ML workspaces to Purview requires subscription owner permissions. This is true for connecting even ADF and ADLS to purview.
The Data Curator role on the root collection of the Microsoft Purview must be assigned to the managed identity of the connected resource.

Cluster computing