Azure Machine
Learning Data Governance:
The following is a list of
practices and features to consider:
- Restrict Access to Resources and
Operations:
- Security Posture Azure
Policy helps to ensure compliance with organizational policies.
- ML Workspace has built-in
policies that audit and govern resource state.
- By defining custom policies,
access to specific operations and resources within Azure Machine Learning
can be controlled.
- Authentication and Identity
Management:
- Microsoft Entra ID serves as the identity
service provider for Azure Machine Learning.
- Security objects (such as users,
groups, service principals, and managed identities) can be created and
managed using Microsoft Entra ID. Having dedicated groups for pods
and specific to subscription by adding a prefix from the subscription id will make it more granular. - Multifactor authentication (MFA)
is already configured in Microsoft Entra ID.
- The authentication process
involves obtaining an Azure Resource Manager token and a Machine Learning
service token for secure access to resources.
- Data Encryption:
- Azure Machine Learning uses
various compute resources and data stores on the Azure platform.
- Data is encrypted both in
transit and at rest.
- Each resource supports encryption to maintain
data security and comes with documentation.
- Vulnerability Scanning:
- Vulnerabilities can be scanned in the Azure
Machine Learning environment and associated container registry.
- All mitigations for
vulnerabilities follow the same method as for existing registries.
- Controlling Notebooks, Jobs,
assets, and access control
- Segregate notebooks by requiring
GitHub integration.
- Secure GitHub independently to
allow selective people and teams.
- Enable Jobs to use datastores.
- Provide custom permissions to
data scientists to use them in jobs
- Ensure access control on shared
adls is respected to isolate pod users.
- Configuration Policies:
- Compliance is enforced by
enabling audit.
- These policies can be
implemented with security postures and common modules.
- Insights and resource graph
queries can be added.
- Azure Monitor based alerts can
be setup.
- A Metadata repository in the
form of a structured database can provide knowledge management
capabilities and integrate with Kusto query language.
- Microsoft Purview and Azure Machine
Learning (AzureML) have a powerful integration that enhances data
governance and responsible AI practices:
- ML Assets can be brought to the Microsoft Purview
Data Map:
- Azure Machine Learning introduces ML
assets as a new object in Purview.
- This integration allows you to
associate ML models with the data used for training, enabling emerging
ML and AI risk and governance scenarios.
- ML models are essentially representations of
data, and by linking them to their training data, there is better
visibility and control over the entire ML lifecycle.
- Automatic Metadata Push:
- When Azure Machine Learning
workspace is registered in Microsoft Purview, metadata from the
workspace is automatically pushed to Purview on a daily
basis.
- No manual scanning is required; the integration
handles metadata synchronization seamlessly.
- Supported Capabilities:
- When scanning the Azure Machine
Learning source, Microsoft Purview supports:
- Metadata
Extraction: Extracting technical metadata from Azure Machine
Learning, including workspace details, models, datasets, and jobs.
- Lineage
Tracking: Understand the lineage of ML assets and their
connections to data.
- Data Sharing: Facilitate
collaboration by sharing metadata across teams.
- Access
Policy: Control access to ML assets.
- View: ML assets
can be visualized within
the Purview Data Map
- Unified Data Governance:
- Microsoft Purview provides a
holistic view of the data landscape.
- Features include automated
data discovery, sensitive data classification, and end-to-end
data lineage.
- Data consumers can access
trustworthy data management, ensuring compliance and security.
Data Scientists and Infrastructure teams can discover, track and govern ML assets throughout the MLOps Lifecycle within the context of Microsoft Purview.
NOTE: Connecting Azure ML workspaces to Purview requires subscription owner permissions. This is true for connecting even ADF and ADLS to purview.
The Data Curator role on the root collection of the Microsoft Purview must be assigned to the managed identity of the connected resource.
No comments:
Post a Comment