Monday, March 25, 2024

 

Azure Machine Learning Data Governance:

The following is a list of practices and features to consider:

  1. Restrict Access to Resources and Operations:
  2. Authentication and Identity Management:
  3. Data Encryption:
    • Azure Machine Learning uses various compute resources and data stores on the Azure platform.
    • Data is encrypted both in transit and at rest.
    • Each resource supports encryption to maintain data security and comes with documentation.
  4. Vulnerability Scanning:
    • Vulnerabilities can be scanned in the Azure Machine Learning environment and associated container registry.
    • All mitigations for vulnerabilities follow the same method as for existing registries.
  5. Controlling Notebooks, Jobs, assets, and access control
    • Segregate notebooks by requiring GitHub integration.
    • Secure GitHub independently to allow selective people and teams.
    • Enable Jobs to use datastores.
    • Provide custom permissions to data scientists to use them in jobs
    • Ensure access control on shared adls is respected to isolate pod users.
  6. Configuration Policies:
    • Compliance is enforced by enabling audit.
    • These policies can be implemented with security postures and common modules.
    • Insights and resource graph queries can be added.
    • Azure Monitor based alerts can be setup.
    • A Metadata repository in the form of a structured database can provide knowledge management capabilities and integrate with Kusto query language.
  7. Microsoft Purview and Azure Machine Learning (AzureML) have a powerful integration that enhances data governance and responsible AI practices:
    1. ML Assets can be brought to the Microsoft Purview Data Map:
      • Azure Machine Learning introduces ML assets as a new object in Purview.
      • This integration allows you to associate ML models with the data used for training, enabling emerging ML and AI risk and governance scenarios.
      • ML models are essentially representations of data, and by linking them to their training data, there is better visibility and control over the entire ML lifecycle.
    2. Automatic Metadata Push:
      • When Azure Machine Learning workspace is registered in Microsoft Purview, metadata from the workspace is automatically pushed to Purview on a daily basis.
      • No manual scanning is required; the integration handles metadata synchronization seamlessly.
    3. Supported Capabilities:
      • When scanning the Azure Machine Learning source, Microsoft Purview supports:
        • Metadata Extraction: Extracting technical metadata from Azure Machine Learning, including workspace details, models, datasets, and jobs.
        • Lineage Tracking: Understand the lineage of ML assets and their connections to data.
        • Data Sharing: Facilitate collaboration by sharing metadata across teams.
        • Access Policy: Control access to ML assets.
        • View: ML assets can be visualized within the Purview Data Map
    4. Unified Data Governance:
      • Microsoft Purview provides a holistic view of the data landscape.
      • Features include automated data discoverysensitive data classification, and end-to-end data lineage.
      • Data consumers can access trustworthy data management, ensuring compliance and security.

        Data Scientists and Infrastructure teams can discover, track and govern ML assets throughout the MLOps Lifecycle within the context of Microsoft Purview.
        NOTE: Connecting Azure ML workspaces to Purview requires subscription owner permissions. This is true for connecting even ADF and ADLS to purview.
        The Data Curator role on the root collection of the Microsoft Purview must be assigned to the managed identity of the connected resource.


No comments:

Post a Comment