Using Azure Data Factory to upload and transform data to
Azure Data Lake Storage Gen2.
This is a continuation of the articles on Azure Data
Platform as they appear here and discusses data security and compliance for ADF and
Data Lakes.
ADF can work with many types of data sources and ingest
files and folders of varying size and number and to the tune of petabytes in
size. Microsoft Purview can be used to govern, protect, and manage data
estates. It provides integrated coverage, helps address the fragmentation of
data across rest and transit. This kind of solution helps an organization to
protect sensitive data across clouds, applications and devices and identify
data risks and manage regulatory compliance requirements. It helps to create an up-to-date map of the
entire data estate that includes data classification and end-to-end lineage,
identify sensitive data and create a secure environment for data consumers to
find valuable data and generate insights about how the data is stored and used.
With data in ADF and data lake, such a report is very helpful to meet
compliance with standards such as SOC, ISO, HiTrust, FedRamp and HIPAA.
ADF can be connected to Microsoft Purview. There are two
options to do so:
1.
Connect to Microsoft Purview account
in Azure Data Factory and
2.
Register Data Factory in Microsoft
Purview
3.
Complete with Azure Monitor based
alerts
The prerequisites are an Owner or Contributor role on the
ADF to connect to a Microsoft Purview Account and ADF to have a system assigned
managed identity enabled. The connection takes merely the Azure subscription to
locate the Purview account and one of them is selected. The connection
information is stored in the ADF resource. ADF’s managed identity is used to
authenticate lineage push operations from the ADF to the Microsoft Purview
account. The Data Curator role on the root collection of the Microsoft Purview
must be assigned to the managed identity of the ADF.
Both this connection and the Purview integration
capabilities can be monitored. The default integration capability is the data
lineage pipeline. When this pipeline is executed, the lineage information is
transmitted to the Purview account. The search bar at the top center of the
Data Factoring authoring UI can be used to search for data and perform actions.
This is very helpful to understand the data based on metadata, lineage, and
annotations. Many organizations heavily rely on tagging and metadata and even
to the point of specifying paths and dedicating storage containers for such
information.
With the data
searched by Microsoft Purview, it is possible to create Linked Service,
Dataset, or dataflow over the data.
All the activity runs from ADF have status, copy
duration, throughput, data read, files read, data written, files written, peak
connections for both read and write, the parallel copies used, the data
integration units, the queue and the transfer durations to provide complete
information on the activities performed for monitoring or troubleshooting.
No comments:
Post a Comment