Thursday, April 6, 2023

 

This is a continuation of the articles on Azure Data Platform and discusses Data Lakes:

Access Control Model:

This section covers the access control model in Azure Data Lake Storage Gen2 which supports the following mechanisms: shared key authorization, shared access signature authorization aka SAS, role-based access control aka RBAC, attribute-based access control aka ABAC and access control lists aka ACL. Out of these the RBAC and ACLs on one hand and shared-key and SAS on the other hand are complimentary. The former has no effect on the latter. Shared-key and shared access both grant access to a user without the need for an identity. RBAC grants coarse-grained access to users such as for read-write of all data.  ABAC refines RBAC role assignments by adding conditions. ACLs grant fine-grained access such as a write access to a specific directory or file. The security principals recognized by the Azure are a user, group, service principal or managed identity  that is defined in the Azure Active Directory. A permission set grants coarse level access such as a read or write access to all the data in a storage account or all the data in a container. Permission sets are granted based on the roles and some well-known roles are Owner, Contributor, Reader, and Storage Account Contributor. The first three can access the data and all four can manage the storage account. They cannot grant access to other security principals but they can provide shared-key and shared access signatures except for the reader role. The order of resolving access grant or denial is RBAC first, ABAC next and ACLs last and applies to all operations such as list, get, create, update, and delete. Security groups are particularly useful to add ACLs. For example, if Azure Data Factory aka ADF ingests data into a folder named /LogData and specific service engineering team upload data in a container and various users analyze the data, then we create a LogsWriter group and LogsReader group to enable these activities. The POSIX permissions assigned to the directory will show rwx permissions assigned the LogWriter group and r-x permissions assigned to the LogReader group. The service principal or the managed service identity aka MSI for ADF will be part of the LogWriter group but the service principal or MSI for Databricks will be part of the LogsReader group. Groups facilitate addition or removal of members without disturbing the assignments. They also help to avoid exceeding the maximum number of role assignments per subscription and the maximum number of ACL entries per file or directory. The limits are 4000 Azure role assignments in a subscription and 32 ACL entries per file or directory.

Premium tier:

ADLS Gen2 supports premium tier and premium block blob storage accounts that are ideal for Big Data analytics and workloads that require low latency and support a high number of operations such as in the case of machine learning. The premium tier supports hierarchical namespace that accelerates big data analytics workloads and enables file-level access control lists. It also supports Azure Blob file system drive for Hadoop.


No comments:

Post a Comment