This is a continuation of the articles on Azure
Data Platform and discusses Data
Lakes:
Access Control Model:
This section covers the access control model in
Azure Data Lake Storage Gen2 which supports the following mechanisms: shared
key authorization, shared access signature authorization aka SAS, role-based
access control aka RBAC, attribute-based access control aka ABAC and access
control lists aka ACL. Out of these the RBAC and ACLs on one hand and
shared-key and SAS on the other hand are complimentary. The former has no
effect on the latter. Shared-key and shared access both grant access to a user
without the need for an identity. RBAC grants coarse-grained access to users
such as for read-write of all data. ABAC
refines RBAC role assignments by adding conditions. ACLs grant fine-grained
access such as a write access to a specific directory or file. The security
principals recognized by the Azure are a user, group, service principal or
managed identity that is defined in the
Azure Active Directory. A permission set grants coarse level access such as a
read or write access to all the data in a storage account or all the data in a
container. Permission sets are granted based on the roles and some well-known
roles are Owner, Contributor, Reader, and Storage Account Contributor. The
first three can access the data and all four can manage the storage account.
They cannot grant access to other security principals but they can provide
shared-key and shared access signatures except for the reader role. The order
of resolving access grant or denial is RBAC first, ABAC next and ACLs last and
applies to all operations such as list, get, create, update, and delete.
Security groups are particularly useful to add ACLs. For example, if Azure Data
Factory aka ADF ingests data into a folder named /LogData and specific service
engineering team upload data in a container and various users analyze the data,
then we create a LogsWriter group and LogsReader group to enable these
activities. The POSIX permissions assigned to the directory will show rwx
permissions assigned the LogWriter group and r-x permissions assigned to the
LogReader group. The service principal or the managed service identity aka MSI
for ADF will be part of the LogWriter group but the service principal or MSI
for Databricks will be part of the LogsReader group. Groups facilitate addition
or removal of members without disturbing the assignments. They also help to
avoid exceeding the maximum number of role assignments per subscription and the
maximum number of ACL entries per file or directory. The limits are 4000 Azure
role assignments in a subscription and 32 ACL entries per file or directory.
Premium tier:
ADLS Gen2 supports premium tier and
premium block blob storage accounts that are ideal for Big Data analytics and
workloads that require low latency and support a high number of operations such
as in the case of machine learning. The premium tier supports hierarchical
namespace that accelerates big data analytics workloads and enables file-level
access control lists. It also supports Azure Blob file system drive for Hadoop.
No comments:
Post a Comment