Cluster computing: Troubleshooting of role assignments via IaC: a case study.

Introduction: One of the least expected but frequent problems is the error code Forbidden. It refers to an action that was not authorized or an unreachable target for taking the action. The corresponding http status code is 403. When this error occurs in a complex system, the process of elimination alone takes a lot of time and effort. On the other hand, the same error can be helpful to building a system with the least privileges by constantly verifying that unintended access is indeed forbidden. This article presents a case study pertaining to restricted data access on a storage account.

Description: The infrastructure-as-code, aka IaC, for deploying a storage account in commercial systems is usually not complete without the use of a corresponding role assignment and source restrictions in reaching the account over the public and private networks. When a storage account is deployed as a data lake with hierarchical file system, it usually has two names to reach it. For example, binary large objects or blobs for short can be accessed programmatically from the storage account by the account’s name as https://storageaccount.blob.core.windows.net and the files can be addressed to https://storageaccount.dfs.core.windows.net . If the storage account must only be accessed by private network, then there must be a private endpoint specific to both dns names. Checking that the source subnet or its public IP address are allow-listed on the storage account facilitates the ruling out of network as a potential culprit.

Then, role-based access control can be studied. In this case, the storage account has multiple containers, and each container has been assigned Access Control Lists with read-write-execute permissions to different groups. The idea behind this is to consolidate the containers in the same storage account for the sake of consistency enforcement in layouts per container from an infrastructure perspective but allow different teams to own their respective containers. The members of the groups allowed on to each container must also be granted a role of Reader to the storage account that lets them view the storage account on their management portal from the browser. When they navigate to it, the contents of only their container are visible to them and they can take actions to download and upload.

This might often be surprising to many that the Reader role being just a control plane built-in role having permissions only to list and read the account and having no data plane permissions to allow read and write of contents in the container, still permits the user to do that. This is not a defect but a feature. Roles and ACLs both grant the ability to read and write but while role allows a blanket permission across all the contents of the storage accounts, ACLs are scoped and indeed grant the ability to read and write. Should the role have blanket permissions at the data plane level, the ACLs are conveniently skipped.

The trouble arises when an allowed member of a group tries to download an item form their container in the storage account and gets a Forbidden error code. The programmatic way of doing this leverages a construct as for instance in the case of Azure public cloud, called DefaultAzureCredentials. With these credentials, when a client reads the blob or the file is instantiated using the Software Development Kit shipped with the Azure Storage Account, the call is already authenticated by virtue of the calling identity of the principal. However, the error forbidden comes from authorization of the read action.

Given the assignment to the Reader role and the ACL, authorization is granted but the forbidden error is misleading. There are usually two steps to resolving this. The first step involves verifying the identity at the client end and the second involves inspecting the role and the ACL on the target. With the credentials acquired from the constructor, a method on the credentials object can be invoked to retrieve an access token from the authenticating endpoint https://management.azure.com. Then this token can be interpreted to view the claims. One of the claim types will be an email and this will indicate the caller. If the expected and the actual email match, then there is no error on the caller side otherwise the environment must be checked for proper configuration, so that the credentials object can be properly constructed.

Next at the target side, the check access functionality of the IAM feature management menu item can be leveraged to find out if the calling principal has at least the minimum role required to communicate with the target. Once this is verified, then the ACLs can be inspected to ensure that the action is authorized. Failure in the role match or the ACL check requires suitable remedy.

Conclusion: The logged-in identity is verified against the identity provider and passed through to the target. Roles and ACLs must authorize this identity prior to data plane operation. If the user tries actions not governed by identity such as generating a Shared Access Signature link for their container item that encapsulates an authentication and authorization segment in the link for use by the bearer, then such an action falls outside the integrated identity-based and role-based access control. By virtue of SAS URL being an alternative to integrated authentication and authorization, read and write permissions granted to SAS URL requires the associated principal doing so to have data plane roles and permissions. Granting those permissions disrespects the ACLs. While SAS URLs can be beneficial to those principals, who are typically data scientists, as they overcome the challenges of isolated networks, broken passthroughs and heterogenous data sources and destinations during the initial onboarding, mature systems often streamline identity-based access.

Cluster computing

Saturday, March 2, 2024

Troubleshooting of role assignments via IaC: a case study.

No comments:

Post a Comment