Wednesday, July 19, 2023

 

Previous articles in this regard have been discussing resolutions for shortcomings in the use of Infrastructure-as-a-code (IaC) in various scenarios. This section discusses the resolution for the case there are cascading resource locks.

Resources can be locked to prevent unexpected changes. A subscription, resource group or resource can be locked to prevent other users from accidentally deleting or modifying critical resources. The lock overrides any permissions the users may have. The lock level can be set to CannotDelete or ReadOnly with ReadOnly being more restrictive. Lock inheritance can be applied at a parent scope, all resources within that scope can then inherit the same lock. Some considerations still apply after locking. For example, a CannotDelete lock on a storage account does not prevent data within that account from being deleted. A read only lock on an application gateway prevents you from getting the backend health of the application gateway because it uses POST. Only Owner and User Access Administrator role members are granted access to Microsoft.Authorization/locks/* actions.

When the IaC is applied, it can be quite frustrating to find the resources locked in the public cloud and preventing the IaC actions to complete. For example, a resource might have a private endpoint which in turn might be associated with a DNS and have a private NIC card and these sub-resources might be locked that prevents the private endpoint from being deleted which in turns fails the IaC application. The resolution for the owner of the subscription is to delete the lock from the said resource via the Azure Portal or the command-line interface and then proceed to apply the locks. And iterate over the ‘apply’ and the ‘unlock’ steps until there are no further obstructions.

While this works for the role with the elevated privileges, many developers using the credentials for the CI/CD pipeline to make changes to the subscription do not have that privilege and might find the experience harrowing to resolve without external intervention. One way that they overcome this unlocking is by applying the unlock commands via a pipeline step prior to the application of the IaC. Fortunately, there are ways to unlock at a global subscription level scope rather than at a resource-by-resource level. Even so, it might not be clear when the locks reappear, and the unlocking might need to be repeated. Checking the policies to make sure that the locking is not enforced automatically, which in turn interferes with the infrastructure changes by code, is a good practice and one that can potentially advise about the intent behind the locking. If the locking were simply to prevent accidental deletions against a broad range of resources, then the unlocking is straightforward for the applying of the changes

Let us make a specific association between say a firewall and a network resource such as a gateway. The firewall must be associated with the gateway to prevent traffic flow through that appliance. When they remain associated, they remember the identifier and the state for each other. Initially, the firewall may remain in detection mode where it is merely passive. It becomes active in the prevention mode. When the modes are attempted to be toggled, the association prevents it. Neither end of the association can tell what state to be in without exchanging information and when they are deployed or updated in place, neither knows about nor informs the other.

The above resolutions are easy when the error messages are descriptive and indicate that the failure of the IaC is exclusively due to locks. There are other forms of errors where the cause may not be straightforward. In such cases, the activity log on the resources or at the subscription level can be quite helpful when the json content of a logged event explains exactly what happened. This particular feature is also helpful to know if something transpired by actions of something other than the deployment of the infrastructure changes.

No comments:

Post a Comment