Friday, October 13, 2023

 

This is a continuation of articles on Infrastructure-as-code. This one talks about locking of resources. Locks, policies and IaC sometimes compete to provide protection to resources and can overstep on each other with conflicts that require resolution. Each of them has a part to play and cannot be done away with and the hope is that each pipeline run is smooth and leaves a clean state after its run.

This might be wishful thinking when resources that need to be created, modified or deleted have sub-resources or are associated with other resources. With such dependencies, operations might result in an error that states that one or the other has locks on it. Locking is essential to prevent any accidental modifications to the resources. It is assumed that authorized operations will be able to unlock the resources prior to the change and then lock afterwards. With the example of a private endpoint on an Azure public cloud resource to provide private ip address for incoming traffic, associated resources including the parent resource, private links and dns zones might all get locked. Only when all the locks are released, will the operations succeed.  This makes it hard to know upfront which locks to acquire and release.

One of the approaches with pipeline automations is the cascaded unlocking of all resources in a resource hierarchy such as a resource group or subscription level. Since the identity with which the Azure operations are performed must be privileged. Only the Owner and User Access Administrator built-in roles can create and delete management locks. The corresponding permissions belong to the Microsoft.Authorization/* or Microsoft.Authorization/locks/* organizational prefix. Custom roles having these permissions could also be sufficient. It might be time consuming to go through all the resources and sub-resources in a resource hierarchy to unlock them first before the operations begin and to lock them at the end and often includes some wait time to be specified in the script. But this leaves the resources in a clean state for the changes to be propagated from the IaC to the management portal for these resources. It is also possible to conditionally run these for changes that carry certain labels or distinguishing features such as a filter on operations.

A policy might act like a catch-all to apply locking where locks are missed out from resources but a policy on the Azure public cloud has a compliance interval of 24 hours. It is also a default allow and explicit deny system If a resource violates a policy, it is marked as non-compliant. The effects that a policy takes are detection or prevention. The IaC code is the ultimate source of truth for the resources and there are ways to specify locks in the IaC for resources that must behave independently from the collective approach taken by the policy. Anytime a policy changes the locks and the IaC is unaware, there is a conflict. It is preferable to keep locking as simple as possible without any customizations for any subset of resources so that the pipeline automation is sufficient to co-ordinate the locking and unlocking.

Finally, it is much easier to do locking and unlocking with command-line interface than execute it elsewhere. Both pipeline scripts and public cloud automations can execute these commands and although Runbooks might not be able to execute them in PowerShell, the az cli can certainly be run via functions or such other resources. Invoking a script for locking or unlocking does not require a resource or its state to change.

No comments:

Post a Comment