Cluster computing: Complex deployments using IaC:

Saturday, December 23, 2023

Complex deployments using IaC:

A complex deployment is one which has multiple layers, resource groups and resource types. Creating a complex deployment using IaC is fraught with errors both at plan and execution stages. The IaC compiler can detect only those errors as can be statically determined from the IaC. Runtime execution errors are more common because policy violations are not known until the actual deployment and given the diverse set of resources that must be deployed, the errors are not always well-known. From name size limitations, invalid security principals, locked resources, mutual incompatibility of resource pairs, conflicting settings between resources, are just a few of the errors to name a few.

A realization dawns in as the size and scale of infrastructure grows that the veritable tenets of IaC such as reproducibility, self-documentation, visibility, error-free, lower TCO, drift prevention, joy of automation, and self-service somewhat diminish when the time and effort increases exponentially to overcome its brittleness. Packages go out of date, features become deprecated and stop working, backward compatibility is hard to maintain, and all existing resource definitions have a shelf-life. Similarly, assumptions are challenged when the cloud provider and the IaC provider describe attributes differently. The information contained in IaC can be hard to summarize in an encompassing review unless we go block by block. Its also easy to shoot oneself in the foot by means of a typo or a command to create and destroy instead of change and especially when the state of the infrastructure disagrees with that of the portal.

TCO of an IaC for a complex deployment does not include the man-hours required to keep it in a working condition and to assist with redeployments and syncing. One-off investigations are just too many to count on a hand in the case when deployments are large and complex. The sheer number of resources and their tracking via names and identifiers can be exhausting. A sophisticated CI/CD for managing accounts and deployments is a good automation but also likely to be run by several contributors. When edits are allowed and common automation accounts are used, it can be difficult to know who made the change and why.

Some flexibility is required to make a judicious use of automation and manual interventions for keeping the deployments robust. Continuously updating the IaC, especially by the younger members of the team is not only a comfort but also a necessity. The more mindshare a complex IaC gets, the likely that it will reduce the costs associated with maintaining it and dispel some of the limitations mentioned earlier.

As with all solutions, scope and boundaries apply. It is best not to let IaC spread out so much that the high priority and severity deployments get affected. It can also be treated like code with its own index, model and co-pilot.

References to build the first co-pilot: 

1. https://github.com/raja0034/azureml-examples 

2. https://github.com/raja0034/openaidemo/blob/main/copilot.py 

References: previous articles on IaC

Cluster computing

Saturday, December 23, 2023

Complex deployments using IaC:

No comments:

Post a Comment