Cluster computing

Thursday, April 4, 2024

Some methods of organization for large scale Infrastructure-as-a-Code deployments.

The purpose of IaC is to provide a dynamic, reliable, and repeatable infrastructure suitable for cases where manual approaches and management practices cannot keep up. When automation increases to the point of becoming a cloud-based service responsible for the deployment of cloud resources and stamps that provision other services that are diverse, consumer facing and public cloud general availability services, some learnings can be called out that apply universally across a large spectrum of industry clouds.

A service that deploys other services must accept IaC deployment logic with templates, intrinsics, and deterministic execution that works much like any other workflow management system. This helps to determine the order in which to run them and with retries. The tasks are self-described. The automation consists of a scheduler to trigger scheduled workflows and to submit tasks to the executor to run, an executor to run the tasks, a web server for a management interface, a folder for the directed acyclic graph representing the deployment logic artifacts, and a metadata database to store state. The workflows don’t restrict what can be specified as a task which can be an Operator or a predefined task using say Python, a Sensor which is entirely about waiting for an external event to happen, and a Custom task that can be specified via a Python function decorated with a @task.

The organization of such artifacts posed two necessities. First, to leverage the builtin templates and deployment capabilities of the target IaC provider as well as their packaging in the format suitable to the automation that demands certain declarations, phases, and sequences to be called out. Second the co—ordination of context management switches between automation service and IaC provider. This involved a preamble and an epilogue to a context switch for bookkeeping and state reconciliation.

This taught us that large IaC authors are best served by uniform, consistent and global naming conventions, registries that can be published by the system for cross subscription and cross region lookups, parametrizing diligently at every scope including hierarchies, leveraging dependency declarations, and reducing the need for scriptability in favor of system and user defined organizational units of templates. Leveraging supportability via read-only stores and frequently publishing continuous and up-to-date information on the rollout helps alleviate the operations from the design and development of IaC.

IaC writers frequently find themselves in positions where the separation between pipeline automation and IaC declarations are not clean, self-contained or require extensive customizations. One of the approaches that worked on this front is to have multiple passes on the development. With one pass providing initial deployment capability and another pass consolidating and providing best practice via refactoring and reusability. Enabling the development pass to be DevOps based, feature centric and agile helps converge to a working solution with learnings that can be carried from iteration to iteration. The refactoring pass is more generational in nature. It provides cross-cutting perspectives and non-functional guarantees.

A library of routines, operators, data types, global parameters and registries are almost inevitable with large scale IaC deployments but unlike the support for programming language-based packages, these are often organically curated in most cases and often self-maintained. Leveraging tracking and versioning support of source control, its possible to provide compatibility as capabilities are made native to the IaC provider or automation service.

Cluster computing

Thursday, April 4, 2024

No comments:

Post a Comment