Sunday, June 18, 2023

 

Translators:

This article introduces a concept that could come handy to overcome some of the drudgery with IaC routines and is a continuation of the previous articles on IaC shortcomings and resolutions. Infrastructure-as-code aka IaC has no hard and fast rules about the use of one form or another to do what it does. At best, it can provide proper scope and better articulation for automation purposes.  The Azure public cloud for instance provides ways to write automation with PowerShell, Command Line interface as well as REST APIs. These cover scripts and for templates, we have different IaC providers such as Azure and Terraform.

Dynamic code generation from an existing form to another is routine and something doable with similar techniques as a compiler does. The set of principles involved is the same. Each unit of IaC can be considered a program and its dependencies can be collected via iterations. With the token and the mappings between one form to another, it is possible to change one set of artifacts to another, say from Azure resource templates to Terraform IaC.

The justification for such a feature is that neither of the IaC providers mentioned would find it in their business interest to provide the tools to convert one form to another even though this is something that they can best do because they already have a runtime to create a plan before applying the IaC. The planning step is the one that understands the IaC and is best suited for tokenizing the IaC.

In the absence of programmability from both IaC providers to expose their internal parsing and translation of their IaC formats, it should be perfectly acceptable to come up with an independent solution that facilitates conversion of one form to another.

This translation involves the following:
1) Lexical analysis This is the part where the compiler divides the text of the program into tokens each of which corresponds to a symbol such as a variable name, keyword, or number.
2) Syntax analysis This is the part where the tokens generated in the previous step are 'parsed' and arranged in a tree-structure (called the syntax tree) that reflects the structure of the program.
3) Type checking This is the part where the syntax tree is analyzed to determine if the program violates certain consistency requirements, for example if a variable is used in a context where the type of the variable doesn't permit.
4) Intermediate code generation This is the part where the program is translated to a simple machine independent intermediate language.
5) Register allocation: This is the part where the symbolic variable names are translated to numbers each of which corresponds to a register in the target machine code.
6) Machine code generation : This is the part where the intermediate language is translated to assembly language for a specific architecture
7) Assembly and linking: This is the part where the assembly language code is translated to binary representation and addresses of variables, functions etc are determined.
The first three parts are called the frontend and the last three parts form the backend.
There are checks and transformation at each step of the processing in the order listed above such that each step passes stronger invariants to the next. The type checker for instance can assume the absence of syntax error.
Lexical analysis is done with regular expressions and precedence rules. Precedence rules are similar to algebraic convention. Regular expressions are transformed into efficient programs using non-deterministic finite automata which consists of a set of states including the starting state and a subset of accepting states and transitions from one state to another on the symbol c. Because they are non-deterministic, compilers use a more restrictive form called deterministic finite automaton. This conversion from a language description written as regular expression into an efficiently executable representation, a DFA, is done by the Lexer generator.
Syntax analysis recombines the token that the lexical analysis split. This results in a syntax tree which has the tokens as the leaves and their left to right sequence is the same as input text. Like in lexical analysis, we rely on building automata and in this case the context free grammars we find can be converted to recursive programs called stack automata. There are two ways to generate such automata, the LL parser (the first L indicates the reading direction and the second L indicates the derivation order) and the SLR parser (S stands for simple)
Symbol tables are used to track the scope and binding of all named objects It supports operations such as initialize an empty symbol table, bind a name to an object, lookup a name in the symbol table, enter a new scope and exit a scope.
Bootstrapping a compiler is interesting because the compiler itself is a program. We resolve this with a quick and dirty compiler or intermediate compilers. 

No comments:

Post a Comment