Translators:
This article introduces a concept that could come handy to
overcome some of the drudgery with IaC routines and is a continuation of the
previous articles
on IaC shortcomings and resolutions. Infrastructure-as-code aka IaC has no hard
and fast rules about the use of one form or another to do what it does. At
best, it can provide proper scope and better articulation for automation
purposes. The Azure public cloud for
instance provides ways to write automation with PowerShell, Command Line
interface as well as REST APIs. These cover scripts and for templates, we have
different IaC providers such as Azure and Terraform.
Dynamic code generation from an existing form to another is
routine and something doable with similar techniques as a compiler does. The
set of principles involved is the same. Each unit of IaC can be considered a
program and its dependencies can be collected via iterations. With the token
and the mappings between one form to another, it is possible to change one set
of artifacts to another, say from Azure resource templates to Terraform IaC.
The justification for such a feature is that neither of the
IaC providers mentioned would find it in their business interest to provide the
tools to convert one form to another even though this is something that they
can best do because they already have a runtime to create a plan before
applying the IaC. The planning step is the one that understands the IaC and is
best suited for tokenizing the IaC.
In the absence of programmability from both IaC providers to
expose their internal parsing and translation of their IaC formats, it should
be perfectly acceptable to come up with an independent solution that
facilitates conversion of one form to another.
This translation involves the following:
1) Lexical analysis This is the part where the compiler divides the text of the
program into tokens each of which corresponds to a symbol such as a variable
name, keyword, or number.
2) Syntax analysis This is the part where the tokens generated in the previous
step are 'parsed' and arranged in a tree-structure (called the syntax tree)
that reflects the structure of the program.
3) Type checking This is the part where the syntax tree is analyzed to
determine if the program violates certain consistency requirements, for example
if a variable is used in a context where the type of the variable doesn't
permit.
4) Intermediate code generation This is the part where the program is
translated to a simple machine independent intermediate language.
5) Register allocation: This is the part where the symbolic variable names are
translated to numbers each of which corresponds to a register in the target
machine code.
6) Machine code generation : This is the part where
the intermediate language is translated to assembly language for a specific
architecture
7) Assembly and linking: This is the part where the assembly language code is
translated to binary representation and addresses of variables, functions etc
are determined.
The first three parts are called the frontend and the last three parts form the
backend.
There are checks and transformation at each step of the processing in the order
listed above such that each step passes stronger invariants to the next. The
type checker for instance can assume the absence of syntax error.
Lexical analysis is done with regular expressions and precedence rules.
Precedence rules are similar to algebraic convention. Regular expressions are
transformed into efficient programs using non-deterministic finite automata
which consists of a set of states including the starting state and a subset of
accepting states and transitions from one state to another on the symbol c.
Because they are non-deterministic, compilers use a more restrictive form
called deterministic finite automaton. This conversion from a language
description written as regular expression into an efficiently executable
representation, a DFA, is done by the Lexer generator.
Syntax analysis recombines the token that the lexical analysis split. This
results in a syntax tree which has the tokens as the leaves and their left to
right sequence is the same as input text. Like in lexical analysis, we rely on
building automata and in this case the context free grammars we find can be
converted to recursive programs called stack automata. There are two ways to
generate such automata, the LL parser (the first L indicates the reading
direction and the second L indicates the derivation order) and the SLR parser
(S stands for simple)
Symbol tables are used to track the scope and binding of all named objects It
supports operations such as initialize an empty symbol table, bind a name to an
object, lookup a name in the symbol table, enter a new scope and exit a scope.
Bootstrapping a compiler is interesting because the compiler itself is a
program. We resolve this with a quick and dirty compiler or intermediate
compilers.
No comments:
Post a Comment