Tuesday, October 24, 2023

The application of data mining and machine learning techniques to Reverse Engineering of IaC.

 



An earlier article introduced the notion and purpose of reverse engineering. This article explains why and how IaC and application code can become quite complex and require reverse engineering.

Software organization seldom appears simple and straightforward even for the microservice architecture. With IaC becoming the defacto standard for describing infrastructure and deployments from cross-cutting business objectives, they can become quite complex, multi-layered, differing in their physical and logical organizations, and requiring due diligence in their reverse engineering  

The premise for doing this is like what a compiler does in creating a symbol table and maintaining dependencies. We recognize that the symbols as nodes and their dependencies as edges presents a rich graph on which relationships can be superimposed and queried for different insights. These insights help with better representation of the knowledge model. Well-known data mining algorithms can assist with this reverse engineering. Even a basic linear or non-linear ranking of the symbols and thresholding them can be very useful towards representing the architecture. 

We cover just a few of the data mining algorithms to begin with and close that with a discussion on machine learning methods including SoftMax classification that can make excellent use of co-occurrence data. Finally, we suggest that this does not need to be a one-pass KDM builder and that the use of pipeline and metrics can be helpful towards incremental or continually enhancing the KDM. The symbol and dependency graph are merely the persistence of information learned which can be leveraged for analysis and reporting such as rendering a KDM. 

Types of analysis:

Classification algorithms  

Regression algorithms  

Segmentation algorithms  

Association algorithms  

Sequence Analysis Algorithms  

Outliers Mining Algorithm  

Decision tree  

Logistic Regression  

Neural Network  

Naïve Bayes algorithm  

Plugin Algorithms  

Simultaneous classifiers and regions-of-interest regressors  

Collaborative filtering  

Collaborative Filtering via Item-based filtering  

Hierarchical clustering  

NLP algorithms  

Where Lucene search indexes and symbol store fail, the data mining insights to code organizations makes up for elaborate knowledge model.

No comments:

Post a Comment