Cluster computing: The application of data mining and machine learning techniques to Reverse Engineering of IaC.

Tuesday, October 24, 2023

The application of data mining and machine learning techniques to Reverse Engineering of IaC.

An earlier article introduced the notion and purpose of reverse engineering. This article explains why and how IaC and application code can become quite complex and require reverse engineering.

Software organization seldom appears simple and straightforward even for the microservice architecture. With IaC becoming the defacto standard for describing infrastructure and deployments from cross-cutting business objectives, they can become quite complex, multi-layered, differing in their physical and logical organizations, and requiring due diligence in their reverse engineering

The premise for doing this is like what a compiler does in creating a symbol table and maintaining dependencies. We recognize that the symbols as nodes and their dependencies as edges presents a rich graph on which relationships can be superimposed and queried for different insights. These insights help with better representation of the knowledge model. Well-known data mining algorithms can assist with this reverse engineering. Even a basic linear or non-linear ranking of the symbols and thresholding them can be very useful towards representing the architecture.

We cover just a few of the data mining algorithms to begin with and close that with a discussion on machine learning methods including SoftMax classification that can make excellent use of co-occurrence data. Finally, we suggest that this does not need to be a one-pass KDM builder and that the use of pipeline and metrics can be helpful towards incremental or continually enhancing the KDM. The symbol and dependency graph are merely the persistence of information learned which can be leveraged for analysis and reporting such as rendering a KDM.

Types of analysis:

Classification algorithms

Regression algorithms

Segmentation algorithms

Association algorithms

Sequence Analysis Algorithms

Outliers Mining Algorithm

Decision tree

Logistic Regression

Neural Network

Naïve Bayes algorithm

Plugin Algorithms

Simultaneous classifiers and regions-of-interest regressors

Collaborative filtering

Collaborative Filtering via Item-based filtering

Hierarchical clustering

NLP algorithms

Where Lucene search indexes and symbol store fail, the data mining insights to code organizations makes up for elaborate knowledge model.

Cluster computing

Tuesday, October 24, 2023

The application of data mining and machine learning techniques to Reverse Engineering of IaC.

No comments:

Post a Comment