Abstract:
Metamodels have been the traditional way to discover
complex legacy applications. Unfortunately, this technique has remained manual
with tools that are limited mainly to academic interests. There are seven
metamodels including Knowledge Discovery Metamodel, Abstract Syntax Tree
Metamodel, the Software Measurement Metamodel, analysis program, visualization,
refactoring, and transformation. This paper argues that the final state for
application modernization routinely converges to a known form. Since the
monolithic legacy application is refactored into a ring of independent
microservices, it is better for the tools to work backwards from this state
rather than attempt to accurately describe the metamodels. There are two main
proposals in this document. First, that interfaces in a legacy system are all
evaluated as candidates for microservices and run through a set of rules in a
classifier, grouped, ranked, sorted, and selected into a shortlist of
microservices. Second, that the design of each microservice extracted
holistically and individually from the legacy system offers more benefit than a
single pass. The benefits for these can be summed up in terms of developer
satisfaction.
Description:
Model driven Software development evolves existing
systems and facilitates the creation of new software systems.
The salient features of model driven software development
include:
1.
Domain-specific languages (DSLs) that express models at different
abstraction levels.
2.
DSL notation syntaxes collected separately.
3.
Model transformations for generating code from models
either directly by model-to-text transformations or indirectly by intermediate
model-to-model transformations.
An abstract syntax is known by a metamodel that uses a
metamodeling language to describe a set of concepts and their relationships.
These languages use object-oriented constructs to build metamodels. The
relationship between a model and a metamodel can be described by a “conforms-to”
relationship.
There are seven metamodels including Knowledge Discovery
Metamodel, Abstract Syntax Tree Metamodel, the Software Measurement Metamodel,
analysis program, visualization, refactoring, and transformation.
ASTM and KDM are complimentary in modeling software
systems’ syntax and semantics. ASTMs use Abstract Syntax Trees to mainly
represent the source code’s syntax, KDM helps to represent semantic information
about a software system, ranging from source code to higher level of abstractions.
KDM is the language of architecture and provides a common interchange format
intended for representing software assets and tools interoperability. Platform,
user interface or data can each have its own KDM and are organized as packages.
These packages are grouped into four abstract layers to improve modularity and
separation of concerns: infrastructure, program elements, runtime resource and
abstractions.
SMM is the metamodel that can represent both metrics and
measurements. It includes a set of elements to describe the metrics in KDM
models and their measurements.
Taking the example of the modernization of a database
forms application and migrating it to a Java platform, an important part of the
migration could involve PL/SQL triggers in legacy Forms code. In a Forms
application, the sets of SQL statements corresponding to triggers are tightly
coupled to the User Interface. The cost of the migration project is
proportional to the number and complexity of these couplings. The reverse
engineering process involves extracting KDM models from the SQL
code.
An extractor that generates the KDM model from SQL code
can be automated. A framework that provides domain specific languages for
extraction of model is available and this can be used to create a model that
conforms to a target KDM from program that conforms to grammar. Dedicated
parsers can help with this code-to-model transformation.
With the popularity of machine learning techniques and
SoftMax classification, extracting domain classes according to syntax tree
meta-model and semantic graphical information has become more meaningful. The
two-step process of parsing to yield Abstract Syntax Tree Meta-model and
restructuring to express Abstract Knowledge Discovery Model becomes enhanced
with collocation and dependency information. This results in classifications at
code organization units that were previously omitted. For example, code
organization and call graphs can be used for such learning as shown in
reference.
The discovery of KDM and SMM can also be broken down into
independent learning mechanisms with the Dependency Complexity being one of
them.
The migration to microservices is sometimes referred to
as the “horseshoe model” comprising three steps: reverse engineering,
architectural transformations, and forward engineering. The system before the
migration is the pre-existing system. The system after the migration is the new
system. The transitions between the pre-existing system and the new system can
be described via pre-existing architecture and microservices
architecture.
The reverse engineering step comprises the analysis by
means of code analysis tools or some existing documentation and identifies the
legacy elements which are candidates for transformation to services. The transformation
step involves the restructuring of the pre-existing architecture into a
microservice based one as with reshaping the design elements, restructuring the
architecture, and altering business models and business strategies. Finally, in
the forward engineering step, the design of the new system is finalized.
Therefore, the pattern of parsing, reverse engineering,
restructuring, and forward engineering is common whether it is done once or
individually for each microservice. The repetition of the cycle end-to-end for
each microservice provides significant improvements and those that were humanly
impossible earlier. After all the microservices have been formed, the
repetitions provide significant learnings to make them leaner and meaner, thus
improving their quality and separation of concerns.
The motivation behind this approach is that the
application readiness is usually understood by going through a checklist. The
operational and application readiness checklist assesses several dozens of
characteristics. This helps with the data driven and quantitative analysis for
an approach to modernization.
The use of a classifier to run these rules is
well-established in the industry with plenty of precedence. Typically, they are
evaluated as a program order of conditions. Learning about interfaces can also
be improved with data mining techniques. These include:
1.
Classification algorithms - This is useful for finding
similar groups based on discrete variables.
It is used for true/false binary classification. Multiple
label classifications are also supported. There are many techniques, but the
data should have either distinct regions on a scatter plot with their own
centroids or if it is hard to tell, scan breadth first for the neighbors within
a given radius forming trees or leaves if they fall short.
Use Case: Useful for categorization of symbols beyond the
nomenclature. The primary use case is to see clusters of symbols match based on
features. By translating to a vector space and assessing the quality of cluster
with a sum of square of errors, it is easy to analyze substantial number of
symbols as belonging to specific clusters for management
perspective.
2.
Regression algorithms - This
is particularly useful to calculate a linear relationship
between a dependent and independent variable, and then use that relationship
for prediction.
Use case: Source code symbols demonstrate elongated scatter plots
in specific categories. Even when the symbols come dedicated to a category, the
lifetimes are bounded and can be plotted along the timeline. One of the best
advantages of linear regression is the prediction about time as an independent
variable. When the data point has many factors contributing to their
occurrence, a linear regression gives an immediate ability to predict where the
next occurrence may happen. This is far easier to do than coming up with a
model that behaves like a good fit for all the data points.
3.
Segmentation algorithms- A
segmentation algorithm divides data into groups, clusters, or items that have
similar properties.
Use Case: Customer segmentation based on symbol feature set is a
quite common application of this algorithm. It helps prioritize the usages
between consumers.
4.
Association algorithms - This
is used for finding correlations between different attributes in a data set.
Use
Case: Association data
mining allows these users to see helpful messages such as “consumers who used
this set of symbols also used this other set of symbols”
5.
Sequence Analysis Algorithms: This
is used for finding groups via paths in sequences. A Sequence Clustering algorithm is like a clustering
algorithm mentioned above but instead of finding groups based on similar
attributes, it finds groups based on similar paths in a sequence. A sequence is
a series of events. For example, a series of web clicks by a user is a
sequence. It can also be compared to the IDs of any sortable data maintained in
a separate table. Usually, there is support for a sequence column. The sequence
data has a nested table that contains a sequence ID which can be any sortable
data type.
Use Case: This is especially useful
to find sequences in symbol usages across a variety of components. Generally, a
set of SELECT SQL statements would follow the opening of a database connection
which could lead to an interpretation that this querying is useful for resource
state representation. This sort of sequence determination in a data driven
manner helps find new sequences and target them actively even suggesting
transitions that might have escaped the casual source code reader.
Sequence Analysis helps with leveraging state-based
encoded meaning behind the use of symbols
6.
Outliers Mining Algorithm: Outliers
are the rows that are most dissimilar. Given a relation R(A1, A2, ..., An), and
a similarity function between rows of R, find rows in R which are dissimilar to
most point in R. The objective is to maximize dissimilarity function in with a
constraint on the number of outliers or significant outliers if given.
The choices for similarity measures between rows include
distance functions such as Euclidean, Manhattan, string-edits, graph-distance
etc. and L2 metrics. The choices for aggregate dissimilarity measures is the
distance of K nearest neighbors, density of neighborhood outside the expected range
and the attribute differences with nearby neighbors
Use
Case: The steps to
determine outliers can be listed as: 1.
Cluster regular via K-means, 2. Compute distance of each tuple in R to
nearest cluster center and 3. choose top-K rows, or those with scores outside
the expected range. Finding outliers is sometimes humanly impossible because
the volume of the symbols might be quite high. Outliers are important to
discover new insights to encompass them. If there are numerous outliers, they
will significantly increase KDM building costs. If they were not, then the
patterns help identify efficiencies.
7.
Decision tree: This
is one of the most heavily used and easy to visualize mining algorithms. The
decision tree is both a classification and a regression tree. A function
divides the rows into two datasets based on the value of a specific column. The
two list of rows that are returned are such that one set matches the criteria
for the split while the other does not. When the attribute to be chosen is
clear, this works well.
Use
Case: A Decision Tree
algorithm uses the attributes of the service symbols to make a prediction such
as a set of symbols representing a component that can be included or excluded.
The ease of visualization of split at each level helps throw light on the
importance of those sets. This information becomes useful to prune the
tree and to draw the tree.
8.
Logistic Regression: This
is a form of regression that supports binary outcomes. It uses statistical
measures, is highly flexible, takes any kind of input and supports different
analytical tasks. This regression folds the effects of extreme values and
evaluates several factors that affect a pair of outcomes.
Use
Case: This can be used for finding repetitions in symbol usages.
9.
Neural Network: This
is a widely used method for machine learning involving neurons that have one or
more gates for input and output. Each neuron assigns a weight usually based on
probability for each feature and the weights are normalized across resulting in
a weighted matrix that articulates the underlying model in the training
dataset. Then it can be used with a test data set to predict the outcome
probability. Neurons are organized in layers and each layer is independent of
the other and can be stacked so they take the output of one as the input to the
other.
Use
Case: This is widely
used for SoftMax classifier in NLP associated with source code as text. This
finds latent semantics in the usage of symbols based on their
co-occurrence.
10.
Naïve Bayes algorithm: This is probably the most
straightforward statistical probability-based data mining algorithm compared to
others. The probability is a mere
fraction of interesting cases to total cases. Bayes probability is conditional
probability which adjusts the probability based on the premise.
Use Case: This is widely used for cases where conditions
apply, especially binary conditions such as with or without. If the input
variables are independent, their states can be calculated as probabilities, and
if there is at least a predictable output, this algorithm can be applied. The
simplicity of computing states by counting for class using each input variable
and then displaying those states against those variables for a give value,
makes this algorithm easy to visualize, debug and use as a predictor.
11.
Plugin Algorithms: Several
algorithms get customized to the domain they are applied to resulting in
unconventional or new algorithms. For example, a hybrid approach on association
clustering can benefit determining relevant associations when the matrix is
quite large and has a large tail of irrelevant associations from the cartesian
product. In such cases, clustering could be done prior to association to
determine the key items prior to this market-basket analysis.
Use Case: Source code symbols are notoriously susceptible to being
similar even when they appear with variations even when pertaining to the same
category. These symbols do not have pre-populated fields from a template, and
everyone enters values for inputs that differ from one to another. Using a
hybrid approach, it is possible to preprocess these symbols with clustering
before analyzing such as with association clustering.
12.
Simultaneous classifiers and regions-of-interest
regressors: Neural nets algorithms typically
involve a classifier for use with the tensors or vectors. But
regions-of-interest regressors provide bounding-box localizations. This form of
layering allows incremental semantic improvements to the underlying raw
data.
Use
Case: Symbol usages
are time-series data, and as more and more are opened, specific time ranges
become as important as the semantic classification of the symbols.
13.
Collaborative filtering: Recommendations
include suggestions for knowledge base, or to find model service symbols. In
order to make a recommendation, first a group sharing similar taste is found
and then the preferences of the group are used to make a ranked list of
suggestions. This technique is called collaborative filtering. A common data
structure that helps with keeping track of people and their preferences is a
nested dictionary. This dictionary could use a quantitative ranking say on a
scale of 1 to 5 to denote the preferences of the people in the selected
group. To find similar people to form a group, we use some form of a
similarity score. One way to calculate this score is to plot the items that the
people have ranked in common and use them as axes in a chart. Then the people
who are close together on the chart can form a group.
Use
Case: Several
approaches mentioned earlier provide a perspective to solving this case. This
is different from those in that opinions from multiple pre-established profiles
in a group are taken to determine the best set of interfaces to
recommend.
14.
Collaborative Filtering via Item-based filtering: This
filtering is like the previous except that it was for user-based approach, and
this is for item-based approach. It is significantly faster than the user-based
approach but requires storage for an item similarity table.
Use
Case: There are
certain filtering cases where divulging which profiles go with what preferences
is helpful to the profiles. At other times, it is preferable to use item-based
similarity. Similarity scores are computed in both cases. All other
considerations being same, item-based approach is better for sparse dataset.
Both user-based and item-based approach perform similarly for the dense
dataset.
15.
Hierarchical clustering: Although
classification algorithms vary quite a lot, hierarchical algorithm stands out
and is called out separately in this category. It creates a dendrogram where
the nodes are arranged in a hierarchy.
Use
Case: Specific
domain-based ontology in the form of dendrogram can be quite helpful to mining
algorithms.
16.
NLP algorithms: Popular
NLP algorithms like BERT can be used towards text mining.
NLP models come extremely useful for processing text from
work notes and other associated attachments in the symbols.
Machine learning algorithms are a tiny fraction of the
overall code that is used to realize prediction systems in production. As noted
in the paper on “Hidden Technical Debt in Machine Learning systems” by Sculley,
Holt and others, the machine learning code comprises mainly of the model but
all the other components such as configuration, data collection, features
extraction, data verification, process management tools, machine resource
management, serving infrastructure, and monitoring comprise the rest of the
stack. All these components are usually hybrid stacks in nature especially when
the model is hosted on-premises. Public clouds do have a pipeline and relevant
automation with better management and monitoring programmability than
on-premises, but it is usually easier for startups to embrace public clouds
than established large companies who have significant investments in their
inventory, devOps and datacenters.
Monitoring and pipeline contribute significantly towards
streamlining the process and answering questions such as why did the model
predict this? When was it trained? Who deployed it? Which release was it
deployed in? At what time was the production system updated? What were the
changes in the predictions? What did the key performance indicators show after
the update? Public cloud services have enabled both ML pipeline and their
monitoring. The steps involved in creating a pipeline usually involves
configuring a workspace and creating a datastore, downloading and storing
sample data, registering, and using objects for transferring intermediate data
between pipeline steps, downloading, and registering the model, creating, and
attaching the remote computer target, writing a processing script, building the
pipeline by setting up the environment and stack necessary to execute the
script that is run in this pipeline, creating the configuration to wrap the
script, creating the pipeline step with the above mentioned environment,
resource, input and output data, and reference to the script, and submitting
the pipeline. Many of these steps are easily automated with the help of
built-in objects published by the public cloud services to build and run such a
pipeline. A pipeline is a reusable object and one can that can be invoked over
the wire with a web-request.
Machine learning services collect the same kinds of
monitoring data as the other public cloud resources. These logs, metrics and
events can then be collected, routed, and analyzed to tune the machine learning
model.
Conclusion:
Many companies
will say that they are in the initial stages of the migration process because
the number and size of legacy elements in their software portfolio continues to
be a challenge to get through. That said, these companies also deploy anywhere
from a handful to hundreds of microservices while still going through the
deployment. Some migrations require several months and even a couple of years.
The management is usually supportive of migrations. The business-IT alignment
comprising of technical solutions and business strategies are more
overwhelmingly supportive of migrations.
Evaluating the
overall quality of the microservices refactored from the original source code
can be evaluated based on a score from a set of well-known criteria involving
DRY principles.
Microservices are implemented as small services by small
teams that suits Amazon’s definition of Two-Pizza Team. The migration
activities begin with an understanding of both the low-level and the high-level
sources of information. The source code and test suites comprise the
low-level. The higher-level comprises of textual documents, architectural
documents, data models or schema and box and lines diagrams. The relevant
knowledge about the system also resides with people and in some extreme cases
as tribal knowledge. Less common but useful sources of information include UML
diagrams, contracts with customers, architecture recovery tools for information
extraction and performance data. Very rarely but also found are cases where the
pre-existing system is considered so bad that its owners do not look at the
source code.
Such an understanding can also be used towards
determining whether it is better to implement new functionalities in the
pre-existing system or in the new system. This could also help with improving
documentation, or for understanding what to keep or what to discard in the new
system.