Friday, April 28, 2023

Microservice extractors for on-premises enterprise applications

Abstract:

Metamodels have been the traditional way to discover complex legacy applications. Unfortunately, this technique has remained manual with tools that are limited mainly to academic interests. There are seven metamodels including Knowledge Discovery Metamodel, Abstract Syntax Tree Metamodel, the Software Measurement Metamodel, analysis program, visualization, refactoring, and transformation. This paper argues that the final state for application modernization routinely converges to a known form. Since the monolithic legacy application is refactored into a ring of independent microservices, it is better for the tools to work backwards from this state rather than attempt to accurately describe the metamodels. There are two main proposals in this document. First, that interfaces in a legacy system are all evaluated as candidates for microservices and run through a set of rules in a classifier, grouped, ranked, sorted, and selected into a shortlist of microservices. Second, that the design of each microservice extracted holistically and individually from the legacy system offers more benefit than a single pass. The benefits for these can be summed up in terms of developer satisfaction.

Description:

Model driven Software development evolves existing systems and facilitates the creation of new software systems. 

The salient features of model driven software development include: 

1.       Domain-specific languages (DSLs) that express models at different abstraction levels. 

2.       DSL notation syntaxes collected separately. 

3.       Model transformations for generating code from models either directly by model-to-text transformations or indirectly by intermediate model-to-model transformations. 

An abstract syntax is known by a metamodel that uses a metamodeling language to describe a set of concepts and their relationships. These languages use object-oriented constructs to build metamodels. The relationship between a model and a metamodel can be described by a “conforms-to” relationship. 

There are seven metamodels including Knowledge Discovery Metamodel, Abstract Syntax Tree Metamodel, the Software Measurement Metamodel, analysis program, visualization, refactoring, and transformation. 

ASTM and KDM are complimentary in modeling software systems’ syntax and semantics. ASTMs use Abstract Syntax Trees to mainly represent the source code’s syntax, KDM helps to represent semantic information about a software system, ranging from source code to higher level of abstractions. KDM is the language of architecture and provides a common interchange format intended for representing software assets and tools interoperability. Platform, user interface or data can each have its own KDM and are organized as packages. These packages are grouped into four abstract layers to improve modularity and separation of concerns: infrastructure, program elements, runtime resource and abstractions.  

SMM is the metamodel that can represent both metrics and measurements. It includes a set of elements to describe the metrics in KDM models and their measurements.  

Taking the example of the modernization of a database forms application and migrating it to a Java platform, an important part of the migration could involve PL/SQL triggers in legacy Forms code. In a Forms application, the sets of SQL statements corresponding to triggers are tightly coupled to the User Interface. The cost of the migration project is proportional to the number and complexity of these couplings. The reverse engineering process involves extracting KDM models from the SQL code.  

An extractor that generates the KDM model from SQL code can be automated. A framework that provides domain specific languages for extraction of model is available and this can be used to create a model that conforms to a target KDM from program that conforms to grammar. Dedicated parsers can help with this code-to-model transformation. 

With the popularity of machine learning techniques and SoftMax classification, extracting domain classes according to syntax tree meta-model and semantic graphical information has become more meaningful. The two-step process of parsing to yield Abstract Syntax Tree Meta-model and restructuring to express Abstract Knowledge Discovery Model becomes enhanced with collocation and dependency information. This results in classifications at code organization units that were previously omitted. For example, code organization and call graphs can be used for such learning as shown in reference.

The discovery of KDM and SMM can also be broken down into independent learning mechanisms with the Dependency Complexity being one of them.  

The migration to microservices is sometimes referred to as the “horseshoe model” comprising three steps: reverse engineering, architectural transformations, and forward engineering. The system before the migration is the pre-existing system. The system after the migration is the new system. The transitions between the pre-existing system and the new system can be described via pre-existing architecture and microservices architecture. 

The reverse engineering step comprises the analysis by means of code analysis tools or some existing documentation and identifies the legacy elements which are candidates for transformation to services. The transformation step involves the restructuring of the pre-existing architecture into a microservice based one as with reshaping the design elements, restructuring the architecture, and altering business models and business strategies. Finally, in the forward engineering step, the design of the new system is finalized. 

Therefore, the pattern of parsing, reverse engineering, restructuring, and forward engineering is common whether it is done once or individually for each microservice. The repetition of the cycle end-to-end for each microservice provides significant improvements and those that were humanly impossible earlier. After all the microservices have been formed, the repetitions provide significant learnings to make them leaner and meaner, thus improving their quality and separation of concerns.

The motivation behind this approach is that the application readiness is usually understood by going through a checklist. The operational and application readiness checklist assesses several dozens of characteristics. This helps with the data driven and quantitative analysis for an approach to modernization.

The use of a classifier to run these rules is well-established in the industry with plenty of precedence. Typically, they are evaluated as a program order of conditions. Learning about interfaces can also be improved with data mining techniques. These include:

1.       Classification algorithms - This is useful for finding similar groups based on discrete variables. 

It is used for true/false binary classification. Multiple label classifications are also supported. There are many techniques, but the data should have either distinct regions on a scatter plot with their own centroids or if it is hard to tell, scan breadth first for the neighbors within a given radius forming trees or leaves if they fall short.
Use Case: Useful for categorization of symbols beyond the nomenclature. The primary use case is to see clusters of symbols match based on features. By translating to a vector space and assessing the quality of cluster with a sum of square of errors, it is easy to analyze substantial number of symbols as belonging to specific clusters for management perspective.   

2.       Regression algorithms   - This is particularly useful to calculate a linear relationship between a dependent and independent variable, and then use that relationship for prediction.  
Use case: Source code symbols demonstrate elongated scatter plots in specific categories. Even when the symbols come dedicated to a category, the lifetimes are bounded and can be plotted along the timeline. One of the best advantages of linear regression is the prediction about time as an independent variable. When the data point has many factors contributing to their occurrence, a linear regression gives an immediate ability to predict where the next occurrence may happen. This is far easier to do than coming up with a model that behaves like a good fit for all the data points.

3.        Segmentation algorithms- A segmentation algorithm divides data into groups, clusters, or items that have similar properties.  
Use Case: Customer segmentation based on symbol feature set is a quite common application of this algorithm. It helps prioritize the usages between consumers. 

4.       Association algorithms - This is used for finding correlations between different attributes in a data set.

Use Case: Association data mining allows these users to see helpful messages such as “consumers who used this set of symbols also used this other set of symbols”

5.       Sequence Analysis Algorithms: This is used for finding groups via paths in sequences. A Sequence Clustering algorithm is like a clustering algorithm mentioned above but instead of finding groups based on similar attributes, it finds groups based on similar paths in a sequence. A sequence is a series of events. For example, a series of web clicks by a user is a sequence. It can also be compared to the IDs of any sortable data maintained in a separate table. Usually, there is support for a sequence column. The sequence data has a nested table that contains a sequence ID which can be any sortable data type.
Use Case: This is especially useful to find sequences in symbol usages across a variety of components. Generally, a set of SELECT SQL statements would follow the opening of a database connection which could lead to an interpretation that this querying is useful for resource state representation. This sort of sequence determination in a data driven manner helps find new sequences and target them actively even suggesting transitions that might have escaped the casual source code reader.

Sequence Analysis helps with leveraging state-based encoded meaning behind the use of symbols  

6.       Outliers Mining Algorithm: Outliers are the rows that are most dissimilar. Given a relation R(A1, A2, ..., An), and a similarity function between rows of R, find rows in R which are dissimilar to most point in R. The objective is to maximize dissimilarity function in with a constraint on the number of outliers or significant outliers if given.   
The choices for similarity measures between rows include distance functions such as Euclidean, Manhattan, string-edits, graph-distance etc. and L2 metrics. The choices for aggregate dissimilarity measures is the distance of K nearest neighbors, density of neighborhood outside the expected range and the attribute differences with nearby neighbors  

Use Case: The steps to determine outliers can be listed as: 1. Cluster regular via K-means, 2.  Compute distance of each tuple in R to nearest cluster center and 3. choose top-K rows, or those with scores outside the expected range. Finding outliers is sometimes humanly impossible because the volume of the symbols might be quite high. Outliers are important to discover new insights to encompass them. If there are numerous outliers, they will significantly increase KDM building costs. If they were not, then the patterns help identify efficiencies.

7.       Decision tree: This is one of the most heavily used and easy to visualize mining algorithms. The decision tree is both a classification and a regression tree. A function divides the rows into two datasets based on the value of a specific column. The two list of rows that are returned are such that one set matches the criteria for the split while the other does not. When the attribute to be chosen is clear, this works well.

Use Case: A Decision Tree algorithm uses the attributes of the service symbols to make a prediction such as a set of symbols representing a component that can be included or excluded. The ease of visualization of split at each level helps throw light on the importance of those sets.  This information becomes useful to prune the tree and to draw the tree.

8.       Logistic Regression: This is a form of regression that supports binary outcomes. It uses statistical measures, is highly flexible, takes any kind of input and supports different analytical tasks. This regression folds the effects of extreme values and evaluates several factors that affect a pair of outcomes.

Use Case: This can be used for finding repetitions in symbol usages.

9.       Neural Network: This is a widely used method for machine learning involving neurons that have one or more gates for input and output. Each neuron assigns a weight usually based on probability for each feature and the weights are normalized across resulting in a weighted matrix that articulates the underlying model in the training dataset. Then it can be used with a test data set to predict the outcome probability. Neurons are organized in layers and each layer is independent of the other and can be stacked so they take the output of one as the input to the other.

Use Case: This is widely used for SoftMax classifier in NLP associated with source code as text. This finds latent semantics in the usage of symbols based on their co-occurrence.  

10.   Naïve Bayes algorithm: This is probably the most straightforward statistical probability-based data mining algorithm compared to others.  The probability is a mere fraction of interesting cases to total cases. Bayes probability is conditional probability which adjusts the probability based on the premise.

Use Case: This is widely used for cases where conditions apply, especially binary conditions such as with or without. If the input variables are independent, their states can be calculated as probabilities, and if there is at least a predictable output, this algorithm can be applied. The simplicity of computing states by counting for class using each input variable and then displaying those states against those variables for a give value, makes this algorithm easy to visualize, debug and use as a predictor.

11.   Plugin Algorithms: Several algorithms get customized to the domain they are applied to resulting in unconventional or new algorithms. For example, a hybrid approach on association clustering can benefit determining relevant associations when the matrix is quite large and has a large tail of irrelevant associations from the cartesian product. In such cases, clustering could be done prior to association to determine the key items prior to this market-basket analysis.
Use Case: Source code symbols are notoriously susceptible to being similar even when they appear with variations even when pertaining to the same category. These symbols do not have pre-populated fields from a template, and everyone enters values for inputs that differ from one to another. Using a hybrid approach, it is possible to preprocess these symbols with clustering before analyzing such as with association clustering.   

12.   Simultaneous classifiers and regions-of-interest regressors: Neural nets algorithms typically involve a classifier for use with the tensors or vectors. But regions-of-interest regressors provide bounding-box localizations. This form of layering allows incremental semantic improvements to the underlying raw data.  

Use Case: Symbol usages are time-series data, and as more and more are opened, specific time ranges become as important as the semantic classification of the symbols.

13.   Collaborative filtering: Recommendations include suggestions for knowledge base, or to find model service symbols. In order to make a recommendation, first a group sharing similar taste is found and then the preferences of the group are used to make a ranked list of suggestions. This technique is called collaborative filtering. A common data structure that helps with keeping track of people and their preferences is a nested dictionary. This dictionary could use a quantitative ranking say on a scale of 1 to 5 to denote the preferences of the people in the selected group.  To find similar people to form a group, we use some form of a similarity score. One way to calculate this score is to plot the items that the people have ranked in common and use them as axes in a chart. Then the people who are close together on the chart can form a group.  

Use Case: Several approaches mentioned earlier provide a perspective to solving this case. This is different from those in that opinions from multiple pre-established profiles in a group are taken to determine the best set of interfaces to recommend.  

14.   Collaborative Filtering via Item-based filtering: This filtering is like the previous except that it was for user-based approach, and this is for item-based approach. It is significantly faster than the user-based approach but requires storage for an item similarity table.

Use Case: There are certain filtering cases where divulging which profiles go with what preferences is helpful to the profiles. At other times, it is preferable to use item-based similarity. Similarity scores are computed in both cases. All other considerations being same, item-based approach is better for sparse dataset. Both user-based and item-based approach perform similarly for the dense dataset.

15.   Hierarchical clustering: Although classification algorithms vary quite a lot, hierarchical algorithm stands out and is called out separately in this category. It creates a dendrogram where the nodes are arranged in a hierarchy.

Use Case: Specific domain-based ontology in the form of dendrogram can be quite helpful to mining algorithms. 

16.   NLP algorithms: Popular NLP algorithms like BERT can be used towards text mining.  

NLP models come extremely useful for processing text from work notes and other associated attachments in the symbols.  

Machine learning algorithms are a tiny fraction of the overall code that is used to realize prediction systems in production. As noted in the paper on “Hidden Technical Debt in Machine Learning systems” by Sculley, Holt and others, the machine learning code comprises mainly of the model but all the other components such as configuration, data collection, features extraction, data verification, process management tools, machine resource management, serving infrastructure, and monitoring comprise the rest of the stack. All these components are usually hybrid stacks in nature especially when the model is hosted on-premises. Public clouds do have a pipeline and relevant automation with better management and monitoring programmability than on-premises, but it is usually easier for startups to embrace public clouds than established large companies who have significant investments in their inventory, devOps and datacenters. 

Monitoring and pipeline contribute significantly towards streamlining the process and answering questions such as why did the model predict this? When was it trained? Who deployed it? Which release was it deployed in? At what time was the production system updated? What were the changes in the predictions? What did the key performance indicators show after the update? Public cloud services have enabled both ML pipeline and their monitoring. The steps involved in creating a pipeline usually involves configuring a workspace and creating a datastore, downloading and storing sample data, registering, and using objects for transferring intermediate data between pipeline steps, downloading, and registering the model, creating, and attaching the remote computer target, writing a processing script, building the pipeline by setting up the environment and stack necessary to execute the script that is run in this pipeline, creating the configuration to wrap the script, creating the pipeline step with the above mentioned environment, resource, input and output data, and reference to the script, and submitting the pipeline. Many of these steps are easily automated with the help of built-in objects published by the public cloud services to build and run such a pipeline. A pipeline is a reusable object and one can that can be invoked over the wire with a web-request.  

  

Machine learning services collect the same kinds of monitoring data as the other public cloud resources. These logs, metrics and events can then be collected, routed, and analyzed to tune the machine learning model.

 

Conclusion:

Many companies will say that they are in the initial stages of the migration process because the number and size of legacy elements in their software portfolio continues to be a challenge to get through. That said, these companies also deploy anywhere from a handful to hundreds of microservices while still going through the deployment. Some migrations require several months and even a couple of years. The management is usually supportive of migrations. The business-IT alignment comprising of technical solutions and business strategies are more overwhelmingly supportive of migrations. 

Evaluating the overall quality of the microservices refactored from the original source code can be evaluated based on a score from a set of well-known criteria involving DRY principles.

Microservices are implemented as small services by small teams that suits Amazon’s definition of Two-Pizza Team. The migration activities begin with an understanding of both the low-level and the high-level sources of information. The source code and test suites comprise the low-level.  The higher-level comprises of textual documents, architectural documents, data models or schema and box and lines diagrams. The relevant knowledge about the system also resides with people and in some extreme cases as tribal knowledge. Less common but useful sources of information include UML diagrams, contracts with customers, architecture recovery tools for information extraction and performance data. Very rarely but also found are cases where the pre-existing system is considered so bad that its owners do not look at the source code. 

Such an understanding can also be used towards determining whether it is better to implement new functionalities in the pre-existing system or in the new system. This could also help with improving documentation, or for understanding what to keep or what to discard in the new system.

No comments:

Post a Comment