Cluster computing

Wednesday, December 21, 2022

Previous post continued

My application data must be encrypted. (Select all that apply)

- at rest

- in transit

- none of the above

My data has PII or PHI and is subject to governance and compliance.

- Yes

- No

With the adoption of cloud technologies, the number of users to my application is expected to increase by

- < 5%

- 5-15%

- 15-50%

- > 50%

My application depends on an on-premise message broker. The throughput of the queues is in the range

- 0 - 1 KB/sec

- 1 - 5 KB/sec

- < 1 MB /sec

- None of the above

My application has an ETL, data pipelining and/or batch automation job

- Yes

- No

My application has vendor lock-ins for OLAP and data warehouses.

- Yes

- No

My application has disaster recovery considerations and/or involves data ageing and archival

- Yes

- No

My application response times must be in the range:

- < 100 ms

- 100 - 250 ms

- > 250 ms

My data remains on-premises

- Yes

- No

My application has significant security restrictions and makes use of firewalls, vulnerabilities assessments and periodic threat assessments

- Yes

- No

My application must meet security certifications such as ISO 27001, ISO 27017 (cloud security), ISO 27019 (privacy), ISO 9001, AWS PCI, and SOC 1, 2 and 3, HIPAA, FERPA, CJIS, SEC Rule 17a - 4(f), IRS 1075, and SRG Impact level 2 and 4 for DoD systems

- Yes

- No

My application must work for government and must comply with FedRAMP at the Moderate and High Level or the GPDR or PCI-DSS or such others

- Yes

- No

My application must reduce the average time needed to detect an intrusion or security failure (MTTD) or the average time needed to resolve issues such as a security breach or outage (MTTR)

- Yes

- No

My application must demonstrate an order of magnitude higher availability after moving to the cloud

- Yes

- No

My application must localize workloads to a specific geographic region

- Yes

- No

I have a specific preference to a specific cloud and I'm willing to sacrifice one or more of the following ( Select all that apply )

- technological scale and expertise to handle critical and highly complex workloads

- anticipated cost savings

- internal IT burden

- None of the above

Tuesday, December 20, 2022

Application Modernization Readiness Assessment

This checklist evaluates the dependencies of an application to help with its modernization. It strives to collect all the information about the application to give a more detailed and precise picture of its readiness for building and running it in the cloud.

Hi, Ravi. When you submit this form, the owner will see your name and email address.

1.The drive for the modernization of this application comes from:

Changing business requirements

Technical debt

Pending deadline

Budgetary considerations

None of the above

2.My application is an N-Tier web application and has customer facing portal.

Yes

3.My application modernization journey requires all three: planning, executing and monitoring.

Yes

4.My application is accessible (Select all that apply.)

Web Interface

Command line

Scripts

SDK

None of the above

5.I would like my cloud adoption strategy to be

Retain/Retire

Lift-and-shift

Lift-and-reshape

Replace, drop and shop

Refactor (Rewriting/De-coupling applications)

6.My application has specific requirements from: (Select all that apply)

Programming Languages

Operating systems

Databases

Services

Application Frameworks

7.My application must maintain the same programming language

Yes

8.My application is sensitive to the flavor and/or version of operating system

Yes

9.My application requires the database to be the same as before:

Yes

10.My application is dependent on other services that are not available in the cloud

Yes

11.My application requires profiling to generate:

mapping of system components

topology maps

coverage of the technology stack

automations

creating a baseline

to view/simulate real-world conditions

for full stress testing

12.I'm fine with phased migration with phases for

service-by-service migration

improving performance and scalability

Integration and full DevOps support

meeting SLAs required from the application

13.I can point to SLAs for the application

Yes

14.I need CI/CD enhancements for

visibility over migration strategy and roadmap

investing over quality controls

showing on dashboards

using with my monitoring solution

15.I need investment in fault detection for

maintaining availability and performance

leveraging my monitoring investment

improving visibility

reducing false alerts

16.I have specific queries or demands from my applications behavior over time

Yes

17.The number of web requests to my application are in the range

< 100 per hour

< 100 per minute

< 100 per second

100 - 1000 per second

> 1000 per second

18.The size of data stored in the database increases by

a few hundred bytes per day

a few hundred kilobytes per day

a few hundred megabytes per day

a few hundred gigabytes per day

a few terabytes per day

greater than a few terabytes a day

This content is created by the owner of the form. The data you submit will be sent to the form owner. Microsoft is not responsible for the privacy or security practices of its customers, including those of this form owner. Never give out your password.

Monday, December 19, 2022

The application of data mining and machine learning techniques to Reverse Engineering.

An earlier article¹ introduced the notion and purpose of reverse engineering. This article focuses on the transitions between text to model for source code so that the abstract knowledge discovery model can be enhanced.

The premise for doing this is similar to what a compiler does in creating a symbol table and maintaining dependencies. In particular, we recognize that the symbols as nodes and their dependencies as edges presents a rich graph on which relationships can be superimposed and queried for different insights. These insights help with better representation of the KDM. Specifically, some queries can be based on the well-known architecture designs such as model-view-controllers that leverage both the functionality and layout of source code. But the purpose of this article is to leverage well-known data mining algorithms to glean more insights. Even a basic linear or non-linear ranking of the symbols and thresholding them can be very useful towards representing the architecture.

We cover just a few of the data mining algorithms to begin with and close that with a discussion on machine learning methods including SoftMax classification that can make excellent use of co-occurrence data. Finally, we suggest that this does not need to be a one-pass KDM builder and that the use of pipeline and metrics can be helpful towards incremental or continually enhancing the KDM. The symbol and dependency graph is merely the persistence of information learned which can be leveraged for analysis and reporting such as rendering a KDM

Classification algorithms

This is useful for finding similar groups based on discrete variables

It is used for true/false binary classification. Multiple label classifications are also supported. There are many techniques, but the data should have either distinct regions on a scatter plot with their own centroids or if it is hard to tell, scan breadth first for the neighbors within a given radius forming trees or leaves if they fall short.

Useful for categorization of symbols beyond the nomenclature. Primary use case is to see clusters of symbols match based on features. By translating to a vector space and assessing the quality of cluster with a sum of square of errors, it is easy to analyze large number of symbols as belonging to specific clusters for management perspective.

Decision tree

This is probably one of the most heavily used and easy to visualize mining algorithm. The decision tree is both a classification and a regression tree. A function divides the rows into two datasets based on the value of a specific column. The two list of rows that are returned are such that one set matches the criteria for the split while the other does not. When the attribute to be chosen is clear, this works well.

A Decision Tree algorithm uses the attributes of the service symbols to make a prediction such as a set of symbols representing a component can be included or excluded. The ease of visualization of split at each level helps throw light on the importance of those sets. This information becomes useful to prune the tree and to draw the tree

Sunday, December 18, 2022

Writing a book:

The journey for writing and publishing a book has many milestones. This article captures some of the essential steps.

The first step for a good book is to prepare the manuscript. Many authors will vouch for the effort that this takes and seldom it is a collaboration unlike the other milestones on the roadmap to realizing a book. It is also an opportunity for the author to find their true voice as she articulates it in the book. Planning for the book writing is an essential step because it consumes several hours at least. There are quite a few incentives available to write a book. The famous author Alexander Chee described his love of writing while on the train and his wish that Amtrak would provide residencies for that very purpose. Amtrak established its residency program to that effect in 2014.

The next step for the author is to decide between self-publishing or utilizing a white glove full service from a publishing house. This affects planning for such things as an ISBN number which must be purchased and can be used with a book only when it has been applied and paid for. Many publishing companies offer several advantages such as working with graphic designers to create the book cover.

The beginning and end of a book have several sections and language necessary for the proper branding and compliance that can only come from experience. It is better to hire a publisher at least for the first book.

The choices between publishing houses vary quite a bit. Some can be based on reputation, precedence or simply affiliation but did you know there are negotiations involved that can often make one better than the other among equals. The art of negotiation is in full swing when it comes to compensation.

There are shops that will help companies enable authors to self-publish their book that allow you to keep 100% of the royalties but these take a while.

Editing, book-designing, and selling are all essential considerations for the publishing of the book. These include several aspects that the publishers can help with, and some might take the author more time if she were to do it herself. Selling channels and advertising is immensely dependent on how sellers perceive the book more than what the author may want to say. Positioning and branding the book to proper selling channels is just as important as the royalty discussion.

There are many creative aspects in which competitors differentiate themselves from each other in the book publishing industry but following the well-trodden path comes with some predictability.

Lastly, authors must explore the opportunities to read the book for audio and books on tapes because these sell just as well as the books.

Friday, December 16, 2022

Reverse engineering of applications

Security experts, DevOps and SRE personnel often find themselves in situations where they are given an application that they do not own but must know enough about it to do their job. Their concerns overlap over the discipline of reverse engineering. It is not enough to know what an application is by doing static analysis whenever possible, but it is also necessary to know what it does at runtime. Observability of an application does not come easily with even sophisticated and learning tools that are now so ubiquitous on-premises and in the cloud. System center agents for public clouds, third-party monitoring tools, on-premises support for telemetry, logging and tracing and cloud-based x-rays of workloads can help with many cases. But they do not adequately cover standalone application-database pairs that are remnants of an age gone by. While arguably every organization can claim to have only a finite number of these applications, more recently container images have proliferated by mutations to the point where organizations might not even bother to make an inventory of all usages.

There are many runtimes analysis tools that are designed to closely monitor the behavior of an application or container image. Companies like Twistlock and Tenable.IO have opened new ways of investigating them for the purposes of vulnerability assessment. Some tools have found a way to allow source code insertion to instrument the source code providing dynamic analysis of the application while it is running either on a native or an embedded target platform. Tools also vary by purpose. For example, code coverage tools perform code coverage analysis. Memory profilers analyze memory usage and detect memory leaks. Performance profiling provides performance load monitoring. Runtime tracing draws a real-time UML sequence diagram of the application. Tools can also be used alone or together. When the source code is run with any of the runtime analysis tools engaged, the resulting instrumented code is then executed, and the result is dynamically displayed in the corresponding reports.

Among these runtime tracing is perhaps the most comprehensive way of observing real-time dynamic interaction analysis of the source code by generating trace data, which can then be used to create UML Diagrams.

Model driven Software discovery evolves existing systems and facilitates the creation of new software systems.

The salient features of model driven software discovery include:

1. Domain-specific languages (DSLs) that express models at different abstraction levels.

2. DSL notation syntaxes that are collected separately

3. Model transformations for generating code from models either directly by model-to-text transformations or indirectly by intermediate model-to-model transformations.

An abstract syntax is defined by a metamodel that uses a metamodeling language to describe a set of concepts and their relationships. These languages use object-oriented constructs to build metamodels. The relationship between a model and a metamodel can be described by a “conforms-to” relationship.

KDM helps to represent semantic information about a software system, ranging from source code to higher level of abstractions. KDM is the language of architecture and provides a common interchange format intended for representing software assets and tools interoperability. Platform, user interface or data can each have its own KDM and are organized as packages. These packages are grouped into four abstract layers to improve modularity and separation of concerns: infrastructure, program elements, runtime resources and abstractions.  SMM is the metamodel that can represent both metrics and measurements. It includes a set of elements to describe the metrics in KDM models and their measurements. 

These are some of the ways reverse engineering is evolving fueled by the requirements called out.

Thursday, December 15, 2022

Overview of a Modernization tool and process in architecture driven modernization.

This section is explained in the context of the modernization of a database forms application to a Java platform. An important part of the migration could involve PL/SQL triggers in legacy Forms code. In a Forms application, the sets of SQL statements corresponding to triggers are tightly coupled to the User Interface. The cost of the migration project is proportional to the number and complexity of these couplings. The reverse engineering process involves extracting KDM models from the SQL code.

An extractor that generates the KDM model from SQL code can be automated. A framework that provides domain specific languages for extraction of model is available and this can be used to create a model that conforms to a target KDM from program that conforms to grammar. Dedicated parsers can help with this code-to-model transformation.

A major factor that determines the time and effort required for the migration of a trigger is its coupling to the user interface which includes the number and kind of statements for accessing the User Interface. A tool to analyze this coupling helps to estimate the modernization costs. Several metrics can be defined to measure the coupling that influences the efforts of migrating triggers. For example, these metrics are based on the UI statements’ count, location, and type such as whether for reading or writing. The couplings can be classified as reflective, declarative, and imperative. The extracted KDM models can then be transformed into Software Measurement Metamodels.

With the popularity of machine learning techniques and SoftMax classification, extracting domain classes according to syntax tree meta-model and semantic graphical information has become more meaningful. The two-step process of parsing to yield Abstract Syntax Tree Meta-model and restructuring to express Abstract Knowledge Discovery Model becomes enhanced with collocation and dependency information. This results in classifications at code organization units that were previously omitted. For example, code organization and call graphs can be used for such learning as shown in reference 1. The discovery of KDM and SMM can also be broken down into independent learning mechanisms with the Dependency Complexity being one of them.

Wednesday, December 14, 2022

Networking Modernization

Networking and storage are taken for granted but their modernization is as important to the enterprise as the Applications and Databases. When businesses embrace the cloud, they must consider whether their on-premises network will scale up to the cloud traffic. The cloud acts like a massive aggregator of traffic and with hybrid cloud, the on-premises network can get overwhelmed because they were not designed with the cloud capacity. This section of the book deals with these considerations for multi cloud adoption and hybrid computing.

Networking modernization is essential to digital transformation. When networks age, they don’t just pose a higher risk with a fault domain, they also increase complexity by being some of the lowest levels of virtualization and compute. When a single Network Interface Card failed on the corporate network associated with a production system, it was easy to diagnose given the reservations made and the stack that was dedicated to it. In a hybrid world, the customers have gone way beyond the traditional application/database landscape to having more modular applications with deep divisions and even segregated hardware. The communication is assumed to be a resource as free as the storage and one that does not factor beyond the latency of a single call. With cloud traffic, application usages and their management via a single pane of glass has elevated the customers from on-premises to the cloud. The public cloud supports rich monitoring that even spans the on-premises with the help of agents running on the enterprise hosts, but they do not help in determining the root cause of failure when the symptoms of failure become scattered, sparse, and even random or non-deterministic.

Newer networks have become software-defined and rightfully so although this has increased an abstraction layer over the hardware. This is an architectural approach to data center networking in the cloud era, bringing the flexibility and economy of software to datacenter hardware. It helps enterprise network infrastructure with the needs of application workloads by providing 1. Automated orchestration and agile provisioning, 2. Programmatic network management, 3. Application-oriented, network wide visibility, and 4. Direct integration with cloud orchestration platforms. SDN is even built into each operating system. When IT wants the ability to deploy applications quickly, SDN and network controller can be used, and policy can be managed with scripts. HyperV, and network controller can be used to create virtual Local Area Networks overlays which do not require the reassignment of IP addresses. Hybrid SDN gateways can be used to assign and manage resources independently.

There is greater security and isolation of workloads with the use of network security groups and distributed firewalls for micro-segmentation. North-South internet traffic and East-West intranet traffic can be established differently. User-defined routing can be configured with service chains that can be established with 3rd party appliances such as firewall, load balancer or content inspection. Cost is driven down by converging storage and network on Ethernet, and activating Remote Direct Memory Access (RDMA)

Network modernization might seem like an overwhelming challenge by virtue of the number of entities impacted by the effort. It can even be a struggle to get a clear picture of the evolving application environment or to document the changing requirements over the infrastructure and operations. Many organizations that don’t know where to begin can do so by identifying gaps that might hinder SDN deployment, determine automation needs, define an orchestration strategy and develop a roadmap.

A strategy for orchestration and automation becomes critical to such implementation plans. Some of these activities of network modernization include enabling self-service functions for development teams, reducing risk through integrated governance and management, preventing vendor lock-ins on hardware-based platforms, saving time by orchestrating and automating integration complexities and boosting overall quality through intelligent and aware operations such as self-healing.