Wednesday, November 30, 2022

Towards architecture driven modernization

 

Architecture driven application modernization involves meta modeling and transformations of models that can help to optimize system evaluations costs by automating the modernization process of systems. This is done in three phases: reverse engineering, restructuring and forward engineering. Reverse engineering technologies can analyze legacy software systems, identify its widgets and their interconnection, reproduce it based on the extracted information, and create a representation at a higher level of abstraction. Some requirements for modernization tools can be called out here. It must allow extracting domain classes according to Concrete Syntax Tree meta-model and semantic graphical information, then analyze extracted information to change them into a higher level of abstraction as a knowledge-discovery model.

Software modernization approach becomes a necessity for creating new business value from legacy applications. Modernization tools are required to extract a model from text in source code that conforms to a grammar by manipulating the concrete syntax-tree of the source code. For example, there is a tool that can convert Java swing applications to Android platform which uses two Object Management Group standards: Abstract Syntax Tree for representing data extracted from java swing code in reverse engineering phase and Knowledge Discovery Platform-independent model. Some tools can go further to propose a Rich Internet Application Graphical User Interface. The three phases articulated by this tool can be separated into stages as: the reverse engineering phase which uses the jdt API for parsing the Java swing code to fill in an AST and Graphical User Interface model and the restructuring phase that represents a model transformation for generating an abstract KDM model and the forward phase which includes the elaboration of the target model and a Graphical User Interface.

The overall process can be described as the following transitions:

Legacy system  

–parsing->

AST Meta model

–restructuring algorithm->

Abstract Knowledge Model

--forward engineering->

GUI Metamodel.

The reverse engineering phase is dedicated to the extraction and representation of information. It defines the first phase of reengineering following the Architecture Driven Modernization process. It is about the parsing technique and representing information in the form of a model. Parsing can focus on the structural aspect of header and source files and then there is the presentation layer that determines the layout of the functionalities such as widgets.

The restructuring phase aims at deriving an enriched conceptual technology independent specification of the legacy system in a knowledge model KDM from the information stored inside the models generated on the previous phase. KDM is an OMG standard and can involve up to four layers: Infrastructure layer, Program Elements layer, resource layer and abstractions layer. Each layer is dedicated to a particular application viewpoint.

The forward engineering is a process of moving from high-level abstractions by means of transformational techniques to automatically obtain representation on a new platform such as microservices or as constructs in a programming language such as interfaces and classes. Even the user interface can go through forward engineering into a Rich Internet application model with a new representation describing the organization and positioning of widgets.

Automation is key to developing a tool that enables these transitions via reverse engineering, restructuring and forward engineering.

 

 

Tuesday, November 29, 2022

Application modernization continued

 

This section of the article discusses a case study on the incremental code-migration strategy of large monolithic base used in supply system. The code migration strategy considers a set of factors that includes scaffolding code, balancing iterations, and grouping related functionality.

Incremental migration only works when it is progressive. Care must be taken to ensure that progress is measured by means of some key indicators. These include tests, percentage of code migration, signoffs and such other indicators. Correspondingly the backlog of the code in the legacy system that must be migrated must also decrease.

Since modernized components are being deployed prior to the completion of the entire system, it is necessary to combine elements from the legacy system with the modernized components to maintain the existing functionality during the development period. Adapters and other wrapping techniques may be needed to provide a communication mechanism between the legacy system and the modernized systems, when dependencies exist.

There is no downtime during the incremental modernization approach and this kind of modernization effort tries to always keep the system fully operational while reducing the amount of rework and technical risk during the modernization. One way to overcome challenges in this regard is to plan the efforts involved in the modernization. A modernization plan must also include the order in which the functionality is going to be modernized.

Another way to overcome challenges is to build and use adapters, bridges and other scaffolding code which represents an added expense, as this code must be designed, developed, tested and maintained during the development period but it eventually reduces the overall development and deployment costs.

Supporting an aggressive and yet predictable schedule also helps in this regard. The componentization strategy should seek to minimize the time required to develop and deploy the modernized system.

This does not necessarily have a tradeoff with quality as both the interim and final stages must be tested and the gates for release and progression towards revisions only helps with the overall predictability and timeline of the new system.

Risk occurs in different forms and some risk is acceptable if it is managed and mitigated properly. Due to the overall size and investment required to complete a system migration, it is important that the overall risk be kept low. The expectations around the system including its performance helps to mitigate these risks.

 

Monday, November 28, 2022

Application modernization for massively parallel applications

  Part 6 of this article on Application Modernization covered the migration process. This section focuses on specialty applications.

Every organization must determine its own roadmap to application modernization. Fortunately, patterns and best practices continue to help and provide guidance. This section describes the application modernization for a representative case study for those applications that do not conform to cookie cutter web applications.

When we take a specialty application that involves massively parallel compute intensive applications that provide predictions, the initial approach is one of treating the model as a black box and working around the dependencies it has. But the modernization effort does not remain constrained by the technology stack that are the dependencies of the model. Instead, this is an opportunity to refine the algorithm and describe it with a class and an interface that lends itself to isolation and testing. This has the added benefit of providing testability beyond those that were available until now. The algorithm can also be implemented with design patterns like the Bridge design pattern so that the abstraction and the implementation can vary or the Strategy design pattern which facilitates a family of algorithms that are interchangeable.

Microservices developed in an agile manner with Continuous Integration and Continuous Deployment pipeline provide an unprecedented opportunity to compare algorithms and fine tune them in a standalone manner where the investments for infrastructure and data preparation need to be considered only once.

Algorithms for massively parallel systems often involve some variation of batched map-reduce summation forms or a continuous one-by-one record processing streaming form. In either of these cases, the stateless form of the microservice demonstrates superior scalability and reduced execution time than other conventional forms of software applications. The leap between microservice to serverless computing is one that can be taken for lightweight processing or where the model has already been trained so that it can be hosted with little resources.

Parallel computing works on immense size of data and the considerations for data modernization continue to apply independent of the application modernization.

Sunday, November 27, 2022

Part 6: Application Migration process

There are two ways at least that are frequently encountered for the migration process itself. In some cases, the migration towards microservices architecture is organized in small increments, rather than a big overall migration project. In those cases, the migration is implemented as an iterative and incremental process. They might also be referred to as phased adoption. This has been the practice even for the migration towards Service Oriented Architecture. There are times when the migration has a predefined starting point but not necessarily a defined upfront endpoint.

Agility is a very relevant aspect when moving towards a microservices architecture. New functionalities are often added during the migration. This clearly shows that the preexisting system was hindering development and improvements. New functionalities are added as microservices, and existing functionalities are reimplemented also as microservices. The difficulty is only in getting the infrastructure ready for adding microservices. Domain-driven design practices can certainly help here.

Not all the existing functionality is migrated. It does not align with the “hide the internal implementation detail” principle of microservices nor does it align with the typical MSA characteristic of decentralized data management. If the data is not migrated, it may hinder the evolving of independent services. Both the service and the data scalability are also hindered. If the scalability is not a concern, then the data migration can be avoided altogether.

The main challenges in architecture transformation are represented by (i) the high level of coupling, (ii) the difficulties in identifying the boundaries of services (iii) and system decomposition. There could be some more improvement and visibility in this area with the use of architecture recovery tools so that the services are well-defined at the architectural level.

Some good examples of microservices have consistently shown a pattern of following the “model around business concepts”.

The general rule of thumb inferred from various microservices continues to be 1) First, to build and share reusable technical competence/knowledge which includes (i.) kickstarting a MSA and (ii.) reusing solutions, 2) Second, to check business-IT alignment which is a key concern during the migration and 3) Third, to monitor the development effort and migrate when it grows too much which would show a high correlation between migration to microservices and increasingly prohibitive effort in implementing new functionalities in the monolith.


Saturday, November 26, 2022

Part 5: Application Modernization and the migration towards Microservices architecture

 The path towards a microservice-based architecture is anything but straightforward in many companies. There are plenty of challenges to address from both technical and organizational perspectives. The performed activities and the challenges faced during the migration process are both included in this section. 

The migration to microservices is sometimes referred to as the “horseshoe model” comprising three steps: reverse engineering, architectural transformations, and forward engineering. The system before the migration is the pre-existing system. The system after the migration is the new system. The transitions between the pre-existing system and the new system can be described via pre-existing architecture and microservices architecture. 

The reverse engineering step comprises the analysis by means of code analysis tools or some existing documentation and identifies the legacy elements which are candidates for transformation to services. The transformation step involves the restructuring of the pre-existing architecture into a microservice based one as with reshaping the design elements, restructuring the architecture, and altering business models and business strategies. Finally, in the forward engineering step, the design of the new system is finalized. 

Many companies will say that they are in the early stages of the migration process because the number and size of legacy elements in their software portfolio continues to be a challenge to get through. That said, these companies also deploy anywhere from a handful to hundreds of microservices while still going through the deployment. Some migrations require several months and even a couple of years. The management is usually supportive of migrations. The business-IT alignment comprising of technical solutions and business strategies are more overwhelmingly supportive of migrations. 

Microservices are implemented as small services by small teams that suits Amazon’s definition of Two-Pizza Team. The migration activities begin with an understanding of both the low-level and the high-level sources of information. The source code and test suites comprise the low-level.  The higher-level comprises of textual documents, architectural documents, data models or schema and box and lines diagrams. The relevant knowledge about the system also resides with people and in some extreme cases as tribal knowledge. Less common but useful sources of information include UML diagrams, contracts with customers, architecture recovery tools for information extraction and performance data. Very rarely but also found are cases where the pre-existing system is considered so bad, that their owners do not look at the source code. 

Such an understanding can also be used towards determining whether it is better to implement new functionalities in the pre-existing system or in the new system. This could also help with improving documentation, or for understanding what to keep or what to discard in the new system. 

Friday, November 25, 2022

 

Part 3 discussed microservices. This one focuses on maintainability, performance, and security. The maintainability of microservices is somewhat different from conventional software. When the software is finished, it is handed over to the maintenance team.  This model is not favored for microservices. Instead, a common practice for microservices development is for the owning team to continue owning it for its lifecycle. This idea is inspired by Amazon’s “you build it, you run it” philosophy. Developers working daily with their software and communicating with their customers creates a feedback loop for the improvement of the microservice.

Microservices suffer a weakness in their performance in that the communication happens over a network. Microservices often send requests to one another. The performance is dependent on these external request-responses. If a microservice has well-defined bounded contexts, it will experience less performance hit. The issues related to microservice connectivity can be mitigated in two ways – making less frequent and more batched calls as well as converting the calls to be asynchronous. Parallel requests can be issued for asynchronous calls and the performance hit is that of the slowest call.

Microservices have the same security vulnerabilities as any other distributed software. Microservices can always be targeted for denial-of-service attack. Some endpoint protection, rate limits and retries can be included with the microservices. Requests and responses can be encrypted so that the data is never in the clear. If the “east-west” security cannot be guaranteed, at least the edge facing microservices must be protected with a firewall or a proxy or a load balancer or some such combination. East-West security refers to the notion that the land connects the east and the west whereas the oceans are external. Another significant security concern is that a monolithic software can be broken down into many microservices which can increase the surface area significantly. It is best to perform threat modeling of each microservice independently. Threat modeling can be done with STRIDE as an example. It is an acronym for the following: Spoofing Identity – is the threat when a user can impersonate another user. Tampering with data- is the threat when a user can access resources or modify the contents of security artifacts. Repudiation – is the threat when a user can perform an illegal action that the microservice cannot deter. Information Disclosure – is the threat when, say a guest user can access resources as if the guest was the owner. Denial of service – is the threat when say a crucial component in the operations of the microservice is overwhelmed by requests so that others experience outage. Elevation of privilege – is the threat when the user has gained access to the components within the trust boundary and the system is therefore compromised.

Migration of microservices comes with three challenges: multitenancy, statefulness and data consistency. The best way to address these challenges involves removing statefulness from migrated legacy code, implementing multitenancy, and paying increased attention to data consistency.

Thursday, November 24, 2022

Part 3: The refactoring of old code to new microservices

 Part 2 of this article earlier, described microservices versus monolithic architecture. With the introduction of microservices, it became easy to host not only a dedicated database but also a dedicated database server instance and separate the concerns for each functionality that the user interface comprised of. When we use microservices with Mesos-based clusters and shared volumes, we can even have many copies of the server for high availability and failover. This is possibly great for small and segregated data but larger companies often require massive investments in their data, often standardizing tools, processes, and workflows to better manage their data. In such cases, consumers of the data don't talk to the database directly but via a service that sits behind say even a message bus. If the consumers proliferate, they end up creating and sharing many different instances of services for the same data each with its own view rather than the actual table.  APIs for these services are more domain-based rather than implementing a query-friendly interface that lets you directly work with the data. As services are organized, data may get translated or massaged as it makes its way from one to another. It is possible to have a ring of microservices that can take care of most data processing for business requirements. Data may even be at most one or two fields of an entity along with its identifier for such services. This works very well to alleviate the onus and rigidity that comes with organization, the interactions between the components, and the various chores that need to be taken to keep it flexible to suit changing business needs. The microservices are independent so they stand by themselves as if spreading out from data for their respective functionalities. This is already business-friendly because each service can now be modified and tested independently of others.

The transition to microservices from legacy monolithic code is not straightforward. The functionalities must be separated beyond components. And in the process of doing so, we cannot risk regression. Tests become a way to scope out behavior at boundaries such as interface and class interactions.  Adequate coverage of tests will guarantee backward compatibility for the system as it is refactored. The microservices are independently testable both in terms of unit tests as well as end-to-end tests. Services usually have a REST interface which makes it easy to invoke them from clients and comes with the benefits of using browser-based developer tools. The data store does not need to be divided between services. In some cases, only a data access service is required which other microservices can call. The choice and design of microservices stem from the minimal functionalities that need to be separated and articulated. If the services don’t need to be refactored at a finer level, they can remain encapsulated in a singleton.

The rule of thumb for the refactoring of the code is the follow-up of the Don’t Repeat Yourself or (DRY) principle which is defined as “Every piece of knowledge must have a single, unambiguous, authoritative representation within a system”. This calls for every algorithm or logic that is cut and pasted for different usages to be consolidated at a single point of maintenance.  This improves flexibility because enhancements such as the use of a new data structure can be replaced in one place and it also reduces the bugs that come by when similar changes must be made in several places. This principle also reduces the code when it is refactored especially if the old code had several duplications. It provides a way to view the minimal skeleton of the microservices when aimed at the appropriate scope and breadth. Even inter-service calls can be reduced with this principle.

 

Good microservices are not only easy to discover from their APIs but also easy to read from their documentation which can be autogenerated from the code with markdowns. Different tools are available for this purpose and both the approach of using microservices as well as the enhanced comments describing the APIs provide sufficient information for the documentation.

Wednesday, November 23, 2022

 

Part 1 of this article describes application modernization. This section deals with microservices architecture that suits application modernization very well. Microservices break away from the monolithic architecture which has been the norm in legacy systems for a while. Monolithic applications tend to grow indefinitely which also increases the complexity. Finding bugs and creating new features take a long time. If a part of the application needs to be updated, the whole application must be restarted which can mean a considerable down time for large systems. Monolithic applications are harder to deploy since some parts of the application might have different requirements. Some parts are computationally heavy, and others are memory heavy. The one-size-fits-all environment to satisfy all requirements of the application is usually expensive and suboptimal. They are not scalable.  A peak in traffic can lead to failures from various components. If the number of instances of the entire application is increased, it wastes resources. These systems do not evolve fast because they are locked in technology. The same programming language and framework must be used from the first to the last module.

Microservices are self-sufficient processes that can interact with other microservices to form a distributed application.  Generally, a ring of microservices is developed that are small independent services that have their own isolated environment with operating systems, databases and other support software.  Each microservice might be dedicated to a distinct set of resources that it supports with create, update and delete operations. They often use message passing via web requests to communicate with one another. Each microservice can be built with different programming languages and different environments depending on the requirements.

Microservices facilitate cross-functional team organization and based on business capabilities. This leads to faster delivery and higher quality due to testability and focused effort. This avoids the immense cross-team interactions from component-based software development. It also avoids developers from writing logic in the layer that is closest to them be it user-interface, service, or database.

Cloud service platforms have made operating and deploying microservices based applications easier and cheaper. It allows teams to build microservices using continuous integration and continuous delivery. The pipeline automates testing, building, developing, deploying and delivering the microservices. Updates to one microservice does not affect the others. But when a single microservice goes down, it can have a cascading effect on other services because they have high fault density. This is true also for components that grow in size. This is generally overcome by keeping microservices focused and small.

The reliability of microservices is dependent on the reliability of the communication between them. Http and protocol buffers are the communication protocols of choice. Development and deployment are also owned by the same team. This idea is inspired by Amazon’s “you build it, you run it” philosophy. The transition to microservices from legacy monolithic code is not straightforward. The functionalities must be separated beyond components. And in the process of doing so, we cannot risk regression. Tests become a way to scope out behavior at boundaries such as interface and class interactions.  Adequate coverage of tests will guarantee backward compatibility for the system as it is refactored. The microservices are independently testable both in terms of unit tests as well as end-to-end tests. The choice and design of microservices stem from the minimal functionalities that need to be separated and articulated. If the services don’t need to be refactored at a finer level, they can remain encapsulated in a singleton. 

The rule of thumb for the refactoring of the code is the follow up of the Don’t Repeat Yourself or (DRY) principle which is defined as “Every piece of knowledge must have a single, unambiguous, authoritative representation within a system”. This calls for every algorithm or logic that is cut and pasted for different usages to be consolidated at a single point of maintenance.  This improves flexibility because enhancements such as the use of a new data structure can be replaced in one place and it also reduces the bugs that come by when similar changes must be made in several places. This principle also reduces the code when it is refactored especially if the old code had several duplications. It provides a way to view the minimal skeleton of the microservices when aimed at the appropriate scope and breadth. Even inter-service calls can be reduced with this principle. 

- courtesy Kristian Tuusjärvi

Tuesday, November 22, 2022

Application Modernization:

 Software used by companies is critical to their business and will continue to provide return on investment. Companies will try to maximize this for as long as possible. Some maintenance is required to these software systems which satisfy business and customer needs and address technical debt that accrues over time. Maintenance works well for short term needs but as time progresses, the systems become increasingly complex and out of date. Eventually maintenance will no longer be efficient or cost-effective. At this point, modernization is required to improve the system’s maintainability, performance, and business value. It takes much more effort to accomplish compared to maintenance. If a software can no longer be maintained or modernized, it will need to be replaced. 

The risks of modernizing legacy systems primarily come from missing documentation. Legacy systems seldom have a complete documentation specifying the whole system with all its functions and use cases.  In most cases, the documentation is badly missing which makes it hard to rewrite a system that would function identically to the previous one. Companies usually couple their legacy software with their business processes. Changing legacy software can cause unpredictable consequences to the business processes that rely on it. The nature of replacing legacy systems with new ones is risky, since the new system can be more expensive on a total cost of ownership basis and there can be problems with its schedule of delivery. 

There are at least three strategies for dealing with legacy systems: scrap the legacy system, keep maintaining the system or replace the whole system. Companies generally have limited budgets on the legacy systems, so they want to get the best return on the investment. Scrapping the system can be an option if the value has diminished sufficiently. Maintenance can be opted into when it is cost-effective. Some improvement is possible by adding new interfaces to make the system easier to maintain. Replacement can be attempted when the support has gone, the maintenance is too expensive, and the cost of the new system is not too high. 

Both technical and business perspectives are involved. If a legacy system has low quality and low business value, the system should be removed. Those with low quality but high business value must be maintained or modernized depending on the expense.  Systems with high quality can be left running. 

Modernization is a more extensive process than maintenance because modernization often incorporates restructuring, functional changes, and new software attributes. Modernization can be either white-box or black-box depending on the level of abstraction. White box modernization requires a lot of information about the internals of the legacy system. Contrary to that, the black box modernization only requires external interfaces and compatibility. Replacement is an option when neither approach works. 

Software modernization is also an evolution of systems. White box systems are more popular than black box systems which might be counter-intuitive to the notion that black-box modernization is easier than white-box modernization. The tool for whitebox methods could have become better to help with the shift. Legacy systems are harder to integrate. Software integration allows companies to better control their resources, remove duplicate business rules, re-use existing software, and reduce cost of development. The effort needed to keep legacy systems running often takes resources away from other projects. Legacy systems also suffer from diminishing ownership and knowledge base which makes changes difficult to make. On the other hand, their business value makes them appear like rare diamonds even when they cost a lot. 

Monday, November 21, 2022

Data Modernization:

 


Data technologies in recent years has popularized both structured and unstructured storage. This is fueled by applications that are embracing cloud resources. The two trends are happening simultaneously and are reinforcing each other.

Data modernization means moving data from legacy databases to modern databases. It comes at a time when many databases are doubling their digital footprint. Unstructured data is the biggest contributor to this growth and includes images, audio, video, social media comments, clinical notes, and such others. Organizations have shifted from a data architecture based on relational enterprise-based data warehouses to data lakes based on big data. If the survey from IT spends is to be believed, a great majority of organizations are already on their way towards data modernization with those in the Finance service firms leading the way. These organizations reported data security planning as part of their data modernization activities. They consider the tools and technology that are available in the marketplace as the third most important reason in their decision making.

Drivers for one-time data modernization plan include security and governance, strategy and plan, tools and technology, and talent. Data modernization is a key component of, or reason for, migrating to the cloud. The rate of adoption of external services in the data planning and implementation is about 44% for these organizations.

The perceived obstacles to implementing data modernization include budget/cost constraints, lack of understanding of technology, lack of consensus among decision-makers, absence of clarity on success metrics, and such other causes. Cloud is already a dominant storage location for nine out of ten of these organizations and it is both a means and an important consequence. A majority of these organizations have all their important applications and data in the cloud. Application and data can be moved independently but many organizations are putting it on modernized platforms at the same time and moving them from on-premises to the cloud. Traditional IT architectures and on-premises data centers often come with their own cost concerns which makes cost a key driver of cloud migration. Those organizations that have combined cloud migration and data modernization could deliver on their strategic goals.

This leads to the assertion that almost all data management approaches will likely eventually be modernized and almost all data and applications will be in the cloud. Cloud migration and data modernization will continue to mutually reinforce each other. Since these two trends support and overlap each other, most companies will do well with both trends.

Sunday, November 20, 2022

Collaborative caching for multitenant solutions:

 With the case study for developing a multitenant solution for the delivery of user generated content to designated audience, different algorithms are required to tune the system for higher performance. These include collaborative caching, context aware streaming, user redirection, distribution tree etc. This section discusses one of them.

Content caches are strategically located through the network which support services with optimally distributing the content from its source to its consumers. Collaborative caching allows software to use hints to influence cache management. When the hints are about the inclusion property, optimal caching can be achieved if the access sequence and cache size are known beforehand. An example of such a hint could be a binary choice between Least Recently Used Algorithm or Most Recently Used Algorithm. Another way a hint could be provided is with a number encoding a property. The result is a new algorithm that can be described as priority LRU and captures the full range between MRU and LRU.

Hints are added by annotating each memory access with a numerical priority. Data accessed with a higher priority takes precedence than data accessed with a lower priority. Collaborative caches enable the caller to specify the importance and the system to determine the priority to manage the data. LRU/MRU used to infer this importance but with a sliding scale priority specified by the caller, a new type of inclusion – non-uniform inclusion becomes possible.

The word inclusion derives from the hierarchy of caches at the machine level namely L1, L2 and so on. The property states that the larger cache will always contain the content of the smaller cache. Generally, a stack data structure is used to visualize hits and misses in play. The stack distance is a useful metric derived from the stack simulations such as for LRU and denotes the amount of data accessed between consecutive re-uses of the same entry and suggests system locality. Data elements at the top c stack positions are the ones in a cache of size c. The stack position defines the priority of the stored data. All accessed data are ordered by their priority in a priority list. The stack distance gives the minimal cache size to make an access, a cache hit and is calculated by simulating such a cache of an infinite size. In the simple LRU case, the data is prioritized by the most recent access time.  The data in a MRU cache is also prioritized by the most recent access time but unlike LRU, the lowest priority is the data element with the most recent access time.  The LRU-MRU can also be mixed where a hint indicates whether the access is LRU or MRU.  Its stack distance would be computed by the stack-based algorithm by assigning the current access time as priority for LRU and corresponding negation for the MRU. The priority hint changes the default scheme.

The effects of the priority hints on cache management include 4 cases for cache hits and 2 cases for cache miss. Consider the access to w with a priority i, (w,i) arriving in a cache of size m, and the current stack position j. The change in priority leads to the item moved up, not at all or moved down in the stack of m items and the move is just conceptual pertaining only to the organization in the stack.

These cases include:

1.       1 <= i < j <= m and w is found in the cache which leads to hit up move where item moves up to position i and the entries from i to j-1 move one position lower.

2.       1 <= j  = i <= m and w is found in the cache which leads to no movement

3.       1 <= j < i <= m and w is found in the cache which leads to hit down move where the item moves down to position I and the entries j+1 to I move one position higher.

4.       1 <= j <= m < I and w is moved out of the cache and the entries from j+1 to m are moved one position higher. This is a voluntary eviction.

5.       j = infinity and 1 <=i and <= m and the accessed data element w is missed from the cache and moved into the position at I and all entries move one position lower. The lowest priority entry is evicted.

6.       J = infinity and i > m which results in a miss bypass and the entries in the cache are unaffected.

This way the priorities stated in the hints and those implied in the access sequence is reconciled.

Saturday, November 19, 2022

The case for disk quotas:

 Oracle Solaris ZFS demonstrated the use of resources and quotas. With the evolution towards cluster-based computing, Network Accessible Storage significantly widened the horizon for unlimited disk space because capacity could now be added in the form of additional nodes and their disks. This represented a virtualized storage but there were no reservations. While ZFS demonstrated effective resource management with isolations for workloads, the cluster would not keep up with the same practice without some form of roles in the form of control and data nodes. In addition, the reservations need not be governed exclusively by the user. They can be decided by system as quality-of-service levels. To achieve service levels, on the same shared disks, we can create disk groups.


There will be times when node disk groups become overloaded by I/O requests. At such time, it will be difficult to identify where the I/O requests are predominantly originating from so that those accounts could be throttled while well-behaved accounts are not affected. Each node disk group keeps track of accounts that issue the I/O requests. The system can then use a Sample-Hold algorithm to track the request rate history of the top N busiest accounts. This information can then be used to determine whether an account is well-behaved or not. If the traffic reduces when the account is throttled, it becomes well-behaved. If a node disk group is getting overloaded, it can use this information to selectively limit the incoming traffic, targeting accounts that are causing the issue. For an example of a metric to serve this purpose, a node disk group can compute a throttling probability of the incoming requests for each account by determining the request rate history for the account. If the request rate is high, it will have a higher probability of being throttled. The opposite also holds. When the metric builds up a history of measurements, it will be easier to tell if the account is well-behaved.  


Load balancing will continue to keep the servers loaded within an acceptable limit. If the access patterns cannot be load balanced, then there is probably high traffic. In such cases, the accounts will be limited, and they will be well-behaved again.  


A node may have more than one disk group so it can form logical partitions in its existing disk space while not requiring cluster level changes in providing service levels from its disks. This is the intelligent node model. If the cluster can perform more effectively by grouping data nodes or their disks and with the help of dedicated control nodes, then the data nodes are freed up to focus on the data path and merely read-write from disk groups. In either case the group id is merely a part of the metadata.
The word group is generally used with client-facing artifacts such as requests and the system resources such as disks are referred to as pools. By that definition, I’m sorry for using the word group and the above argument could be read as for disk pools.
 

Friday, November 18, 2022

 

This article picks up the previous discussion on the requirements from a multi-tenant solution for the delivery of user-generated content to designated consumers and delves into the architecture.

The architecture comprises a cloud infrastructure layer, a content service virtualization layer, and an Identity Access Management that provides authentication, authorization, and auditing. These can be integrated through an inter-cloud messaging bus that facilitates a distributed protocol for resource management, account management, content management and service orchestration.

The cloud infrastructure layer spans the public, private and personal cloud. The purpose of stretching a cloud infrastructure layer over hybrid IT resources is to form a dynamic content delivery service overlay. This overlay consists of a content distribution tree which provides 1) the required Quality-of-Service and 2) has a minimum cost. In addition to the standard storage and bandwidth required by the overlay, some computing resources are reserved for content processing, rendering and management functionalities. Each of these functionalities requires a dedicated application engine. The key module to pool all the required IT resources together is a distributed algorithm for cloud resource management. Any message-based consensus protocol could be used here such as Paxos or Raft*.

The content service virtualization layer is the one providing a suite of middleware components to support different application engines. These include:

Content distribution: which provides services for optimally distributing the content from its source to its consumers. These services are supported by content caches strategically located through the network. The distribution component. The distribution component also includes the actual mechanism and the protocols used for transmitting data over the underlying network and serving the content to the consumers.

Content processing: which provides services for sourcing and adapting content based on available bandwidth, user preferences and device capabilities. Modification and conversion also includes those for wireless devices.

Content storage: which includes modules that provision directories and artifacts for storing user generated content with security specified by the providers. The implementation of an S3 storage is a good analogy here. Functionalities include distributed secure storage, content naming, and resolution, and content replication.

Request routing: which provides services for navigating both clients’ requests to upload their contents and consumers requests to retrieve the content from the nearest and available location. The selection of this location depends on both the proximity and the availability of the system resources.

Service orchestration: which provides services for integrating different media services for internal and external service entry points. For example, a social TV application would draw a lot of individual media services such as a buddy service from social networking and a media streaming service.

The IAM and Service Bus are left out to be completed as appropriate for the above application engines.

Thursday, November 17, 2022

 

A case study for a multitenant solution for delivery of user-generated content to designated consumers

Content delivery networks do not solve the massive technical challenges of delivering media uploaded by users to specific groups of consumers. Media can include photos, videos, and podcasts, and are created by individuals, organizations, and small-to-medium businesses. The majority of these are of low interest to the general public, which runs counter to the principle of content delivery networks where content is for public consumption. The leading content delivery networks tailor their network architecture and operational structure toward popular content.

A multitenant solution that provides content delivery services can fulfill the requirements by virtue of isolation and scalability. When the implementation is designed for the cloud, it can become both elastic and, on a pay,-as-you-go model.   Such a service would redistribute content among edge servers and render them in a context-aware fashion.

YouTube and Facebook represent the content web platforms available for common use today. They have native support for social networking functionality but they do not provide private delivery services because all the content is aggregated together for delivery. Some web services support content delivery such as CloudFront and object storage as a cloud service but they do not support access management and segmentation requirements. Therefore, it is hard to find a solution that can address all of these requirements.

The essentials for a multitenant service that fulfills these requirements include the following:

1.       Media cloud – The media cloud is hybrid in nature. It can include on-premise appliances or public cloud services that support unconfigured content distribution with an underlying network infrastructure.  It exhibits resource virtualization in terms of computing, storage, and networking resources. For example, networks can be virtualized with point-to-site VPN connectivity or open-flow technologies.

2.       Content provider – When the content is published, a content-delivery request is submitted to a choice media service provider. Each request has the following parts:

a.       A list of locations for the content which could be private servers or storage space such as Amazon S3.

b.       A group of targeted consumers which could be a group of friends on a specific social networking platform or all the users in some geological location or within some subnet.

c.       A list of desired Quality-of-service metrics e.g. Bandwidth, delay, time jitter, security, and others.

d.        A time window during which their contents can be consumed.

3.       Workflow – Carving out a content delivery service from the underlying media cloud supports a virtual overlay that provides the required QoS with minimum cost. Storage, bandwidth, and compute required for delivery are reserved. Managerial routines can be delegated to compute that can be scaled up or down.

4.       Identity and access management to secure a targeted audience. This could be as simple as public-private key pairs to gain access.

When the virtual content delivery service is configured, contents are acquired from the storage locations and pushed into the set of edge servers from which a targeted audience can consume.

Reference: https://1drv.ms/w/s!Ashlm-Nw-wnWhLMfc6pdJbQZ6XiPWA?e=fBoKcN

 

 

 

Wednesday, November 16, 2022

Are multi-tenant SaaS Applications easy to maintain?

 The perception is that SaaS applications are maintenance free.  They certainly alleviate that concern from many customers, allows full scale across tenants who share application and database instances and remove deployment chores. Tenants can continue to download applications via the application store, but this does not impact one another. There is even cost reduction in overall application costs which makes the technology attractive to service providers.  The proper architectural choices can address numerous challenges such as resource sharing, configurability, shared instances, and scalability. The improper choices can significantly increase the costs for maintenance and operations.

Companies take the shortest path for multitenant systems as one where a single-tenant system is modified to be a multi-tenant system. They encounter two barriers that are:

·        The initial startup costs of re-engineering their existing single-tenant software systems into multi-tenant software systems.

·        The high configurability of the new systems which might even eliminate the maintenance advantage.

On the other hand, there are significant cost, the architectural choices could have been very different when the multitenant system is built from scratch. It follows that the new system design and the reengineering provide two different perspectives of the problem space and the tradeoffs between the choices.

The choices for application and database sharing include the following:

AD-  A dedicated application server is running for each tenant, and therefore, each tenant receives a dedicated application instance. 

AS – a single application server is running for multiple tenants and each tenant receives a dedicated application instance. 

AI – a single application server is running for multiple tenants and a single application instance is running for multiple tenants. 

DD – a dedicated database server is running for each tenant and therefore the database is also isolated. 

DS – a single database server is running for multiple tenants and each tenant gets an isolated database. 

DB – a single database server and a single database is being used for multiple tenants  

DC – a single database server is running for multiple tenants and data from multiple tenants is stored in a single database and in a single set of tables with same database schema but separation based on records. 

·        The challenges to overcome can be explained as:
Sharing of resources and higher than average hardware utilization, performance may be compromised. It must be ensured that all tenants get to consume resources. If one tenant clogs up resources, the performance of all other tenants may be compromised. This is specific to multitenancy and not single tenancy.  In a virtualized instances situation, this problem is solved by assigning an equal amount of resources to each instance. The solution may lead to very inefficient utilization of resources and may not suit all multitenant systems.

·        Scalability: When tenants share the same application and database, scalability suffers. An assumption with single tenancy is that tenants do not need more than one application or database but there are no such limitations that exist when placing multiple tenants on one server. Tenants from various geographies can use the same application which affects its scalability. In addition, geographies pose constraints, legislations, and regulations. For example, EU mandates that invoices sent from within the EU must be stored within the EU. Additional constraints can be brought by tenant to place all the data on the same server to speed up queries.

·        Security: When security is compromised, the risk for data stealing is high. In a multitenant environment, a security breach can result in the exposure of data to other, possibly competitive tenants. Data protection becomes an important challenge to tackle.

·        Zero-downtime: Introducing new tenants or adapting to changing business requirements of existing tenants brings along the need for constant growth and evolution of a multi-tenant system

Tuesday, November 15, 2022

Content reselling platform continued

 

Components:

Storage:

Media content is binary and immutable in nature. An S3 storage enables it to have web accessible address. Generating a presigned URL to upload an object enables the user interface to bypass the middle tier for uploads. This will significantly improve the time it takes to upload content. S3 acceleration can help with downloading by pointing to the closest mirror. The metadata and user information can be collected in a relational store for use by the webAPI.

When the application requires to serve the users to the scale of social engineering applications, Presto can be used to store the high-volume data in NoSQL stores and with the ability to bridge a SQL query over the data. Presto from Facebook is a distributed SQL query engine can operate on streams from various data source supporting ad-hoc queries in near real-time. It does not partition based on MapReduce and executes the query with a custom SQL execution engine written in Java. It has a pipelined data model that can run multiple stages at once while pipelining the data between stages as it becomes available. This reduces end to end time while maximizing parallelization via stages on large data sets.   
A data warehouse can be supported in the cloud in virtual data centers. It can support data ingestion in the form of JSON from data pipelines. The ability to perform queries over this warehouse follows the conventional Online Analytical Processing model and serves the reward points very well. While Presto does not dictate the data source, a data warehouse forms a veritable data store. Both can scale but there are cost-benefit ratios to consider when deploying custom stores via something offered from public clouds. 

WebAPI:

The APIs for Media and content publishing can simply be REST/GraphQL APIs for a resource named say Movie, which looks like the following: 

·        Storing movies in a database with user details such as  

o   Movie-form,  

o   user relationship,  

o   target URL to fulfil the reward from backend store, and  

o   active versus inactive state for movie

·        API to create/update/delete movies that includes: 

o   GET /api/v1/movie/ to list 

o   POST /api/v1/movie/ to upload

o   GET /api/v1/movie/:id/ to lookup  

o   PUT /api/v1/movie/:id/ to edit 

o   DELETE /api/v1/movie/:id to delete

·        List and implement movie types 

o   a name in the name.verb syntax. 

o   a payload to simply mirror the representation from the standard API. 

·        send hooks with POST to each of the target URLs for each matching movie 

o   compiling and POSTing the combined payload for the triggering resource and hook resource 

o   sending to known online retail stores with the Movie where X-Hook-Secret header has a unique string and one that matches what was issued by the backend retail store. 

o   confirming the hook legitimacy with a X-Hook-Signature header 

o   Handling responses like the 410 Gone and optionally retrying connection or other 4xx/5xx errors. 

 

Rules Engine:

A Rules Engine can be provided that introduces a callback as a URL for the Movie API servers to call which permits a dynamic policy for the translation of movies and the ability for the organization to set policies. 

·        Rule Execution Sets should be provided by the organization administrators 

o   Any complex rule involving nested if and else should be flattened to ordered list of if statements and appropriate conditions. 

o   Ordering of the conditions should be specified by the administrators. 

·        Exports or Objects returned by the Rule Engine should be published 

o   Multiple objects should be returned as a list 

o   These should be filtered by the object filter specified by the client 

o   Not include any sensitive information.