Tuesday, February 28, 2023

In continuation of a set of types of problems in fleet management science, Rail and inter-modal transportation have several noteworthy approaches to solutions.

Several interesting algorithms have been proposed for the Rail and Intermodal problem space. This is a complex system composed by different transport networks, infrastructures, different transport means and operators, such as dryage operators, terminal operators, network operators and others. Intermodal means there are lots of decision makers who must work in a coordinated manner for the system to run smoothly. If intermodal transport is to be developed, it will require more decision-making support tools to assist the decision-makers and stakeholders.

When train operations are perturbed, a new conflict free timetable must be recomputed such that the deviation from the original must be minimized. This scheduling problem is modeled with an alternative graph formulation and a branch and bound algorithm is developed. Some approaches use an integrated framework which deals with signal layout optimization, train scheduling optimization at microscopic level, and others.

Heuristic approaches include a look-ahead greedy heuristic and a global neighborhood search algorithm, in terms of railway total train delay. Scheduling additional train services to be integrated into the current timetables is a problem that is modeled as a hybrid job shop scheduling techniques that operate upon a disjunctive graph model of trains.

One approach presented a two phased train set routing algorithm to cover a weekly train timetable with minimal working days of a minimal number of train sets. The first step involves relaxing maintenance requirements and obtaining minimum cost routes by solving the polynomial relaxation. Then maintenance-feasible routes are generated from the crossovers of the minimum cost routes. This pragmatic approach seems particularly effective for the high-speed railways system which is simpler, with fewer end stations and higher frequency of trains.

Corman et al addressed the problem of train conflict detection and resolution that became quite popular among traffic controllers. This approach proposed a family of techniques referred to as the Railway Optimization by Means of Alternative graphs and it involved effective rescheduling algorithms and local rerouting strategies in a tabu search scheme. A fast heuristic and a truncated branch and bound algorithm are alternate for computing train schedules within a short computation time. The effectiveness of using different neighborhood structures can be investigated for train rerouting. Tabu search is known to be faster than ROMA. Another approach proposed a train slot selection model based on multicommodity network flow concepts for determining freight train timetables for scheduling rail services along multiple interconnected routes. The model seeks to minimize operating costs incurred by carriers and delays incurred by shippers while ensuring that the schedules and demands are mutually consistent.

Monday, February 27, 2023

 

Another problem space that defines fleet management in its own way is Rail and intermodal transport. This is a complex system composed by different transport networks, infrastructures, different transport means and operators, such as dryage operators, terminal operators, network operators and others. Intermodal means there are lots of decision makers who must work in a coordinated manner for the system to run smoothly. If intermodal transport is to be developed, it will require more decision-making support tools to assist the decision-makers and stakeholders.

For example, the operational level involves the day-to-day management decisions about the load order of trains and barges, redistribution of railcars or push barges and load units. The fleet comprises of load units. The assignment of a set of trailers and containers to the available flatcars that can move the equipment is a classic problem in this space. Routing involves a lot  more than that. While the minimum cost path algorithm was common to road network, in this case, the routing decision is a mere modal choice problem for specific trajectories between beginning and end points and involving specific freight volumes and specific time constraints.

Container transportation is a major component of intermodal transportation carried out by a combination of truck, rail and ocean shipping. Fleet management covers  the whole range of planning and management issues from procurement of power units and vehicles to vehicle dispatch and scheduling of crews and maintenance operations. But rail transport is characterized by different kinds of trains travelling on the network and the subdivision of resources between passenger and freight trains. The train scheduling and routing must consider their timetables.

When train operations are perturbed, a new conflict free timetable must be recomputed such that the deviation from the original must be minimized. This scheduling problem is modeled with an alternative graph formulation and a branch and bound algorithm is developed. Some approaches use an integrated framework which deals with signal layout optimization, train scheduling optimization at microscopic level, and others.

Sunday, February 26, 2023

 

Another problem space that defines fleet management in its own way is Maritime support. This is not restricted to sea borne transportation and includes water-borne transportation as well. Ships and ports, their logistics and containers management and interconnections among vessels characterize this field.

There are three modes of transportation in maritime – liner, industrial and tramp. Adjustments to fleet size and mix, fleet deployment, ship routing and scheduling are some of the tactical problems. As with the airline transportation discussed earlier, the issue of robustness needs to be addressed here as well as disruptions are common. Therefore, robustness is factored into the optimization models used for planning.

Methodologies used towards solutions include input parameters, deterministic models that incorporate penalties and stochastic optimization models. One of the established techniques uses fleet composition and mix routing while another uses decision support methodology for strategic planning in tramp and industrial shipping. This involves simulation and optimization, where a Monte-Carlo simulation framework is built around an optimization-based decision support system and short term routing and scheduling using a rolling horizon principle where information is revealed as time goes by. This helps with a wide range of strategic planning problems.

When it comes to ferry scheduling, the analogy from public transit system discussed earlier hold true. The common objective is the minimization of costs and maximization of customer satisfaction. Decreasing the operational costs and reducing the travel time and waiting time for passengers requires reevaluation and improvements to the ferry schedule. There is a ‘logit’ model that determines’ the passengers’ service choices. This is used by the formulation to determine the best mixed-fleet operating strategy, including interlining schemes, so as to minimize the objective function that combines both the operator and passengers’ performance measures.

Saturday, February 25, 2023

 Another class of problems in fleet management aside from those discussed, is the one concerning air transport. This is characterized by network design and schedule construction, fleet assignment, aircraft routing, crew scheduling, revenue management, irregular operations, air traffic control and ground delay programs, gate assignment, fuel management, short term fleet assignment swapping. They were mostly solved by operation research techniques and the majority of applications utilized network-based models. 

The airline scheduling process is carried out sequentially so that flight, aircraft and schedules are created one after another over several months prior to the day of the operations. A detailed flight schedule might be based on marketing decisions. The first step in operational scheduling is the assignment of an aircraft fleet type to each flight and is based on the demand forecasts, the capacity and the availability of the aircrafts. After fleet assignment, an aircraft is assigned to each flight with respect to maintenance constraints such as aircraft routing. Crew scheduling can be broken down into two steps. The first phase is called crew pairing and it involves anonymous crew itineraries subject to constraints such as maximum allowed working time or flying time per duty. The second phase is crew rostering, and it involves assigning individual crew members to the itineraries. The goal of this scheduling process is to reduce costs. 

Fleet routing and fleet scheduling also affect costs but it determines the airline’s level of service and its competitive capability in the market. Network flow techniques are adopted for modeling and solving such complex mathematical problems. The full optimization problem can be hard so they are solved in parts sequentially. The output of one is input to the next. 

The limitations of the sequential approach were subsequently solved with an integrated approach that reduces costs even more. 

The fleet assignment problem deals with assigning aircraft types, each having a different capacity to the scheduled flights, based on equipment capabilities and availability, operational costs and potential revenues. When there are many flights each day, this problem becomes difficult. Some remediations include: 1) integrating the FAP with other decision processes such as schedule design, aircraft maintenance routing, and crew scheduling, 2) proposing solution techniques that introduces additional parameters and constraints into the traditional fleeting models, such as itinerary based demand forecasts and the recapture effect and 3) studying dynamic fleeting mechanisms that update the initial fleeting solution as departures approach and more information is gathered on demand patterns. In a few models, a non-linear integer multi-commodity network flow is formulated, and new branch-and-bound strategies are developed. 

Traffic disruptions are one characteristic of this problem space. This might lead to an infeasible aircraft and crew schedules on the day of the operations and the recovery to reasonable schedule must be attempted. The short-term recovery actions might increase operational costs, sometimes even higher than the planned costs. Recovery options could be factored into the scheduling at the design time and this approach is generally called robust scheduling. Sometimes this is articulated as a measure. For example, a non-robustness measure is used to penalize restricted aircraft changes according to the slack time during an aircraft change. 

Global stochastic models have been attempted to be solved with an iterative approach. The iterative approach yields a set of different solutions regarding the trade-offs between the costs and robustness whereas an integrated approach returns mostly one near-optimal solution for a given robustness penalty. Iterative approach is more favorable to a decision maker. When multiple airlines must coordinate, the models are formulated as multiple commodity network flow problems which can be solved by programs based on mathematical formulations. 

 

Friday, February 24, 2023

 Another class of problems in fleet management aside from those discussed, is the one concerning air transport. This is characterized by network design and schedule construction, fleet assignment, aircraft routing, crew scheduling, revenue management, irregular operations, air traffic control and ground delay programs, gate assignment, fuel management, short term fleet assignment swapping. They were mostly solved by operation research techniques and the majority of applications utilized network-based models.

The airline scheduling process is carried out sequentially so that flight, aircraft and schedules are created one after another over several months prior to the day of the operations. A detailed flight schedule might be based on marketing decisions. The first step in operational scheduling is the assignment of an aircraft fleet type to each flight and is based on the demand forecasts, the capacity and the availability of the aircrafts. After fleet assignment, an aircraft is assigned to each flight with respect to maintenance constraints such as aircraft routing. Crew scheduling can be broken down into two steps. The first phase is called crew pairing and it involves anonymous crew itineraries subject to constraints such as maximum allowed working time or flying time per duty. The second phase is crew rostering, and it involves assigning individual crew members to the itineraries. The goal of this scheduling process is to reduce costs.

Fleet routing and fleet scheduling also affect costs but it determines the airline’s level of service and its competitive capability in the market. Network flow techniques are adopted for modeling and solving such complex mathematical problems. The full optimization problem can be hard so they are solved in parts sequentially. The output of one is input to the next.

The limitations of the sequential approach were subsequently solved with an integrated approach that reduces costs even more.

The fleet assignment problem deals with assigning aircraft types, each having a different capacity to the scheduled flights, based on equipment capabilities and availability, operational costs and potential revenues. When there are many flights each day, this problem becomes difficult. Some remediations include: 1) integrating the FAP with other decision processes such as schedule design, aircraft maintenance routing, and crew scheduling, 2) proposing solution techniques that introduces additional parameters and constraints into the traditional fleeting models, such as itinerary based demand forecasts and the recapture effect and 3) studying dynamic fleeting mechanisms that update the initial fleeting solution as departures approach and more information is gathered on demand patterns. In a few models, a non-linear integer multi-commodity network flow is formulated, and new branch-and-bound strategies are developed.

Traffic disruptions are one characteristic of this problem space. This might lead to an infeasible aircraft and crew schedules on the day of the operations and the recovery to reasonable schedule must be attempted. The short-term recovery actions might increase operational costs, sometimes even higher than the planned costs. Recovery options could be factored into the scheduling at the design time and this approach is generally called robust scheduling. Sometimes this is articulated as a measure. For example, a non-robustness measure is used to penalize restricted aircraft changes according to the slack time during an aircraft change.

Global stochastic models have been attempted to be solved with an iterative approach. The iterative approach yields a set of different solutions regarding the trade-offs between the costs and robustness whereas an integrated approach returns mostly one near-optimal solution for a given robustness penalty. Iterative approach is more favorable to a decision maker. When multiple airlines must coordinate, the models are formulated as multiple commodity network flow problems which can be solved by programs based on mathematical formulations.

 

Thursday, February 23, 2023

 

In continuation of a set of types of problems in fleet management area, Urban public transport and dial-a-ride are more recent.

Urban public transport deserves to be called a class of problems by itself. It consists of determining ways to provide good quality of service to passengers with finite resources and operating costs. Their planning process often involves 1. Network route design, 2. Frequency setting and timetable development, 3. Vehicle scheduling and 4. Crew scheduling and rostering. Some state-of-the-art models involve tuning the routing and scheduling with minimization of passenger cost functions. Metaheuristics schemes that combine simulated annealing, tabu, and greedy search methods serve this purpose. One of the distinguishing features of this problem space is that customers often formulate two requests per day, specifying an outbound request from pick-up to drop-off and an inbound request for the round trip. Another feature is that the quality of service needs to be maximized while minimizing operating costs incurred to satisfy all the requests.

Dial-a-ride transport is a move toward more economical and greater flexibility of transport services. Demand responsive transportation systems require the planning of routes and customer pick-up and drop-off scheduling on the basis of received requests. It must deal with multiple vehicles with limited capacity and time-windows. The problem of working out optimal routes and times is referred to as the Dial-a-ride problem. As with many problem spaces in fleet management, this can be treated as NP-hard combinatorial optimization problem. Attempts to develop an optimal solution has been limited to simple and small-size problems.

Such a service may operate in a static or dynamic mode. In the static settings, all the customer requests are known beforehand, nd the system solves a tour each vehicle must make within the constraints of the pick-up and drop-off time window and minimizing the solution cost. In the dynamic mode, the customer requests arrive over time to a control station and the solution may change over time. Processing must also keep up with the incoming rate without interfering with the optimization cycle at the end of the service. The goal is two-fold: reduce overall costs and improve quality of service to customers. Several algorithms have been tried for this purpose. They include tabu search heuristics, dynamic programming, branch and cut, a heuristic two phase solution method, genetic algorithms and variable neighborhood search.

Even the users can be differentiated as well as the transportation modes. Some meta-heuristics such as vehicle waiting time with passenger onboard can also be used with branch-cut algorithms for this purpose.

Studying quality of service in this context has evolved into models which use various measurement scales. The quality of service provided by organizations also depend on the type of organization and the operational rules used.

Wednesday, February 22, 2023

 Fleet Management continued...

The need for fleet management arose from the requirements of passengers and freight transportation services. Usually, their fleet is considered heterogeneous because it includes a variety of vehicles. Some of the fleets must perform tasks that may be known beforehand or are done repetitively. Most of them respond to demand. The scale and size of the fleet can be massive.

Vehicle routing and scheduling is one such class of problems. A fleet of vehicles with limited capacity based at one or several depots must be routed serving a certain number of customers to minimize the number of routes, total traveling time and the distance traveled. Additional restrictions can specialize this class of problems with time windows where each customer is served in a specified time interval. This class of problems is central to the field of transportation, distribution, and logistics.

Dynamic fleet management is another class of problems. While classical fleet management problems address routing and scheduling plans, unforeseen events might force additional requirements. When communication is leveraged to get this additional information, real-time usage of fleet resources can be improved. The changes in vehicle location, travel time and customer orders can be used with an efficient re-optimization procedure for updating the route plan as dynamic information arrives. When reacting to real-time events leaves no time, it can be worked around by finding ways to anticipate future events in an effective way. Data processing and forecasting methods, optimization-simulation models, and decision heuristics can be included to improve comprehensive decision-support systems.

Another field of increasing interest is the urban freight transportation and the development of new organizational models for management of freight. As for any complex systems, city logistics transportation systems require planning at strategic, tactical, and operational levels. While wide area road networks require routing based on distances, that within the city logistics network demands time-dependent travel times estimates for every route section. While static approaches are well studied, time-dependent vehicle routing still appears to be unexplored. One of the ways to bridge this gap has been to use an integration framework that brings dedicated systems together for a holistic simulation that performs something like a dynamic router and scheduler.

Urban public transport deserves to be called a class of problems by itself. It consists of determining ways to provide good quality of service to passengers with finite resources and operating costs. Their planning process often involves 1. Network route design, 2. Frequency setting and timetable development, 3. Vehicle scheduling and 4. Crew scheduling and rostering. Some state-of-the-art models involve tuning the routing and scheduling with minimization of passenger cost functions. Metaheuristics schemes that combine simulated annealing, tabu, and greedy search methods serve this purpose. One of the distinguishing features of this problem space is that customers often formulate two requests per day, specifying an outbound request from pick-up to drop-off and an inbound request for the round trip. Another feature is that the quality of service needs to be maximized while minimizing operating costs incurred to satisfy all the requests.

Tuesday, February 21, 2023

Fleet Management

 

The need for fleet management arose from the requirements of passengers and freight transportation services. Usually, their fleet is considered heterogeneous because it includes a variety of vehicles. Some of the fleets must perform tasks that may be known beforehand or are done repetitively. Most of them respond to demand. The scale and size of the fleet can be massive.

The complexity is clearer in the case of public transport which usually has a scheduled transportation network. They use techniques and ideas from mathematics as well as computer science. Tools and concepts include graph and network algorithms, combinatorial optimizations, approximations and online algorithms, stochastic and robust optimization. Newer models and algorithms can improve the productivity of resources, efficiency, and network capacity. One of the ways to do that has been to leverage a database and use parameterized queries. The order of the data in the database provides just the right framework for the query methods to return an accurate and complete set of results. The results might differ on consistency levels, responsiveness and coverage depending on whether the relational, batch or streaming mode was used.

When the transportation problems were modeled, they were often treated as combinatorial optimization problems which included vehicle routing, scheduling, and network design. These are notoriously difficult to solve, even in a static context. This led to the need for a human dispatcher in many fleet management scenarios. Emergence of powerful computing including meta-heuristics, distributed and parallel computing has now made that somewhat easier. One of the main challenges is the need to handle dynamic data.

Vehicle routing and scheduling is one such class of problems. A fleet of vehicles with limited capacity based at one or several depots must be routed serving a certain number of customers to minimize the number of routes, total traveling time and the distance traveled. Additional restrictions can specialize this class of problems with time windows where each customer is served in a specified time interval. This class of problems is central to the field of transportation, distribution, and logistics.

Mathematical formulations of this class of problems have bounded certain parameters and changed the criteria to obtain approximate solutions instead of optimal ones because the class of problems is inherently an NP-hard problem. In the last fifteen years, an incremental amount of metaheuristic algorithms has been designed. These include simulated annealing, genetic algorithms, artificial neural networks, tabu search, ant colony optimization, Greedy Randomized adaptive search procedure, Guided local search and variable neighborhood search along with several hybrid techniques. Local search is the most frequently used heuristic technique for solving combinatorial optimization problems. Sequential search is a general technique for the efficient exploration of local search neighborhoods. One of its key concepts is the systematic decomposition of moves, which allows pruning options within the local search based on associated partial gains.

Monday, February 20, 2023

 

One of the benefits of migrating workloads to the public cloud is the savings in cost. There are many cost management functionalities available from the AWS management console but this article focuses on the a pattern that works well across many migration projects.

This pattern requires us to configure user-defined cost allocation tags. For example, let us consider the creation of detailed cost and usage reports for AWS Glue Jobs by using AWS cost explorer. These tags can be created for jobs across multiple dimensions and we can track usage costs at the team, project or cost center level. An AWS Account is a prerequisite. AWS Glue jobs uses other AWS Services to orchestrate ETL (Extract, Transform and Load) jobs to build data warehouses and data lakes. Since it takes care of provisioning and managing the resources that are required to run our workload, the costs can vary. The target technology stack comprises of just these AWS Glue Jobs and AWS Cost Explorer.

The workflow includes the following:

1.       A data engineer or AWS administrator creates user-defined cost-allocation tags for the AWS Glue jobs

2.       An AWS administrator activates the tags.

3.       The tags report metadata to the AWS Cost Explorer.

The steps in the path to realize these savings include the following:

1.       Tags must be added to an existing AWS Glue Job

a.       This can be done with the help of AWS Glue console after signing in.

b.       In the “Jobs” section, the name of the job we are tagging must be selected.

c.       After Expanding the advanced properties, we must add new tag.

d.       The key for the tag can be a custom name and the value is optional but can be associated with the key.

2.       The tags can be added to a new AWS Glue Job once it has been created.

3.       The administrator activates the user-defined cost allocation tags.

4.       The cost and usage reports can be created for the AWS Glue Jobs. These include:

a.       Selecting a cost-and-usage report from the left navigation pane and then creating a report.

b.       Choosing “Service” as the filters and applying them. The tags can be associated with the filters.

c.       Similarly, team can be selected and the duration for which the report must be generated can be specified.

This pattern is repeatable for cost management routines associated with various workloads and resources.

Sunday, February 19, 2023

 Migrating remote desktops 

Most migrations discuss workloads and software applications. When it comes to users, identity federation is taken as the panacea to bring all users to the cloud. But migrating remote desktops for those users is just as important for those users when they need it. Fortunately, this comes with a well-known pattern for migration. 

Autoscaling of virtual desktop infrastructure (VDI) is done by using NICE EnginFrame and NICE DCV Session Manager. NICE DCV is a high performance remote display protocol that helps us stream remote desktops and applications from any cloud or data center to any device, over varying network conditions. When used with EC2 instances, NICE DCV enables us to run graphics-intensive applications remotely on EC2 instances and stream their user interfaces to commodity remote client machines. This eliminates the need for expensive dedicated workstations and the need to transfer large amounts of data between the cloud and client machines. 

The desktop is accessible through a web-based user interface. The VDI solution provides research and development users with an accessible and performant user interface for submitting graphics-intensive analysis requests and reviewing results remotely 

The components of this VDI solution include: VPC, public subnet, private subnet, an EngineFrame Portal, a Session Manager Broker, and a VDI Cluster that can be either Linux or Windows. Both types of VDI Clusters can also be attached side by side via an Application Load Balancer. The user connects to the AWS Cloud via another Application Load Balancer that is hosted in a public subnet while all the mentioned components are hosted in a private subnet. Both the public and the private subnets are part of a VPC. The users request flows through the Application Load Balancer to the NICE EngineFrame and then to the DCV Session Manager. 

There is an automation available that creates a custom VPC, public and private subnets, an internet gateway, NAT Gateway, Application Load Balancer, security groups, and IAM policies. CloudFormation is used to create the fleet of Linux and Windows NICE DCV servers. This automation is available from the elastic-vdi-infrastructure GitHub repository. 

The steps to take to realize this pattern are listed below: 

  1. The mentioned code repository is cloned. 

  1. The AWS CDK libraries are installed. 

  1. The parameters to the automation script are updated. These include the region, account, key pair, and optionally the ec2_type_enginframe and ec2_type_broker and their sizes 

  1. The solution is then deployed using the CDK commands 

  1. When the deployment is complete, there are two outputs: Elastic-vdi-infrastructure and Elastic-Vdi-InfrastruSecretEFadminPassword 

  1. The fleet of servers is deployed with this information 

  1. The EnginFrame Administrator password is retrieved and the portal is accessed. 

  1. This is then used to start a session. 

This completes the pattern for migrating the remote desktops for users. 

Saturday, February 18, 2023

Extending datacenters to the public cloud:

 


A specific pattern used toward hybrid computing involves extending datacenters to the public cloud. Many companies have significant investments in their immovable datacenters and while they can create a private cloud such as a VMWare cloud within the public cloud, they might find it costly to maintain both an on-premise cloud and one on the public cloud. A reasonable approach between these choices is to extend the existing datacenters to the public cloud. This article explores this pattern.

 

Although technology products are not referred to by their brands or product names in a technical discussion of an architectural pattern, it simplifies this narrative by providing a specific example of the technology discussed. Since many technological innovations are patented, it’s hard to refer to them without using product names. In this case, we use the example of a private cloud with VMWare cloud and refer to its products for manageability. A VMWare vCenter is a centralized management utility that can manage virtual machines, hosts, and dependent components. VMWare vSphere is VMWare’s virtualization platform, which transforms datacenters into aggregated computing infrastructures that include CPU, storage, and networking resources.

The pattern to extend the datacenter to VMWare Cloud on AWS uses Hybrid Linked Mode. Inventories in both places can be managed through a single VMWare vSphere Client interface. This ensures consistent operations and simplified administration and uses a VMWare Cloud Gateway Appliance. It can be used to manage both applications and virtual machines that are on-premises.

There are two mutually exclusive options for configuration. The first option installs the Cloud Gateway Appliance and uses it to link from the on-premises vCenter server to the cloud SDDC. The second option configures Hybrid Linked Mode from the cloud SDDC. The Hybrid Linked Mode can only connect one on-premises vCenter Server Enhanced Linked Mode domain and supports on-premises vCenter Server running more recent versions. When a cloud gateway appliance is connected to the Hybrid Linked Mode, there can be multiple vCenter Server connected to the appliance but when the cloud SDDC is directly connected to the Hybrid Linked Mode, there can be only one vCenter Server.

Different workloads can be migrated using either a cold migration or a live migration with VMWare vSphere vMotion. Factors that must be considered when choosing the migration method include virtual switch type and version, the connection type to the cloud SDDC, and the virtual hardware version.

A cold migration is appropriate for virtual machines that experience downtime. These virtual machines can be shut down, migrated and then powered back on.  The migration time is faster because there is no need to copy active memory. This holds true for applications as well.  A live migration, on the other hand, uses vMotion to perform rolling migration without downtime and is advisable for mission critical applications. The idea behind vMotion is that a destination instance is prepared and made ready and the switching from source to destination happens near instantaneously.

This pattern establishes promotes the visibility of existing infrastructure to the cloud.

IT organizations building a presence in the cloud have a lot in common with the datacenter operations for a private cloud. There used to be a focus primarily on the agile and flexible infrastructure which became challenging with the distributed nature of the applications deployed by the various teams within the company. The operations of these application stacks evolved with the tools that transformed how IT operates but these organizations continued to be measured by the speed, simplicity, and security to support their business objectives.

 

The speed is a key competitive differentiator for the customers of any infrastructure – either on-premises or in the cloud. The leveraging of datacenter locations as well as the service centric cloud operations model has become mission critical. Fueled by the transformations in the work habits of the workforce to work from anywhere at any time, the business resiliency and agility depended on a connective-fabric network.

 

The network connects the on-premises, cloud, and edge applications to the workforce, and it is a multi-disciplinary effort among NetOps, SecOps, CloudOps, and DevOps teams. Each one has a perspective into building the infrastructure and the tools that manage where the workloads are run, the service level objectives defining the user experience, and implementation of zero trust security to protect vital business assets.

 

Enablement of these teams requires real-time insights usually delivered with an automation platform. Both the cloud and the datacenter operations can be adapted to the new normal of shifting workloads and distributed workforces. Delivering a consistent simplified experience to the teams with such a platform, empowers them to align and collaborate more efficiently than before. Architectural patterns and manageability interfaces that unify and simplify these administrative routines are more than welcomed given the scale of the inventory.

 

Some datacenter automations can be fabric agnostic but they all must have some common characteristics. These include providing a unified view into proactive operations with continuous assurance and actionable insights, an orchestrator to coordinate activities, and a seamless access to network controllers and third-party tools or services. The orchestrator can also enforce policies across multiple network sites and enable end-to-end automation across datacenter and networks. A dashboard offers the ability to view all aspects of management through a single pane of glass.  It must also define multiple personas to provide role-based access to specific teams.

 

Some gaps do exist between say NetOps to DevOps which can be bridged with a collaborative focal point that delves into integration such as with ticketing frameworks for incident management, mapping compute, storage, and network contexts for monitoring, identifying bottlenecks affecting workloads, and consequent fine-tuning.

 

Automation also has the potential to describe infrastructure as a code, or infrastructure as a resource or infrastructure as a policy. Flexible deployment operations are required throughout. Complexity is the enemy of efficiency and tools, and processes must be friendly to the operators. Automation together with analytics can enable them to respond quickly and make incremental progress towards their goal.


Friday, February 17, 2023

 

As enterprises and organizations survey their applications and assets to be moved to the cloud, one of the often-overlooked processes involve the entrenched and almost boutique build systems that they have invested in over the years. The public clouds advocate the use of cloud native DevOps pipeline and automations that work well for new repositories and small projects but when it comes to billion dollars plus revenue generating source code assets, the transition of build and deployment to cloud become surprisingly challenging.

New code projects and businesses can start out with a code repository in GitHub or GitHub Enterprise with files conforming to the 100MB limit and the repository sizes  conforming to the 5GB limit.  When we start clean, on the Cloud based DevOps, managing the inventory to retain only text in the source and move the binaries to an object storage is easy. When the enterprises have accrued massive repositories over time, even a copy operation becomes difficult to automate. What used to be robocopy on windows involving large payloads, must now involve a transfer over S3.

One of the first challenges in the movement of build and capacity planning infrastructure to the cloud is to prepare the migration. External dependencies and redundancies can cause these repositories to become very large, not to mention branches and versions. Using a package manager or their equivalents to separate out the dependencies into their packages can be helpful to their reusability. Bundler, Node’s package manager and Maven are testament to this effect. Object storage or Artifactory and their equivalents can store binary data and executables. Backup and restore can easily be added from Cloud Services when they are not configurable via the respective cloud services.

Another challenge is the proper mapping of infrastructure to handle such large processes involved in Continuous Integration and Continuous Deployment. GitHub Enterprise can provide up to 32 cores and 50000 minutes/month for public repositories of sizes up to 50GB. The cloud, on the other hand, is limitless compute, storage and networking, all with the convenience of pay-as-you-go billing. If there is effective transformation of DevOps automations, both the infrastructure required and the automations they support become easier to host in the cloud. As with the first challenge, the ability to take stock of the inventory for infrastructure resources and automation logic becomes daunting. Consequently, some form of organization and nomenclature to divide up the inventory into sizeable chunks can help with the transformation and even parallelization

A third challenge involved is environmental provisioning and manual testing. Subscriptions, resource groups, regions and configurations proliferate in the cloud when such DevOps are transformed and migrated. These infrastructure and state become veritable assets to guard just the same way as the source that are delivered with the DevOps. From importing and exporting these infrastructure-as-a-code templates as well as their states and forming blueprints that can include policies and reconcile the state become a necessity. A proper organization and naming convention are needed for these as well.

Other miscellaneous challenges include but are not limited to forming best practices and centers of excellence, creating test data, providing manual deployments and overrides, ensuring suppliers, determining governance, integrating an architecture for tools say in the form of runbooks, manual releases, determination of telemetry, determining teams and managing accesses, supporting regulatory compliance, providing service virtualization, and  providing education for special skillsets. In addition, managing size and inconsistencies, maintaining the sanctity as a production grade system and providing an escalation path for feedback and garnering collaboration across a landscape of organizations and teams must be dealt with.

Finally, people, process and technology must come together in a planned and streamlined manner to make this happen. These provide a glimpse of the roadmap towards the migration of build and deployments to the cloud.

Thursday, February 16, 2023

 

One of the architectural patterns for application migration is about managing AWS Service Catalog products in multiple AWS Accounts and AWS Regions. AWS Service Catalog is used to create, share organize and govern the curated IaC templates. Governance and distribution of Infrastructure is simplified and accelerated. AWS uses CloudFormation Templates to define a collection of AWS resources aka stacks required for a solution or a product. StackSets extend this functionality by enabling us to create, update or delete stacks across multiple accounts and AWS Regions with a single operation.

If a CloudFormation template must be made available to other AWS accounts or organizational units, then the portfolio is typically shared. A portfolio is a container that includes one or more products. 

On the other hand, this architectural pattern is an alternative approach that is based on AWS CloudFormation StackSets. Instead of sharing portfolio, we use AWS StackSet constraints to set AWS regions and accounts where the resources can be deployed and used. This approach helps to provision the Service Catalog products in multiple accounts, OUs and AWS Regions, and managed from a central location which meets governance requirements.

The benefits of this approach are the following:

1.       the product is provisioned and managed from a primary account, and not shared with other accounts.

2.       This approach provides a consolidated view of all provisioned products (stacks) that are based on a specific set of templates.

3.       The use of a primary account makes the configuration with AWS Service management Connector easier

4.       It is easier to query and use products from the AWS Service Catalog.

The architecture involves an AWS management account and a target Organizational Unit (OU) account. The CloudFormation template and the service catalog product are in the management account.  The CloudFormation stack and its resources are in the target OU account. The user creates an AWS CloudFormation template to provision AWS resources, in JSON or Yaml format. The CloudFormation template creates a product in AWS Service Catalog, which is added to a portfolio. The user creates a provisioned product, which creates CloudFormation stacks in the target accounts. Each stack provisions the resources specified in the CloudFormation templates.

The steps to provision products across accounts include: 1. Creating a portfolio say with the AWS command line interface 2. Create the template that describes the resources, 3. Create a product with version title and description and 4. Apply constraints to the portfolio to configure product deployment options such as multiple AWS accounts, regions and permissions and 5. Provide permissions to users so that they can launch the products in the portfolio.