Cluster computing

Wednesday, February 22, 2023

Fleet Management continued...

The need for fleet management arose from the requirements of passengers and freight transportation services. Usually, their fleet is considered heterogeneous because it includes a variety of vehicles. Some of the fleets must perform tasks that may be known beforehand or are done repetitively. Most of them respond to demand. The scale and size of the fleet can be massive.

Vehicle routing and scheduling is one such class of problems. A fleet of vehicles with limited capacity based at one or several depots must be routed serving a certain number of customers to minimize the number of routes, total traveling time and the distance traveled. Additional restrictions can specialize this class of problems with time windows where each customer is served in a specified time interval. This class of problems is central to the field of transportation, distribution, and logistics.

Dynamic fleet management is another class of problems. While classical fleet management problems address routing and scheduling plans, unforeseen events might force additional requirements. When communication is leveraged to get this additional information, real-time usage of fleet resources can be improved. The changes in vehicle location, travel time and customer orders can be used with an efficient re-optimization procedure for updating the route plan as dynamic information arrives. When reacting to real-time events leaves no time, it can be worked around by finding ways to anticipate future events in an effective way. Data processing and forecasting methods, optimization-simulation models, and decision heuristics can be included to improve comprehensive decision-support systems.

Another field of increasing interest is the urban freight transportation and the development of new organizational models for management of freight. As for any complex systems, city logistics transportation systems require planning at strategic, tactical, and operational levels. While wide area road networks require routing based on distances, that within the city logistics network demands time-dependent travel times estimates for every route section. While static approaches are well studied, time-dependent vehicle routing still appears to be unexplored. One of the ways to bridge this gap has been to use an integration framework that brings dedicated systems together for a holistic simulation that performs something like a dynamic router and scheduler.

Urban public transport deserves to be called a class of problems by itself. It consists of determining ways to provide good quality of service to passengers with finite resources and operating costs. Their planning process often involves 1. Network route design, 2. Frequency setting and timetable development, 3. Vehicle scheduling and 4. Crew scheduling and rostering. Some state-of-the-art models involve tuning the routing and scheduling with minimization of passenger cost functions. Metaheuristics schemes that combine simulated annealing, tabu, and greedy search methods serve this purpose. One of the distinguishing features of this problem space is that customers often formulate two requests per day, specifying an outbound request from pick-up to drop-off and an inbound request for the round trip. Another feature is that the quality of service needs to be maximized while minimizing operating costs incurred to satisfy all the requests.

Tuesday, February 21, 2023

Fleet Management

The complexity is clearer in the case of public transport which usually has a scheduled transportation network. They use techniques and ideas from mathematics as well as computer science. Tools and concepts include graph and network algorithms, combinatorial optimizations, approximations and online algorithms, stochastic and robust optimization. Newer models and algorithms can improve the productivity of resources, efficiency, and network capacity. One of the ways to do that has been to leverage a database and use parameterized queries. The order of the data in the database provides just the right framework for the query methods to return an accurate and complete set of results. The results might differ on consistency levels, responsiveness and coverage depending on whether the relational, batch or streaming mode was used.

When the transportation problems were modeled, they were often treated as combinatorial optimization problems which included vehicle routing, scheduling, and network design. These are notoriously difficult to solve, even in a static context. This led to the need for a human dispatcher in many fleet management scenarios. Emergence of powerful computing including meta-heuristics, distributed and parallel computing has now made that somewhat easier. One of the main challenges is the need to handle dynamic data.

Mathematical formulations of this class of problems have bounded certain parameters and changed the criteria to obtain approximate solutions instead of optimal ones because the class of problems is inherently an NP-hard problem. In the last fifteen years, an incremental amount of metaheuristic algorithms has been designed. These include simulated annealing, genetic algorithms, artificial neural networks, tabu search, ant colony optimization, Greedy Randomized adaptive search procedure, Guided local search and variable neighborhood search along with several hybrid techniques. Local search is the most frequently used heuristic technique for solving combinatorial optimization problems. Sequential search is a general technique for the efficient exploration of local search neighborhoods. One of its key concepts is the systematic decomposition of moves, which allows pruning options within the local search based on associated partial gains.

Monday, February 20, 2023

One of the benefits of migrating workloads to the public cloud is the savings in cost. There are many cost management functionalities available from the AWS management console but this article focuses on the a pattern that works well across many migration projects.

This pattern requires us to configure user-defined cost allocation tags. For example, let us consider the creation of detailed cost and usage reports for AWS Glue Jobs by using AWS cost explorer. These tags can be created for jobs across multiple dimensions and we can track usage costs at the team, project or cost center level. An AWS Account is a prerequisite. AWS Glue jobs uses other AWS Services to orchestrate ETL (Extract, Transform and Load) jobs to build data warehouses and data lakes. Since it takes care of provisioning and managing the resources that are required to run our workload, the costs can vary. The target technology stack comprises of just these AWS Glue Jobs and AWS Cost Explorer.

The workflow includes the following:

1. A data engineer or AWS administrator creates user-defined cost-allocation tags for the AWS Glue jobs

2. An AWS administrator activates the tags.

3. The tags report metadata to the AWS Cost Explorer.

The steps in the path to realize these savings include the following:

1. Tags must be added to an existing AWS Glue Job

a. This can be done with the help of AWS Glue console after signing in.

b. In the “Jobs” section, the name of the job we are tagging must be selected.

c. After Expanding the advanced properties, we must add new tag.

d. The key for the tag can be a custom name and the value is optional but can be associated with the key.

2. The tags can be added to a new AWS Glue Job once it has been created.

3. The administrator activates the user-defined cost allocation tags.

4. The cost and usage reports can be created for the AWS Glue Jobs. These include:

a. Selecting a cost-and-usage report from the left navigation pane and then creating a report.

b. Choosing “Service” as the filters and applying them. The tags can be associated with the filters.

c. Similarly, team can be selected and the duration for which the report must be generated can be specified.

This pattern is repeatable for cost management routines associated with various workloads and resources.

Sunday, February 19, 2023

Migrating remote desktops

Most migrations discuss workloads and software applications. When it comes to users, identity federation is taken as the panacea to bring all users to the cloud. But migrating remote desktops for those users is just as important for those users when they need it. Fortunately, this comes with a well-known pattern for migration.

Autoscaling of virtual desktop infrastructure (VDI) is done by using NICE EnginFrame and NICE DCV Session Manager. NICE DCV is a high performance remote display protocol that helps us stream remote desktops and applications from any cloud or data center to any device, over varying network conditions. When used with EC2 instances, NICE DCV enables us to run graphics-intensive applications remotely on EC2 instances and stream their user interfaces to commodity remote client machines. This eliminates the need for expensive dedicated workstations and the need to transfer large amounts of data between the cloud and client machines.

The desktop is accessible through a web-based user interface. The VDI solution provides research and development users with an accessible and performant user interface for submitting graphics-intensive analysis requests and reviewing results remotely

The components of this VDI solution include: VPC, public subnet, private subnet, an EngineFrame Portal, a Session Manager Broker, and a VDI Cluster that can be either Linux or Windows. Both types of VDI Clusters can also be attached side by side via an Application Load Balancer. The user connects to the AWS Cloud via another Application Load Balancer that is hosted in a public subnet while all the mentioned components are hosted in a private subnet. Both the public and the private subnets are part of a VPC. The users request flows through the Application Load Balancer to the NICE EngineFrame and then to the DCV Session Manager.

There is an automation available that creates a custom VPC, public and private subnets, an internet gateway, NAT Gateway, Application Load Balancer, security groups, and IAM policies. CloudFormation is used to create the fleet of Linux and Windows NICE DCV servers. This automation is available from the elastic-vdi-infrastructure GitHub repository.

The steps to take to realize this pattern are listed below:

The mentioned code repository is cloned.

The AWS CDK libraries are installed.

The parameters to the automation script are updated. These include the region, account, key pair, and optionally the ec2_type_enginframe and ec2_type_broker and their sizes

The solution is then deployed using the CDK commands

When the deployment is complete, there are two outputs: Elastic-vdi-infrastructure and Elastic-Vdi-InfrastruSecretEFadminPassword

The fleet of servers is deployed with this information

The EnginFrame Administrator password is retrieved and the portal is accessed.

This is then used to start a session.

This completes the pattern for migrating the remote desktops for users.

Saturday, February 18, 2023

Extending datacenters to the public cloud:

A specific pattern used toward hybrid computing involves extending datacenters to the public cloud. Many companies have significant investments in their immovable datacenters and while they can create a private cloud such as a VMWare cloud within the public cloud, they might find it costly to maintain both an on-premise cloud and one on the public cloud. A reasonable approach between these choices is to extend the existing datacenters to the public cloud. This article explores this pattern.

Although technology products are not referred to by their brands or product names in a technical discussion of an architectural pattern, it simplifies this narrative by providing a specific example of the technology discussed. Since many technological innovations are patented, it’s hard to refer to them without using product names. In this case, we use the example of a private cloud with VMWare cloud and refer to its products for manageability. A VMWare vCenter is a centralized management utility that can manage virtual machines, hosts, and dependent components. VMWare vSphere is VMWare’s virtualization platform, which transforms datacenters into aggregated computing infrastructures that include CPU, storage, and networking resources.

The pattern to extend the datacenter to VMWare Cloud on AWS uses Hybrid Linked Mode. Inventories in both places can be managed through a single VMWare vSphere Client interface. This ensures consistent operations and simplified administration and uses a VMWare Cloud Gateway Appliance. It can be used to manage both applications and virtual machines that are on-premises.

There are two mutually exclusive options for configuration. The first option installs the Cloud Gateway Appliance and uses it to link from the on-premises vCenter server to the cloud SDDC. The second option configures Hybrid Linked Mode from the cloud SDDC. The Hybrid Linked Mode can only connect one on-premises vCenter Server Enhanced Linked Mode domain and supports on-premises vCenter Server running more recent versions. When a cloud gateway appliance is connected to the Hybrid Linked Mode, there can be multiple vCenter Server connected to the appliance but when the cloud SDDC is directly connected to the Hybrid Linked Mode, there can be only one vCenter Server.

Different workloads can be migrated using either a cold migration or a live migration with VMWare vSphere vMotion. Factors that must be considered when choosing the migration method include virtual switch type and version, the connection type to the cloud SDDC, and the virtual hardware version.

A cold migration is appropriate for virtual machines that experience downtime. These virtual machines can be shut down, migrated and then powered back on. The migration time is faster because there is no need to copy active memory. This holds true for applications as well. A live migration, on the other hand, uses vMotion to perform rolling migration without downtime and is advisable for mission critical applications. The idea behind vMotion is that a destination instance is prepared and made ready and the switching from source to destination happens near instantaneously.

This pattern establishes promotes the visibility of existing infrastructure to the cloud.

IT organizations building a presence in the cloud have a lot in common with the datacenter operations for a private cloud. There used to be a focus primarily on the agile and flexible infrastructure which became challenging with the distributed nature of the applications deployed by the various teams within the company. The operations of these application stacks evolved with the tools that transformed how IT operates but these organizations continued to be measured by the speed, simplicity, and security to support their business objectives.

The speed is a key competitive differentiator for the customers of any infrastructure – either on-premises or in the cloud. The leveraging of datacenter locations as well as the service centric cloud operations model has become mission critical. Fueled by the transformations in the work habits of the workforce to work from anywhere at any time, the business resiliency and agility depended on a connective-fabric network.

The network connects the on-premises, cloud, and edge applications to the workforce, and it is a multi-disciplinary effort among NetOps, SecOps, CloudOps, and DevOps teams. Each one has a perspective into building the infrastructure and the tools that manage where the workloads are run, the service level objectives defining the user experience, and implementation of zero trust security to protect vital business assets.

Enablement of these teams requires real-time insights usually delivered with an automation platform. Both the cloud and the datacenter operations can be adapted to the new normal of shifting workloads and distributed workforces. Delivering a consistent simplified experience to the teams with such a platform, empowers them to align and collaborate more efficiently than before. Architectural patterns and manageability interfaces that unify and simplify these administrative routines are more than welcomed given the scale of the inventory.

Some datacenter automations can be fabric agnostic but they all must have some common characteristics. These include providing a unified view into proactive operations with continuous assurance and actionable insights, an orchestrator to coordinate activities, and a seamless access to network controllers and third-party tools or services. The orchestrator can also enforce policies across multiple network sites and enable end-to-end automation across datacenter and networks. A dashboard offers the ability to view all aspects of management through a single pane of glass. It must also define multiple personas to provide role-based access to specific teams.

Some gaps do exist between say NetOps to DevOps which can be bridged with a collaborative focal point that delves into integration such as with ticketing frameworks for incident management, mapping compute, storage, and network contexts for monitoring, identifying bottlenecks affecting workloads, and consequent fine-tuning.

Automation also has the potential to describe infrastructure as a code, or infrastructure as a resource or infrastructure as a policy. Flexible deployment operations are required throughout. Complexity is the enemy of efficiency and tools, and processes must be friendly to the operators. Automation together with analytics can enable them to respond quickly and make incremental progress towards their goal.

Friday, February 17, 2023

As enterprises and organizations survey their applications and assets to be moved to the cloud, one of the often-overlooked processes involve the entrenched and almost boutique build systems that they have invested in over the years. The public clouds advocate the use of cloud native DevOps pipeline and automations that work well for new repositories and small projects but when it comes to billion dollars plus revenue generating source code assets, the transition of build and deployment to cloud become surprisingly challenging.

New code projects and businesses can start out with a code repository in GitHub or GitHub Enterprise with files conforming to the 100MB limit and the repository sizes conforming to the 5GB limit. When we start clean, on the Cloud based DevOps, managing the inventory to retain only text in the source and move the binaries to an object storage is easy. When the enterprises have accrued massive repositories over time, even a copy operation becomes difficult to automate. What used to be robocopy on windows involving large payloads, must now involve a transfer over S3.

One of the first challenges in the movement of build and capacity planning infrastructure to the cloud is to prepare the migration. External dependencies and redundancies can cause these repositories to become very large, not to mention branches and versions. Using a package manager or their equivalents to separate out the dependencies into their packages can be helpful to their reusability. Bundler, Node’s package manager and Maven are testament to this effect. Object storage or Artifactory and their equivalents can store binary data and executables. Backup and restore can easily be added from Cloud Services when they are not configurable via the respective cloud services.

Another challenge is the proper mapping of infrastructure to handle such large processes involved in Continuous Integration and Continuous Deployment. GitHub Enterprise can provide up to 32 cores and 50000 minutes/month for public repositories of sizes up to 50GB. The cloud, on the other hand, is limitless compute, storage and networking, all with the convenience of pay-as-you-go billing. If there is effective transformation of DevOps automations, both the infrastructure required and the automations they support become easier to host in the cloud. As with the first challenge, the ability to take stock of the inventory for infrastructure resources and automation logic becomes daunting. Consequently, some form of organization and nomenclature to divide up the inventory into sizeable chunks can help with the transformation and even parallelization

A third challenge involved is environmental provisioning and manual testing. Subscriptions, resource groups, regions and configurations proliferate in the cloud when such DevOps are transformed and migrated. These infrastructure and state become veritable assets to guard just the same way as the source that are delivered with the DevOps. From importing and exporting these infrastructure-as-a-code templates as well as their states and forming blueprints that can include policies and reconcile the state become a necessity. A proper organization and naming convention are needed for these as well.

Other miscellaneous challenges include but are not limited to forming best practices and centers of excellence, creating test data, providing manual deployments and overrides, ensuring suppliers, determining governance, integrating an architecture for tools say in the form of runbooks, manual releases, determination of telemetry, determining teams and managing accesses, supporting regulatory compliance, providing service virtualization, and providing education for special skillsets. In addition, managing size and inconsistencies, maintaining the sanctity as a production grade system and providing an escalation path for feedback and garnering collaboration across a landscape of organizations and teams must be dealt with.

Finally, people, process and technology must come together in a planned and streamlined manner to make this happen. These provide a glimpse of the roadmap towards the migration of build and deployments to the cloud.

Thursday, February 16, 2023

One of the architectural patterns for application migration is about managing AWS Service Catalog products in multiple AWS Accounts and AWS Regions. AWS Service Catalog is used to create, share organize and govern the curated IaC templates. Governance and distribution of Infrastructure is simplified and accelerated. AWS uses CloudFormation Templates to define a collection of AWS resources aka stacks required for a solution or a product. StackSets extend this functionality by enabling us to create, update or delete stacks across multiple accounts and AWS Regions with a single operation.

If a CloudFormation template must be made available to other AWS accounts or organizational units, then the portfolio is typically shared. A portfolio is a container that includes one or more products.

On the other hand, this architectural pattern is an alternative approach that is based on AWS CloudFormation StackSets. Instead of sharing portfolio, we use AWS StackSet constraints to set AWS regions and accounts where the resources can be deployed and used. This approach helps to provision the Service Catalog products in multiple accounts, OUs and AWS Regions, and managed from a central location which meets governance requirements.

The benefits of this approach are the following:

1. the product is provisioned and managed from a primary account, and not shared with other accounts.

2. This approach provides a consolidated view of all provisioned products (stacks) that are based on a specific set of templates.

3. The use of a primary account makes the configuration with AWS Service management Connector easier

4. It is easier to query and use products from the AWS Service Catalog.

The architecture involves an AWS management account and a target Organizational Unit (OU) account. The CloudFormation template and the service catalog product are in the management account. The CloudFormation stack and its resources are in the target OU account. The user creates an AWS CloudFormation template to provision AWS resources, in JSON or Yaml format. The CloudFormation template creates a product in AWS Service Catalog, which is added to a portfolio. The user creates a provisioned product, which creates CloudFormation stacks in the target accounts. Each stack provisions the resources specified in the CloudFormation templates.

The steps to provision products across accounts include: 1. Creating a portfolio say with the AWS command line interface 2. Create the template that describes the resources, 3. Create a product with version title and description and 4. Apply constraints to the portfolio to configure product deployment options such as multiple AWS accounts, regions and permissions and 5. Provide permissions to users so that they can launch the products in the portfolio.