Cluster computing

Saturday, February 4, 2023

Migrating remote desktops

Most migrations discuss workloads and software applications. When it comes to users, identity federation is taken as the panacea to bring all users to the cloud. But migrating remote desktops for those users is just as important for those users when they need it. Fortunately, this comes with a well-known pattern for migration.

Autoscaling of virtual desktop infrastructure (VDI) is done by using NICE EnginFrame and NICE DCV Session Manager. NICE DCV is a high performance remote display protocol that helps us stream remote desktops and applications from any cloud or data center to any device, over varying network conditions. When used with EC2 instances, NICE DCV enables us to run graphics-intensive applications remotely on EC2 instances and stream their user interfaces to commodity remote client machines. This eliminates the need for expensive dedicated workstations and the need to transfer large amounts of data between the cloud and client machines.

The desktop is accessible through a web-based user interface. The VDI solution provides research and development users with an accessible and performant user interface for submitting graphics-intensive analysis requests and reviewing results remotely

The components of this VDI solution include: VPC, public subnet, private subnet, an EngineFrame Portal, a Session Manager Broker, and a VDI Cluster that can be either Linux or Windows. Both types of VDI Clusters can also be attached side by side via an Application Load Balancer. The user connects to the AWS Cloud via another Application Load Balancer that is hosted in a public subnet while all the mentioned components are hosted in a private subnet. Both the public and the private subnets are part of a VPC. The users request flows through the Application Load Balancer to the NICE EngineFrame and then to the DCV Session Manager.

There is an automation available that creates a custom VPC, public and private subnets, an internet gateway, NAT Gateway, Application Load Balancer, security groups, and IAM policies. CloudFormation is used to create the fleet of Linux and Windows NICE DCV servers. This automation is available from the elastic-vdi-infrastructure GitHub repository.

The steps to take to realize this pattern are listed below:

1. The mentioned code repository is cloned.

2. The AWS CDK libraries are installed.

3. The parameters to the automation script are updated. These include the region, account, key pair, and optionally the ec2_type_enginframe and ec2_type_broker and their sizes

4. The solution is then deployed using the CDK commands

5. When the deployment is complete, there are two outputs: Elastic-vdi-infrastructure and Elastic-Vdi-InfrastruSecretEFadminPassword

6. The fleet of servers is deployed with this information

7. The EnginFrame Administrator password is retrieved and the portal is accessed.

8. This is then used to start a session.

This completes the pattern for migrating the remote desktops for users.

Friday, February 3, 2023

One of the benefits of migrating workloads to the public cloud is the savings in cost. There are many cost management functionalities available from the AWS management console but this article focuses on the a pattern that works well across many migration projects.

This pattern requires us to configure user-defined cost allocation tags. For example, let us consider the creation of detailed cost and usage reports for AWS Glue Jobs by using AWS cost explorer. These tags can be created for jobs across multiple dimensions and we can track usage costs at the team, project or cost center level. An AWS Account is a prerequisite. AWS Glue jobs uses other AWS Services to orchestrate ETL (Extract, Transform and Load) jobs to build data warehouses and data lakes. Since it takes care of provisioning and managing the resources that are required to run our workload, the costs can vary. The target technology stack comprises of just these AWS Glue Jobs and AWS Cost Explorer.

The workflow includes the following:

1. A data engineer or AWS administrator creates user-defined cost-allocation tags for the AWS Glue jobs

2. An AWS administrator activates the tags.

3. The tags report metadata to the AWS Cost Explorer.

The steps in the path to realize these savings include the following:

1. Tags must be added to an existing AWS Glue Job

a. This can be done with the help of AWS Glue console after signing in.

b. In the “Jobs” section, the name of the job we are tagging must be selected.

c. After Expanding the advanced properties, we must add new tag.

d. The key for the tag can be a custom name and the value is optional but can be associated with the key.

2. The tags can be added to a new AWS Glue Job once it has been created.

3. The administrator activates the user-defined cost allocation tags.

4. The cost and usage reports can be created for the AWS Glue Jobs. These include:

a. Selecting a cost-and-usage report from the left navigation pane and then creating a report.

b. Choosing “Service” as the filters and applying them. The tags can be associated with the filters.

c. Similarly, team can be selected and the duration for which the report must be generated can be specified.

This pattern is repeatable for cost management routines associated with various workloads and resources.

Thursday, February 2, 2023

Among all the established cloud migration patterns for different workloads, one pattern stands alone because it is not specific to a resource. This is the pattern for software development and testing in the cloud so that the logic can be incrementally migrated from an on-premises application to say, serverless computing. With the elimination of a one-time deployment of migrated code, the path to incremental delivery and enhancement of logic that is hosted in the cloud becomes more gradual, deliberate and even expanding audience. Versioning of the logic helps keep existing customers while onboarding new ones.

Let us take a specific example of a NodeJs application developed with GitHub and build with AWS CodeBuild. The instructions in this document helps to set up a continuous integration and continuous delivery workflow that runs unit-tests from a GitHub repository. Unit-tests reduce refactoring time while helping engineers to get up to speed on their code base more quickly, and provide confidence in the expected behavior. It involves testing individual functions including Lambda functions. Use of an AWS Cloud9 instance which is an Integrated Development Environment is suggested but not mandatory. It can be accessed even through a web browser.

Setting up this pipeline involves a few milestones which developers call ‘epics’. The first epic involves running unit-tests on a personal GitHub repository with CodeBuild. The tasks involved are:

Sign in to the AWS Management Console and open the CodeBuild console at https://console.aws.amazon.com/codesuite/codebuild/home

Create a build project and in the project configuration, type the name of the project

In the source section, specify the provider as GitHub and point to the existing personal Repository in the GitHub account by specifying its URL.

In the primary source webhook events section, specify to rebuild every time a code change is pushed to this repository.

In the environment section, choose managed image and the latest image for an Amazon Linux instance.

Leave the default settings and complete the project creation.

Start the build.

The CodeBuild console will display the tests run and the unit-test results can be reviewed. These results validate the repository integration with the project that has been created with the steps above. When the webhook is applied, code changes automatically start a build.

Unit-tests often involve assertion, spies, stubs and mocks.

An assertion is used to verify an expected result. For example, the following code validates that the results are in a given range:

Describe(‘Target Function Group’, () => {

It(‘Check that the result is between 0 and 1000’, function () {

const target = new Target();

expect(target.id).is.above(0).but.below(1000)

});

A spy is used to observe what is happening when a function is running. The following example shows whether a set of methods were invoked.

Describe(‘Target Function Group’, () => {

It (‘should verify that the proper methods were called’, () => {

const spyStart = spy(Target.prototype, “start”);

const spyStop = spy(Target.prototype, “stop”);

const target = new Target();

target.start();

target.stop();

expect(spyStart.called).to.be.true;

expect(spyStop.called).to.be.true;

});

A stub is used to override a function’s default response. For example, a stub can be used to force a return ID from the getId function

Describe(‘Target Function Group’, () => {

It (‘Check that the Target Id is between 0 and 1000’, function () {

let generateIdStub = stub(Target.prototype, ‘getId’).returns(99999);

const target = new Target();

expect(target.getId).is.equal(99999);

generateIdStub.restore();

});

A mock is a fake method that has predefined behavior for testing different scenarios. A mock can be considered an extended form of a stub and can carry out multiple tasks simultaneously. For example, a mock can be used to validate three scenarios: 1. A function is called, 2. That it is called with arguments. and 3. It returns an integer, say, 9.

Describe(‘Target Function Group’, () => {

It (‘Check that the TargetId is between 0 and 1000’, function() {

let mock = mock(Target.prototype).expects(‘getId’).withArgs().returns(9);

const target = new Target();

const id = target.getId();

mock.verify();

expect(id).is.equal(9);

});

Wednesday, February 1, 2023

One of the advantages of migrating workloads to the public cloud is that there are well-known patterns to leverage which make the assess->deploy->release path quite predictable. For example, we can talk about the migration strategy identification itself as a pattern that leverages the notion of an AppScore. The idea behind this pattern is that the challenges around missing operational data crucial for studying applications such as recovery time objective, recovery point objective or data privacy, can be overcome by using an application centric view of the portfolio of applications to migrate. This includes a recommended transformation route for each of the application against the 5 R’s model described in the earlier post

The score helps to capture application information, determine the ideal transformation route, identify the risk, complexity, and benefits of cloud adoption, and quickly define the migration scopes, move groups and schedules.

The score is used towards a recommendation based on the following three categories of application attributes:

1. Risk – which is the business criticality of the application, whether it contains confidential data, data sovereignty requirements, and the number of application users or interfaces.

2. Complexity – which is the application’s development language, age, UI, or number of interfaces.

3. Benefit – which is the batch processing demand, application profile, disaster recovery model, development, and test environment use.

There are four phases of iterative data capture, which include:

1. Signposting – questions that are combined with server data to produce the assessments.

2. Scoring – questions that produces scores for risk, benefit, and complexity.

3. Current state assessment – questions that provide a current state assessment of the application.

4. Transformation – questions that comprehensively evaluate the application for future state design.

Only the signposting and scoring stages are required to receive application scores, assessments and enable group planning. After the applications are grouped and scopes are formed, the latter two stages are used to build a more detailed overview of the application.

While the migration evaluator helps to create a directional business case for planning and migration, the application centric view helps to bridge the gap between discovery and migration implementation and provide a recommended route to the cloud.

This workflow can be described with the following stages:

1. Start the discovery and assessment.

2. Align Applications and servers.

a. Capture the application and business information.

b. Import server and technical details.

3. Obtain recommendations, scoring and costs for each application.

4. Plan costed schedules using move groups.

5. Design application migration or transformation activities

6. Export Application assessment and transformation reports from 3. 4. And 5.

7. Perform approved migration or transformation activities.

With this pattern, the determination of the recommended migration strategy becomes more predictable.

Tuesday, January 31, 2023

Application Modernization

One of the Gartner Reports for Information Technology called out the paths to adopting the public cloud as one of Five R’s – Rehost, Refactor, Rearchitect, Rebuild and Retire. Many Solution Architects would tend to agree with such separate workstreams for the progression from assessment through deployment and finally release of the cloud hosted applications. But while the analysts and even customers tend to focus on one approach, the Solution architects often see many paths and even involving intermediary steps between the start and the finish. While they fine tune the path by optimizing several parameters and often on a case-by-case basis, they generally agree with the break-out of these definitions.

Rehosting is a recommendation to change the infrastructure configuration of the application in order to “Lift and Shift” it to the cloud using Infrastructure-as-a-service.

Refactor is a recommendation to perform modest modifications of the application code without changing the architecture or functionality so that it can be migrated to the cloud in a container using Container-as-a-service or using Platform-as-a-service.

Rearchitect is a recommendation to dramatically modify the application code thereby altering the architecture to improve the health of the application and enable it to be migrated to the cloud using Platform-as-a-service or deployed serverless using Function-as-a-service.

Rebuild is a recommendation to discard the code of the application and develop it again in the cloud using Platform-as-a-service or serverless Function-as-a-service.

Retire is a recommendation to discard the application altogether or potentially replace it with a commercial software-as-a-service alternative.

All of the recommendations above can be made on an application-by-application basis but must be elaborated with additional information such as the Lines of code analyzed, the number of files read, a score or ranking, the total number of full-time employees required, a quantitative assessment for cloud readiness, the number of roadblocks encountered, the estimated effort, Operations and Business Support system dedication, the SR, SA, and SE assessments, and tentative dates for the release.

As you can see, this process can be quite intensive and repeating it for dozens of legacy applications can be quite tedious. Coming up with a table for the results of the rigor across all these applications and for a dashboard to represent the insights via canned queries can easily become error prone. Fortunately, some trail and documentation are maintained as this analysis is conducted and there are quite a few tools that can help with the visualization of the data even as the data is incrementally added to the dataset. An end-to-end automation that can help with detailed analysis, drill downs and articulated charts and graphs is possible if those applications were merely scanned by their source. Indeed, a rapid scan of a portfolio of applications can be a great start in the roadmap that requires us to assess, mobilize, migrate and modernize but iterations will remain inevitable and even desirable as they help towards optimizing continuously.

Monday, January 30, 2023

Zero Cost for IT Cloud Automation and managed services:

Introduction: Analogies from open source total cost of ownership aka TCO are directly applicable to the inventory of an IT provider in any organization. Costs such as acquisitions, operations and personnel are not only amplified at the cloud level for an IT provider but also incur increased workflow automations and complexity costs. While the TCO was coined around the era of war between managed software and open source, it continues to draw parallels and hold as much relevance in the new IT world dominated by public and private cloud inventories. In this writeup, we review some of these in detail.

Description: An IT Organization provider has the following costs:

Costs of resource instances – Today we rely more and more on virtual and shared resources and additionally we now have more granular and fragmented resources as containers, virtual networks and migratory applications and services. There is no longer a concept of ownership as much as there is weak reference established via tenancy.

Cost of operations and services: Erstwhile notions of patch management, backup and security no longer apply on a periodic assessment basis and instances are more short-lived than ever. Furthermore, management of resources is now done via dedicated agents and central command that transcend cloud and on-premise boundaries. The running of these services is now no longer triggered as much as they are serviced automatically via alerts and notifications.

Cost of manpower: With increased complexity of cloud computing, there is more manpower needed than before. This runs contrary to the general belief that the cloud services are self-serve and increasingly automated.

Detail: We reduce the cost in each of these categories above.

Today as private cloud customers request compute and storage, we slap on self-service automation and maintenance workflows by default so that the resources get leased and serviced in a routine manner. However, these workflows are all utilizing existing infrastructure that have either come to end of life or have not kept up with the pace of things in the public cloud. Moreover, the discrepancy between the services offered in a private cloud and those offered in public cloud only grows with emphasis on legacy tools and platforms. Take examples such as infoblox, zabbix etc for our network address and monitoring utilities and the evidence becomes clearer. If we rely on static ip addressing versus dhcp, we may have workflows and automations in place that build up on these services in layers and each addition is costly because the foundation is not right. There is no evidence of using static ip addressing as the primary mode of address assignment in the public cloud. Furthermore, the public cloud is thought through in how it will scale on many fronts whereas private cloud groans to scale because of the bottlenecks in how these products scale to cloud loads. Monitoring with clunky products like Zabbix is another example of why a UI-CLI-SDK masquerading product is nowhere compared to the services at the cloud scale such as AWS CloudWatch or Azure Monitor. Automations and scripts are relatively inexpensive compared to products but misguided automation and incorrect emphasis only leads to more and more technical debt that may not only weigh down the capabilities at some point but also sink the offerings if the users find the ubiquity and norm of cloud computing more appealing. This write-up does not belabor the point that betting against the public cloud is foolish and instead draws attention to the investments being made in the short term on the private cloud versus the on boarding for the public cloud. We will look at these differences in investment and come up with an onboarding strategy for public cloud in a separate section but right now we just make the case that the differences in the technology stack on private cloud versus their comparables in the public cloud should be few. For the sake of clarity, we refrain from a discussion on hybrid cloud because the state of affairs in a private cloud is focus worthy to take precedence in this discussion and the differences are much better called out between private and public rather than hybrid and anything else.

Moreover, our workflows are not that simple. We repackage existing functionalities from out of box providers that are largely on-premise in their deployments with sometimes a false claim for being cloud enabled. There is a reason why there are no third party applications and appliances put into the mix of resources or services available from a public cloud provider. With a native stack and homogeneous resources, the public cloud can clone regions and avoid significant costs and labor associated with external purchases, maintenance, relicensing and support tools. These public clouds are betting on people and their deliverables to grow and expand their capabilities with very little dependencies or churn thrown their way. Naturally the number and type of services have significantly grown in the public cloud.

The technology stack from private cloud must not be recreating services from scratch at par with the public cloud but should be designed with fit over public cloud services in the first place before extending to the legacy hardware and private cloud infrastructure. Infrastructure as a service and platform as a service must be differentiated. Today we go to either based on whether we are requesting resource or whether we are requesting software stacks. I believe the right choice should be to differentiate the offerings based on usage requests. For example, managed clusters with marathon stacks should come from IaaS and programming stack for deploying applications should come from PaaS. In the former case, we can rely on cloud computing resources from public cloud.

Some private cloud offerings are hesitant to let go of resources and usages because they cite that the equivalent is not available in public cloud. For example, they say the flavors for operating system and the images offered from private cloud are not available the same in public cloud. In such cases, the private cloud has the advantage that it can offer better variations and customized flavors for the customer. In addition, the machines can also be joined to domain.

This sort of reasoning is fallacious. There are two reasons for it. First if the customers cannot do with public cloud offerings, then they are definitely digging themselves into a rut from which they will find it difficult to climb out later. A private cloud is offered not merely because it offers an alternative in the portfolio of cloud based investments but more so that the stack is completely managed by the organization. As such this ownership based slice must be a smaller percentage than the leased public cloud investments. The second reason is that the private cloud does not come homogenous. It is based on either Openstack or VMware based stacks and the tools and services are separate for each. There are very few features from private cloud that make use of both equally.

The differences between private cloud and public cloud also come with existing and legacy inventory both of which have their own set of limitations that require more workarounds and workflow rework that cost significantly to make and operate. The rollover and upgrade from existing to new resources are somewhat more drawn out with the age of the systems as applications written at the time or customers using those resources may not move quickly or as nimbly as the new ones being written and deployed on say PaaS. Customers are slow to respond to take actions on their compute resources when notified by emails and a link to self-help.

As a use case, take the example of co-ordination software such as chef, puppet, ansible, salt, BladeLogic to manage all the systems whether on-premise or in the cloud. Each of these systems have a purchase cost, relicensing cost, training cost, operational cost, and continue to add their own complexities to workflows and automations with what they support and what they don’t. On the other hand there are tools from the public cloud that spans both the public cloud and on-premise assets by virtue of an agent installed on the machine that talks over http. These System Center tools from either public cloud are designed for the organization wide asset management and chores that has been widely accepted by many companies.

Conclusion: The adage of total cost of ownership holds true even in the new IT world although not by the same name. Managed versus unmanaged services show clear differentiation and place value in favor of managed anywhere.

Sunday, January 29, 2023

These are a few SQL exercises to demonstrate querying beyond spreadsheets:

Let us consider that there are

-- Employees at Company X

-- their names must be unique.

CREATE TABLE employees (

id SERIAL PRIMARY KEY,

name TEXT,

);

-- Contacts are people an employee knows

CREATE TABLE contacts (

id SERIAL PRIMARY KEY,

name TEXT,

email TEXT

);

-- employees will refer people they know

-- must be cleaned up after an employee leaves

CREATE TABLE referrals (

employee_id INT references employees(id),

contact_id INT references contacts(id)

);

Q.1: Which of the following statements is TRUE?

Creating employees with the same name will generate a SQL Error (FALSE)

Creating employees with the same name will not generate a SQL Error (TRUE)

Q.2: Which of the following statements is TRUE?

When an employee or contact is deleted, referrals will be deleted (FALSE)

When an employee or contact is deleted, referrals will NOT be deleted (TRUE)

Query ONE

SELECT * FROM (

SELECT

employees.name AS employee_name

contact_id,

contacts.name as contacts_name,

contacts.email as contacts_email,

(SELECT COUNT (*) FROM referrals as r WHERE r.contact_id = referrals.contact_id) as num_referrals

FROM referrals

JOIN employees on employees.id= employee_id

JOIN contacts on contacts.id = contact_id

) As q

where q.num_referrals > 1;

Query TWO

SELECT

employees.name AS employee_name

contact_id,

contacts.name as contacts_name,

contacts.email as contacts_email,

(SELECT COUNT (*) FROM referrals as r WHERE r.contact_id = referrals.contact_id) as num_referrals

FROM referrals

JOIN employees on employees.id= employee_id

JOIN contacts on contacts.id = contact_id

WHERE

contact_id IN (SELECT contact_id FROM REFERRALS GROUP BY contact_id HAVING COUNT (*) > 1);

Q.3: Which of the two queries is more efficient?

Query ONE (TRUE)

Query TWO (FALSE)

Q.4: A ride hailing company has their DB structured in 3 major tables as described in the SCHEMA

section below.

Write a query to fetch the top 100 users who traveled the most distance using the service.

The output should be structured as: users.name distance_traveled

Sort the output by distance_traveled in descending order, then by user name in ascending order. Show

only the top 100 users, ignoring the ties at the last position.

Note: There could be multiple users with the same name but they will have different IDs

CITIES: id string assigned id to the city represented as 32 characters UUID

name string the name of the city

USERS: id string assigned id to the city represented as 32 characters UUID

city_id string the id of the city in which this user resides.

name string the name of the user.

email string the email of the user.

SELECT DISTINCT u.name, SUM(r.distance) as distance_traveled

FROM RIDES r

JOIN USERS u

ON r.user_id = u.id

GROUP BY r.user_id, u.name

ORDER BY distance_traveled DESC, u.name ASC

LIMIT 0,100;