Cluster computing

Tuesday, May 24, 2016

#codingexercise
Fix two swapped nodes of BST
//Inorder traversal will have two nodes out of place
void correctBSTUtil(node root, ref node first, ref node middle, ref node last, ref node prev)
// the target nodes may be adjacent then they will be found by first and middle
// otherwise they will be found by first and last
{
if (root)
{
correctBSTUtil(root.left, first, middle, last, prev);
if ( prev && root.data < prev.data)
{
if (!first)
{
first = prev;
middle = root;
}
else
last = root;
}
prev = root;
}
correctBSTUtil(root.right, first, middle, last, prev);
}

void correctBST(node root)
{
node first, middle, last, prev;
first=middle=last=prev=null;
correctBSTUtil(root, first, middle, last, prev);
if (first && last)
{
swap(first.data, last.data);
}
else if (first && middle)
{
swap(first.data, middle.data);
}

}

Snapshots and AMI from AWS

We were discussing backup and snapshots. AWS handles this differently. It takes snapshots of volumes and images (Amazon machine image) from instances. Both are required for a full restore although an instance can be launched from either. This is because the root drive of the instance is a volume which can be snapshot from EBS. The machine should ideally be powered off for a consistent backup otherwise it could be restarted to let other programs flush to disk. As long as the disk captures the active state, we are good to backup because then we don't lose any transient data.

AWS comes with 'boto' sdk that facilitates some of these operations in the following way:

connection = boto.ec2.connect_to_region(region, aws_access_key_id=accesskey, aws_secret_access_key=accesssecret)

connection.get_all_instances()

// find instance

instance.create_image(hostname+"Backup", description=None, no_reboot=False, dry_run=False)

#codingexercise

Given three points a, b and c, write a function to find what type of triangle they construct or whether a triangle can be made at all.

First we rule out triangle if the points are collinear

bool collinear(Point p, Point q, Point r)

{

if (q.x >= min(p.x, r.x) && q.x <= max(p.x, r.x) &&

    q.y >= min(p.y, r.y) && q.y <= max(p.y, r.y))

return true;

return false;

}

Monday, May 23, 2016

Virtual Machine snapshots on different cloud platforms.

Cloud providers such as OpenStack, VMWare provide their own snapshot services. The snapshot of a Virtual machine is a file itself. Therefore backup of this file is equivalent to taking snapshot of the Virtual Machine. Since Cloud providers are used to create virtual machine instances, they are also the appropriate controllers to take snapshots of the created instances.

From VMWare documentation:

“On vSphere, backups are usually done by taking a snapshot, to efficiency obtain a static image of the virtual machine. Snapshots are a view of a virtual machine at a certain point in time, and enable quick and clean backup operation. Snapshots also provide an incremental backup mechanism called changed block tracking. ”

Let us take a look at how to create the instances for different platforms:

OpenStack:

This provides snapshot APIs from the Images library:

GET /v2/images

GET /v2/images/{image_id}

And

POST /v2/images

Upload and download of raw image data can be done with:

PUT /v2/Images/{image_id}/file

GET /v2/Images/{image_id}/file

VMware:

This provides workflows that can be used to create snapshot using the VCO client.

Most of them have a generic signature as follows:

/workflows/{workflow-id}

/workflows/{workflow-id}/executions to check the status of the execution

In addition, VDDK also provides abilities with a different set of API calls.

On one vCenter Server, the moRef uniquely identifies a virtual machine. If we need to track and inventory virtual machine backups across multiple vCenter Servers, we can use moRef together with instanceUuid. We can see the instanceUuid at the following browser path:

https://<vcserver>/mob/?moid=ServiceInstance&doPath=content.about

The following code sample shows how to create a snapshot on a specific virtual machine:

// At this point we assume the virtual machine is identified as ManagedObjectReference vmMoRef. String SnapshotName = "Backup"; String SnapshotDescription = "Temporary Snapshot for Backup"; boolean memory_files = false;

boolean quiesce_filesystem = true; ManagedObjectReference taskRef = serviceConnection.getservice().CreateSnapshot_Task(vmMoRef,

SnapshotName, SnapshotDescription, memory_files, quiesce_filesystem);

The following Java code demonstrates how to delete the snapshot:

ManagedObjectReference removeSnapshotTask; ManagedObjectReference snapshot; // Already initialized. removeSnapshotTask = serviceConnection.getservice().removeSnapshot_Task(snapshot, Boolean FALSE);

AWS provides a variety of ways to interact with the VMs. While the options 1) and 2) are for the private cloud, this one is for the public cloud and hence comes with rich documentation on steps to create a snapshot. For example, we can do this with a tool, SDK or API.

Here is an example:

https://ec2.amazonaws.com/?Action=CreateSnapshot

&VolumeId=vol-1234567890abcdef0

&Description=Daily+Backup

&AUTHPARAMS

Conclusion: Different cloud providers provide the ability to take snapshots and are tied directly with their abilities to create instances. We leverage these to allow instances to be snapshot in a platform agnostic manner.

#coding exercise

Level order traversal of a tree in spiral form

Same a level order except that alternate levels are put in a stack and printed.
Void printSpiral (node root)
{

Bool drc = false;
For (int I = 1; I < height (root) I ++){
PrintGivenLevel (root, i, drc);
Drc ~= drc
}
}
void printGivenLevel(Node root, int level, bool direction)
{
if (root == null) return;
if (level ==1) print(root.data);
else if level > 1{
if (direction){
printGivenLevel(root.left, level - 1, direction);
printGivenLevel(root.right, level -1, direction);
}else{
printGivenLevel(root.right, level-1, direction);

printGivenLevel(root.left, level -1, direction);
}
}
}

Sunday, May 22, 2016

Backup and Recovery – why existing products ?

Introduction : Digital Data can get erased, lost or corrupted. Virtual Machines are no different in that respect. To enable disaster recovery, virtual machines are often backed up. Previously data was important and stored outside machines on shares, files and repositories. But object storage changed that where the virtual machines was used as stores. While object storage tolerates failures from participating machines, it relies on keeping copies of data but backups are also copies of data. A word document can get saved continuously. Why can’t all the virtual machines in a data center get saved periodically for the duration that the machines are active?

Why a new product? It’s true that backup and recovery is becoming increasingly difficult to manage. Products have become smarter by keeping agents in the operating systems of the virtual machines that can detect what to backup and when. On the other hand, snapshots which are not that intelligent and require merely a point of time capture at the storage level has become increasingly popular among cloud providers. Existing products have improved offerings with deduplication, manageability, maintenance and even come with their own appliances. As these businesses compete, there is seldom any attention paid to efficiencies in terms of disk I/Os and network bandwidth in a vendor –heterogenous-deployments in the cloud. Consequently, many cloud providers either do not leverage the products to their full abilities or have to explicitly disable some features to manage the overall operations of the data center.

Gartner report on backup and recovery actually mentions less than a handful of visionaries who are even poised to make greater impact. Yet their technologies are leaving a lot to be configured by the administrators in a datacenter. What if we had a dedicated pool offering to take a round robin of snapshots periodically for every virtual machine in a datacenter? At this point, we are planning to offer a service and not a product to add value where none existed earlier for the cloud providers. Moreover given the large number of virtual machines in a data center, these snapshots can be better organized and automated so that neither the user of the virtual machines nor the administrators need to take any additional actions. Most backup programs already have a service or a daemon running in their client-server solutions anyways. These agents that run locally on a client or in a central server target individual flavors of the operating systems. But when we run a service outside the virtual machines and on a pool of servers such that they can handle the load of a datacenter, we are talking about automating at a scale larger than ever before. Now we can consistently provide many more features in this managed service such as aging policies and cleanups.

#codingexercise

Find the smallest subarray that needs to be sorted to sort the full array

Tuple<int,int> GetSmallerSort(List<int> nums)

{

int min = nums.min();

int start = nums.indexOf(min);

int max = nums.max();

int end = nums.indexOf(max);

for ( int i = start-1; i>= 0; i--)

if (nums[i] > nums[start]){

start = i;

break;

}

for ( int j = end+1; j < nums.Length; j++)

if (nums[j] < nums[end]){

end = j;

break;

}

return new Tuple<int, int>() { start, end };

}