Cluster computing

Wednesday, October 1, 2014

#codingexercise
Check if there's a path from root to leaf in a binary tree such that the sum of the node values equals the given number:
bool hasPathSumRecursive(Node root, int val, int sum = 0)
{
if (root != null)
{
if (sum + root.data > val) return false;
return hasPathSumRecursive(root.left, val, sum+ root.data) || hasPathSumRecursive(root.right, val, sum+root.data);
}
return sum == val;
}
Now list all such paths :
bool hasPathSumRecursive(Node root, int val, int sum = 0, ref List<List<Node>> paths, ref List<Node> candidatePath)
{
if (root != null)
{
candidatePath.Add(root);
if (sum + root.data > val) { candidatePath.Remove(root); return false;}
bool hasPathSum = hasPathSumRecursive(root.left, val, sum+root.data, paths, candidatePath) || hasPathSumRecursive(root.right, val, sum+ root.data, paths, candidatePath);
if (hasPathSum)
paths.Add(candidatePath);
candidatePath.Remove(root);
return hasPathSum;
}
return sum == val;
}

Concatenate two strings by removing the redundancy of the matching tail of one and the head of another

string ConcatenateTailHead(string t, string s)
{
   //parameter validation
   // canonicalization ( trim whitespace etc)
   int index = t.IndexOf(0, s[0]);
while (index != -1 && index < t.Length)
{
    if (s.startwith(t.substring(index))
    {
        return t.substring(0, index) + s;
     }
    index = t.IndexOf(s[0], index + 1);
}
return t + s;
}

check if two trees are identical
bool isSameTree(Node a, node b)
{
if (a == null || b == null)
{
if (a != b) return false;
Else return true;
}
if (a.data != b.data)
{
return false;
}
return isSameTree(a.left, b.left) && isSameTree(a.right, b.right);
}

Tuesday, September 30, 2014

In Today's post, we continue our discussion on OpenStack. In particular, we refer the Architecture for OpenStack on Ubuntu. OpenStack is described on http://docs.openstack.org It is designed to be massively scalable horizontally. To begin with we can have a single cloud controller that hosts the databases, message queue service, authentication and authorization service, image management service, user dashboard, and externally accessible API endpoints for OpenStack services. By starting with a single cloud controller, we favor simplicity over fault tolerance and availability that is possible with a fully redundant cloud controller. Most OpenStack Compute central services communicate with each other using the Message Queue which also has cluster capabilities. Databases save stateful information and they are critical to the central services. The content and delivery services consists of two parts - one that is responsible for the delivery of images and the latter maintains the metadata information associated with the virtual machine images and requires a database.
The OpenStack dashboard is implemented as a Python web application that runs in Apache httpd. The OpenStack Identity service allows identity to be set in the policy.json file. The identity service supports different plugins for storing information which include an in-memory Key-Value store, SQL database, PAM and LDAP. OpenStack implementations usually involve MAAS and Juju. MAAS is a tool that helps manage physical infrastructure with the same ease and flexibility as virtual machines in the cloud. Specially, MAAS allows us to discover, commission and deploy physical servers and dynamically allocate physical resources to match workload requirements and retire servers. Juju is a service orchestration tool which allows the administrator to configure, manage, maintain, deploy and scale cloud services (workloads) quickly and efficiently leveraging MAAS. Generally MAAS and Juju GUI run on separate servers. Similarly the Controller node, Compute services, Object Storage services and Block Storage operate on separate nodes. This is done to provide dedicated hardware for each of these services and to reduce contention. The Controller and compute nodes should have 4 1TB physical drives with RAID5 striping. The storage should have 2 500 GB physical disks locally in addition to access to an array of such disks.

Monday, September 29, 2014

#CodingExercise
I apologize for cluttering up my posts with coding exercises, I know its distracting but cannot resist, and hence I'm tagging them so you can skip:
Successor in a binary search tree:
Node GetSuccessor(Node root)
{
if (root == null) return null;
if (root.right)
{
// return tree-minimum
Node current = root.right;
while(current && current.left)
   current = current.left;
return current;
}
Node parent = root.parent;
while(parent && parent.right == root)
{
root = parent;
   parent =parent.parent;
}
return parent;
}

For string matching between a string T of length n and a pattern P of length m, we mentioned Rabin Karp method earlier that brought two improvements over the brute force, one is that we can skip ahead up to a length m on a mismatch and for any spurious hits, a portion could match. We do this with a special hash function that builds up the resulting hash from the hash of the portions.
// after preprocessing
for (int s = 0; s<= n-m; s++)
     if ( pattern == t.substring)
            // do a full comparision of the substring before calling it a match
     if s < n-m
       then update the next substring with d(ts - T[s+1]h) + T[s + m + 1])mod q

KnuthMorrisPratt improves the preprocessing with the knowledge of how to shift the substring.

Finding the convex hull of a set Q of points can be solved with one of the following methods:
1 Rotational sweep methods such as
a) Graham's scan - Graham's scan solves the convex hull problem by maintaining a stack S of candidate points. While the angle formed by the next-to-top, top and pi makes a non-left turn, the stack is popped and the candidate point is pushed on the stack. At the end, the stack is left with exactly the points on the hull in the anticlockwise manner.
b) Jarvis' march builds a sequence of vertices which has the smallest polar angle with respect to the previous point. Similarly p2 has the smallest polar angle with respect to p1 and so on. When we reach the top we start all over and separate the left and the right sequence. The advantage of the separate chains is that we need not compute the angles but just compare it.
2) Incremental method : In the incremental method, the points are sorted from left to right, yielding a sequence. At the ith stage of the convex hull of I-1 points, the sequence is updated according to the ith point from the left.
3) Divide and Conquer method:
In this method the set of n points is divided into two subsets and the convex hull is computed recursively. Then the hulls are combined. Each recursive step takes as input the set P of points, the set X and Y each of which contains the points in P sorted monotonically by X and Y axis coordinates respectively. All the inputs are divided in half to form corresponding left and right sequences. The recursion is to make two calls - one to find the closest pair of points in the PL and the other for finding the same in PR. The merge step occurs by finding the closest pair either from one of the recursive calls with distance delta or by combining one from the PL and the other from the PR that reside in the two times delta vertical strip. In each recursive call, we form a sorted subset of a sorted array Y.
len(YL) = len(YR) = 0;
for I = 1 to len (Y)
     do if Y[I] belongs to PL:
          then len(YL) = len(YL) + 1
                 YL(len(YL) = y[I]
          else
                 len[YR] = len[YR] + 1
                 YR[len[YR]] = Y[I]
4) The prune and search method: This finds the upper portion of the convex hull by repeatedly throwing out a constant fraction of the remaining points until only the upper chain remains. Then the process is repeated for the lower chain.

Node GetPredecessor(Node root)
{
if (root == null) return null;
if (root.left)
{
// return tree-minimum
Node current = root.left;
while(current && current.right)
   current = current.right;
return current;
}
Node parent = root.parent;
while(parent && parent.left == root)
{
root = parent;
   parent =parent.parent;
}
return parent;
}

void PrintFactors(n)
{
// Pollard Rho
int i = 1;
Random r = new Random();
int x = r.Next(0, n-1);
int y = x;
int k = 2;
for (;;)
{
i = i + 1;
x = (x * x - 1) % n;
d = gcd(y - x, n);
if (d != 1 && d != n)
Console.Writeline("{0}", d);
if (i == k)
{
y = x;
k = 2k;
}
}
}

int gcd(int a, int b)
{
// Euclid
if (b == 0)
{
return a;
}
else
{
return gcd(b, a mod b);
}
}

Kerninghan and Pike string matching code:
    /* match: search for regexp anywhere in text */
int match(char* regexp, char* text)
{
if (regexp[0] == '^')
     return matchhere(regexp+1, text);
do {
   if (matchhere(regexp, text))
    return 1;
}while(*text++ != '/0');
return 0;
}

int matchere(char* regexp, char* text)
{
if (regexp[0] == '/0')
    return 1;
if (regexp[1] == '*')
    return matchstar(regexp[0], regexp+2, text);
if (regexp[0] == '$' && regexp[1] == '/0')
    return *text == '/0';
if (*text != '\0' && (regexp[0] == '.' || regexp[0] == *text))
    return matchhere(regexp+1, text+1);
return 0;
}

int matchstar(int c, char* regexp, char* text)
{
do {
if (matchhere(regexp, text))
return 1;
} while (*text != '/0' && (*text++ == c || c == '.'));
return 0;
}

List<List<T>> GetPowerSet(List<T> elements)
{
    if (elements.Count == 0) return new List<List<T>>(){ new List<T>();};
    var cut = elements.GetRange(0,1);
    var remaining = elements.Except(cut);
    var powerSet = GetPowerSet(remaining) ;
    powerSet.Union(Combine(powerset, cut.First()));
    return powerSet;
}

List<List<T>> Combine (List<List<T>> powerSet, T item)
{
if (item)
{
   foreach (var set in powerSet)
      set.Add(item);
}
return powerSet;
}

void Stack::Push(Item item)
{
this.list.InsertAt(0, Item);
}

Item Stack::Pop()
{
this.list.RemoveAt(0);
}

Cloud computing is universally recognized as having the potential to drive greater agility, control and cost savings by providing on-demand access in a user centric environment. What IT is adding to it is a hybrid delivery model that includes a traditional IT, private and public cloud.
In this context, Cloud Management Platform provides a solution that manages cloud access, simplifies and unifies business operations, implements security and compliance, and provides an open, extensible and scalable infrastructure. The speed of delivery has increased dramatically with this new ability. Costs are reduced and control is passed back to the organization instead of the "shadow IT" and improves accountability.
We review what is meant by the open, extensible and scalable infrastructure. Ideally, the management platform for the cloud should be optimized for heterogeneity, multi-hypervisor, multi-vendor environments and multiple public clouds. The adoption of OpenStack as the underlying technology and enabling optimized workload portability by implementing industry standards like TOSCA define openness. Flexibility for integration and extensibility comes from REST APIs with out-of-box integration and extensibility for custom environments enables an ecosystem to evolve.
Heterogeneity with no - vendor lock-in means the infrastructure is scalable.
When we review the management toolset, we notice the following requirements: 1) it has to be a single set of tools 2) simple enough to use and 3) it should work on both traditional IT and cloud.
Some common functions from this management tools include governance, security, compliance, service assurance, patching, service lifecycle management etc.
The toolset should be able to give a complete view of the IT infrastructure. Complete means the toolset includes automation, orchestration, security, performance management and financial management tools. In addition, the cloud management platform should be consistent in meeting business SLAs. These SLAs are met by providing capabilities such as security with the support for automation, compliance and backup, role based access and identity management. Simple code analysis tools go a long way in assuring this. Further, Information correlation and application analysis and network level defense mechanism can thwart or preempt attacks. Some IT tools already work well in this space but the suggestion is to keep it open enough for any tool to provide the functionality or a platform tool that can integrate with others.
Performance monitoring tools is also part of this toolset. Advanced reporting and analytics tools that involve chargeback and license compliance to be reported also complete this toolset.

One more coding question for today :
A list of objects have colors red, white or blue. we need to sort them by the color red, white or blue.

public static SortedSet<Color> SortColor(List<Color> items)
{
// parameter validation
var set = new SortedSet<Color>(new ByColor());
foreach (var item in items)
{
set.Add(item);
}
return set;
}

public class ByColor : IComparer<Color>
{
public int Compare(Color x , Color y)
{
if (x.color == y.color) return 0;
if (x.color == red) return 2;
if (x.color == white)
{
if (y.color == red) return -2;
else return 1;
}
//x.color == blue;
if (y.color == red) return -2
else return -1;
}
}

Saturday, September 27, 2014

Another problem and solution for LeetCode challenge :
Given n, generate all structurally unique BSTs that store value 1 .. n
The answer is said to be Catalan number = (2n)! / (n+1)! (n)!
How do we arrive at it ? Let's take a look:
Consider all possible Binary Search trees with each number as the root. For each of the n nodes as root, there are n-1 non-root nodes. We can partition these non-root nodes into the lower values than the root and the higher values than the root for the left and the right subtree. If we choose say i in a sorted list as the root node, then there are i-1 nodes in the left subtree and n-i in the right subtree.
Thus if we denote the number of possible BSTs with the function t(n), then we can write it as the
sum of i = 1 to n of the product t(i-1) t(n-1). We multiply because the arrangement of the left subtree is independent of the right subtree and for each such arrangement of the two sub-trees, we get one BST with the root as i. Also note that regardless of the values of the elements in the sorted list, the number of unique BSTs depends on the number of elements n.
Therefore we write the code as follows:
int numUniqueBSTs(int n)
{
int count = 0;
if (n == 0) return 1;
for (int i = 1; i <= n i++)
count += numUniqueBSTs(i-1) * numUniqueBSTs(n-i)
return count;
}
The iterative solution is also equally useful:
int numUniqueBSTs(int n)
{
int[] count = new int[n+1];
count[0] = 1;
for (I = 1; I <= n; I++)
for (int j = 0; j < I; j++)
count[I] += count[j] *count(I-j-1)
return count[n];
}

Okay one more coding exercise:
Given an array of positive integers with every element appearing twice except for one, find that one:
int OccursOnce(int[] num)
{
if (num == null || num.length <= 0) return -1;
int candidate = A[0];
for (int i = 1; i < num.length; i++)
{
candidate = candidate ^ num[i];
}
return candidate.
}

My next few posts will be on managing hybrid clouds, PAAS and OpenStack.

Here's another coding exercise:
Convert a sorted list to binary search tree.
Note that the in order traversal of a binary search tree is a sorted list
The caveat is that we can only interpret a balanced binary search tree from the given list.
123456789

     5
  3      8
2  4    7 9
1       6

Node SortedListToBST(List<int> num, int start, int end, Node root)
{

int mid;

if ( (start + end) %2 == 0)
     mid = (start + end ) / 2 + 1;
else
    mid = (start + end) / 2

var node = new Node();
node.data = num[mid];

if (root != null)
if (num[mid] < root.data)
     root.left = node;
else
    root.right = node

node.left = SortedListToBST(num, start, mid-1, node);

node.right = SortedListToBST(num, mid + 1, end, node);

return node;

};

Today we cover one more programming exercise question from the interview problems list:
Given a binary tree, print the zigzag level order.
The binary tree is given as 3, 9, 20, -1, -1, 15, 7 where -1 means null
and the tree is
3
9 20
15 7

We have to return the output as
[ (3),
(20, 9),
(7, 15),
]
public static List<List<int>> GetZigZag(Node root)
{
if (root == null) return null;
var output = new List<List<int>>();
var level = new List<int>();
var Q = new Queue();
Q.Enqueue(root);
Q.Enqueue(null);
while (Q.Count > 0)
{
var item = Q.Dequeue();
if (item == null)
{
output.Add(level);
if (Q.Count == 0) break;
level = new List<int>();
Q.Enqueue(null);
continue;
}
level.Add(item);
if (item.right) Q.Enqueue(item.right);
if (item.left) Q.Enqueue(item.left);
}
return output;
}