Cluster computing

Sunday, August 28, 2016

We talked about graph extensions for relational data here. We now explore whether such an extension is possible when the entire relational database is in memory.

The challenge with graph abstraction of relational data is that it increases the size of the data with persisted predicates in the RDF description of the framework. Unless the entire RDF model is rendered, the graph operations are not executed.

There are two challenges with this:

1) The entire RDF model is expensive to create both in terms of time and space

2) The graphs have to be warm and available in memory for the duration the graph extension is available.

Graph databases that worked with NoSQL option could make use of the following two attributes:

1) Batch processing of the graph with parallel and offline analytics

2) Partitioning of graphs

Instead if we use a streaming model of the graph and provide graph queries that can operate on a stream of graph, then we can provide approximate answers even when operating entirely in memory.

Some Graph databases do not have a query language that supports streaming operations. They merely stream results.For example, cypher query language does not seem to support streams.

Microsoft Stream Insight supports stream processing

#algorithm discussion continued
Today we also continue reading the flow networks. We were looking at max flow min cut theorem and the properties associated with it. We saw how it applied to Ford Fulkerson method. Each iteration of the method augmented the flow by considering a path in the residual network until there is no more. For each path considered, the residual capacity of that path is found and augmented to the flow network.
How the path is chosen is not mentioned by the method. This is usually done with Breadth First Search
When the BFS is used to choose the augmenting path as a shortest path from s to t in the residual network, where each edge has unit distance, the algorithm is then called Edmonds-Karp algorithm
It can be shown that with this algorithm for all vertices, the shortest path distance in the residual network increases monotonically with each flow augmentation.
The use of BFS also bounds the number of iterations of the Edmond-Karp algorithm. This bound is O(VE) for a graph with V vertices and E edges. This is explained in terms of critical edges. An edge (u,v) is considered critical on a path p if the residual capacity of (u,v) is that of the path. We already know that at least one edge on any path must be critical. When a flow is augmented along the critical edge, that path disappears from the residual network because the flow is already considered. It can only appear again if we consider an augmenting path in the reverse direction. We know that the path earlier was one hop away because we used BFS. If the vertex u becomes unreachable from the source again if ever, by BFS this must be yet another hop away from start giving a total of two hops. Therefore that edge becomes critical at most (v-2)/2 times again because there are at most v/2 times that the edge become critical again.Since there are at most E edges, each can become critical at most V/2 times.

#codingexercise
Find minimum difference between any two elements of an array

int GetMinDiff(List<int> nums)
{
int min = INT_MAX;
for (int i = 0; i < nums.count; i++)
for (int j = i+1; j < nums.count; j++)
if (Math.abs(nums[i]-nums[j]) < min)
min =Math.abs(nums[i]-nums[j]);
return min;
}

Find the next greater element to the right of all elements in an array
void prntNextGreater(List<int> nums)
{
for (int i = 0; i < nums.count; i++)
{
int next = INT_MAX;
for (int j = i+1; j < nums.count; j++)
{
if (nums[i] < nums[j])
{
next = nums[j];
break;
}
}
Console.write("{0}-{1}",nums[i], next);
}
}
#puzzle
There are 25 horses among which you need to find out the fastest 3 horses. You can conduct race among at most 5 to find out their relative speed. At no point you can find out the actual speed of the horse in a race. Find out how many races are required to get the top 3 horses.
The answer is 7. The solver uses a process of exclusion. The first five races gives the best in each group and the overall number 1. The next two races give the next fastest horse each Note the selection of horses spans two groups after the first round by excluding the weak horses.

Saturday, August 27, 2016

#codingexercise
Get the maximum height of a common subtree between two binary trees.
There are a few ways to solve this problem such as evaluate each node to be root of a common subtree between both binary trees.
However, we know that when a subtree is common to both binary trees, all the elements in the sub tree will serialize into the same sequence for both binary trees. At that point, the problem decomposes to finding longest common substring between string representations of tree serializations.
void GetMaxHeightCommonSubTree(node tree1, node tree2)
{
var str1 = tree1.serialize();
var str2 = tree2.serialize();
var pattern =GetLCS(str1, str2);
var subtree = pattern.deserialize();
return GetMaxHeight(subtree);
}
string GetLCS(string str1, string str2)
{
int end = 0
int length = 0;
var lcs = new int[str1.count, str2.count]();
for (int i = 0; i < str1.length; i++)
for (int j = 0; j < str2.length; j++)
{
if (i == 0 || j == 0) lcs[i,j] = 0;
else if (str1[i-1] == str2[j-1]){
end = i;
lcs[i,j] = lcs[i-1,j-1] + 1;
length = max(length, lcs[i,j]);
}
else lcs[i,j] = 0;
}
return str1.substring(end-length+1, length);
}

2) Find if there is a path from root to leaf in a binary tree with a given sum

bool hasPathSum(node root, int sum)
{
if (root == null) return sum == 0;
bool ret = false;
if (sum-root.data == 0 && root.left == null && root.right == null) return true;
if (root.left)
ret = ret || hasPathSum(root.left, sum- root.data);
if (root.right)
ret = ret || hasPathSum(root.right, sum -root.data);
return ret;
}

3) the serialization of a binary tree can be level order traversal

Friday, August 26, 2016

Graph plugin for databases

Many companies view graphs as an abstraction rather than an implementation of the underlying database. There are two reasons for this :

1) Key-value stores suffice to capture the same information in a graph and can provide flexibility and speed for operations that can be translated as queries on these stores. Then these can be specialized for the top graph features that an application needs

2) Different organizations within the company require different stacks built on the same logical data for reasons such as business impact, modular design and ease of maintenance.

Even large companies such as Twitter and Facebook don’t necessarily use graph databases for their social network data. They use graph database concepts but their implementations are usually highly specialized and optimized REDIS, MySQL or such other databases.

These companies use a graph query layer that enables them to do things such as graph traversals which are often needed in graph operations but they execute against traditional Relational database management systems.

This abstraction is facilitated by a Resource Description Framework data model which is intrinsically a labeled, directed multi-graph. This kind of modeling is similar to classical data modeling and it makes statements about resources in the form of subject-predicate-object expressions. The subject denotes the resource, and the predicate denotes traits or relationships between the subject and the objects. These expressions are known as triples in RDF terminology and can even be persisted in relational database.

Plugins for conventional relational databases exploit RDF and other data models such as Network data models or a collection of spatial features in their offerings for supporting graph layer as an overlay. A network data model is a property graph model used to model and analyze physical and logical networks. A geospatial feature consists of a geometric representation of the data in some co-ordinate space.

Many companies embrace key value stores over relational data saying that relational data does not scale to arbitrary size without performance degradation. This is not necessarily the fault of the relational database but seems to be from its usage. In fact, large databases of relational data can support recursive queries allowing native and direct translation of graph operations.

Oracle did a major study on its spatial and graph plugin for both relational and NoSQL data. In its approach, it consolidated several data sources or data types such as medical devices, lab information systems, subscription services and legacy records from different data servers such as Lab/Clinical, Research, Content Management, Billings/Claims and Reporting/BI into a central domain vocabulary. This central repository was none other than an integrated graph metadata that was implemented as an RDF Triple Store. They showed that it was scalable to billions of triples with RAC and exadata scalability.
Courtesy : Oracle studies
#codingexercise
Find a subarray with sum k in a list of positive integers

Void sum(list<int> num, int k, ref list<int> candidate, int start)
{
If (candidate.sum() == k) { console.write(candidate.toString()); return;}
If (candidate.sum() > k) return;
If (start == num.count) return;
// with
Candidate.add(num[start]);
Sum(num, k, ref candidate, start+1);
Candidate.removeLast();
// without
Sum(num, k, ref candidate, start +1);
}

Another method to find contiguous sum is as follows:
int GetSumK(List<int>num, int k)
{
if (num.count == 0) return 0;
int curr = num[0];
int start = 0;
for (int i = 1; i <= num.count; i++)
{
while (cur > K && start < i-1)
{
curr = curr - num[start];
start++;
}
if (cur == K)
{
console.write("sum k between {0} and {1}", start, i-1);
return 1;
}
if (i < n)
curr = curr + num[i];
}
return 0;
}

Thursday, August 25, 2016

Today we continue reading the flow networks. We were looking at max flow min cut theorem and the properties associated with it. We saw how it applied to Ford Fulkerson method. Each iteration of the method augmented the flow by considering a path in the residual network until there is no more. For each path considered, the residual capacity of that path is found and augmented to the flow network.
How the path is chosen is not mentioned by the method. This is usually done with Breadth First Search
When the BFS is used to choose the augmenting path as a shortest path from s to t in the residual network, where each edge has unit distance, the algorithm is then called Edmonds-Karp algorithm
It can be shown that with this algorithm for all vertices, the shortest path distance in the residual network increases monotonically with each flow augmentation.
#puzzle
There are 3 ants sitting on three corners of a triangle. All ants randomly pick a direction and start moving along edge of the triangle. What is the probability that any two ants collide?
There are two choices for each ant that makes the total number of possibilities as 2 ^ 3 = 8
Out of these only two - clockwise and anticlockwise movements will not result in a collision. So the answer is 6/8
#coding interview
How do we sort an array consisting of three elements 0,1, 2 without counting ?
We use the Dutch National Flag Algorithm
This algorithm is described as
Divide the array into four regions:
1 to Low -1 as all zeros
Low to Mid - 1 as all ones
Mid to High as unknown
and High+1 to N as twos

Initially Low is set to 1, Mid is set to 1 and High to N and we shrink the unknown by swapping the mid element with the corresponding region.
while mid <= Hi
if A[mid] == 0 Swap(A,low,mid) low++ mid++
if A[mid] == 1 Mid++;
if A[mid] == 2 Swap(A,mid, high) high--

void SortThrees(List<int> nums)
{
assert(nums.all(x => x == 0 || x == 1 || x == 2));
int N = nums.Count;
int low = 1;
int mid = 1;
int high = N;
while ( mid <= high)
{
switch(A[mid-1])
{
case 0: {Swap (ref nums, low-1, mid-1); low++; mid++; break;}
case 1: { mid++, break;}
case 2: {Swap(ref nums, mid-1, high-1); high--; break;}
default:
raise Exception("Invalid element");
break;
}
}
}

Wednesday, August 24, 2016

There are other approaches possible to merging the Binary Search Tree

1) Add the elements of one tree to the other one at a time by moving the root until there are no more nodes in the tree

2) Transform one of the trees into inorder and insert the items into the other tree.

3) Jointly traverse both trees to find the minimum in either tree, delete it and add to a new BST

Deleting a node from a tree

Node TreeDelete(ref node root, node z)

{

If (z==null) return NULL;

If (root == null) return NULL;

Node y = null;

If (z.left == null || z.right ==null)

Y = z;

Else

Y = TreeSuccessor(ref node, z) ;

X = null ;

If (y.left != null)

X = y.left ;

else

x = y.right ;

If (x != NULL)

x.parent = y.parent ;

If (y.parent == NULL)

Root = x ;

Else if (y = y.parent.left)

then

y.parent.left = x ;

else

y.parent.right = x ;

if (y != z )

then z.data = y.data ;

return y

}

void TreeInsert(Node tree1, Node z)

{

Node Y = NULL;

Node x = tree1;

While (x != NULL){

Y = X;

If (z.data < x.data)

X= x.left;

Else

X = x.right;

z.parent = y;

if (y == NULL)

tree1 = Z;

else if (z.data < y.data)

z = y.left;

else

z = y.right;

}

Node Merge(Node tree1, Node tree2)

{

Node current = tree2;

While (current)

{

Var root = TreeDelete(ref tree2, current);

TreeInsert(ref tree1, root);

Current = tree2;

}

Return tree1;

}
Node TreeSuccessor(Node x)
{
If (x == null) return null;
If (x.right) return TreeMinimum(x.right);
Node y = x.parent;
While(y != null && y.right == x){
X = y;
Y = y.p;
}
Return y;
}

Node TreePredecessor(Node x)

{

If (x == null) return null;

If (x.left) return TreeMaximum(x.left);

Node y = x.parent;

While(y != null && y.left == x){

X = y;

Y = y.p;

}

Return y;

}

Node TreeMinimum(node root)
{
If(root == null) return null;
Node current = root;
While(current && current.left)
Current = current.left;
Return current;
}

Node TreeMaximum(node root)
{
If(root == null) return null;
Node current = root;
While(current && current.right)
Current = current.right;
Return current;

}

Tuesday, August 23, 2016

#codingexercise

Merge two Binary Search Trees(BST)

Node Merge(Node tree1, Node tree2)

{

var in1 = new List<int>();

InOrder(Node tree1, ref in1);

var in2 = newList<int>();

InOrder(Node tree2, ref in2);

var result = in1.Merge(in2);

return ToBST(result, 0, result.count-1);

}

void InOrder (Node root, List<int> tree)

{

if (root == null) return;

InOrder(root.left);

tree.add(root.data);

InOrder(root.right);

}

Node ToBST(List<int> nums, int start, int end)

{

if (start > end) return null;

int mid = (start + end) / 2;

var root == new Node();

root.data = nums[mid];

root.left = ToBST(nums,start, mid-1);

root.right = ToBST(nums, mid+1, end);

return root;

}

Another method to do this is to use two auxiliary stacks for two BSTs. If we get a element smaller from the trees, we print it. If the element is greater, we push it to the stack for the next iteration.

We use the iterative inorder traversal :

- Create an empty stack S

- Initialize current node as root

- Push the current node to stack S, set current to current.left, until current is NULL

- If the current is null and stack is not empty then

o Pop the top item from stack

o Print the popped item

o Set current = popped_item.right

o Goto step 3

- If current is null and the stack is empty, then we are done.

To merge the BST we do the following:

Initialize two stacks s1 and s2 from null.

If the first tree is null, do iterative inorder traversal of second to print the merge

If the second tree is null, do iterative inorder traversal of first to print the merge

Set current in tree 1 to be cur1 and current in tree 2 to be cur2

While either cur1 or cur 2 is not null or one of the stacks is not empty:

- If cur 1 or cur 2 is not null

o For both the tree, if the current is not null, push it onto the stack, set current to current.left This will be repeated by the while until current is nul is null

- Else

o If either of the stacks is empty, then one tree is exhausted, print the other tree. Do this for both stacks.

o Set the current of each tree to the popped element from respective tree

o Compare the two current and execute an iteration of step3 from iterative inorder traversal above - the smaller of the two current nodes is printed, the larger pushed back on its corresponding stack

At the end of the while loop, both stacks should be empty and the currents must be pointing to null and both the trees would be printed.

There are other approaches possible:

1) Add the elements of one tree to the other one at a time by moving the root until there are no more nodes in the tree
2) Transform one of the trees into inorder and insert the items into the other tree.
3) Jointly traverse both trees to find the minimum in either tree, delete it and add to a new BST