Saturday, April 30, 2016

#palindrome lengths in a continuous sequence:
The key observation here is that the lengths of the palindrome upto those found to a given center are also palindrom-ic. Therefore we look for the next center based on the lengths we have found already.
The suggestion is that the lengths array we construct will have values similar to the palindrom-ic nature of Pascal's triangle.
Therefore the algorithm is:
Initialize the lengths array to the number of possible centers
Set the current center to the first center
Loop while the current center is valid:
Expand to the left and right simultaneously until we find the largest palindrome around this center
Fill in the appropriate entry in the longest palindrome lengths array
Iterate through the longest palindrome array backwards and fill in the corresponding values to the right of the entry for the current center until an unknown value (because we don't have sufficient elements in the string to make an interpretation on the length) is encountered.
Set the new center to the index of this unknown value
Return the lengths array
Courtesy: Fred Akalin 2007

If we have a continuous stream of letters, we can find the palindrome for the string upto that character and repeat.
Otherwise the method above is also online in its processing.

#codingexercise2
There are two sorted arrays A and B of size m and n respectively. Find the median of the two sorted arrays. The overall run time complexity should be O(log (m+n)).
This is equivalent to finding the kth element where k = (m+n)/2

if (m+n)%2 != 0
return findkth(A, B, (m+n)/2, 0, m-1, 0, n-1);
else
return (findkth(A,B, (m+n)/2, 0, m-1, 0, n-1)
+ findkth(A,B, (m+n)/2 - 1, 0, m-1, 0 , n-1)) / 2;

int findKth(A, B, k, astart, aend, bstart, bend)
{
int alen = aEnd - aStart + 1;
int blen = bEnd - bStart + 1;

// special cases
if (aLen == 0)
return B[bStart + k];
if (bLen == 0)
return A[aStart + k];
if (k == 0)
return A[aStart] < B[bStart] ? A[aStart] : B[bStart];

int aMid = (aLen* k)/ (aLen +bLen);
int bMid = k - aMid - 1;

aMid = aMid + aStart;
bMid = bMid + bStart;

if (A[aMid] > B[bMid])
{
k = k - (bMid- bStart +1);
aEnd = aMid;
bStart = bMid + 1;
}else{
k = k - (aMid - aStart + 1);
bEnd = bMid;
aStart = aMid + 1;
}
return findKth(A, B, k, aStart, aEnd, bStart, bEnd);
}

or we can merge the two arrays and find the middle element.

Friday, April 29, 2016

#coding questions

#1) Merge two sorted arrays into one
int[] Merge(int[] first, int second)
{
int i = 0; // first index
int j = 0; // second index
int k = 0; // merged index
var result = new int[first.Length + second.Length];
while ( i < first.Length && j < second.Length ){
if ( first[i] < second[j]){
result[k] = first[i];
i++;
k++;
}else{
result[k] = second[j];
j++;
k++;
}
}
while (i < first.Length )
{
result[k] = first[i];
i++;
k++;
}
while (j < second.Length)
{
result[k] = second[j];
j++;
k++;
}
return result;
}

#2) Find the weight of each subtree in a weighted tree
- Traverse each subtree and accumulating the (level * weight) for each nodes in that subtree.

int GetSubTreeWeights(Node root, int level)
{
assert ( level > 0);
if (root == NULL) return 0;
int wt = root.weight x level;

for (int i = 0; i < root.siblings(); i++)
wt += GetSubTreeWeights(root.siblings[i], level + 1);
return wt;
}

for (int i = 0; i < root.siblings(); i++)
Console.WriteLine("Weight of subtree with root at {0} is {1}",
root.siblings[i].toString(),
GetSubTreeWeights(root.siblings[i], 2));

Alternatively the tree can be traversed level wise and then the sum computed linearly.

#3) Edit Distance:
Given a starting word and an ending word, find the edit distance.
Edit distances are computed using the corresponding operations for insertion, deletion and substitution. If we represent edit distance with f(i,j) where between the substring of the first word ending in j'th character and the substring of the second word ending in with ith character,
Then
f(i,0) = i
f(0,j) = j
and f(i,j) == f(i-1,j-1) when the character is the same
and f(i,j) = f(i-1, j-1) + 1 for substitutions
and f(i,j) = f(i-1,j) + 1 for insertion of jth character into the first
and f(i,j) = f(i,j-1) +1 for deleting the nth character of the first string
We can summarize that by saying the edit distance is the minimum of f(i-1,j) + 1, f(i,j-1) + 1 and f(i-1,j-1) + 1. It is max(i,j) if min(i,j) = 0;

We create a table with the characters of the first string as columns and those of the second string as rows and use the substring to compute the edit distances of any pair of substrings.
The first row and first column are all zero because they compare against an empty string
At each additional row, we use the value from the previous column or row, and find the minimum value from the three operations - insertion, deletion and substitution.The last cell gives the edit distance of the first string with the second string.

int GetEditDistance(List<char> s, List<char> t)
{
var d = new int[s.Count+1, t.Count+1]; // initialized with zero
for (int i =1; i< s.Count; i++)
d[i,0] = i;
for (int j = 1; j < t.Count; j++)
d[0,j] = j;
for (int i = 1; i < s.Count; i++)
for (int j = 1; j < t.Count; j++)
{
int sub = 1;
if (s[i] == t[j]) sub = 0;
d[i,j] = min( d[i-1.j-1] + sub,
d[i-1,j] + 1, // for insertion
d[i, j-1] + 1); // for deletion
}

return d[s.Count, t.Count];
}

#palindrome lengths in a continuous sequence:
The key observation here is that the lengths of the palindrome upto those found to a given center are also palindrom-ic. Therefore we look for the next center based on the lengths we have found already

Wednesday, April 27, 2016

Find missing numbers in a contiguous sequence and express them as a string of ranges.
For example,
124569 should output 3,7-8
String GetMissingNumbers(List<int> numbers)
{
string ret = string.empty;
numbers.sort();
if ( numbers.Count() <= 1) return ret;
for (int i = 1; i < numbers.Count(); i++)
{
if (numbers[i] - numbers[i-1] == 2)
ret += (numbers[i-1] + 1).toString() + ",";
if (numbers[i] - numbers[i-1] > 2)
ret += (numbers[i-1] + 1).toString() + "-" + (numbers[i]-1).toString() + ",";
}
return ret;
}
The above is O(n)
We can make it O(log n) with below:

void GetMissing(List<int> numbers, ref List<string> result, int start, int end)
{
if (start < end)
{
mid = (start + end)/2;
GetMissing(numbers, ref result. start, mid);
GetMissing(numbers, ref result, mid+1, end);
Merge(numbers, result, start, mid, end);
}
}
void Merge(List<int> numbers, ref List<string> result, int start, int mid, int end)
{
// constant time operation
if (numbers[mid] - numbers[start] == mid - start &&
numbers[end] - numbers[mid+1] == end-(mid+1) &&
numbers[mid+1] - numbers[mid] <= 1) return; // assume sequence is already sorted
if (numbers[mid+1] - numbers[mid] == 2)
result.Add (numbers[mid] + 1).toString();
if (numbers[mid+1] - numbers[mid] > 2)
result.Add(numbers[mid] + 1).toString() + "-" + (numbers[mid+1]-1).toString() ;
}

result.Join(",")

Today we continue the set of coding exercise with different orders of traversal. This time we include levelwise traversal with InOrder. Again we fallback on our argument that the left and right tree need to be separated by root node in at least one sequence to make it easier for the tree to be rebuild.

A

             B          C

        D     E         F   G

H    I J   K        L   M    N    O
Pre: A B D H I E J K C F L M G N O
Post: H I D J K E B L M F N O G C A
In: H D I B J E K A L F M C N G O

Level NULL A NULL B C D NULL E F G H NULL I J K L M N O NULL

We iterate the nodes level wise so we know that the parents are traversed before the children are.
for any node we try to rebuild, we have the inorder give us the left-subtree + root + right-subtree going forward. Each sub-tree is found till a node upto a parent . For example, E has sub trees in the inorder upto B on one side and upto A on the other side. A has subtrees to the extremes because it is the root. The siblings are found in their respective subtrees as occuring in the next level.

Node Rebuild(List<Node> level, List<node> in) // All nodes in list initialized with left and right = null and set only with data
{
assert( level!= null && in != null);
assert (level.Count() == in.Count());

for(int i = level.Count(); i > 0; I--)
{
Node root = level[I];
int inIndex= Array.indexOf(in, level[i]);
Node left = in.First( x => Array.indexOf(level, x) < i);
Node right = in.First( x => Array.indexOf(level, x) > i );
int left = Array.indexOf(in, left);
int right = Array.indexOf(in, right);
int left-tree-len = inIndex - left - 1;
int right-tree-len = inIndex - right - 1;
Node leftSibling = findNextLevelOccurrenceFromSubtree( level, in, i, left, -1, inIndex);
Node rightSibling = findNextLevelOccurrenceFromSubtree(level, in, i, -1, right, InIndex);
root.left = leftSibling;
root.right = rightSibling;
}
return level[0];
}

Node findNextLevelOccrrenceFromSubtree(
List<Node> level, // the level list
List<Node> in, // the in list
int current, // the index in level of the current element
int left, // left most position of subtree in inorder
int right, // left most position of subtree in inorder
int inIndex, // left most position of subtree in inorder
)
{
while( current != null) current++;
current++;
if (left > 0){
for (int i = inIndex; i >= left; i--) // left subtree only
for (int j = current; j != NULL; j++) // next level only
if (in[i] == level[j])
return in[i];
}
if (right > 0)
{
for (int i = inIndex; i <= right; i++) // left subtree only
for (int j = current; j != NULL; j++) // next level only
if (in[i] == level[j])
return in[i];
}
return NULL;
}

Tuesday, April 26, 2016

Yesterday we were discussing rebuilding a tree from post order and in order traversals.
Today we see if we can rebuild a tree from preorder and post order:
We make the following observations:
1) the roots are found top down by walking the post order list in the reverse manner or the preorder in the forward manner
2) the length of the left subtree for any root in preorder is given by the number of nodes between its position in the postorder and the start for the subtree of the parent.
3) the position of the start of the subtree of the parent is given by the start of the post order or the start of the sibling in the post order depending on whether it is the left or the right subtree
4) In general, we can rebuild the tree if we know the sizes of either left or right subtree consistently. We don't need both but if we do that's great.
5) the length of the right subtree is the distance between the the position of the left sibling and the root in the post order.
6) The subtrees may be null and we just need to be mindful of the size but not need to check it upfront.
7) Since the items 2) and 5) indicate that we need to the postion of a sibling for at least the root of the whole tree to begin with, it is important to identify one of the siblings first.
8) if the element next to the root in pre is the same as the element previous to the same root in post, then one of the subtrees is null.
9) Since we cannot say which of the subtree is null from 8), this problem is not possible to solve.

Consider:
Pre: AB
Post BA
and
Pre: AC
Post: CA

A

             B          C

        D     E         F   G

H    I J   K        L   M    N    O
Pre: A B D H I E J K C F L M G N O
Post: H I D J K E B L M F N O G C A
In:    H D I B J E K A L F M C N G O

Here too we see that unless the left subtree and the right subtree are listed once on both sides of the root, the solution becomes difficult. Therefore inorder traversal is needed.

#coding exercise

int GetSmallestNumberListed(string number, int n) // assume unique

{

if (String.IsNullorEmpty(number) || n <= 0) return null;

for (int I =0; I < numbers.Length; I++ )

{

if (Char.IsDigit(numbers[I]) == false)

{

return -1;

}

string sorted = number.sort();

int returnLength = number.Length - n;

if (returnLength <= 0) return -1;

sorted = sorted.substring(0, returnLength);

String result = null;

for (int I = 0; I < numbers.Length; I++)

{

var candidate = string.Empty;

if (sorted.Contains(numbers[I]))

{

candidate += numbers[I];

for (int j = I+1; j < numbers.Length; j++)

{

if (sorted.contains(numbers[j]))

{

if (candidate.Length >= returnLength) break;

candidate += numbers[j];

}

if (result == null)

{

result = candidate;

}

if (candidate.Length == returnLength && String.CompareOrdinal(candidate, result) <= 0)

{

result = candidate;

}

return result;

}

#codingexercise
int GetMissingSerialNumber(List<int> num)
{
int sum_of_n = num.Count x (numCount + 1)/2;
return sum_of_n-num.sum();
}

The sorting would help narrow down the range with the missing numbers logarithmically if there were more than one.
int binarysearch(List<int> num, int target, int start, int end)
{
assert (start <= end);
int mid = (start + end)/2;
if (num[mid] == target) return mid;
if ( mid == start) return -1;
if (num[mid] < target)
return binarysearch(num, target, mid + 1, end);
else
return binarysearch(num, target, start, mid-1);
}

Monday, April 25, 2016

Yesterday we were discussing rebuilding a tree from post order and in order traversals.
We made the following observations:
1) the roots are found top down by walking the post order list in the reverse manner
2) the length of the right subtree for any root is given by the number of nodes between its position in the inorder and that of its parent.
3) the position of the left sibling in post order is at an offset equal to the length of the right subtree to the left of the root in post order. The position of the right sibling is immediately before the root.
4) the length of the left subtree is the distance between the the position of the right sibling and the root in the in order.
6) The subtrees may be null and we just need to be mindful of the size but not need to check it upfront.

A

             B          C

        D     E         F   G

H    I J   K        L   M    N    O
Post: H I D J K E B L M F N O G C A
In:    H D I B J E K A L  F M C N G O

Node Rebuild(List<Node> post, List<node> in) // All nodes in list initialized with left and right = null and set only with data
{
assert( post != null && in != null);
assert (posts.Count() == in.Count());

for(int i = post.Count()-1; I >= 0; I--)
{
Node root = post[i];
int inIndex= Array.indexOf(in, post[i]);
if ( I > 0 && I < post.Count())
{
Int parent = findParentIndex (Post, i); //parent occurs to the right of root in post with left,right set
Int parentin = in.Count();// defaults to in.Count()
if (parent < post.Count())
parentin = Array.indexOf(in, post[parent]);
Int righttreelen = parentIn-inIndex-1;
if (i-righttreelen-1 > 0)
Root.left = post [i-righttreelen-1];
Int right = i-1;
if (right > 0)
Root.right = post [right];
}
return post[post.Count() - 1];
}

int findParentIndex(List<node>post, int current)
{
int target = post.Count();
for (int i = current+1; i< post.Count(); i++)
{
if ( post[i].left == post[current] ||
post[i].right == post[current]){
target = i;
break;
}
}
return target;
}

For performance gain, we also made the observation that the length of the right subtree need to be calculated only when we know that the right subtree is not null. We already know this if the index of the previous element (right sibling) in post happens to have an inorder index less than that of the root.

Sunday, April 24, 2016

Today we look at another coding problem along the same lines as the one we have seen in the previous post and the one before.
This time we have

// Postorder   C D B F E A
// Inorder      C B D A E F
// Build tree

We use relative comparisons which has worked for us before. Let us take the example of our elemental subtree of a root with two siblings.
   J
K L

1) All three nodes
    Post: K L J
    InOrder: K J L
2) Only left -subtree
    Post: K J
    In: K J    // L is missing because the previous element in post order lies within inIndex < that of J
3) Only right-subtree
    Post: L J
     InL J L // K is missing when no elements appear before J whose inIndex < that of J
4) only root
Post:     Prev root Next
In :        Prev some-others root yet-others Next
4) is not differentiable from 2)

In general, with arbitrary sub-trees for siblings
// left-subtree missing when
Previous node in pre appears to the rightof inIndex of root in inorder
// right-subtree missing when
previous element is left subtree in post order.

the sizes of the left and right sub tree can be kept track of because we use the same predecessor technique

                    A

             B          C

        D     E         F   G

H    I J   K        L   M    N    O
Post: H I D J K E B L M F N O G C A
In:    H D I B J E K A L  F M C N G O

Node Rebuild(List<Node> post, List<node> in) // All nodes in list initialized with left and right = null and set only with data
{
assert( post != null && in != null);
assert (posts.Count() == in.Count());

for(int i = post.Count()-1; I >= 0; I--)
{
Node root = post[i];
int inIndex= Array.indexOf(in, post[i]);
if ( I > 0 && I < post.Count())
{
//todo
int inIndexPrev = Array.indexOf(in, post[I-1]);
// Ignore comments below
// Ignore : Instead of taking subtree lengths, can we detect if subtrees are null from pattern
//Ignore: int inIndexNext = Array.indexOf(in, post[I+1]);
//Ignore: if both left and right subtree is null, handle it separately,
// Ignore: if (in.Count() - 1 - inIndex > 0)
// Ignore:if (inIndexPrev < inIndex && inIndexNext > inIndex) continue;
//otherwise keep track of right subtree length as the differences in inorder indexes

Int parent = findParentIndex (Post, i); //parent occurs to the right of root in post with left,right set
Int parentin = Array.indexOf(in, post[parent]); // defaults to in.Count()
Int righttreelen = parentIn-inIndex-1;
if (i-righttreelen-1 > 0)
Root.left = post [i-righttreelen-1];
Int right = i-1;
if (right > 0)
Root.right = post [right];

}
return post[post.Count() - 1];
}

The recursive solution online is as follows:
public TreeNode buildTree(int[] inorder, int inStart, int inEnd,
int[] postorder, int postStart, int postEnd) {
if (inStart > inEnd || postStart > postEnd)
return null;

int rootValue = postorder[postEnd];
TreeNode root = new TreeNode(rootValue);

int k = 0;
for (int i = 0; i < inorder.length; i++) {
if (inorder[i] == rootValue) {
k = i;
break;
}
}

root.left = buildTree(inorder, inStart, k - 1, postorder, postStart,
postStart + k - (inStart + 1));
// Becuase k is not the length, it it need to -(inStart+1) to get the length
root.right = buildTree(inorder, k + 1, inEnd, postorder, postStart + k- inStart, postEnd - 1);
// postStart+k-inStart = postStart+k-(inStart+1) +1

return root;
}

Saturday, April 23, 2016

Today we improve the performance of the previous code sample shown to reconstruct the tree from the preorder and inorder traversals.
The key to improving performance is that we do that many steps as it takes to rebuild the tree and as cheaply as possible. while this might sound cliched, it may translate to our problem this way. Are there any redundant operations in the previous code shown ? The answer is yes, We do repeated Array.IndexOf ? How do we avoid this?
Because we deal with a tree, we know all test cases can be covered by the elementary tree with a single node and two siblings where one or more nodes can be null.
Consequently, we take node J as root with left K and Right L
we have

1) Both siblings:
Pre: JKL
In: KJL
2) Right sibling only:
Pre: JL
In: JL
3) Left Sibling only
Pre: JK
In: KJ

When the siblings are trees
4) No leftchild:
Pre: J some-other-node
In: J intermediary-nodes some-other-node
5) No right child:
Pre: J some-other-node
In: some-other-node intermediary-nodes J

Therefore we can use relative positions and indexes instead of absolute indexes to optimize the solution. We make the following observations:
The next element to a node in preorder has an index lesser than its own in the in order and this implies the next element is a left child.
The next element to a node has an immediate next index in the inorder as well then the next element is a right child
The next element to a node has an index greater than its own + 1, then it has no right child.
The previous element to a node in pre order occurs as the next element in the in order means that the node has no left child

Node Rebuild(List<Node> pre, List<node> in) // All nodes in list initialized with left and right = NULL and set only with data
{
assert( pre != NULL && in != null);
assert (pre.Count() == in.Count());

for(int i = 0; i < pre.Length; i++)
{
Node root = pre[i];
int inIndex= Array.indexOf(in, pre[i]);
if (i+1 < pre.Length && inIndex+1<in.Length)
{
//todo:
//set root.left
bool left = true;
for (int j =inIndex; j< in.Count(); j++)
if (in[j] == pre[i+1])
left = false;
if(left != false)
root.left = pre[i+1];
//set root.right
bool right= true;
for (int j =inIndex; j>= 0 j--)
if (in[j] == pre[i+1])
right = false;
if(right != false){
Node target = in[inIndex-1];
for(int j =i+1; j<pre.Count();j++)
if (pre[j] == target)
root.right = pre[j];
}
}
}

return pre[0];
}

Some can claim that yesterday's approach was development oriented and today's approach is test-oriented and there may be some truth in that there are entrenched mindsets along those lines but the reality is that today's approach is leaned on for real world data.

Friday, April 22, 2016

Today we look at using the Binary tree traversals to reconstruct the tree. We mentioned in yesterday's post that the preorder traversal and the inorder traversal can be used to determine the length of the left and right subtree of any node at the same time. Let us take a look at the solution now.

// Preorder A B C D E F
// Inorder C B D A E F
// Build tree

Node BuildTree(char[] Preorder, char[] InOrder, int index = 0)
{
Node root = new Node();
root.data = PreOrder[index];
root.left = null;
root.right = null;

int inIndex = Array,indexOf(InOrder, Preorder[index]);

if ( index+1 < Preorder.Length &&
IsLeftSubtree(Preorder[index+1], inOrder, inIndex) == true)
root.left = BuildTree(Preorder, InOrder, index + 1);

if ( inIndex+1 < InOrder.Length &&
isPredecessor(InOrder[inIndex+1], Preorder, index) == false)
root.right = BuildTree(Preorder, InOrder, Array.IndexOf(PreOrder, InOrder[inIndex + 1]));

return root;
}

bool is Predecessor(char c, char[] Preorder, int index)
{
return Array.IndexOf(Preorder, c) < index;
}

bool isLeftSubtree(char c, char[] inOrder, int index)
{
Array.IndexOf(Inorder,c) < index;
}

or iteratively as follows:

Node BuildTree(List<Node> pre, List<Node> in) // Node.left = null; Node.right = null;
{
for (int i = 0; i < pre.Length; i++)
{
int index = i;
int inIndex = Array,indexOf(in, pre[index]);
if ( index + 1 < pre.Length &&
Array.IndexOf(in, pre[index+1]) < index)
pre[i].left = pre[index+1];
if (inIndex+1 < in.Length &&
Array.IndexOf(pre, in[inIndex+1]) >= index)
pre[i].right = Array.IndexOf(pre, in[inIndex+1]);
}
return pre[0];
}

Thursday, April 21, 2016

Today we look at a few more coding problems. Yesterday we said the traversal orders can be used to serialize the tree. If we are given a preorder traveral of a tree, can we verify that the tree is serialized without reconstructing? This means it can be linear in time complexity.
Let us take an example:
9
3 2
4 1 # 6
is serialized as 9,3,4,#,#,1,#,#,2,#,6,#,#

Note that we can remove the right most vertices if it has two null leaves and substitute it with null.
If this is mot the case then the parent has a left subtree and we start to do the same with the left sub tree until finally there is only one node remaining and it is null.

bool IsValidBT(List<Node> nodes)
{
bool isValid = false;
int i = Nodes.Count-1;
while (i >= 3)
{
if (Nodes[i] == null && Nodes[i-1] == null && Nodes[i-2] != null){
nodes.RemoveRange(i-2, 3);
nodes.Insert(i-2, null);
i = i - 3;
continue;
}
if(Nodes[i] != null){
if (Nodes[i+1] == null && Nodes[i+2] == null){
nodes.RemoveRange(i, 3);
nodes.Insert(i, null);
}
i=i-1;
continue;
}
i = i -1;
}
if (i ==0 && nodes[i] == null) isValid = true;
return isValid;
}

If we have a binary search tree and we have an inorder and preorder traversals,
then the two traversals can be used to simultaneously find the left and right subtree length of each node.

Wednesday, April 20, 2016

Today we continue more coding puzzles reviews. One of the problems is about serializing and deserializing a binary tree.
There are many ways to do it. Usually two different orders of traversal of a binary tree can give all the information to deserialize the binary tree. They are also efficient because each node is visited once during a traversal. Another way to do this would be to print the tree level wise with null nodes for missing child nodes of the tree. The level wise ensures one serialization result.
void serialize(Node root, Node delimiter, Node Empty)
{
var q = new Queue<Node>();
q.Enqueue(root);
q.Enqueue(delimiter);

var node = q.Dequeue();
while (node)
{
if (root.left)
      q.Enqueue(root.left) ;
else
      q.Enqueue(Empty); // new node

if (root.right)
      q.Enqueue(root.right);
else
      q.Enqueue(Empty); // new node

node = q.Dequeue();

Console.Write(node.toString());

if (node == delimiter) {q.Enqueue(delimiter);}

}
}

The deserialization now will find all the left or child nodes for every parent nodes even if they are null.

void deserialize(List<Node> nodes, Node delimiter)
{
int level = nodes.Index(nodes.First(x => x == delimiter));
level++;
for (int I = 0; I < nodes.Count() && level < nodes.Count(); I++)
{
   if (nodes[I] != delimiter)
   { nodes[I].left = nodes[level]; level++;
      nodes[I].right = nodes[level]; level++;
   }
   else
        assert nodes[level] ==delimiter
        level++;
}
}

Let's take an example:
1
2 3
4 5
1D23DEE45

Tuesday, April 19, 2016

Today we continue to review a coding problem.
The problem statement:
There are n bulbs that are initially off. You first turn on all the bulbs. Then, you turn off every second bulb. On the third round, you toggle every third bulb (turning on if it's off or turning off if it's on). For the ith round, you toggle every i bulb. For the nth round, you only toggle the last bulb. Find how many bulbs are on after n rounds.
for example if there are five bulbs:
Initially [ 0, 0, 0, 0, 0]
Round 1 [ 1, 1, 1, 1, 1]
Round 2 [ 1, 0, 1, 0, 1]
Round 3 [ 1, 0, 0, 0, 1]

Round 4 [ 1, 0, 0, 1, 1]

Round 5 [ 1, 0, 0, 1, 0]

The solution:

int bulbSwitch(int n){
return Math.sqrt(n);
}

The reasoning:
A bulb ends up on only if it is switched an odd number of times.
bulbs 1 to n has a bulb i that is switched in round d only if d divides i.
bulb i is in a switched on state only if it has an odd number of divisors.
but divisors come in pairs except for squares and pairs returns to original state
so only squares leave a toggled state and the initial state is all off
therefore we count only the squares uptil and inclusive of the number
Thus the answer is Math.sqrt(n)

courtesy: Stefan Pochmann

We implement square root with gradient method. If y ^ 2 = x then y = x / y
And we iteratively improve the approximation
Double SqRt (int n)
{
Double g = n/2;
Double y = n/g;
While(abs (y-g) > 0.001)
{
g = (g + n/g) /2;
y = n / g;
}
Return y;
}

Monday, April 18, 2016

Today we continue to review a few more dynamic programming problems. We discussed coin counting problem. Let us look at a variation that involves maximum number of ways in which coins can make a sum. We assume an infinite supply of coins and we could this by backtracking.
void CoinCount(ref List<int> coins, ref List<int>change, int amount, int start, int level)
{

for (int I = start; I < amount; I++)

{

for (int j = 1; i < coins.Count() && j <= amount/coins[I]; j++)

{

for (int k = 0; k < j; k++){
       change[level+k] = coins[i];
}

if (change.sum() == amount) Console.WriteLine("{0} with {1} elements", change.ToString(),
change.Count());

if (change.sum() < amount)
      CoinCount( ref coins, ref change, n, start+1, level+j);

for(int k = 0; k <j; k++){
      change[level+k] = 0;
}
}
From all the change set printed, we can find the the total count as the answer.
Note that we cannot apply the knapsack solution to another coin counting problem earlier we cannot apply a top down greedy choice method because we don't have a criteria to decide upfront how to generate a sequence that satisfies the sum.

The dynamic programming approach builds the solution in a bottom up manner. For each of the given denominations, either the denomination is part of the solution or it isn't. If the denomination is part of the solution, then the solution sub problem shrinks the sum otherwise it shrinks the denominations. If the denomination is part of the solution, then it shrinks the sum from any one of 1 to amount/denomination occurrences. Since there are overlapping subproblems, we could do well to reuse already memoize the computations.

We now look at a different problem which is to reconstruct itinerary. Itineraries consist of source destination pairs. They belong to the same traveler so they all connect. Build the itinerary starting with JFK and in the smallest lexical order.

List<string> GetItinerary(List<Tuple<string, string>> segments)
{
segment.sort(); // based on source airport comparator

var startItem = segments.find("JFK");
segments.Remove(startItem);

var ret = new List<string>(){startItem.Item1, startItem.Item2};
while (segments.IsEmpty() == False)
{
startItem = segments.find(startItem.Item2);
ret.Add(startItem.Item2);
segments.Remove(startItem);
}
return ret;
}

Allocate minimum number of pages to students
N books
Pi pages i = 1 to N
M students
int GetMinPages(int N, int M, List<int>Pi)
{
assert (Pi.Count() == N);
if (N == 0) return 0;
if (M == 0) return INT_MAX;
min = 0;
for (int i =0; i < N/M;i++)
int val = GetMinPages(N-i+1, M-1, Pi.Range(i+1,N-i+1)) + Pi.Range(0,i).Sum();
if val < min
min = val;
return min;
}

Sunday, April 17, 2016

Today we review the code for unit and fractional knapsack problems. In these problems, a thief is trying to fill his bag with as much value of items with as little weight as possible. The solution looks like this for the unit case:
recursive solution
/ Input:
// Values (stored in array val)
// Weights (stored in array w)
// Number of distinct items (n)
// Knapsack capacity (W)
int knapsack( int W, int[] wt, int[] val, int n)
{
if (n== 0 || W == 0)
return 0;
if (wt[n-1] > W)
return knapsack(W, wt, val, n-1);

return max(
val[n-1] + knapsack(W-wt[n-1], wt, val, n-1),
knapsack(W, wt, val, n -1)
);
}
In the fractional case, we have the following:
int fractional-knapsack( int W, int[] wt, int[] val, int n) // wt is sorted based on val[i]/w[i]
{
if (n== 0 || W == 0)
return 0;

return max(
val[n-1] + fractional-knapsack(W-wj + wj -w, wt, val, n),
val[n-1] + fractional-knapsack(W-wj, wt, val, n -1),
fractional-knapsack(W, wt, val, n-1)
);
}
which we can further improve by leaving the dynamic programming approach and applying the top down greedy choice property
wt.sort() // based on val[i]/wt[i]
n = wt.Count()
int fractional-knapsack-greedy-recursive(int W, int[] wt, int[] val, int n)
{
if (n== 0 || W == 0)
return 0;

// first the selection
if (wt[n-1] > W)
{
return val[n-1]/wt[n-1]*W;
}
else
{
// then the recursion
return val[n-1] + fractional-knapsack-recursive(W-wt[n-1], wt, val, n -1);
}
}

Saturday, April 16, 2016

Today we review longest common subsequence instead of longest increasing subsequence
The LCS is based on the optimal substructure as follows: X, Y are sequences and Z is the LCS
1. if xm = yn, then zk = xm = yn and Zk-1 is the LCS of Xm-1 and Yn-1
2. if xm <> yn, then zk <> xm implies that Z is an LCS of Xm-1 and Y.
3. if xm <> yn, then zk <> yn, implies that Z is an LCS of X and Yn-1

The length of the longest common subsequence c[i,j] is :
1. 0 if i = 0 or j = 0
2. c[i-1, j-1] + 1 if i,j > 0 and xi = yj
3. max(c[i,j-1], c[i-1, j]) if i, j > 0 and xi != yj

LCS-Length(x, y)
m <- length(x)
n<-length(y)
for i = 1 to m
c[i, 0] = 0
for j = 0 to n
c[0,j] = 0
for i = 1 to m
for j = 1 to n
if xi == yi
c[i,j] = c[i -1, j-1] + 1
b[i,j] = up-left
elseif c[i-1,j] >= c[i, j-1]
c[i,j] = c[i-1, j]
b[i,j] = up
else c[i,j] = c[i,j-1]
b[i,j] = left
return c and b
Now the LCS can be printed with :
PRINT-LCS( b, X, i, j) // PRINT-LCS(b, X, length(X), length(Y))
if i = 0 or j = 0
then return
if b[i,j] == up-left
then PRINT-LCS (b, X, i-1, j-1)
print xi
else if b[i,j] == up
then PRINT-LCS(b,X,i-1,j)
else PRINT-LCS(b,X, i, j-1)

It can be hard to choose between longest common subsequence and longest increasing subsequence as a solution to a problem. For example, consider the box stacking problem where boxes of different l x b x h exist and we have to stack them up such that we can find the tallest tower

Previously in our earlier blog post we incorrectly referred to it as the longest common subsequence but it is in fact longest increasing subsequence for the formation of the tower.

The solution remains the same:
var msh = new List<int>();
for (int i = 1; i < = n; i++)
for (int j = 0; j < i; j++)
{
if (w (j)*l(j) > w (i)*l (i) && max (msh [j] + 1) > msh (i)
MSH (i) = msh[j] + 1;
}
return msh.max();

while dynamic programming considers choices repeatedly, back tracking never considers an invalid option and reduces the number of options from the permutations.

Friday, April 15, 2016

Today we review another coding problem
Problem: There are a lot of cuboid boxes of arbitrary sizes. Find the largest subset of cubes that can fit into each other.
Solution: This problem can be simplified if we think about cubes. Cubes can fit into each other based on their sorted edges.. This reminds us of a longest increasing subsequence (LIS) problem.
We recall the solution of the LIS problem as :
x can be a single-element increasing sequence or
x can be appended at the end of the longest increasing subsequence ending at some y provided that y precedes x and y < x
For example,
Edges          11, 1, 10, 4, 3, 2, 8, 7, 12, 6, 9, 5
Length of 1 1 2 2 2 2 3 3 4 3 4, 3
subsequence
This we can now code as :
int getLIS(List<int> cubeedges)
{
var lis = new List<int>();
for (int i = 0; i < cubeedges.Count(); i++)
{
for (int j = 0; j < i; j ++)
{
      if (cubeedges[j] < cubeedges[i] && lis[j] + 1 > lis[i])
                     lis[i] = lis[j] + 1;
   }
}
return list.max();
}
The above is O(N^2). We can have a O( N LogN ) solution also
Formally,
Let X[i] represent the sequence 2, 3, 1
Let M[j] store the index k of the smallest value X[k] so that there is an increasing subsequence of length j ending at X[k] on the range k <=i and j <=k <= i. j represents the length of the increasing subsequence and k represents the index of termination.
and Let P[k] store the index predecessor of X[k]
The longest Increasing subsequence at any given point is given by X [M [1]], X [M [2]], ... X [M [L]] and if there is an increasing subsequence at i then there is one at i-1 which is X [P [M[i]]] and between these two bounds we can do a logarithmic search for j <=k <= i
Therefore the steps are:
P = array of length N - all initialized to zero
M = array of length N + 1 // pad on the left so that we can use zero based index.
L = 0
for i in range 0 to N-1:
Binary search for the largest positive j <= L
such that X[M[j]] < X[i]
lo = 1
hi = L
Binary Search between lo and hi adjusts lo and hi
After searching lo is 1 greater than the length of the longest prefix
newL = lo
The predecessor of X[i] is the last index of the subsequence of length newL-1
P[i] = M[newL -1]
M[newL] = i
if newL > L:
if we found a subsequence longer than L, update L
L = newL

// Reconstruct the longest increasing subsequence
T = array of length L
k = M[L]
for i in range L-1 to 0:
    T[i] = X[k]
k = P[k]

return T
The only difference between cubes and cuboids is that we compare not one dimension but all three.

This means that all the edges in one must be smaller than the corresponding of the other

Thursday, April 14, 2016

Today we review a few more coding questions.

We can compare the coin counting puzzle with the classical knapsack problem. The knapsack problem is defined this way:
A thief robbing a store finds n items.
The ith item is worth vi dollars and weighs wi pounds, where vi and wi are
integers. The thief wants to take as valuable a load as possible, but he can carry at
most W pounds in his knapsack, for some integer W . Which items should he take?
(We call this the 0-1 knapsack problem because for each item, the thief must either
take it or leave it behind; he cannot take a fractional amount of an item or take an
item more than once.)

If the thief removes an item j from this load, the remaining load must be the most valuable load, weighing at most W-wj that the thief can take from the n-1 original items excluding j .
Therefore the knapsack problem can be written as follows:
case 1: with weight wj, find the maximum weights among n-1 weights to fill a bag that can hold W-wj
case 2: without weight wj, find the maximum weights among n-1 weights to fill a bag that can hold W

// Input:
// Values (stored in array v)
// Weights (stored in array w)
// Number of distinct items (n)
// Knapsack capacity (W)

for j from 0 to W do:
m[0, j] := 0

for i from 1 to n do:
for j from 0 to W do:
if w[i-1] > j then:
m[i, j] := m[i-1, j]
else:
m[i, j] := max(m[i-1, j], m[i-1, j-w[i-1]] + v[i-1])

The knapsack and the coin change are similar because we have to select the fewest high denomination
coins to reach a sum

we substitute the capacity W with the sum to reach, the number of distinct items as n, the number of weights as denominations and the values as 1 each. We find the minimum instead of the maximum value as count of coins.

Wednesday, April 13, 2016

We can solve the previous problem by using combinations as shown below:
void CoinCount(ref List<int> coins, ref List<int>change, int amount, int start, int level)
{

for (int I = start; I < amount; I++)

{

for (int j = 1; i < coins.Count() && j <= amount/coins[I]; j++)

{

for (int k = 0; k < j; k++){
       change[level+k] = coins[i];
}

if (change.sum() == amount) Console.WriteLine("{0} with {1} elements", change.ToString(),
change.Count());

if (change.sum() < amount)

      CoinCount( ref coins, ref change, n, start+1, level+j);

for(int k = 0; k <j; k++){
      change[level+k] = 0;
}
}

The dynamic programming approach is as follows:

//
// returns the maximum sum that cannot be met by the given coins
//
public int CoinSumDP(List<int> coins, ref List<int> change, int amount, int start, int end) // start is for the type of coins and end is for the number of same coin and all coins appear in sorted duplicate manner and using start,end for contiguity start = 0; end = change.Count [start,end] range initialized to zero
{
//pseudo code
if (coins.Count() == 1) return Coins[0];
int sum = 0;
for (int i = 0; i < Coins.Count(); i++)
var q = CoinSumDP(Coins.ChooseK(), ref change, amount, start, end) +
CoinSumDP(Coins[start].Repeat(end-start), ref change, amount, start, end)
if q > sum
sum = q
return Sum;
}

#leetcode codingexercise
You are given coins of different denominations and a total amount of money amount. Write a function to compute the fewest number of coins that you need to make up that amount.

Example 1:
coins = [1, 2, 5], amount = 11
return 3 (11 = 5 + 5 + 1)

Example 2:
coins = [2], amount = 3
return -1.

Note:
You may assume that you have an infinite number of each kind of coin.

Solution:
If the range of sums that can be generated using the first k numbers in coins is from 0 to r, and the next number has value x, then either:
• x <= r+1 and it is possible to generate any sum from 0 to r+x using the first k+1 numbers, or
• x > r + 1 and it is not possible to generate the sum r + 1, since x and all the following numbers have greater values.
We keep adding x=r+1 and counting such additions until r meets or exceeds n.

public int CoinChange(List<int> coins, int amount) {
int lookup[amount+1] = {0};

for (int i=0; i<coins.length; i++)
{
for (int n=1; n<=amount; n++)
{
if (n > coins[i])
{
int countWithNewCoin = lookup[n-coins[i]] + 1;
if (countWithNewCoin < lookup[n] || lookup[n]==0)
lookup[n] = countWithNewCoin;
}

}
}
return lookup[amount];
}
Coins 1,3
amount 5
lookup 0 1 2 3 4 5
Coins[1] 1 2 3 4 5
Coins[2] 2 3
courtesy : internet posts prismo skills

Tuesday, April 12, 2016

Given a positive integer n, find the least number of perfect square numbers (for example, 1, 4, 9, 16, ...) which sum to n.

For example, given n = 12, return 3 because 12 = 4 + 4 + 4; given n = 13, return 2 because 13 = 4 + 9.

Any combination of perfect square numbers with or without repetitions may yield the sum

We enumerate all and find the ones with the least number

Note the greedy approach does not work: For example to get 12, we cannot say 9 + 1 + 1 + 1

void SquareSums(ref List<int> perfectsquares, ref List<int>candidate, int n, int start, int level)
{

for (int I = start; I < n; I++)

{

for (int j = 1; j <= n/perfectsquares[I]; j++)

{

for (int k = 0; k < j; k ++){
       candidate[level+k] = perfectsquares[i];
}

if (candidate.sum() == n) Console.WriteLine("{0} with {1} elements", candidate.ToString(),
candidate.Count());

if (candidate.sum() < n)

      SquareSums( ref perfectsquares, ref candidate, n, start+1, level+j);

for(int k = 0; k <j; k++){
      candidate[level+k] = 0;
}
}

}

}

Monday, April 11, 2016

Given an array with n objects colored red, white or blue, sort them so that objects of the same color are adjacent, with the colors in the order red, white and blue.

Here, we will use the integers 0, 1, and 2 to represent the color red, white, and blue respectively.

We make two passes

First pass put all the reds together by exchanging contiguous positions with red only

Second pass with all whites together

The rest automatically follow.

Void sortColors ( ref list <int> colors )
{
For ( int color = 0; color < 2; color ++;)
For (Int src = 0; src < colors.count (); src++)
{
If ( color > 0 && colors [src] <= color) continue;
Int d = src +1;
If (colors [src] != color){
While ( d < colors.count () && colors [d] != color) d++;
If ( d != colors.count()) swap colors [src], colors [d]
}
}

#coding exercise
Write a program to find the nth ugly number. An ugly number is one that has prime factors of only two, three or five
Solution an ugly number must be formed by multiplying 2,3 or 5 from a smaller number.

List<int> getUgly(int n)
{
var ret = new List<int>() {1};
// indexes for last candidates for corresponding multiples
int I2 = 0;
int I3 = 0;
int I5 = 0;
for (int i = 0; i < ret.Count() && ret.Last() < n;i++)
{

int min = min(GetProduct(ref ret, 2, I2 ),
GetProduct(ref ret, 3, I3),
GetProduct(ref ret, 5, I5));
if (min == GetProduct(ref ret, 2, I2))
I2 = I2 + 1;
if (min == GetProduct(ref ret, 3, I3))
I3 = I3 + 1;
if (min == GetProduct(ref ret, 5, I5))
I5 = I5 + 1;
if (ret.Contains(min) == false) // this check can be commented out
ret.Add(min);
}
return ret;
}
int GetProduct(ref ret, int multiplier, int index)
{
return ret[index] * multiplier;
}
For example:
ret 1,2, 3, 4, 5, 6 ,8,9, 10, 12, 15
I2 6
I3 5
I5 3

Sunday, April 10, 2016

Enterprise scale message queues

Public clouds offer message queue services as a background task processor. Messages may be sent to queues that are set up with a processor to handle the messages. Since the messages are consumed offline, they are ideal for background tasks that do not otherwise fall under the scope of the online services or their Application Programming Interface (API).

This post describes how message queues may be made available in private cloud for end user consumption. If we choose full service message queue software, we will inevitably find a distributed cluster as the topology to deploy. The distributed cluster works even on the same physical machine and is already geared towards performance and scale-ability even on a limited resource. However, the choice of software and hardware and their configurations is important to study so that a suitable arrangement can then be made to handle different sizes of workloads. As we will shortly see, there are many options for rolling out message queue services. Moreover, the message queue services are considered here to be deployed in the cloud. If it were personal to each cloud user, then we could have treated it as an application and rolled it out with each server that is issued to customers.

Let us therefore see how the enterprise wide message queue services needs to be laid out to the customers of a private cloud.

First we choose the number of instances of the message queue cluster and the cluster size. We have already discussed that being as granular as one instance for every server does not offer the power that cloud computing users exist. On the other end of the spectrum having a large cluster for a single dedicated instance to private cloud users has the benefit that it can be made organization specific and at the same time be centrally managed and handled for all users. Moreover, this logical instance can be internally represented by a syndication of clusters across different regions hiding the complexity from the user. After all, the user expects an endpoint and access to the cluster regardless of how it is setup

Second the cluster size can vary depending on the loads because most message queue software allow the addition of nodes to increase the computing power. This means that if we go with a single instance then we can arbitrarily scale out the cluster to meet the traffic. This is also more efficient when it is offered as a managed service

Lastly, we consider the nature of this software and provision accordingly. Customers are likely to use the queues in one of the following ways each of which implies a different layout for the same instance and cluster size.

1. The queues may be mirrored across multiple nodes – This results in a choice of using clusters

2. The queues may be chained where one feeds into the other – This results in a choice of using federation

3. The queues may be arbitrary depending on application needs – This results in a build your own or the shovel work

Conclusion: A private cloud message queue offering has to be carefully planned and could begin with a single managed organization wide distributed instance.
#codingexercise
Given a non negative integer number num. For every numbers i in the range 0 ≤ i ≤ num calculate the number of 1's in their binary representation and return them as an array.
int GetBits(int num)
{
int count = 0;
for (int i = 0; i < num; i++)
{
List<Bit> bits = num.toBits();
count += bitx.Count( x => x & 0x1);
}
return count;
}

Saturday, April 9, 2016

In yesterday's problem solving, we did not really do dynamic programming as much as we did generation of paths.
Since the overlapping subproblems can also be expressed in a relationship between the longest paths of the neighbors to that of the current, we could also solve the problem without redundancy.
for example, we can say that longest_path_at current = max(longest_path_of_neighbors) + 1 and the longest path for each cell of the matrix as the origin can be computed and memoized so that we don't repeat the calculations there again.
Another coding interview question:
Given a list of unique words. Find all pairs of distinct indices (i, j) in the given list, so that the concatenation of the two words, i.e. words[i] + words[j] is a palindrome.
The naive approach would be to iterate the list to enumerate the pairs. This is O(N^2) and then concatenate the pairs and test for palindrome.
something like
List<Tuple<int,int>> GetPalindromeIndices(List<string> words)
{
var ret = new List<Tuple<int,int>>();
for (int i = 0; i < words.Count(); i++)
{
for (int j = 0; j < words.Count(); j++)
{
if (i != j && IsPalindrome(words[i] + words[j]))
{
var index = new Tuple<int,int>() { i, j };
ret.Add(index);
}
}
}
return ret;
}
An improvement would be to sort the words in the list and then if any of the words has a midpoint for reverse within it find the complimentary word or the reverse of the current word in the list. This is a binary search since we know what we are looking for. Note that the word pair lengths need not be the same and hence the midpoint to be contained in one of the words. After we have found all the pairs of words, we then iterate the list once to lookup their corresponding indices.

Detecting themidpoint of a word is only possible when the letters start to appear reverse, therefore we know what the remainder of the word should be. Otherwise if the midpoint is not in the word, the other word is a complete reverse of this one.

Friday, April 8, 2016

Yesterday we were solving a puzzle as below.
You are given an array x of n positive numbers. You start at point (0,0) and moves x[0] metres to the north, then x[1] metres to the west, x[2] metres to the south,x[3] metres to the east and so on. In other words, after each move your direction changes counter-clockwise. Find out if the paths cross.
Today we discuss the space and time complexity of the solution. We use a set of start and end points for both current and previous axis segments. We update it as we linearly traverse the list of displacements until either we cross the path or we reach the end of the given list. Thus the space complexity is O (1) and time complexity is O (n).
We also review another coding problem.
This is the longest increasing path problem.
Given an integer matrix we have to find the length of the longest increasing path. From any cell, the path is formed by traversing left right up or down but not diagonally or outside the boundary. Since the path consists of strictly increasing sequence of selected elements from the matrix, it is a sorted sequence. We are only interested in the number of elements.
IF we start with a cell, we will have one or more choices to pick the next element for a path because only a few of the elements, if any, will be bigger than the current element. for each of these elements we have to repeat the same problem again.
There are three key aspects to recognize
1) the problem has a termination condition. we know we stop when we cannot add another element in the path because every element adjacent to the current element is of lesser or equal value or the current element is at the boundary or both.
2) each element participates as an origin of the path or as an interim element of some other element's path. This is called overlapping sub-problems
3) for all possible choices of next steps from the current element, we have the same problem all over again. This condition is for making progress in our path and we do it with recursion. Recursion means we form a solution that take care of termination and progress to the next step at which time we reenter the solution with the next element as the current.
Therefore the solution looks like the following:

void GetLongestPath(int[,] matrix, int rows, int cols, int i, int j, ref List<int> path)
{
path.Add(matrix[i,j]);
bool progress = false;
if (j-1 > 0 && matrix[i,j-1] > matrix[i,j]) { progress = true; GetLongestPath(matrix, rows, cols, i, j-1, path);}
if (i-1 > 0 && matrix[i-1,j] > matrix[i,j]) { progress = true; GetLongestPath(matrix, rows, cols, i-1, j, path);}

if (j+1 < cols && matrix[i,j+1] > matrix[i,j]) { progress = true; GetLongestPath(matrix, rows, cols, i, j+1, path); }

if (i+1 < rows && matrix[i+1,j] > matrix[i,j]) { progress = true; GetLongestPath(matrix, rows, cols, i+1, j, path); }
if (progress == false) Console.WriteLine("{0} items in {1}", path.Count(), path.toString());

path.Remove(matrix[i,j]);

}

Thursday, April 7, 2016

#puzzle
You are given an array x of n positive numbers. You start at point (0,0) and moves x[0] metres to the north, then x[1] metres to the west, x[2] metres to the south,x[3] metres to the east and so on. In other words, after each move your direction changes counter-clockwise. Find out if the paths cross.
Solution
Keep track of the axis segments from four steps behind
if the current axes segment is perpendicular (directions are repeating patterns) to the one four step behind and has a point from the tracked axes segment on the current axes segment (either the x or y are same, we know whether it is x or y because they repeat alternatively), then they cross.
bool axis_segments_cross( int x1, int y1, int x2, int y2,
int x3, int y3, int x4, int y4)
{
if (x1 == x2)
{
assert(y3 == y4);
if (y1 <= y3 && y3<= y2) return true;
else return false;
}
if (y1 == y2)
{
assert(x3 == x4);
if (x1 <= x3 && x3 <= x2) return true;
else return false;
}
if (x3 == x4)
{
assert(y1 == y2);
if (y3 <= y2 && y2<= y4) return true;
else return false;
}
if (y3 == y4)
{
assert(x1 == x2);
if (x3 <= x1 && x1 <= x4) return true;
else return false;
}
return false;
}

bool update_axis_segments(List<int>x)
{
Point curr_start, curr_end; // current displacement
Point prev_start, prev_end; // 4 displacements back
for (int i = 0; i < x.Length; i++)
{
//update
switch(i%4)
{
case 0: // North curr_end.y+=x[i] also snapshot previous
case 1: // West curr_end.x-=x[i]
case 2: // South curr_end.y -= x[i];
case 3: // East curr_end.x += x[i];
}
// check
if (axis_segments_cross(curr_start.x, curr_start.y, curr_end.x, curr_end.y,
prev_start.x, prev_start.y, prev_end.x, prev_end.y) == true) return true;
}
return false;
}

Wednesday, April 6, 2016

Today we start reading the paper Pelican: a building block for exascale cold data storage by Balakrishnan et al. Cold data is the data that is rarely accessed. Pelican is a rack-scale hard-disk based storage unit designed as the basic building block for Exabyte scale storage for cold data. A rack-scale set of computers is a class of hardware with pooled storage, compute and network such that thousands of cores are interconnected at high speed. A rack is increasingly replacing the individual server as the basic building block of modern data centers. At Microsoft, the rack-scale computing, the Rack-scale computing project aims to redesign hardware, storage, networking and OS stacks in a cross layer co-design manner so as to improve efficiencies. By choosing a building block of this size, Pelican aims to make a building block of this size for cold data workloads. The provisioning is considered just right when the total cost of ownership reduces as compared to traditional disk-based storage. For example, only 8% of the drives can be concurrently spinning in this building block. This introduces complex resource management to be handled by Pelican.
This is met in the following way. Resource restrictions are expressed as constraints over the harddrive. The data layer and IO scheduling ensure that these constraints are not violated. Pelican and conventional storage are compared side by side using a cross validated simulator. Pelican demonstrates high throughput with acceptable latency for cold workloads.
The cost of storing data appears as a sliding scale. The storage in Amazon S3 for example is near the high end. But cold data is also growing phenomenally. Amazon Glacier and Facebook cold data storage are just a few examples where S3 doesn't suffice and cloud scale tiers need to be provisioned. Moreover these systems have to minimize up front capital costs of buying the storage as well as the costs of running the storage. By co-designing hardware and software stacks, Pelican aims to allow the entire rack's resources to be carefully balanced to support cold data workloads.
#coding exercise
Given a sorted positive integer array nums and an integer n, add/patch elements to the array such that any number in range [1, n] inclusive can be formed by the sum of some elements in the array. Return the minimum number of patches required.
This has a dynamic programming approach
If the range of sums that can be generated using the first k numbers in nums is from 0 to r, and the next number has value x, then either:
• x <= r+1 and it is possible to generate any sum from 0 to r+x using the first k+1 numbers, or
• x > r + 1 and it is not possible to generate the sum r + 1, since x and all the following numbers have greater values.
We keep adding x=r+1 and counting such additions until r meets or exceeds n.

int GetMinPatches(List<int> nums, int n)
{
int count = 0;
Int index = 0;
int r = MaxSumImpossible(nums, 0, index);
while ( index < nums.count && r < n)
{
Index++;
int x = MaxSumImpossible(nums, 0, index+1)
If ( nums.contains(x) == false){
count ++;
Nums.add (x, index);
Index++
}
r = r +x;
}
return count;
}

int MaxSumImpossible(Numbers, I, j)
if len(Numbers) == 0:
return 0;
If len(Numbers) == 1:
Return Numbers[0]
m[I,j] = 0
For k = 0 to j-I-1
if (jth coin <= MaxSumImpossible(Numbers Choose K, I,j-1)
q = MaxSumImpossible(Numbers Choose K, I,j-1) + MaxSumImpossible(Numbers with jth Number only, j,j)
If q > m[I,j]
M[I,j] = q
return m[I,j]

Tuesday, April 5, 2016

Today we continue to discuss software testing effectiveness measures - these are P-Measure, E-Measure and F-Measure. We had read that the E-measure helps make P-Measure more useful.Their relative values change similarly. Moreover E-measure can differentiate the detection of more than one failure while the P-Measure regards a testing strategy as good as another if they can detect at least one failure. E-Measure assumes homogeneous subdomains whereas to maximize P-Measure, only one subdomain has the failure causing inputs.
Moreover, it has been difficult to generalize such conditions for overlapping subdomains. But in this case too E-Measure is able to identify some characteristics of overlapping subdomains testing For example, T.Y. Chen and Y. T. Yu found that the relative performance of subdomain testing to random testing depends on the aggregate of the differences in the subdomain failure rate and the overall failure rate. The differences must be weighted by the number of test cases selected from that sub-domain. In the overlapping case, the subdomain failure rates can be higher than the overall which makes the score in favor of subdomain testing as opposed to random testing.
Lastly if the E-Measure says that the subdomain testing strategy is better than the random testing then it must be the case with P-Measure also. This leads to characterizations using P-Measure. This does not mean that the same program can get the same ranking from both measures. That said, if the failure rates are small, the E-Measure serves as a good approximation to the P-Measure. More analysis with E-measure can also be helpful if the subdomain testing strategies are do-able.

Monday, April 4, 2016

Today we continue to discuss software testing effectiveness measures - these are P-Measure, E-Measure and F-measure. We were discussing random testing. The P-measure in this case is equal to 1 minus the success rate raised to the power of the inverse of failure rate p. The E-measure is the combination of the number of test cases and the failure rate. The F-measure is the ratio of the size of the input domain and the size of the failure region. Random black box sampling of the input domain gives the worst case. That said the randomness may still be required because the failure patterns are not always known. This leads to a technique called adaptive random testing. This is a technique to distribute test cases evenly within the input space without losing all randomness. The fixed size candidate set case of this technique involves the following steps; First, randomly generate a set of test cases from each iteration of the test case generation. For each such set, the distance of the candidate from the set of previously executed tests is calculated. Lastly, the candidate with the greatest distance is executed and added to the set of executed tests. This technique has achieved result near the upper bound for effectiveness.
In subdomain testing, the effectiveness of testing is measured in terms of the faults detected and the maximum value of E-measure gives such a case. In contrast, to maximize P-measure, it is only necessary to have one subdomain full of failure causing inputs, even though all other subdomains may have low failure rates. Therefore the E-measure has more merit for this purpose. Besides, it can distinguish the capability of detecting more than one failure while the P-measure treats them the same if they result in at least one failure. Furthermore, it has been difficult to extend the formal analysis to more general and more common case of overlapping subdomains. By using E-measure, T.Y.Chen and Y.T.Yu say that new properties of P-measure can be found rather than with P-measure alone. For example when comparing subdomain testing with random testing, a greater value or equal value of E-measure assures a greater value of P-measure. Thus the relative value of E-measure also gives a good indication of P-measure.
There is no single measure that serves as the best measure.

Sunday, April 3, 2016

Today we discuss software testing effectiveness measures - these are P-Measure, E-Measure and F-measure. The E-measure is the expected number of failures detected by the set of test cases. The P-Measure is the probability of detecting at least one failure from the set of test cases. Both the P-measure and the E-measure have been used widely. The F-measure is the number of tests required to detect the first failure. Also faults are different from failures. A failure is any manifestation of deviation from the correctness of a program execution. One or more faults may be the reasons for the failures. Until a failure occurs, a fault is said to be not known. A fault is typically found by debugging. A failure is seen by observation. The F-measure is therefore concerning the first fault. It leads to fewer test cases and a reduced cost of overall testing. Even within the same number of test cases, the order of test cases matters.
The detection of more failures does not always help us identify more faults. That said it is still better to reveal as many failures as possible. Intuitively the more failures the testing reveals the more information for debugging faults and the higher the chance of detecting more faults. In this sense, a measure can be to detect as many failures as possible. This is what we attempt with maximum value of E-measure.(courtesy T.Y. Chen and Y.T. Yu)
Failures occur for a subset of the possible input domain of the program. Generally, inputs that produce failures will cluster in regions of the input domain. Contiguous regions form patterns in the input domain. For two dimensions, there are (a) block pattern, (b) strip pattern and (c) point pattern.
The assumptions made in finding patterns include the following:
(a) the failure region's size, shape and orientation is known
and (b) the region's location in the input domain is not known. (courtesy: cs.umd.edu CMSC737)
courtesy:
#codingexercise
int GetP-Measure(int p)
{
return 1 - (1-p) ^ (1/p);
}
int GetE-Measure(int n, int p)
{
return n*p;
}
int GetF-Measure(int sizeOfInputDomain, int sizeOfFailureRegion)
{
return sizeOfInputDomain / sizeOfFailureRegion;
}

Saturday, April 2, 2016

Background Tasks

Contents

Problem statement:

We have a self service file share provisioning portal which creates and maintains a file share on a remote storage clusters that are named by their location. Each cluster has a driver that we call the connector and it exposes a set of SSH commands for the said operations. An API layer translates the user portal request into a command for a target connector. The API layer is reached through a gateway. Frequently we have seen timeouts from different sources - sometimes the gateway times out, sometimes the connector takes too long, sometimes the SSH connection to a remote cluster takes too long. For now we have been working with an upper bound on the timeout. In this post, we try to find an alternative solution using an asynchronous mechanism.

Solution:

Current Interaction diagram looks like this:

In this interaction, API is on hold until the resource is created.

Instead if a promise could be made that can be fulfilled later, then the API can be relieved right away.

This introduces two challenges:

1) Earlier the object was created and its properties cannot be changed and operations cannot be made on the object until it has been created and marked active. Moreover we send emails to the customer when the share has been created.

2) Earlier the object creation was scoped within a transaction so we had a clean state to begin or end with.

We now overcome these to have an interim state before it is marked active. We call this state pending. Since each resource is identified by it location and file share name, the API can check multiple requests for the same object creation by finding the existence of a record in the database and making sure that it is not pending. If the object is pending, the API can return an error as a bad request or method not allowed. If the object is already created, then the bad request error returned with a different error message. No other operations are permitted unless the object is active. Therefore the create call is an idempotent call

In order to receive a notification, the API layer can spawn workers to separate the send and receive operations on different tasks. This lets the send operation relieve the caller immediately. Moreover the receive operation can be made a background task. On completion of this task, the user can be emailed that the share is ready. If the share creation fails, the share entry is removed from the current database so that it is in a clean state and the create operation can be retried.

This background task happens as processing of a message queue. During the send operation the database is marked and a message is placed on the message queue. The message queue processor reads the message and issues a command to the connector waiting on the response. When the connector responds back, it suitably marks the record as active or deletes the entry. Only the processor can delete the entry. It notifies the API or the portal or the user via http or email. Both the processor and the API have to check the entries in a database. There is an understanding between the API and the processor that redundant create messages may appear but there can only be one file share and one corresponding entry in the database.

The background processor or spool :

The interaction diagram now looks like this:

Choice of a background task processor versus concurrent APIs:

In the above description, a background task processor has been proposed. The same task separation could have been executed independently with concurrency features of the language. However, the background task enables separation from the caller and the recipient of the results. Moreover it can handle different types of tasks. A message queue because allows retries on failures and can scale as a cluster.

Conclusion:

APIs can become more responsive with a background task processor and a new state for the resource.