Cluster computing

Saturday, May 10, 2014

In this post, I want to include a puzzle I came across online.

Write a function to generate random numbers .

The problem talks about pseudo-random number generation since true random number can be difficult. This can be done in many ways.

It is generally best to use built-in pseudo random number generators such as Math.Random

However, I want to describe the problem this way. Generate random numbers between 0 to 3 and print them. The only test I have is that calling the function three times doesn't result in same sequence more than half the number of times the function is called.

By restricting the size of the array, I'm going to use the index and the contents of the array.

for( int i = 0; i <= N; i++)

{

Swap(h[i], h[Math.Random(0,N-1)]);

}

return h[n];
we could probably optimize this to just iterate over half the array.

Some languages implement random number as follows:

int next(int n)

{

seed = (seed * 0x5DEECE66DL + 0xBL) & ((1L << 48) - 1))

return (int)(seed >>> (48 - n));

}
or Mersenne Twister based such as in C documentation:
int rand(void)
{
next = next * 1103515245 + 12345;
return (unsigned int )(next/2 * (RAND_MAX + 1L) % (RAND_MAX + 1L));
}

Friday, May 9, 2014

In this post, we briefly review some of the compiler flags available from the MS IDE.
/Od disables optimization. If you see code that doesn't show locals or functions under a debugger, chances are the compiler has decided to optimize the code. This flag helps with the debugging in that it disables optimizations so that you have access to all code artifacts. This flag has also been used to troubleshoot the compiler. Compiler can give bad output as well say in one of the following circumstances:
1) a strange combination of compiler flags was used and
2) compiler was asked to build against the incorrect target
In the cases, where the compiler does something weird with the switches specified, the /Od comes to the rescue in providing a single expected version of the executable.
Another thing to mention is that the compiler also provides a /errorReport switch that comes to the use when there are error information report to be generated. This flags is described on the MSDN as one that allows you to provide internal compiler error directly to the Visual C++ team.
The /favor switch helps you to produce code that is optimized for a specific architecture or for micro architectures in both AMD64 and EM64T.
The /GA switch optimizes code for Windows application.
The /GA speeds access to data declared with the __declspec(thread) in a Windows based program.
The __declspec usually denotes the Microsoft specific storage class attributes such as for example align. What align specifies is that the alignment of user defined data be precisely controlled. This can be used with a struct, union or class or variables.
The other type of storage attributes include app domain, dllimport, dllexport, noalias, noinline, noreturn, nothrow, novtable, safebuffers, thread etc.
The /Ge switch activates stack probes. This is helpful for stack overflow The same functionality is now available via /Gs and is preferred over the now deprecated /Ge switch.
The /GF switch enables string pooling that can create smaller programs. /GF is in effect when /O1 or /O2 is used.
The /GL switch enables whole program optimization. Much of these optimizations are also described by the /LTCG option LTCG stands for Link time code generation and is responsible for optimizations such as
cross module inlining
inter procedural register allocation on 64bit
custom calling convention (on x86 only)
Small TLS displacement (on x86 only)
Stack double alignment(x86 only)
and disambiguation between global variables and input parameters.
/Wp64 compiler option detects portability problems
It does this by testing the usages of int, long and pointer in the program. If the 64bit compiler is regularly used, this flag can be disabled because the compiler will detect all such issues.
By default, this is off in 32 bit compilers and on in the 64 bit compilers.
Lastly, we discuss the /Zi option that generates complete debugging information. This option implies the /DEBUG (generate debug information) Note that this only affects the program database PDB where the compiler puts the debugging information.
/GT option supports fiber safety for data allocated using static thread-local storage.
The opposite of disabled optimization is full optimization which is specified by the /Ox maximum optimization (/Ob2gity /Gs)

Here are some previous coding interview questions:
Write the method that gives the predecessor of a node in a binary search tree.
Write the method that builds a tree given the in order and post order traversal of a tree.
Write the method that removes duplicates from an integer array.
Write the method that detects if two arrays are rotations of the same sequence.
Write the method that reverses a string
Write the method that inserts a node at the tail of a circular linked list in constant time.
Write the method that sorts entries into a phonebook.
Write the method that finds the common ancestor to two nodes in a binary search tree.
Write the method that searches for a word in a given string. This could be with Knuth Morris Pratt algorithm:
KMP algorithm evaluates states based on the following
m position within a string which is a prospective match
and i index in the pattern denoting the character under consideration.
First we pre-process to build the automata of distinct characters in a table T
Next, we do the match as follows:
while (m + i < length (S))
if W[i] = S[m +i]
if i == length(W) - 1
return m;
i = i + 1
else
if T[i] > -1
i = T[i], m = m + i - T[i]
else
i = 0, m = m + 1
return the length of s
The table building algorithm is as follows:
T[0] = -1;
while (i < patternLength) {
T[i + 1] = T[i] + 1;
while ( T[i+1] > 0 &&
pattern[i] != pattern[T[i + 1] - 1])
T[i + 1] = T[T[i + 1] - 1] + 1;
i++;
}

Thursday, May 8, 2014

In this post, we will quickly review the design patterns:
First is the singleton pattern. Some of the core system functions are commonly accessed via a singleton pattern. Remember to use static, remember to hide the constructor, remember to count instances and remember to make it thread safe.
Builder is another pattern. The builder creates an object on your behalf in a step wise manner without knowing how it is created. This is very common to see in object relational mapping where we use a construct like
new Table().AddColumn().AddColumn().AddColumn(). .. etc
instead of Table(col1, col2, col3);
Another advantage is that it can create a hierarchy of builders.
Factory method pattern applies a concept of a factory method which is one that creates and returns an object.
An Abstract factory is one which creates other objects. This helps use many factories to create objects usually when they are of the same kind. There is a separation between the concrete classes and the callers of the factory.
An Observer pattern is one that is also called a Publish-Subscribe pattern or pub-sub pattern for short. Observers register themselves with the subject using an interface for update notifications. The model view controller is an example of this pattern because changes to the model automatically update the views.
One caveat in the observer pattern is that when there are frequent updates, the object may have to spend a lot of time notifying its observers. This is particularly true when there are multiple properties getting updated. It would then be better to consolidate the updates. Also the subject could pass the information on what has changed.
Decorator patterns are often applied in wrappers. The behavior of the underlying object is changed by the decorator.
A decorator is very different from an interface. While provide methods for certain behavior from the underlying objects. In one the object is wrapped and presented as another object while in the latter the object itself provides the desired behavior. Typically a decorator is used when the behavior has to be radically changed.
Using the decorator also helps avoid class explosion when there are many different behaviors added to an object.

In today's post we look at a few more coding questions:
How do we convert a binary tree to a heap ?
There is a shortcut answer for this.
Remember that both can be serialized into arrays.
So we simply sort the array
Then we can deserialize them into the respective data structures.
In the case of the heap, the left and right nodes are at 2i and 2i+1
For the binary trees, the deserialization depends on whether the elements are listed in pre-order, inorder and postorder traversal.
Also, when we sort the array, remember that we can used different comparators.
We discuss the pancake problem next.
The pancake problem has a stack of n pancakes each with a different diameter. When you flip the pancakes, you can flip more than one by inserting your pancake flipper at any level into the stack and reversing it. The question is how many flips will it take to sort all the pancakes by size with the bottom pancake being the largest. We are looking for something that will work for in most cases of different orders of pancakes.
The first thing to observe here is that this is way different from sorting. In sorting we can exchange elements in a collection. This is not the same for this pancake stack where we must lift all the pancakes above the point where we insert the pancake flipper.
Consider three cases for the largest pancake.
If the largest pancake is already at the bottom, we don't need to do anything.
If the largest pancake is at the top, we flip the entire stack so the largest one lands at the bottom.
If the largest pancake is in the middle, we split the stack at that point. Put the pancake at the top of one stack and then invert it.
Because the pancakes at the bottom are unaffected by the flips above, we proceed from bottom to top in this way.
Since it takes two flips for the bottom pancake, we can assume that this takes the same number of flips for each pancake resulting in 2n-2 flips in worst case.
The pancakes problem has an interesting history. Bill Gates published a paper that was the most efficient for almost 30 years [Courtesy programming interviews exposed book]

Wednesday, May 7, 2014

We look at a few more coding questions today. First we look at breadth first traversal and depth First traversal. Depth first traversal we can write recursively as follows with no additional data structure.
DFS-Visit(Root)
{
mark root as partially visited
for each vertex v adjacent to Root
if color of v = unvisited
set parent of v
DFS-visit(v)
mark root as visited
}
Next we look at breadth first search
In BFS, we maintain an additional data structure to store the siblings at any level.
BFS(G, s)
{
// initialize
ENQUEUE(Q,s)
While Q is not null
u = DEQUEUE(Q)
for each v belonging to Adj[u]
do if color of v = unvisited
then color it partially visited and
ENQUEUE(Q)
mark u as visited
}

I'm going to cover a few more questions but first I wanted to take a minute. In particular I want to cover graph based coding questions and revisit to distributed computing.

I will take a small example for edit distance first :
Find the minimum edit distance between two input strings. Edit distance is the minimal number of edit operations to modify a string from one state to the other.
For eg : Saturday can be changed to Sunday using the following transformations:
Saturday -> Sturday
Sturday -> Surday
Surday -> Sunday
if we denote the edit distance by a function f(i,j) between the substring of the first string ending with the j'th character and the substring of the second string ending with the i'th character. Then we know that f(i,0) = i since the because we need to delete that many characters to get an empty string. Similarly f(0,j) = j because the empty string is at the beginning of that same substring ending with the 0th character.
If the jth character of the first string is different from the ith character of the second string, then
there are three different techniques involved in the transformations and finding the edit distances. These are insertion, deletion and substitution.
We note that we can make the lengths equal by deleting the characters from the longer string or inserting the characters into the shorter string. We will shortly see how.
If the jth character of the first string is the same as the ith character of the second string, then the edit distance f(i,j) is the same as f(i-1,j-1)
If there is substitution involved then the edit distance f(i,j) = f(i-1,j-1) + 1
if we insert the ith character of the second string into the first, then edit distance f(i,j) = f(i - 1, j) + 1
If we delete the jth character of the first string, then the edit distance f(i,j) = f(i, j-1) + 1
Thus if the i != 0 and j != 0 and string [i] != string [j], then the edit distance is the minimum of
f (i-1,j) + 1 , f(i, j - 1) + 1 and f(i - 1, j - 1) + 1
With the above formula comprising of case i = 0 case j = 0 case string[i] == string[j] and string[i] != string[i,j] we can create a table of edit distances by enumerating the characters of the first string as columns and those of the second string as rows and using the substrings to compute edit distances for any pair of substrings. Note that we introduce a column at the beginning of all rows and we introduce a row at the beginning of all columns to hold the position for the empty string and we initialize the value of this first cell as zero. we can computer the edit distance for other pairs of substrings starting from here. At each additional row or column, we use the edit distance from the previous row or column and we find the minumum value from the three operations listed above .At the diagonal end of the first cell which is the last cell we will compute the edit distance of the entire first string and the entire second string.

Algorithm: The function to implement the above has the following loops
first loop to set the value of the first row ( corresponding to empty string )
second loop to set the value of the first column ( corresponding to the empty string)
then for each pair of characters from the first and the second string,
if they are equal we set it to the previous diagonal value
else (they are different and) we find the costs for insertion, deletion and substitution and choose the minimum with which we update the corresponding position in the table

We quickly review some graph algorithms next:

Minimum spanning trees:
Kruskal's algorithm:

// Make-set of all vertices
sort the edges of E into non-decreasing order by weight w
for each edge (u,v) belonging to E taken in that order
connect the edge if they belong to different sets

Prim's algorithm:
Put a vertex in the min-priority Q sorted by the weights
While Q is not empty,
extract the minimum
for each vertex v belonging to adjacents of u
if v belongs to Q and w(u,v) < key(v)
connect v to u and set its key

Shortest paths are found by relaxing the edges

Djikstra's shortest path:
Initialize-single-source(G,s)
Initialize Q with V[G]
While Q is not empty
u = extract-min(Q)
add u to shortest path vertices
for each vertex adjacent to u
relax (u,v,w)

In this post we look at a few more coding questions,. In particular we look at a technique called backtracking.
Backtracking helps over brute force method.
Take the alphabet matrix as follows:
a b c e
s f c s
a d e e
There is a path for the string 'bcced' but not for path abcb. Re-entering is not possible
This technique is called backtracking because it returns back to the previous state and uses pop operations from a stack. That is when it finds a character it pushes it on to the stack and for all the rejection where the character is not the same as the corresponding character, it backtracks and pops from the stack.
Since all entries can be entered only once, we keep a boolean flag pertaining to each entry.
bool hasPath(char* matrix, int rows, int cols, char* str)
{
// parameter validation
// bool* visited for the matrix of used flags
int pathLength = 0;
for (int row = 0; row < rows; row++){
for (int col = 0; col < cols; col++){
if (hasPathStack(matrix, rows, cols, row, col, str, pathLength, visited))
return true;
}
}
//cleanup
return false;
}

bool hasPathStack(char*matrix, int rows, int cols, char* str, int& pathLength, bool* visited)
{
// if the boundary conditions are satisfied &&
// if the path has not been found yet &&
// if the element has not been visited already
++ pathLength;
// visited set to true
// visit one of the four neighboring cells
bool hasPath = hasPathStack(matrix, rows, cols, row, col - 1, str, pathLength, visited) ||
hasPathStack(matrix, rows, cols, row - 1, col, str, pathLength, visited) ||
hasPathStack(matrix, rows, cols row, col + 1, str, pathLength, visited) ||
hasPathStack(matrix, rows, cols row + 1, col, str, pathLength, visited) ;
if (!hasPathStack) {
--pathLength; // backtracking
// visited set to false
}
}
[from the coding interviews book]