Cluster computing

Wednesday, April 10, 2013

interview question answers continued

Q: Given an integer array find if there exists three integers that add up to zero.
A: This is another integer array problem where we are interested in a particular sum. Since we are looking for three integers the naiive solution looks O(n^3) but this is not the case. First we could just sort the elements, then for each element we encounter we know the complement that makes the sum zero.For each complement, we know the fractions has to be earlier than the number we have just encountered. Thus we can proceed better in this case if the elements are sorted.

Q: Given a set of login and logout pairs of users, find the number of users online at any given point of time.
A: From the start of the list of logins and logouts, populate a dictionary of users and their active logins by incrementing a counter for every login of that user and decrement for every logout of that user until the last entry. This gives the number of active logins for each user.

Q: Another integer array is given where there are repeating zeros. Compact the array by erasing the zeros.
A: The problem hints on using the same storage to get the resultant array. So we can lean on multiple passes on the array if we want. Hence, we could make one pass to find the index of all occurances of zero. Then for each occurance, shift all the elements following it. We could shift more than one at the same time if the occurances are consecutive. Another improvement could be to shift only the elements upto the next occurance and keep the zeros consecutive at the next occurance.

Q: Given a set of non-overlapping integer range and an input integer, now design a data structure to store them and have operations of insert, delete and search
A: A double dimensional array seems sufficient for the operations where the integer ranges could traverse one or more of the jagged array elements. For each element we keep track of the end of a range and the start of another.

Q:How can you tell if a given string is of the form aZb where a is the reverse of b ?
A: If a and b can be empty strings almost all strings can satisfy that form otherwise compare start and end index elements and update the indexes on a match.

Monday, April 8, 2013

interview question answers continued

Q: In Java differentiate between final, finally and finalize ?
A: The keyword final in Java are used to denote that which cannot be inherited further. The finally is used for mandatory invocation after try catch exception handling and the finalize is used by the runtime for garbage collection of objects.

Q. Given two arrays, how will you find the intersection ?
A. If intersections is defined as the set of elements that appear in both arrays in no particular order, and both the arrays can be arbitrarily large or have elements from an arbitrary range, then one approach could be to sort both arrays and walk down both arrays element by element and put the common ones in a bag until one of the arrays gets over.

Q. Given two squares by their four corners coordinates, find the coordinates of their intersection.
A. Take the minimum and maximum of all x coordinates and the same for y coordinates. If these are of the same square, one is contained in the other. Otherwise they are either overlapping or disjoint, which you can tell from the diagnoally opposite ends of the min and max. If they do overlap, we already have the co-ordinates for two diaognally opposite corners and other two are common points to both squares.

Q. What is the difference between a hash table and a hash map.
A. A hash table stores values based on the hash computed from the value. A hash map stores key value pairs based on the hash of the key. In C++ a hash map is used where < is not defined or unsuitable for the intended key.   A good hash function is a pre-requisite for both or there can be collisions for some entries.   For example :
template<class T> size_t Hash<T>::operator() (const T& key) const
{
size_t ers = 0;
size_t len = sizeof(T);
const char* p = reinterpret_case<const char*> (&key);
while (len--) res = (res << 1) ^*p++;
return res;
}

Q: What is the significance of the term "dead beef"
A: Initialized memory is usually set to 0xEF bit pattern to indicate the object at this memory has not been used. Similarly 0xCC is used to denote uninitialized memory or no man's land.

Q: Write a C code to find the depth of a binary tree.
A: int GetDepth(Node* root)
     {
       if (root == null) return 0;
       int depth = 1;
       if (root->left == null && root->right == null) return depth;
       int subtree = GetDepth(root->left);
       if (subtree >= depth) depth = subtree+ 1;
       subtree = GetDepth(root->right);
       if(subtree >= depth) depth = subtree + 1;
       return depth;
     }

Sunday, April 7, 2013

interview question answers continued

Q: Given an array of integers ( both positive and negative in random order ), find the subarray with the largest sum.
A: This has a surprisingly simple solution which keeps track of a local and global sum. If the inclusion of the next number does not improve our sum so far, we reset our sum to this number and proceed for the length of the array.

Q: How are cookies passed in the HTTP protocol ?
A: The server sends a set-cookie:name=value to the client, in response to which the client sends a cookie: name=value in its request header.

Q: How will you search for a word in a very large database ?
A: There are builtin operators that lets you search for text such as Contains, like, and fulltext. If we don't know the table we need to look in, we can use full text search.

Q: Explain the functioning or working of a search engine like Google :
A: As explained in the anatomy of a search engine paper, here are some of the steps:
1. A URL server sends lists of URLs to be fetched to the crawlers.
2. Several distributed crawlers download web pages based on this list.
3. The web pages that are fetched are then sent to the store server.
4. The store server then compresses and stores the web pages in a repository.
5. Every web page has an associated ID number called a docID which is parsed whenever a new URL is parsed out of a web page
6. An indexer and sorter, reads the repository, uncompresses the documents and parses them to index the words occurrences called hits.The indexer distributes these words into a set of barrels. It also stores the links in an anchor file
7. The URL resolver reads the anchors file and converts relative URL into absolute URL and into docID. It also generates a database of URL pairs to compute page ranks.
8. The sorter takes the barrels sorted by docID and resorts them by wordID to generate the inverted index.The sorter also produces a list of wordIDs and offsets into the inverted index.
9. A program called DumpLexicon takes this list together with the lexicon produced by the indexer and generates a new lexicon to be used by the searcher.
10. The searcher is run by a web server and uses the lexicon built by the DumpLexicon together with the inverted index and page ranks to answer queries.

Saturday, April 6, 2013

interview questions answers continued

Tree Flattening
Question : How do you flatten a binary tree ?
A binary tree can be flattened into an array by storing the left and the right siblings at index positions 2i and 2i+1 as in the case of a heap.
A binary tree can be flattened into a linked list, breadth first by keeping a queue to keep the siblings.
So as you encounter nodes, enqueue it and as you dequeue the nodes, enqueue the siblings. You can have pointers to the next sibling at the same level. Or you can mark the leftmost nodes in the tree in your linked list so that all the siblings breadth wise at the same level in the tree occur between two such marked nodes.
There are some other solutions as well using recursion or storage by keeping track of positions in the tree.

Design a class library to writing game cards.
Game cards is a collection of distinct cards that lend themselves to various groupings and operations. Use a collection that works with LINQ like methods

How will you write a find and replace tool ?
A find and replace tool takes two input strings - one that is used for searching and the other that is used to replace the occurance. These two are independent operations. When the text is to be deleted, the replacement string is empty. In such cases, rest of the string after the occurance is moved or copied. Search for the next occurance can resume from the current location or all occurances can be found initially.

There is a very large array, in which all the numbers are repeated once except one number. Tell how will you find that number
We can maintain a candidate list and an exclusion list. Numbers that we have seen before are moved to the exclusion list and skipped.

Friday, April 5, 2013

interview questions answers continued

Question : How do you find the edit distance in strings ?
Answer : You calculate the Levenshtein distance between strings with memoization. This is how it works:
If one of the input strings is empty, return the length of the other string as the distance D.
D(i,0) = i
D(0,j) = j
In a zero based index, d(i,j):
1) D(i-1, j) + 1 ( i.e. consider with a new left string letter )
2) D(i, j-1) + 1 ( i.e. consider with a new right string letter )
3) D(i-1, j-1) + cost where cost = 1 if S[i] != S[j] else cost = 0;

Tail recursion could also be considered with the same three types of breakouts. Cost can be 1 if the elements don't match and 0 otherwise.

If we took a sample data like the following, we would have a distance matrix as follows:

2 T 2 1 0
1 A 1 0 1
0 C 1 1 2
       B A T
       0   1   2

memo = {}
#i is the start index of str1, j is the start index of str2
def LevenshteinDistance(str1, i, len1, str2, j, len2):
    l = [i,len1,j,len2]
    key = (',').join(str(l))
    if(memo.has_key(key)):
        return memo[key]
    if(len1 == 0):
        return len2
    if(len2 == 0):
        return len1
    cost = 0
    if(str1[i] != str2[j]):
        cost = 1;
    dele = LevenshteinDistance(str1, i+1,len1-1, str2,j,len2)+1
    inse = LevenshteinDistance(str1,i,len1,str2,j+1,len2-1)+1
    upda = LevenshteinDistance(str1,i+1,len1-1,str2,j+1,len2-1)+cost
    dist = min(dele, inse, upda)
    memo[key] = dist
    return dist
print LevenshteinDistance("cat", 0, 3, "bat", 0 ,3)
def min(a, b, c):
    d = a;
    if (d > b):
        d = b
    if (d > c):
        d = c
    return d

prints 1

C++ version:
map<string, int> m;

int getLD(char* str1,int i,int len1, char* str2, int j, int len2)
{
stringstream sout;
sout << i << "," << len1<< "," << j << "," << len2;
    key = sout.str();
    if(m.find(key) != m.end())
        return m[key];
    if(len1 == 0)
return len2;
    if(len2 == 0)
        return len1;
int cost = 0;
    if(str1[i] != str2[j])
        cost = 1;
    dele = GetLD(str1, i+1,len1-1, str2,j,len2)+1;
    inse = GetLD(str1,i,len1,str2,j+1,len2-1)+1;
    upda = GetLD(str1,i+1,len1-1,str2,j+1,len2-1)+cost;
int dist = min(dele, inse, upda);
    m[key] = dist;
return dist;
}

int min(int a, int b, int c):
{
int d = a;
    if (d > b)
        d = b;
    if (d > c)
        d = c;
    return d;
}

Thursday, April 4, 2013

interview question answers continued

Another Interview question is how do you find an integer in a circularly sorted integers, then tell how will you find an integer.
In this case there is one factor working for us - which is that the integer collection is sorted and the another that we could choose to ignore if we knew the length of the array. A circular array index is bounded by the modulo length. We can confirm if the array is circular if the link and skip level link meet at the same node.
Starting with any position, the integers in one direction are monotonically increasing or decreasing in nature and in the other direction may or may not be so. So you start with the first and last index positions, find the mid point and look in either half in each iteration, update the start and end to select a half until a find or if start and end cross over.

public int FindSorted(int[] input, int val)
{
int start = 0;
int end = input.Length - 1;
while (start < end)
{
int mid = start + (end - start) / 2;
if (input[mid] == val) return mid;
if (input[start] <= input [mid])
{
if ( input[start] <= val && val < input[mid])
end = mid - 1;
else
start = mid + 1;
}
else
{
if (input[mid] < val && val <= input[end])
start = mid + 1;
else
end = mid - 1;
}
}

if (input[start] == val) return start;

if (input[end] == val) return end;
return -1;
}

Wednesday, April 3, 2013

interview questions answers continued

This follows my earlier post on interview questions and answers:
Question: If you are given a string of length N and M strings each of length L, How will you find that each of the M strings occurred in the larger string ?
Answer: There are several approaches to this problem.
The first approach is to iterate for each of the M strings to see if it occurs in the original string.
For each pattern that we match, we can do one of the following:
1) Brute force method : Iterate through each character of the original string to see if its the start of the pattern. For the length of the pattern, match the subsequent characters to see if the pattern exists.
2) Finite - automata: This is an improvement on the pattern match part. We build a string matching automaton that lets us complete the pattern match in linear time.
3) All match at once: This is an improvement on the iteration of all the patterns. Instead of iterating over each of them one by one, each pattern match and automaton progress can be stored as the characters of the original string are iterated.