Cluster computing

Monday, April 8, 2013

interview question answers continued

Q: In Java differentiate between final, finally and finalize ?
A: The keyword final in Java are used to denote that which cannot be inherited further. The finally is used for mandatory invocation after try catch exception handling and the finalize is used by the runtime for garbage collection of objects.

Q. Given two arrays, how will you find the intersection ?
A. If intersections is defined as the set of elements that appear in both arrays in no particular order, and both the arrays can be arbitrarily large or have elements from an arbitrary range, then one approach could be to sort both arrays and walk down both arrays element by element and put the common ones in a bag until one of the arrays gets over.

Q. Given two squares by their four corners coordinates, find the coordinates of their intersection.
A. Take the minimum and maximum of all x coordinates and the same for y coordinates. If these are of the same square, one is contained in the other. Otherwise they are either overlapping or disjoint, which you can tell from the diagnoally opposite ends of the min and max. If they do overlap, we already have the co-ordinates for two diaognally opposite corners and other two are common points to both squares.

Q. What is the difference between a hash table and a hash map.
A. A hash table stores values based on the hash computed from the value. A hash map stores key value pairs based on the hash of the key. In C++ a hash map is used where < is not defined or unsuitable for the intended key.   A good hash function is a pre-requisite for both or there can be collisions for some entries.   For example :
template<class T> size_t Hash<T>::operator() (const T& key) const
{
size_t ers = 0;
size_t len = sizeof(T);
const char* p = reinterpret_case<const char*> (&key);
while (len--) res = (res << 1) ^*p++;
return res;
}

Q: What is the significance of the term "dead beef"
A: Initialized memory is usually set to 0xEF bit pattern to indicate the object at this memory has not been used. Similarly 0xCC is used to denote uninitialized memory or no man's land.

Q: Write a C code to find the depth of a binary tree.
A: int GetDepth(Node* root)
     {
       if (root == null) return 0;
       int depth = 1;
       if (root->left == null && root->right == null) return depth;
       int subtree = GetDepth(root->left);
       if (subtree >= depth) depth = subtree+ 1;
       subtree = GetDepth(root->right);
       if(subtree >= depth) depth = subtree + 1;
       return depth;
     }

Sunday, April 7, 2013

interview question answers continued

Q: Given an array of integers ( both positive and negative in random order ), find the subarray with the largest sum.
A: This has a surprisingly simple solution which keeps track of a local and global sum. If the inclusion of the next number does not improve our sum so far, we reset our sum to this number and proceed for the length of the array.

Q: How are cookies passed in the HTTP protocol ?
A: The server sends a set-cookie:name=value to the client, in response to which the client sends a cookie: name=value in its request header.

Q: How will you search for a word in a very large database ?
A: There are builtin operators that lets you search for text such as Contains, like, and fulltext. If we don't know the table we need to look in, we can use full text search.

Q: Explain the functioning or working of a search engine like Google :
A: As explained in the anatomy of a search engine paper, here are some of the steps:
1. A URL server sends lists of URLs to be fetched to the crawlers.
2. Several distributed crawlers download web pages based on this list.
3. The web pages that are fetched are then sent to the store server.
4. The store server then compresses and stores the web pages in a repository.
5. Every web page has an associated ID number called a docID which is parsed whenever a new URL is parsed out of a web page
6. An indexer and sorter, reads the repository, uncompresses the documents and parses them to index the words occurrences called hits.The indexer distributes these words into a set of barrels. It also stores the links in an anchor file
7. The URL resolver reads the anchors file and converts relative URL into absolute URL and into docID. It also generates a database of URL pairs to compute page ranks.
8. The sorter takes the barrels sorted by docID and resorts them by wordID to generate the inverted index.The sorter also produces a list of wordIDs and offsets into the inverted index.
9. A program called DumpLexicon takes this list together with the lexicon produced by the indexer and generates a new lexicon to be used by the searcher.
10. The searcher is run by a web server and uses the lexicon built by the DumpLexicon together with the inverted index and page ranks to answer queries.

Saturday, April 6, 2013

interview questions answers continued

Tree Flattening
Question : How do you flatten a binary tree ?
A binary tree can be flattened into an array by storing the left and the right siblings at index positions 2i and 2i+1 as in the case of a heap.
A binary tree can be flattened into a linked list, breadth first by keeping a queue to keep the siblings.
So as you encounter nodes, enqueue it and as you dequeue the nodes, enqueue the siblings. You can have pointers to the next sibling at the same level. Or you can mark the leftmost nodes in the tree in your linked list so that all the siblings breadth wise at the same level in the tree occur between two such marked nodes.
There are some other solutions as well using recursion or storage by keeping track of positions in the tree.

Design a class library to writing game cards.
Game cards is a collection of distinct cards that lend themselves to various groupings and operations. Use a collection that works with LINQ like methods

How will you write a find and replace tool ?
A find and replace tool takes two input strings - one that is used for searching and the other that is used to replace the occurance. These two are independent operations. When the text is to be deleted, the replacement string is empty. In such cases, rest of the string after the occurance is moved or copied. Search for the next occurance can resume from the current location or all occurances can be found initially.

There is a very large array, in which all the numbers are repeated once except one number. Tell how will you find that number
We can maintain a candidate list and an exclusion list. Numbers that we have seen before are moved to the exclusion list and skipped.

Friday, April 5, 2013

interview questions answers continued

Question : How do you find the edit distance in strings ?
Answer : You calculate the Levenshtein distance between strings with memoization. This is how it works:
If one of the input strings is empty, return the length of the other string as the distance D.
D(i,0) = i
D(0,j) = j
In a zero based index, d(i,j):
1) D(i-1, j) + 1 ( i.e. consider with a new left string letter )
2) D(i, j-1) + 1 ( i.e. consider with a new right string letter )
3) D(i-1, j-1) + cost where cost = 1 if S[i] != S[j] else cost = 0;

Tail recursion could also be considered with the same three types of breakouts. Cost can be 1 if the elements don't match and 0 otherwise.

If we took a sample data like the following, we would have a distance matrix as follows:

2 T 2 1 0
1 A 1 0 1
0 C 1 1 2
       B A T
       0   1   2

memo = {}
#i is the start index of str1, j is the start index of str2
def LevenshteinDistance(str1, i, len1, str2, j, len2):
    l = [i,len1,j,len2]
    key = (',').join(str(l))
    if(memo.has_key(key)):
        return memo[key]
    if(len1 == 0):
        return len2
    if(len2 == 0):
        return len1
    cost = 0
    if(str1[i] != str2[j]):
        cost = 1;
    dele = LevenshteinDistance(str1, i+1,len1-1, str2,j,len2)+1
    inse = LevenshteinDistance(str1,i,len1,str2,j+1,len2-1)+1
    upda = LevenshteinDistance(str1,i+1,len1-1,str2,j+1,len2-1)+cost
    dist = min(dele, inse, upda)
    memo[key] = dist
    return dist
print LevenshteinDistance("cat", 0, 3, "bat", 0 ,3)
def min(a, b, c):
    d = a;
    if (d > b):
        d = b
    if (d > c):
        d = c
    return d

prints 1

C++ version:
map<string, int> m;

int getLD(char* str1,int i,int len1, char* str2, int j, int len2)
{
stringstream sout;
sout << i << "," << len1<< "," << j << "," << len2;
    key = sout.str();
    if(m.find(key) != m.end())
        return m[key];
    if(len1 == 0)
return len2;
    if(len2 == 0)
        return len1;
int cost = 0;
    if(str1[i] != str2[j])
        cost = 1;
    dele = GetLD(str1, i+1,len1-1, str2,j,len2)+1;
    inse = GetLD(str1,i,len1,str2,j+1,len2-1)+1;
    upda = GetLD(str1,i+1,len1-1,str2,j+1,len2-1)+cost;
int dist = min(dele, inse, upda);
    m[key] = dist;
return dist;
}

int min(int a, int b, int c):
{
int d = a;
    if (d > b)
        d = b;
    if (d > c)
        d = c;
    return d;
}

Thursday, April 4, 2013

interview question answers continued

Another Interview question is how do you find an integer in a circularly sorted integers, then tell how will you find an integer.
In this case there is one factor working for us - which is that the integer collection is sorted and the another that we could choose to ignore if we knew the length of the array. A circular array index is bounded by the modulo length. We can confirm if the array is circular if the link and skip level link meet at the same node.
Starting with any position, the integers in one direction are monotonically increasing or decreasing in nature and in the other direction may or may not be so. So you start with the first and last index positions, find the mid point and look in either half in each iteration, update the start and end to select a half until a find or if start and end cross over.

public int FindSorted(int[] input, int val)
{
int start = 0;
int end = input.Length - 1;
while (start < end)
{
int mid = start + (end - start) / 2;
if (input[mid] == val) return mid;
if (input[start] <= input [mid])
{
if ( input[start] <= val && val < input[mid])
end = mid - 1;
else
start = mid + 1;
}
else
{
if (input[mid] < val && val <= input[end])
start = mid + 1;
else
end = mid - 1;
}
}

if (input[start] == val) return start;

if (input[end] == val) return end;
return -1;
}

Wednesday, April 3, 2013

interview questions answers continued

This follows my earlier post on interview questions and answers:
Question: If you are given a string of length N and M strings each of length L, How will you find that each of the M strings occurred in the larger string ?
Answer: There are several approaches to this problem.
The first approach is to iterate for each of the M strings to see if it occurs in the original string.
For each pattern that we match, we can do one of the following:
1) Brute force method : Iterate through each character of the original string to see if its the start of the pattern. For the length of the pattern, match the subsequent characters to see if the pattern exists.
2) Finite - automata: This is an improvement on the pattern match part. We build a string matching automaton that lets us complete the pattern match in linear time.
3) All match at once: This is an improvement on the iteration of all the patterns. Instead of iterating over each of them one by one, each pattern match and automaton progress can be stored as the characters of the original string are iterated.

Tuesday, April 2, 2013

stored procedures for semantic or fulltext search

Semantic search stored procedures in SQL Server enable you to register and unregister a pre-populated Semantic Language Statistics database. Fulltext has to be enabled for this to work. To index office 2010 documents, filter packs are available.
User interface with a clean search page that lets users search for entities based on full-text can be built with a full text stored procedure. When full-text is enabled, the queries can specify contains or full-text to return the results. Full text indexing services runs separately and can be expected to be up to date with all changes to the domain objects. Hence the full text stored procedure merely needs to execute a full-text query. The reason we want to store the query in a procedure is because it will give better performance and security. This is especially relevant when the search page is vulnerable to SQL injection attacks.
When the search page uses form to query against one or more fields of domain entities, then there are patterns we can use with service calls or ORM queries or database queries. Some pattern are for example when searching for a name and a SSN, the UI could build the predicate in the context of the domain entity selected on the screen. Another pattern is to query against one or more entities based on common attributes. The goal with the patterns is to classify and streamline the different access patterns to the data. Sometimes a materialized view helps, other times, the ORM may have methods to lookup entities via some top level domain entity.
Search page presentation can be dedicated page or inset into various other pages. Search Results should be accordingly presented.