Cluster computing

Friday, April 19, 2013

Building a search page for a web application.

Let us assume that we have a three tier stack to create the search page for a web application. Next let us abstract the store for the data and focus on the controller and views instead. We will require a view model and some WPF controls. We could consider an editable control that filters the list of entities from the data store for each character input. This control merely has a textbox for search. The ViewModels handle the bindings to the Name attribute of entities. Autocomplete feature could also be added subsequently as a separate handler on each character input. This is possible because we can check against IsEditing and IsEditingAllowed flags on what the controls should do.
Changes will be made not only to the view model but also to the views. Initially, this search could appear in a separate page but this is something that could be shared across different views. Moreover, this view could merely hold a single custom control. Since the results of the search and the search control can be a template for different domain entities.
The control for displaying the search text box will be static and it can have a default text ink visible. The text control will have a dependencyproperty and setup for bi-directional flow. Each time the text changes in the control, the handlers evaluate against the entities enumerated from the datastore. The execute button is not required in this case.
Initially, the search results can all be similar and dependent on the type of the results to be returned. This is simpler and allows for users to reuse the search across different domain entities.
However, the search results need not be homogoneous and can have hierarchical data or grouping. The results view is independent of the search performed and can display a list of items in a tree format based on some pre-determined format.
If the data store provider were to be expanded to include full text search that returns results across domain entities, then the entities could be described in a grid but they won't be double clickable as in the simpler version above. The list items shown in the type based search mentioned earlier will each map to the datastore item and can be double-clicked to open a details view.
Many of the steps mentioned above may alread be implemented in the web application for various other purposes. Since the search results filter the enumeration of datastore item based on their names against the text input, the functionality is simple and moreover, the form of the search pane and results can follow the same layout as the rest of the application.

Thursday, April 18, 2013

Interview question answers continued

Q: design a file system object model
A: a file system exposes files and folders in a tree form. Both these are file system objects that are manageable in terms of resolving the location, securing their access and iterating over their contents. Internally, their is a layer of file manager, file, block and disk based access.

class File : private FileSystemObject
{
public:
// data operations
byte[] ReadAllBytes();
string[] ReadAllLines();
fstream Read(FileMode m);
void Encrypt();
void Decrypt();

// metadata operations
void SetAccessControl(FileSecurity s)
void SetCreationTime(DateTime t)
}

class Directory : private FileSystemObject
{
public:
// access operations
void GetFiles();
void GetDirectories();
// metadata operations
void SetAccessControl(DirectorySecurity s);
void SetCreationTime(DateTime t)
}

Wednesday, April 17, 2013

interview question answers continued

Q: How do you determine if a binary tree is binary search tree ?
A: bool IsBST( node* root, int* prev)
{
bool ret = true;
if (root == null) return ret;

ret = IsBST(root->left, prev);
if (ret==false) return ret;

if (*prev >= root->val) return false;
*prev = root->val;

ret = IsBST(root->right, prev);
if (ret==false) return ret;

return ret;
}

Tuesday, April 16, 2013

Chapter 24 DB systems textbook

To build a decision tree:
Input : node n, parition D, split selection method S
Output: decision tree for D rooted at node n

Top-Down decision tree induction schema:
BuildTree(Node n, data partition D, split selection method S)
   Apply S to D to find the splitting criterion
   if ( a good splitting criterion is found)
       Create two children nodes n1 and n2 of n
       Partition D into D1 and D2
       BuildTree(n1, D1, S)
       BuildTree(n2, D2, S)
   endif

Clustering:
Input : node n, partition D, split selection method S
Output: decision tree for D rooted at node n

Top-Down decision Tree induction Schema:
BuildTree(Node n, data partition D, split selection method S)
(1a) Make a scan over D and construct the AttributeValue Classlabels (AVC) group of n in-memory.
(1b) Apply S to the AVC group to find the splitting criterion

A partitioning clustering algorithm partitions the data into k groups such that some criterion that evaluates the clustering quality is optimized. k is specified by the user. A hierarchical clustering algorithm generates a sequence of partitions of the records. Starting with a partition in which each cluster consists of one single record, the algorithm merges two partitions in each step until one single partition remains in the end.

Monday, April 15, 2013

keyword clustering

Wartena 08 describes topic detection by clustering methods. The paper suggests that the topic for the whole text can be found based on a probabilistic latent semantic analysis. Keywords are clustered based on statistical similarity measure. The conditional probability distributions that indicate the latent topic are the clusters that this approach finds. Clustering is based on a hierarchical top down method which requires that the center is recomputed in each step. Two elements are selected that have the largest distance and are used as seeds for the clustering. Next all other items are assigned to the center closest to one of the two seeds. After all the items are assigned, the centers are recomputed. The new centers serve as new seeds and the process is repeated until two centers are converged. If the clusters are larger than a threshold, the whole procedure is repeated on that cluster and thus we find a binary tree of clusters. The choice of distance functions used in this paper for two terms t and s are the cosine similarity of the document distribution, the cosine similarity of vector tf, idf values of keywords, the Jensen Shannon distributions between the document distributions and the Jensen Shannon distributions between the term distributions pt and ps.
When clustering using any distance function, the cluster centers could be chosen among the data points and then centers recomputed in each step.

Saturday, April 13, 2013

interview question answers continued

Q: How do you update the game of a black and white coin game such as Othello ?
A: In the update mode, find the longest sequence of the coin opposite of the color to be played. If this sequence has options to be bounded, place the coins at one of the ends. Choice of ends can be based on the side that has odd number of open spaces. The set of sequence can be based on the state of the game. and can generally be identified by expanding out from the center.

Q: Given the root of a binary search tree along with a node inside it, find the successor node for the given node.
A: The structure of a binary search tree lets us determine the successor of a node without ever comparing keys. If the right subtree of node x is non-empty, then the successor of x is the leftmost node of the right subtree.If the right subtree of node x is empty and x has a successor y, then y is the lowest ancestor of x whose left child is also an ancestor of x.

Q: Given a lot of intervals [ai, bi], you have to find a point with the intersection of most number of intervals.
A: We can maintain a count of inclusions in intervals for each point as we traverse the numberline from the minimum ai to the maximum bi.

Friday, April 12, 2013

interview question answers continued

Q: Give the four basics of object oriented Programming.
A: These are some of the tenets of object oriented programming
1. Encapsulation - define a concept by keeping the data and its operations together
2. Inheritance - defines an "is-a" relationship between derived and base class.
3. Composition - defines a "has-a" relationship between a container and contained class.
4. Polymorphism - defines the same method for different contexts

Q: Given two integers m, n, find the first integer r that is divisible by both m and n.
A: Find the factors of m and n. From the factors choose two or more factors such that the product is bigger than both m and n and less than or equal to m * n.

Q: List and compare the various sorting algorithms
A: 1. Bubble sort : make multiple pass and fix the order of elements. It has an average O(n^2) complexity/
2. Selection sort : finds the minimum and swaps with the first and repeat for consecutives. Another O(n^2) algorithm
3. Insertion sort : played like the cards in a deck, take elements one by one and insert them in position in the sorted list. Average O(n^2)
4. Heap sort : Use the array as a heap and use the heap algorithm to sort. O(nlogn) complexity
5. Merge sort : bottom up merge between pairs O(nlogn) complexity
6. Quick sort : divide and conquer based on partitioning on a pivot value. O(nlogn) complexity