Cluster computing

Wednesday, May 7, 2014

We look at a few more coding questions today. First we look at breadth first traversal and depth First traversal. Depth first traversal we can write recursively as follows with no additional data structure.
DFS-Visit(Root)
{
mark root as partially visited
for each vertex v adjacent to Root
if color of v = unvisited
set parent of v
DFS-visit(v)
mark root as visited
}
Next we look at breadth first search
In BFS, we maintain an additional data structure to store the siblings at any level.
BFS(G, s)
{
// initialize
ENQUEUE(Q,s)
While Q is not null
u = DEQUEUE(Q)
for each v belonging to Adj[u]
do if color of v = unvisited
then color it partially visited and
ENQUEUE(Q)
mark u as visited
}

I'm going to cover a few more questions but first I wanted to take a minute. In particular I want to cover graph based coding questions and revisit to distributed computing.

I will take a small example for edit distance first :
Find the minimum edit distance between two input strings. Edit distance is the minimal number of edit operations to modify a string from one state to the other.
For eg : Saturday can be changed to Sunday using the following transformations:
Saturday -> Sturday
Sturday -> Surday
Surday -> Sunday
if we denote the edit distance by a function f(i,j) between the substring of the first string ending with the j'th character and the substring of the second string ending with the i'th character. Then we know that f(i,0) = i since the because we need to delete that many characters to get an empty string. Similarly f(0,j) = j because the empty string is at the beginning of that same substring ending with the 0th character.
If the jth character of the first string is different from the ith character of the second string, then
there are three different techniques involved in the transformations and finding the edit distances. These are insertion, deletion and substitution.
We note that we can make the lengths equal by deleting the characters from the longer string or inserting the characters into the shorter string. We will shortly see how.
If the jth character of the first string is the same as the ith character of the second string, then the edit distance f(i,j) is the same as f(i-1,j-1)
If there is substitution involved then the edit distance f(i,j) = f(i-1,j-1) + 1
if we insert the ith character of the second string into the first, then edit distance f(i,j) = f(i - 1, j) + 1
If we delete the jth character of the first string, then the edit distance f(i,j) = f(i, j-1) + 1
Thus if the i != 0 and j != 0 and string [i] != string [j], then the edit distance is the minimum of
f (i-1,j) + 1 , f(i, j - 1) + 1 and f(i - 1, j - 1) + 1
With the above formula comprising of case i = 0 case j = 0 case string[i] == string[j] and string[i] != string[i,j] we can create a table of edit distances by enumerating the characters of the first string as columns and those of the second string as rows and using the substrings to compute edit distances for any pair of substrings. Note that we introduce a column at the beginning of all rows and we introduce a row at the beginning of all columns to hold the position for the empty string and we initialize the value of this first cell as zero. we can computer the edit distance for other pairs of substrings starting from here. At each additional row or column, we use the edit distance from the previous row or column and we find the minumum value from the three operations listed above .At the diagonal end of the first cell which is the last cell we will compute the edit distance of the entire first string and the entire second string.

Algorithm: The function to implement the above has the following loops
first loop to set the value of the first row ( corresponding to empty string )
second loop to set the value of the first column ( corresponding to the empty string)
then for each pair of characters from the first and the second string,
if they are equal we set it to the previous diagonal value
else (they are different and) we find the costs for insertion, deletion and substitution and choose the minimum with which we update the corresponding position in the table

We quickly review some graph algorithms next:

Minimum spanning trees:
Kruskal's algorithm:

// Make-set of all vertices
sort the edges of E into non-decreasing order by weight w
for each edge (u,v) belonging to E taken in that order
connect the edge if they belong to different sets

Prim's algorithm:
Put a vertex in the min-priority Q sorted by the weights
While Q is not empty,
extract the minimum
for each vertex v belonging to adjacents of u
if v belongs to Q and w(u,v) < key(v)
connect v to u and set its key

Shortest paths are found by relaxing the edges

Djikstra's shortest path:
Initialize-single-source(G,s)
Initialize Q with V[G]
While Q is not empty
u = extract-min(Q)
add u to shortest path vertices
for each vertex adjacent to u
relax (u,v,w)

Cluster computing

Wednesday, May 7, 2014

No comments:

Post a Comment