Cluster computing

Sunday, October 12, 2014

#codingexercise

TwoSum is a method that finds the indexes of two numbers in ascending order in an array that add upto a target. Numbers are positive integers.

List<int> TwoSum(List<int> numbers, int target)

{
if (numbers.Length <= 0 || target == 0) return null;
var ret = new List<int>();
for (int i = 0; i < numbers.Length, i++)
{
int index = numbers.IndexOf(i+1, target-numbers[i]);
if (index != -1)
{
ret.Add(i);
ret.Add(index);
}
}
return ret;

}

DBExplorer investigates the use of keyword search in relational databases. It mentions several interesting problems that are encountered in keyword search as opposed to structured queries. For example, SQL applications require knowledge of schema and the keyword search doesn’t. Secondly, given a set of keywords, a match is found by joining several tables on the fly. Thirdly, Indexes need to be leveraged for efficient keyword search. Fourthly, common data structures used for document collections such as inverted lists are replaced by symbol table. For each keyword, a list of rows is kept in the symbol table that contains the keywords. Alternatively, for each keyword, a list of columns can be kept that contains the keywords. Search is performed across multiple tables using a graph where the nodes are the tables and the edges are the foreign-key relationships. When looking for a co-occurrence, the tables are joined on the fly by exploiting the schema as well as the content.

Saturday, October 11, 2014

#codingexercise
Text justification
This problem requires the text in sentences to be broken and laid out in a way where each line has L characters except for maybe the last line. Words that break on a line move on to the next and padding introduced evenly between the remaining words.
Let X and Y denote sentences with count of characters less than or equal to L and more than L respectively
The test cases are
X
Y
XX
XY
YX
YY

List<string> Justify(string text, int L)
{
if (string.IsNullOrEmpty(text) || L < 1) return null;

var ret = new List<string>();
string candidate = string.Empty;

var words = text.split();
for (int I = 0; I < words.Count; I++)
{
   // add word
   if (candidate.Length + words[I].Length + 1 <= L)
   {
     candidate += words[I] + " ";
     continue;
    }

   // add padding
   if (candidate.Length > 0)
   {
   int padLen = L - candidate.Length;
   if (padLen > 0)
   {
   var w = candidate.Split();
   candidate = string.Empty;
   for (int k = 0; k < w.Count; k++)
   {
       candidate += w[k] + " ";
       for (int l = 0; l < padLen / w.Count && candidate.Length < L; l++)
          candidate += " ";
   }

   if (w.Count > 0 && padLen > 0)
   for (int l = 0; l < padLen % w.Count && candidate.Length < L; l++)
         candidate += " ";
   ret.Add(candidate);
   candidate = string.Empty;
   }

    if (candidate.Length + words[I].Length + 1 <= L)
   {
     candidate += words[I] + " ";
    }
   else
   {
     candidate += words[I].substring(0,L-1) + "-";
     ret.Add(candidate);
   candidate += words[I].substring(L) + " ";
   }

}
ret.Add(candidate);
return ret;
}

Friday, October 10, 2014

I mentioned a SSH library called pexpect which let's you pipe stdin and stdout. Similarly psppass can be invoked from a bash script as well :

#!/bin/bash

# CRUD.sh of resource over SSH

#

if [ -z "$1" ]

then

echo "Usage: `basename $0` [create|update|delete|list] [resource] [rule]"

exit $E_NOARGS

fi

share_=$2_

rule_=$3_

create_Command="create resource"

delete_Command="delete resource"

list_Command="list resource"

case $1 in

"create" ) Command=""$create_Command"" ;;

"update" ) Command=""$update_Command"" ;;

"delete" ) Command=""$delete_Command"" ;;

"list" ) Command=""$list_Command"" ;;

* ) echo "Usage: `basename $0` [create|update|delete|list] [resource] [rule]" ;;

esac

export SSHPASSpassword # use keys instead

sshpass -e ssh -oBatchMode=no root@xyz.com""$Command""

exit $?

OK now where did I leave the change from the bus ride today?

# coding exercise
We implement yet another way to print the area of the largest rectangle in a histogram of unit width heights. In addition to the simple approaches described in the previous posts, we look at recursive approach because there are overlapping solutions and iterative approaches using data structures
One approach is to use divide and conquer.
We can use the overlapping subproblems in terms of the areas computed between a range of indices. further if we can keep track of these indices and computations we dont have to redo them. This tecnique is called memoization. Here we don't compute the maxarea for a range that we have already computed and stored the value. So we keep track of ranges and the max area computed by the increasing order of start and end.Each time we update a range we update the start or end and the corresponding max value or both. How we choose the range of indices is our choice. The goal is to exhaust all the indexes in the histogram so that we don't leave out any portion. One idea here as it has appeared on geeks for geeks website is that we find the minimum value of the heights of bars in a histogram by dividing and conquering the entire range.
Given this minimum we find the max area as the maximum of the following :
1) Maximum area to the left
2) maximum area to the right
3) number of bars multiplied by minimum value
Note that the minimum in a range of bars does not guarantee the maximum area unless the same is applied for all ranges including the one bar that may shoot out of the chart.
Another way of choosing the indexes is to find it progressively to the right as we traverse range from start to end of indexes. In this method, The range that we have already covered, we have exhausted each bar as contributing to the final answer .
Yet another approach would be to divide and conquer the indexes and combine them so we calculate the max area of (a,b,c) as
Max (a, b, c ).
Specifically, given two bars in adjacent ranges, the area is the maximum of
1) minimum common height times number of bars in the combined range
2) maximum area of one range
3) maximum area of the other range.
If we chose the latter approach,
We keep track of the areas computed in the data structure discussed in memoization.
First without memoization, the solution is
Int MaxArea (ref int [] h, int start, int end, ref int min)
{
If (start == end)
{
min = h [start];
return min × 1;
}
If (start < end)
{
Int mid = (start + end)/2;
Int minleft = 0;
Int minright = 0;
Int left = MaxArea (c, ref h, start, mid, ref minleft);
Int right = MaxArea (c,ref h, mid +1, end, ref minright);
min = min (minleft, minright) ;
Int minArea= min × (end-start+1);
Return max (left,right, minArea);
}
Return 0;
}

Thursday, October 9, 2014

I came across a method implemented in a library that gives a programmatic way to establish SSH sessions (Python pexpect library - pxssh.py). This library had a method to compute the Levenshtein distance as follows:

'''This calculates the Levenshtein distance between a and b.

'''

n, m = len(a), len(b)

if n > m:

a,b = b,a

n,m = m,n

current = range(n+1)

for i in range(1,m+1):

previous, current = current, [i]+[0]*n

for j in range(1,n+1):

add, delete = previous[j]+1, current[j-1]+1

change = previous[j-1]

if a[j-1] != b[i-1]:

change = change + 1

current[j] = min(add, delete, change)

return current[n]

As we may know already, Levenshein distance is a metric for measuring the difference between two string sequences. The distance is computed in terms of single-character edits specifically (insertions, deletions and substitutions). The method takes a shorter string and transforms it to the longer string. The delete is calculated from the position to the left in the sequence. The change is to the position on the left in the candidate. The add is at the immediate position in the candidate. The positions are iterated for both sequences. Match gets a value zero and a mismatch costs 1 by way of transformation at the specified positions.