Cluster computing

Sunday, December 4, 2016

Find the maximum rectangle under a histogram

Initialize the return value

For each histogram top as the top left of the rectangle,

Find the span it has across the histogram

If this area is greater than max, update max

Return max

Sort n colored objects which are red, white and blue

First pass put all the reds together by exchanging contiguous positions with red only

Second pass put all the whites together.

Largest subarray sum - Kadane’s algorithm

Initialize

max_so_far = 0

max_ending_here = 0

For each element of the array

max_ending_here = max(0, max_ending_here+ A[i])

max_so_far = max(max_so_far, max_ending_here)

Alternatively

Initialize and maintain the following variables

x = sum of elements

y = minimum of sum encountered

z = maximum of sum encountered

For each element of the array

y = min(x, y)

z = max(z, x-y)

Get CRC

Pad a message buffer to length n

Initialize last to the index of the first element of the excess from given polynomial

For each character at index from 0 to last

Xor the message with the polynomial at this char if the message has value

Return the excess portion of the xor result.

More here : https://1drv.ms/w/s!Ashlm-Nw-wnWljrS9T48GT29eGu0

Saturday, December 3, 2016

Some programming questions revisited:

Longest Common Subsequence:

For two sequences X and Y, we determine Z as the LCS by iterating over each element in X as i and Y as j. The length of LCS is

0 if i = 0 or j = 0

C[i-1, j-1] + 1 if i,j > 0 and xi = yj

Max(c[I, j-1], c[i-1,j) if i,j > 0 and xi <> yj

Find longest increasing subsequence

For element I from 0 to length of the sequence

For element j from 0 to I

If criteria and best[j] + 1 > best[I]
Update best[I] with best[j]+1

Merge Intervals
Using a stack initialized with the first interval

If the top of the stack is not overlapping with the next interval, push it on the stack

Update the end time of the top of the stack
Given a string and a dictionary of words, determine if the string can be segmented into one or more dictionary words

Assuming limited dictionary, maintain an array T of one more than string length such that

T[i] = true implies 0 - i - 1 can be treated as a word boundary and segmented using dictionary, t[0] = true

For each position in the string

We evaluate only if there is a match

Foreach word in the dictionary

If the length + position is already a known word boundary or exceeds the string length, we continue

We set this position as a word boundary

We can simplify the code with

Test both the presence of the substring before the position in the dictionary

And recurse for the position beyond the substring

Find the minimum number of one letter transformations from

Each word could be associated with a number of steps. This will help during BFS

Starting with a word

If this is the destination word, return its step number

For each of its positions

Switch the character with one from a to z skipping itself

If the dictionary contains this word

add it to the queue with incremented step number

remove the word from the dictionary

Median of two sorted arrays

Sequentially merge from both arrays by taking the minimum from either

If one gets over, take from the other

Stop after the desired number of elements in merged

Find the maximum rectangle under a histogram

1. Initialize the return value

2. For each histogram top as the top left of the rectangle,

a. Find the span it has across the histogram

b. If this area is greater than max, update max

3. Return max

Friday, December 2, 2016

We continue discussing the paper "Nested sampling for general Bayesian computation" by Skilling
With this technique, nested sampling estimates directly how the likelihood function relates to previous value of the discrete random variable. And it can be used to compare two data models for the same set by comparing the conditional probability that is assigned after the relevant evidence is taken into account.
This method directly computes the evidence. This technique simplifies the evidence calculation by not summing over the parameters directly but instead performing it on the cumulated prior mass that cover likelihood values greater than say lambda. As Lambda increases, the enclosed mass X decreases from one to zero. We were looking at the transformation of computing the evidence based on the prior mass instead of the parameters. We looked at the integration performed. This paper essentially says don't navigate the parameter space. It is sufficient to explore a likelihood weighted space
The method replaces the point of the lowest likelihood by new one drawn from within the likelihood
It increments the evidence by filling in the missing band with weight w = Xj/N for each surviving point. The random values in the range 0,1 suffice.to draw the samples within the constrained likelihood contour. This method also calculates the information H during the iterations as a by product.
There are j iteartive steps. At each step, the procedure has points theta1, theta2, ... thetaN, with corresponding likelihoods L(theta1), L(theta2) .. L(thetaN). The lowest such value Li is the likelihood L associated with step i. Initially the evidence is zero and prior mass is one.
In each iteration, we set the prior mass as exp(-i/N) crude and weight Xi-1 - Xi
The evidence is then incremented by the quantity Li.wi
The point of lowest likelihood is then replaced by the new one drawn from within L(theta) > Li, in proportion to the prior pi(theta)
After all the iterations, we increment the evidence by the sum of the corresponding likelihoods L(theta1), L(theta2) .. L(thetaN) with the weight N^-1 Xj which fills in the missing band of the integral with the chosen weight for each surviving point now that the domain is between 0 and Xj. The weighted summation for evidence is therefore performed for j iterations and N points (j+N).
The paper also shows an example with j = 5 and N = 3 points. Initially the three points are taken from the entire parameter space which is equivalent to taking randomly over X from 0 to 1. Their labels are not known. The worst of these is taken and replaced so that all three are then uniform in a reduced range X1. In step 2, the new point happened to be inner most and is identified as number 5. After five such iterations there are five such discarded points and three survivors. It is over these points that the weighted sum is then evaluated.
Coding exercises updated : https://1drv.ms/w/s!Ashlm-Nw-wnWljZZqmcHRUZNx9PF

Node GetParent(Node root, Node child)
{
If (child == null) return null;
Node parent = null;
While ( root )
{
If (child.val == root.val) return parent;
parent = root;
If (child.val < root.val)
Root = root.left;
Else
Root = root.right;
}
Return null;
}
With this the GetPredecessor changes to
Get predecessor of binary node
Node GetPredecessor(Node root)
{
if (root == NULL) return root;
if (root.left )
{
Node current = root.left;
While (current && current.right)
Current = current.right;
Return current;
}
Node parent = GetParent(root);
While (parent && parent.left == root)
{
root = parent;
parent = GetParent(root, parent);
}
return parent;
}

Thursday, December 1, 2016

Stateless versus Stateful APIs:

Stateless API:

def cmd_execute(cmds):

import shlex, subprocess

from threading import Timer

from subprocess import Popen, PIPE

args = shlex.split(cmds[0])

print(args)

process = subprocess.Popen(args, stdin = subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)

kill_proc = lambda p: p.kill()

timer = Timer(10, kill_proc, [process])

output = ''

try:

timer.start()

for i in range (1,len(cmds)):

process.stdin.write(str.encode(cmds[i]))

output, stderr = process.communicate()

output = output.decode('utf-8')

finally:

timer.cancel()

return output

Stateful API:

import pexpect
global c
c = None
def cmd_execute(cmds):
global c
if c == None:
c = pexpect.spawn ('/usr/bin/python')
c.expect('>>>')
count = 0
output = ''
for cmd in cmds:
#print('cmd='+cmd)
c.sendline(cmd)
c.expect ('.+')
print(c.after)
output += c.after
c.sendline()
c.expect('.+')
print(c.after)
return output
output = cmd_execute(["a='hello'", "a"])
#print(output)
print('c='+repr(c))
output = cmd_execute(["b='world'", "b"])
#print(output)
print('c='+repr(c))
output = cmd_execute(["c=a+b", "c"])
#print(output)
print('c='+repr(c))
"""

a='hello'
>>>
a
'hello'
>>>
c=<pexpect.pty_spawn.spawn object at 0x7f0728dbbcd0>

>>>
b='world'
>>>
b
'world'
>>>

c=<pexpect.pty_spawn.spawn object at 0x7f0728dbbcd0>
>>> c=a+b

>>> c

'helloworld'
>>>

c=<pexpect.pty_spawn.spawn object at 0x7f0728dbbcd0>

"""

#codingexercise

Find the lexicographic minimum rotation of a string
The trick is that for most rotations we just concatenate the string with itself to find all possible rotations as substrings Then we can linearly search for the minimum.
String MinRotation(String input)
{
String min = input;
var concatenated = input + input;
for (int i = 0; i < input.Length; i++)
var select = input.substring(i,n);
if (select < min)
min = select;
return min;
}
Finding the maximum is similarly straightforwardly linear with greater than comparision
if we were to group different rotations by their occurence then we can use a hashtable
Also a few more exercises: https://1drv.ms/w/s!Ashlm-Nw-wnWljZZqmcHRUZNx9PF
int fib(int n)
{
if (n <= 0) return 0;
if (n == 1) return 1;
if (n == 2) return 1;
return fib(n-1) +fib(n-2);
}