Cluster computing

Thursday, October 23, 2014

In our previous post on use of optimization for flight searches from the Programming Collective Intelligence book, we discussed optimizing for costs and then by preferences. We now explore other optimization techniques such as matrix methods from a paper by Gill, Murray, Saunders, Wright et al. Here the optimization can be written in the form of matrices.
In particular we will look at the sparse matrix methods in optimization. As with most optimization techniques we solve a set of linear equations : Bkyk = bk. Both direct and iterative equation solvers are used.
While a single equation system was being solved predominantly, soon there was a need for different systems of equations to be solved together. The single system solution was then applied repeatedly to a sequence of modified systems. The speed of factorizing a matrix then becomes less important than solving the subsequent many systems Furthermore, the solution Bk is related to Bk-1
The simplest example of the use of matrices for such problems is the example of linear programming which tries to find the optimal solution as a region bounded by a set of linear equations in a convex polytope. It has applications in routing, scheduling, assignment and design.
Linear programming has been applied to problems of all sizes, however the relative costs of the steps of the optimization methods changes when the problems become large. For example, when there is no constraint, the cost function is measured based on the number of evaluations. But subsequent recomputations are more expensive because there is more data structure and computing involved. Instead we expressed the evaluation in a way where the (k+1)th iterate is defined as xk+1 = xk + alpha-k.pk and dependent on the previous one.
We will discuss Newton and Conjugate gradient methods in this regard.

Wednesday, October 22, 2014

#codingexercise
Int MaxR (ref List <int> items, int start, int end)
{
If (start > end || items == null || items.length == 0 || start < 0 || end >= items.length) throw new Exception();
If (start == end) return items [start];
Int mid = checked ( (start + end) / 2);
Int left = MaxR (ref items, start, mid);
Int right = MaxR (ref items, mid +1, end);
Return left > right ? Left : right;
}

In continuation of our posts on the applications of machine learning from the book Programming Collective Intelligence, we will now see an interesting application of optimization in the real flight searches. The Kayak API is a popular vertical search engine for travel. After registering for developer access to the Kayak package which gives us a developer key, we get a new Kayak Session from the Kayak API server. With this session, we can do different flight searches. Note that the need for a session, the use of XML based response, and the extra long URLs based with query strings comprising of several different search parameters, are all reminiscient of the APIs of those times. With just the query parameters for session Id, destination and depart date, it returns a search id with no results that can be used for polling. With the search id and the session id, we can write the polling function as one that requests the results until there are no more. There is a tag called morepending which has a value true until its all done. The prices, departures and arrivals list retrieved from the query is zipped into a tuple list. The flights are returned primarily in the order of price and secondly in order of time. When we try to create the schedule for the arrivals and departures of the Glass family where the family members visit from different places and leave after the holiday, this flight search is simply performed over and over again for their outbound and return flights for each individual. Then we can optimize the flights using the actual flight data in the costing function. The Kayak searches can take a while, we are merely considering this search for the first two passengers. The costing function now takes the domain as a range multiplied by the count of the outbound and return flights of these two passengers and run the same optimization as discussed earlier using this domain. The actual flight data gives us more interesting data ways to expand our optimization.
This far we have discussed optimization based on costs. However, there is such a thing as preferences which may or may not be directly tied to costs. Preferences are user attributed.and this is a ranking that makes the people happy or as little annoyed as possible. The information is taken from the users and combined to produce the optimal result. For example, everyone may want a window seat on the same flight but that may not be possible. If we make the cost function to return very high values for invalid outcomes, we can resolve this conflict.

Tuesday, October 21, 2014

#codingexercise
Convert numbers to Roman
public string GetRoman(int t)

{

List<int> numbers = {1000, 900, 500, 400, 100, 90, 50, 40, 10, 9, 5, 4 ,1 };

List<str> numerals = { “M”, “CM”, “D”, “CD”, “C”, “XC”, “L”, “XL”, “X”, “IX”, “V”, “IV”, “I”};

String ret = string.empty;

int i = 0;

while (t > 0)

{

int digit = t / numbers[i];

For (int j = 1; j < digit; j++;)

{

ret += numerals[i];

}

t -= digit * numbers[i];

i = i + 1;

}

Return ret;
}

#codingexercise
Public int getInteger(string numeral , dictionary<string, int> dict)
{
Int I = 0;
Int number = 0;

While ( I < numeral . length )
{
If ( I + 1 < numeral.length && dict[numeral [i+1] ]> dict [numeral[i]])
{
Number += dict [numeral.substring(i,2)];
I = I + 2;
}
Else
{
Number += dict [numeral [i]];
I = I + 1;
}
}
Return number;
}

Public string getRoman (int)

Monday, October 20, 2014

#codingexercise
int RomanToInt(string s)
{
if (String.IsNullOrEmpty(s)) return 0;
//regex match string with +[IVXLCDM]
var dict = new Dictionary<char, int> ();
        dict['I'] = 1;
        dict['V'] = 5;
        dict['X'] = 10;
        dict['L'] = 50;
        dict['C'] = 100;
        dict['D'] = 500;
        dict['M'] = 1000;
   int ret = 0;
   char temp;

   stack<char> st;
   st.push(s[0]);

for (int I = 1; I< s.Length; I++)
{
    if (st.count > 0 && dict[s[I]] > dict[st.Peek()])
    {
        temp = st.top();
        st.pop();
        ret = ret - dict[temp];
       st.Push(s[I]);
    }
    else
      st = st.Push(s[I]);
}

while(!st.empty)
{
ret += m[ st.Peek()];
st.pop();

}
return ret;
}
// there are other ways of doing this but I'm just listing the earliest one.

int RomanToInt(string s, int index, Dictionary<char, int> dict)
{
// validate s as not empty, not null and roman numerals only
if (index == s.length) return 0;
if (index - 1 > 0 && dict[s[index]] > dict[s[index-1]])
return (0-2 * dict[s[Index-1]]) + dict[s[Index]] + RomanToInt(s, index+1, dict);
else
return dict[s[Index]] + RomanToInt(s, index + 1, dict);
}

Sunday, October 19, 2014

Non-negative matrix factorization: This is a technique used for extracting the important features of the data. This technique involves matrix multiplication where a matrix with r number of columns is multiplied with another that has the same number of rows r and a cell in the resulting matrix is scalar product of the values in the corresponding same row in the first matrix with same column in the second matrix. It also involves transposing where the matrix elements are written in column order first.
Given a set of articles containing words, we can write the usual article matrix where the articles will be the rows and the words appearing in them will be the columns. The words are the properties for the articles and we take words that are common but not too common :
wordvec = []
for w,c in allw.items():
   if c > 3 and c < len(articlew) * 0.6:
    wordvec.append(w)
This technique is about factorizing the article matrix into two matrices. a features matrix and a weights matrix. The features matrix has a row for each feature and a column for each word. The weights matrix tells us how much each feature applies to each article. Each row is an article and each column is a word.
The articles matrix may be large. The purpose of this technique is to reduce the matrix to a smaller set of common features. This smaller set of features could be combined with different weights to get back the article matrix. Although it is unlikely to get back the original, this technique attempts to get back as closely as possible.
Its called non-negative because it returns features and weights with no negative values. Since we count the inclusions or occurrences of a word, this technique is not applied such that we consider the exclusions of words. This approach doesn't look for the best factorization but its results are often easier to interpret.
The algorithm we use in this approach would have a utility function to determine how close the reconstruction is. This is done with a function that sums the squares of the differences between two equal sized matrices. Then we gradually update the matrix to reduce the cost function.
Here we could use a number of the optimization techniques such as annealing but we discuss multiplicative update rules.
Give the original matrix which this optimization technique calls data matrix, we make four new update matrices
hn = the transposed weight matrix multiplied by the data matrix.
hd = the transposed weights matrix multiplied by the weights matrix multiplied by the features matrix.
wn = the data matrix multiplied by the transposed features matrix.
wd = the weights matrix multiplied by the features matrix multiplied by the transposed features matrix.
The weights and features matrix are initialized with random values.
To update the features and weights matrices, all these matrices are converted to arrays.
Every value in the features matrix is multiplied by the corresponding value in hn and divided by the corresponding value in hd. Likewise, every value in the weights matrix is multiplied by the corresponding value in wn and divided by the value in wd.
This is repeated until the number of features are found or the number of iterations is exceeded.
def factorize(v,pc=10,iter=50):
    ic=shape(v)[0]
    fc=shape(v)[1]
    # Initialize the weight and feature matrices with random values
    w=matrix([[random.random( ) for j in range(pc)] for i in range(ic)])
    h=matrix([[random.random( ) for i in range(fc)] for i in range(pc)])
    # Perform operation a maximum of iter times
    for i in range(iter):
        wh=w*h
        # Calculate the current difference
        cost=difcost(v,wh)
        if i%10==0: print cost
        # Terminate if the matrix has been fully factorized
        if cost==0: break
        # Update feature matrix
        hn=(transpose(w)*v)
        hd=(transpose(w)*w*h)
        h=matrix(array(h)*array(hn)/array(hd))
        # Update weights matrix
        wn=(v*transpose(h))
        wd=(w*h*transpose(h))
        w=matrix(array(w)*array(wn)/array(wd))
    return w,h

Recall that an Eigenvector and corresponding eigenvalue is another way to factor the matrix into vectors and since we know that the det(A-Lambda.I) = 0, we can solve for the Eigen values and Eigen vectors. This may be useful for finding a single feature.
Another way to do the optimization could be to use the updates of the matrix using the Laplace transform of a matrix valued function as mentioned here (http://see.stanford.edu/materials/lsoeldsee263/10-expm.pdf). This is useful when we don't want to find the Eigen values and we can use the resolvents. Moreover the statewise transitions are linear.
For example, the harmonic oscillator function can be written as
x-dot = [   0   1 ] x
             [  -1 0 ]
and the state transitions are x(t) = [cos t sin t] x(0)
                                                       [-sin t cos t]

Similarly the matrix exponential solution is x(t) = e ^(tA) x(0)
where the matrix exponential is e ^ M = I + M + M^2 / 2! + ...

for t = 1 and A = [ 0 1]
                            [ 0 0], the power series has M^2 = M^3 = ... = 0
and e^A = I + A = [ 1 1 ]
                               [ 0 1 ]

We can also explore weighted laplacian matrix for the partitioning.
• Larry Hardesty, Short algorithm, long-range consequences, MIT News, 1 March 2013 is an interesting read on the subject.