Saturday, October 25, 2014

Today  I want to review a chapter first from the programming collective intelligence book. In particular, let us look at a few numerical methods:
There are a number of numerical functions available:
1) Trigonometric functions like sine, cosine, and tangent.
2) Other mathematical functions like power, square root, and absolute value.
3) Statistical distributions, such as a Gaussian
4) Distance metrics, like Euclidean and Tanimoto distances
5) A three parameter function that returns -1 if the first parameter is between second and third
function (a, b, c){
if ( b <= a && a= < c) return -1;
return 0;
}
6) A three parameter function that returns 1 if the difference between the first two parameters  is less than the third
function (a, b, c){
if ( Math.abs(a-b) < Math.abs(c)) return 1;
return 0;
}

Numerical methods are reactive. They are dependent on the input. This is the right approach for mathematical functions, however for efficiency programs have to store information for the next round.
One way to do this is to create tree nodes that can store and retrieve values from predefined slots. A store node has a single child  and an index of a memory slot, it gets the result from its child and stores it in the memory slot and then passes this along to its parent. A recall node has no children and simply returns the value in the appropriate slot. If a store node is at the top of a tree, the final result is available to any part of the tree that has the appropriate recall node.
In addition to the individual memory, there can be shared memory set up where programs can share a set of slots that all have access to read and write. This might remind of parallel computing with higher levels of cooperation and competition.

Some other functions include :
Gini Impurity : This is a measure of how impure a set is. If you have a set of items such as [A,A, B,B,B,C,C] then Gini impurity tells you the probability that you would be wrong if you picked one item and randomly guessed its label. If the set were all A instead and your guess was A, then you would never be wrong and you would have a pure set.
Ig (i) = 1 - sum_j_=_1_to_m  f(i,j)^2  = sum_j<>k f(i,j)f(i,k)

Entropy is another way to see how mixed  a set is:
It's a measure of how surprising a randomly selected item from the set is. For a pure set, the entropy would be zero.
H(X) = sum_i_=1_to_n_p(xi)log2( 1  / p(xi)) = - Sum_i=1_to_n p(xi).log_2_p(xi)

Variance is how much a list of numbers varies from the average value. It is found by averaging the squared difference of every number from the mean:

Sigma ^ 2 = (1 / N ) Sum_i_1_to_N(xi-x-average) ^ 2

def variance(vals):
  mean = float(sum(vals))/lens(vals)
  s = sum([(v-mean)**2 for v in vals])
  return s/len(vals)

Gaussian function is the probability density function of the normal curve:
G with a variance of sigma = (1/sigma.sqrt(2.pi)) exp(-(x-mu)^2 / 2 sigma^2 )
import math
def Gaussian(dist, sigma=10.0):
 exp = math.e**(-dist**2/ 2*sigma**2)
return (1/(sigma*(2*math.pi)**.5))*exp

We discussed the non-negative matrix factorization method earlier that simply requires you to call the factorize function with a list of observations and the number of underlying features and gives you the weights and the features. It performed the factorization based on the following four update matrices:
hn the transposed weights matrix multiplied by the data matrix
hd the transposed weights matrix multiplied by the weights matrix multiplied by the features matrix.
wn the data matrix multiplied by the transposed features matrix.
wd the weights matrix multiplied by the features matrix multiplied by the transposed features matrix.
The weights and the features computed may not be the same all the time for small data but for larger data they are usually consistent.

Friday, October 24, 2014

Today we continue our discussion on matrix methods of optimization. We mentioned linear and second order methods. Newton's method is a good example of the latter. It finds a solution to an equation in one unknown. This method is an iterative method that seeks a critical point of the cost function f ( the gradient to f tends to zero ) by selecting an update vector at xk along which the directional derivative of grad f is equal to -grad f(xk). The second order information on the cost function is incorporated through the directional derivative of the gradient.
The iteration of this method now follows:
From an initial point x0 for the function F in domain R superscript N, it constructs a sequence of iterates according to xk+1 = xk - F(xk) / F'(xK+1), where F' denotes the derivative of F.
xk+1 corresponds to the intersection of the tangent to the graph of F at xk with the horizontal axis.
xk+1 is the solution to first order Taylor expansion of F around xk.  which is written in a linear form as F(xk)  + F'(xk) [xk+1  -xk] = 0
Thus this method is a linear approximation to a non-linear equation and has a convergence near the roots of the equations.
This method is then applied to finding the critical point of a cost function f on R n simply by taking F(x) as grad f which is the transpose of the partial derivatives of the function f from 1 to n. On a plane this is the Euclidean gradient of f. Instead of finding the difference the xk+1 - xk, we find xk+1 from the tangent vector in the tangent space xk The step control is either line search or trust region
Newton's method is used as the default method in computing FindRoot.

#coding exercise
Write an RBAC open source program
along the lines of  Permission.Demand()
and declarative [PermissionAttribute(SecurityAction.Demand Role="User")]

function ( role, permissions)
{
   switch (role)
   {
          case user:
                  if (has_permission(permission)) // lookup table
                     return Operation.Allowed;
                  return Operation.Disallowed;
        case Admin:
                if (for_trustedbase(permission))
                    return Operation.Disallowed;
                return Operation.Allowed;
       default:
                 throw;
   }
}

Thursday, October 23, 2014

#codingexercise

Putting optimization technique to applications

Let us write a program to layout users and their resources. So there may be many users sharing many resources. Due to the nature of the problem we could  have users and resources as two parallel rows and connect them with crosses. But let us also add some constraint, each user is matched with one share. Let us label them both with the same name. However their order is not necessarily the same. First let us connect the users to their resources with little or no intersecting lines. Then we overlay the many to many problem.

When connecting the users to the resources on a one-to-one basis and keeping them from crossing each other, one technique to avoid the crossing would be to see that the pairing between users and resource progress incrementally and we skip over those that aren't. To do so, we keep one of them ordered and then compare the other. Let us do this by sorting the set based on users first.
Let us say there are n users. We initialize the longest incremental subsequence (aka lis) to one for each of these users.
Then  starting from the second to the last user, for each one, we start iterating over all the previous users to see which one will increase our count of incremental subsequences and choose the maximum.
for (int i = 1; i < n; i++) {
   for (int j = 0; j < i; j++) {
     if ( users[j].resource < users[i].resource  // we have an incremental pair
                  && lis[i] < lis[j] + 1)
                         lis[i] = lis[j] + 1;
     }
 }
return lis.max();

When connecting many users and resources, we take the users and resources to be the nodes of the graph with existing edges that we have drawn and then draw the remaining edges based on the simulated annealing approach discussed earlier.

In our previous post on use of optimization for flight searches from the Programming Collective Intelligence book, we discussed optimizing for costs and then by preferences. We now explore other optimization techniques such as matrix methods from a paper by Gill, Murray, Saunders, Wright et al. Here the optimization can be written in the form of matrices.
In particular we will look at the sparse matrix methods in optimization. As with most optimization techniques we solve a set of linear equations : Bkyk = bk.  Both direct and iterative equation solvers are used.
While a single equation system was being solved predominantly, soon there was a need for different systems of equations to be solved together. The single system solution was then applied repeatedly to a sequence of modified systems. The speed of factorizing a matrix then becomes less important than solving the subsequent many systems Furthermore, the solution  Bk is related to Bk-1
The simplest example of the use of matrices for such problems is the example of linear programming which tries to find the optimal solution as a region bounded by a set of linear equations in a convex polytope. It has applications in routing, scheduling, assignment and design.
Linear programming has been applied to problems of all sizes, however the relative costs of the steps of the optimization methods changes when the problems become large. For example, when there is no constraint, the cost function is measured based on the number of evaluations. But subsequent recomputations are more expensive because there is more data structure and computing involved. Instead we expressed the evaluation in a way where the (k+1)th iterate is defined as xk+1 = xk + alpha-k.pk and dependent on the previous one.
 We will discuss Newton and Conjugate gradient methods in this regard.

Wednesday, October 22, 2014

#codingexercise
Int MaxR (ref List <int>  items, int start, int end)
{
If (start > end || items == null || items.length == 0 || start < 0 || end >= items.length) throw new Exception();
If (start == end) return items [start];
Int mid = checked ( (start + end) / 2);
Int left = MaxR (ref items, start, mid);
Int right = MaxR (ref items, mid +1, end);
Return left > right ? Left : right;
}

In continuation of our posts on the applications of machine learning from the book Programming Collective Intelligence, we will now see an interesting application of optimization in the real flight searches. The Kayak API is a popular vertical search engine for travel. After registering for developer access to the Kayak package which gives us a developer key, we get a new Kayak Session from the Kayak API server. With this session, we can do different flight searches. Note that the need for a session, the use of XML based response, and the extra long URLs based with query strings comprising of several different search parameters, are all reminiscient of the APIs of those times. With just the query parameters for session Id, destination and depart date, it returns a search id with no results that can be used for polling. With the search id and the session id, we can write the polling function as one that requests the results until there are no more. There is a tag called morepending which has a value true until its all done. The prices, departures and arrivals list retrieved from the query is zipped into a tuple list. The flights are returned primarily in the order of price and secondly in order of time. When we try to create the schedule for the arrivals and departures of the Glass family where the family members visit from different places and leave after the holiday, this flight search is simply performed over and over again for their outbound and return flights for each individual.  Then we can optimize the flights using the actual flight data in the costing function.  The Kayak searches can take a while, we are merely considering this search for the first two passengers. The costing function now takes the domain as a range multiplied by the count of the outbound and return flights of these two passengers and run the same optimization as discussed earlier using this domain. The actual flight data gives us more interesting data ways to expand our optimization.
This far we have discussed optimization based on costs. However, there is such a thing as preferences which may or may not be directly tied to costs. Preferences are user attributed.and this is a ranking that makes the people happy or as little annoyed as possible. The information is taken from the users and combined to produce the optimal result. For example, everyone may want a window seat on the same flight but that may not be possible.  If we make the cost function to return very high values for invalid outcomes, we can resolve this conflict. 

Tuesday, October 21, 2014

#codingexercise
Convert numbers to Roman
public string GetRoman(int t)
{
List<int> numbers = {1000, 900, 500, 400, 100, 90, 50, 40, 10, 9, 5, 4 ,1 };
List<str> numerals = { “M”, “CM”, “D”, “CD”, “C”, “XC”, “L”, “XL”, “X”, “IX”, “V”, “IV”, “I”};
String ret = string.empty;
int i = 0;
while (t > 0)
{
  int digit = t / numbers[i];
  For (int j = 1; j < digit; j++;)
  {
   ret += numerals[i];
  }
  
  t -= digit * numbers[i];
  i = i + 1;


Return ret;
}

#codingexercise
Public int getInteger(string numeral , dictionary<string, int> dict)
{
Int I = 0;
Int number = 0;

While ( I < numeral . length )
{
If ( I + 1 < numeral.length && dict[numeral [i+1] ]> dict [numeral[i]])
{
Number += dict [numeral.substring(i,2)];
I = I + 2;
}
Else
{
Number += dict [numeral [i]];
I = I + 1;
}
}
Return number;
}