Friday, November 7, 2014

We now look at Gram-Schmidt conjugation technique. This helps us generate a set of A-orthogonal search directions say di. This technique provides a simple way to generate them. We begin with a set of n-linearly independent vectors ui which are say co-ordinate axes or something better. To construct di, take ui and subtract out any components that are not A-orthogonal to the previous vectors. We begin with two linearly independent vectors u0 and u1. Set d0 = u0. The vector u1 is composed of two components one u+ that is along the direction of u0 and another it's A-orthogonal u*. After conjugation, only the A-orthogonal portion remains and d1 = u*. In the next step we will have two A-orthogonal vectors d0 and d1 and we construct the new one based on the previous two. We find the components along each of the previous A-orthogonal vectors so far. The components are found by taking some length in the direction of the previously found vectors. This length is determined the same way as we mentioned earlier with a pair of A-orthogonal vector. After conjugation, only the A-orthogonal portion remains to be included to our set. We repeat until we have generated a full set of n orthogonal vectors.

def gram-schmidt(set_ui, A):
      set_di = []
      for i in range(len(set_ui)):
           components = 0
           for k in range(i):
               components += component(ui,dk) # - (utranspose A dk) / (dk-transpose A dk)
           di = ui + components
           set_di = di.append(component)
      return set_di


Let us take a closer look at the conjugate directions method. We see that if the search vector is constructed by using the above method with axial unit vectors, it becomes equivalent to performing Gaussian elimination. If we take the orthogonal directions method with the same axial vectors and stretched it, it would look very much like the one with the conjugate directions.

def gaussian-elimination(A,n,b):
     # augment the matrix A with b as the final column
     # decomposition to upper triangular matrix by repeatedly multiplying the first row and subtracting from that row.
     # back-substitution
     for j in range(0:n-2):  # start from the first column (we have to go column order first)
     for i in range(n-1:j+1): # start from last row, repeat for each row above except for first
             # pick a row to calculate the ratio
             rows = [ r in range(0:i-1) if A[r,j] != 0 ]
             if len(rows) == 0:
                      raise ('bad input')
             row = rows[0]
              
             if A[i,j] != 0:
                 ratio = A[i,j]  / A[row,j]
                 for k in range(0:n-1): # set first column to zero, repeat
                                                 # and don't forget the augmented column                 
                   if  ratio > 0:
                       A[i,k] = A[i,k] - ratio * A[row,k]
                       b[i,0] = b[i,0] - ratio * b[row,0]
                   else
                       A[i,k] = A[i,k] + ratio*A[row,k]
                       b[i,0] = b[i,0] + ratio * b[row,0]
      # we now have a matrix with lower triangular zero
      # substitute the last value in the last row to the row above to solve each line
         
                 


Thursday, November 6, 2014

#coding exercise
Bool memcmp (char* src, char* dest, size_t n)
{
If ( !src || !dest || n <= 0) return false;
While (n)
{
If (*dest != *src) return false;
Dest++;
Src++;
n--;
}
Return true;
}

We will continue our discussion on steepest descent method and conjugate directions. The steepest descent method often finds itself taking steps in the same direction. It would have been better if the steps were taken right the first time. How do we correct this ? We take two orthogonal search directions and take steps with just the right length along these directions such that after a certain number of steps we are done. This is what we will discuss in conjugate direction method.

As an example, we can use the co-ordinate axes as search directions. The first step leads to the correct x-coordinate and the second step is the vertical step to reach the center. For each step we choose a point such that it is a step length in the direction di. We use the fact that ei is orthogonal to di, so that we should never step in the direction of di again. Using this we want to determine the step length but we have a catch-22 situation. We don't know the step length without knowing ei and if we knew ei, we wouldn't have to compute at all.  To overcome this, we don't use search directions but instead use A-orthogonal or conjugate directions To picture these conjugate directions we imagine ellipsis that can be expanded or stretched on a bubble to form concentric circles. Our new requirement is that ei+1 is A-orthogonal to di. The benefit of this new orthogonality condition is that it equivalent to finding the minimum point along the search direction di as in the steepest descent. We can prove this by setting the directional derivative to zero. but we look at finding the step length when the search directions are A-orthogonal. The step length can now be calculated just the same way we derived that for the steepest descent and we see that it is expressed in terms of the residual. In fact, if the search vector were the residual, this step length would be the same as in the steepest descent.
While the method of orthogonal directions works only when we know the final destination, the conjugate directions method works in n-iterations. The method of conjugate directions converges in n-steps. The first step is taken along some direction d(0). The minimum point x is chosen by some constraint that e(1) must be A-orthogonal to d(0) The initial error can be expressed as a sum of A-orthogonal components. Each step of the conjugate directions eliminates one of these components. 

Wednesday, November 5, 2014

Today we discuss a new idea for an application. This application provides a personal box for you to use online. You can store digital content or share.it ages and collects the items so that you don't have to bother about cleaning up. Moreover, you can mark items to get burned after some time. You can also safely and securely share items with others using tiny urls or a variety of data formats knowing that it will be temporary. The idea is to bring the garbage collector to you. However it is something that provides very specific functionality and may need to be assessed  for appeal and business value. More on this perhaps later.

Tuesday, November 4, 2014

#codingexercise matrix addition
Int [, ] addition ( int [,] left, int [,] right)
{
If (left == null || right == null) return null;
Int lc  = left.GetLength (0);
Int lr   = left.GetLength (1);
Int rc  = right.GetLength (0);
Int rr  = right.GetLength (1);
If (lc != rc || lr != rr) return null;
Var ret = new int [lr, lc]();
For (int r = 0; r < lr;  r++)
{
   For (int  c = 0; c < lc; c++)
     {
           Ret [r,c] += left [r,c] + right [r,c];
      }
}
Return ret;
}
#codingexercise matrix subtraction

Int [, ] Subtraction ( int [,] left, int [,] right)

{

If (left == null || right == null) return null;

Int lc  = left.GetLength (0);

Int lr   = left.GetLength (1);

Int rc  = right.GetLength (0);

Int rr  = right.GetLength (1);

If (lc != rc || lr != rr) return null;

Var ret = new int [lr, lc]();

For (int r = 0; r < lr;  r++)

{

   For (int  c = 0; c < lc; c++)

     {

           Ret [r,c] += left [r,c] - right [r,c];

      }

}

Return ret;

}

In continuation of our discussion on the convergence of steepest descent method, we will see that there is instant convergence even with a set of eigenvectors. For a symmetric matrix, there exists a set of n orthogonal eigenvectors of A. As we can scale eigenvectors arbitrarily, each of them is chosen as unit-length and the error term is expressed as a linear combination  of this eigenvector.  Then we see that the residual can be expressed as the sum of the eigenvector components. We saw that when the set has only one eigenvector, the convergence is achieved in one step by choosing the inverse of the eigenvalue. Here all vectors have a common eigenvalue and therefore leads again to a onestep convergence.

The lowest value of the function is at the minimum of the paraboloid.

Monday, November 3, 2014

#codingexercise
int[,] Transpose (int[,] matrix, int row, int col)
{
   var ret = new int[col, row]
   for (int i = 0; i < row; i++)
     for (int j = 0; j < col; j++)
     {
           ret[j, i] = matrix[i,j]
     }
    return ret;
}

Assert (matrix = Transpose(Transpose(matrix));

We will continue to look at the convergence of Steepest Descent method. Let's first consider the case where ei is an eigenvector with eigenvalue Lambda-e. Then the residual which is A applied to the negative eigenvector ei, is also an eigenvector.  Since we discussed earlier that the position in the next step can be determined from the position in the current step based on step length and direction, we see that when we express the residuals as eigen vectors, the next residual turns out to be zero. This implies that it takes only one step to converge to the exact solution.  The current position lies on one of the axes of the ellipsoid, and so the residual points directly to the center of the ellipsoid. Choosing gradient as  negative eignenvalue gives us instant convergence.

#codingexercise
Matrix multiplication
Int [, ] multiply ( int [,] left, int [,] right)
{
If (left == null || right == null) return null;
Int lc  = left.GetLength (0);
Int lr   = left.GetLength (1);
Int rc  = right.GetLength (0);
Int rr  = right.GetLength (1);
If (lc != rr) return null;
Var ret = new int [lr, rc]();
For int r = 0; r < lr;  r++;
   For int  c = 0; c < rr; c++
     {
         For ( int k = 0; k < lc; k++)
                     Ret [r,c] += left [r,k] × right [k, r];
}
Return ret;

}

Sunday, November 2, 2014

We resume our discussion on matrix operations. In this post, we will pick up Jacobii iteration. The Jacobii method also solves the same linear equation as before. The matrix A is split into two parts: D whose diagonal elements are identical to those of A, and whose off-diagonal elements are zero and E whose diagonal elements are zero and whose off diagonal elements are identical to those of A. A is the sum of D and E. The usefulness of D is that it is a diagonal matrix and so it can be inverted.
As compared to the steepest descent method, this does not have to look for a value that converges to zero.Using the linear equation and the split of the matrix into the diagonal specific matrices, we can now write equation as an iteration where the next step is calculated based on a constant times the previous step taken together with another constant. Each iteration leads closer to the final solution which is called a stationary point because the next iteration at the solution will be the same as the previous step. Let us take a closer look at what each iteration does. If we express each iterate (current step) as the sum of the exact solution and the error term, then the equation becomes a simple translation in terms of previous step. In other words, each iteration does not affect the correct part of the iterate but only the error term as the iterations converge to the stationary point. Hence the initial vector x0 has no effect on the inevitable outcome. And it also does not affect the number of iterations required to converge to a given tolerance.
We will next discuss spectral radius which determines the speed of convergence in this method.Suppose that vj is the eigenvector of B with the largest eigenvalue. The spectral radius of B  is this largest eigenvalue. Now the error can be expressed as a linear combination of eigenvectors, one of which will be in the direction of vj. and this one will be the slowest to converge because the eigenvalue is proportional to the steepness of the corresponding slope.  This follows from the eigenvectors being aligned along the axes of the paraboloid describing the quadratic form of the function. The rate of convergences depends on this spectral radius and it is in turn dependent on A. Unfortunately, this technique does not apply to all A. However, this method illustrates the usefulness of eigenvectors. The eigenvector paths of each successive error term determine the path of convergence.They converge normally at the rate defined by their eigenvalues.
#coding exercise
Given the matrix in the coding exercise described earlier, print the elements in  a single dimensional sorted order
List<int> GetSortedElements(int[,] matrix, int row, int col)
{
 // assuming parameter validation
  var ret = new List<int>();
  int i = 0;
  int j = 0;
  var front = new List<int>(); // column corresponding to each row of the front
  for (int i = 0; i < row; i++) { front.add(0); } // initialize to first column index for each row
  while (i < row && j < col)
 {
  // extract min of allfront
  var candidate = new List<int>();
  for (int k = 0; k< row; k++)
        candidate.add((front[k] < col ? matrix[k, front[k]] :  INT_MAX);
  int min = candidate.min();
  i = candidate.IndexOf(min);
  front[i] = front[i] + 1;
  j = front[i];
  ret.Add(min);
}
return ret; 
}
23 34 45 46 68 89
27 35 47 69 97 110
32 46 48 65 98 112

Saturday, November 1, 2014

To continue on the coding exercise, there are several other ways to find a number in the matrix like we have described. The one we described had a binary chop method to search for the number in each row of the matrix. The binary chop operation on a single row takes log N time and if there are n rows we are repeating to this to a total complexity of O(N log N) times. We make no use of the sorted order of numbers in the column in this approach and the convergence is linear to the number of rows.
We could go binary chopping across rows as well because the elements are sorted there too and iterate down the columns. That would however put us at no better benefit over the one we have already. It would be a pure chance if one were to work better than the other unless we knew which one of the two - row or column is smaller.
Another technique that does not recognize the sorted order of the numbers and instead uses the divide and conquer approach - is to partition the matrix. One example of this would be to split the matrix into two or more sub-matrices and then recursively apply the solution to each of the sub-matrices. We could easily eliminate a matrix if the number we are looking for lies outside the right bottom entry in the matrix since this is the largest per the description of the matrix.
If we choose for example a point in the center of the matrix, say mid = (row /2, col/2) and then get four sub matrices of different sizes, then we can easily exclude regions of matrices pretty quickly. By excluding submatrices we attempt to converge onto the number in the bigger matrix logarithmically. This is way better but it involves parallel and recursive computation and consequently more resources. The approach is still log N. However note the termination condition of this matrix. It is not that we can say that the bottom right number is greater than the search number and the top left number is smaller than the search number, the number can be checked along the diagonal. To the contrary, if we take the example [ 53 78
                                                       87  92 ] and looking for a number 62, we see that this is not sufficient to go diagonally. We have to exhaust the rows one after another anyways. While it is easier to exclude the matrix by looking at the top left and bottom right numbers in the matrix, the same cannot be said for when a number is bound within them. In such a case there is additional time complexity of going linearly from row to row which may take O(N) time anyways.
This leads us to a potentially faster solution by enumerating the elements of the array in sorted order until we reach the search number. In the example above, we traverse row order or read column order which ever gives us the next higher number and compare it with the previous position. For example,
if we took a matrix
23 34 45 46 68 89
27 35 47 69 97 110
32 46 48 65 98 112
then we start from the top left and progress down the next higher number as if we were snaking our way up the progression. In the above example, we progress down the number from the top left and compare the row order as well as the column order and keeping track of the starting point we return to the number adjacent to the starting point. For example we progress down from 23, 27, 32 and come back to 34 after we have exhausted column order. Similarly we start from 34 and note the next number 45 and traverse down the column to read 34, 35 until we encounter a number that is higher than 45. Notice that the front for the propagation is a diagonal which in this case is  46, 47 and 46. If we keep track of the elements coming out of the front, we can easily sort all the entries in the matrix in a linear order. Each row will contribute one number to the front and we update the front on this row as we include the numbers in our sorted list.. We therefore burn the matrix by the contents across  the front and this is another way to solve the problem in O(N) time. Note that if all the numbers in the front are higher than the desired number, we stop right then instead of burning through the rest of the matrix. Thus there is some optimization possible as well
Lastly, I want to add that there is perhaps modifications possible where instead of keeping track of the front, we simply update the start and the end for each row based on the findings from the preceding row. Thus we optimize our original solution to not repeat the entirety of each row but to take triangularly smaller portions as we go down the rows one after another.

bool Exists ( int[,] matrix, int number)
{
  // assuming parameter validation
  // we go row wise and do a binary chop

  int start = 0;
  int end = matrix.GetLength(0);
  for (int row = 0; row < matrix.GetLength(0); row++)
  {
       start  = 0;
       If ( BinaryChop(matrix, row, number,  start, ref end))
        Return true;
  }
 return false;
}


bool BinaryChop(int[,] matrix, int row, int number, int start, ref int end)
{
  while (start < end)
  {
       Int mid = checked ( (start + end) / 2);
       if (matrix[row, mid] == number) return true;
       if (matrix[row, mid] < number)
             start = mid + 1
       else
             end = mid  - 1
  }
  if (matrix[row, start] == number) return true;
  if (matrix[row, end] == number) return true;
 return false;
}