Sunday, September 21, 2014

We will start reviewing a book titled Programming Collective Intelligence from O'Reilly Media in the subsequent blog posts. This book includes such things as making recommendations, discovering groups, searching and ranking, optimization, document filtering, modeling with decision trees, building price models, advanced classification - Kernel methods and SVMs, finding independent features, evolving intelligence and a summary of algorithms. The sample code is presented in Python for readability.
As an example, a Bayesian classifier can help classify a sentence as language for a use of the word python or as a snake for the use of the same word. It works based on the probability P(category | Document) = P(Document | Category) * P(category)
where P(Document | Category) = P(word1 | category) * P(word2 | category) * ....
All we need is a feature extraction that turns the data we're using for training or classification into a list of features basically something that takes an object and returns a list such as a sentence converted into words. This classifier is trained on the strings. Then it can be used to predict the test data.
 docclass.getwords('python is a dynamic language')
{'python' : 1, 'dynamic':1,  'language':1}
c1 = docclass.naivebayes(docclass.getwords)
c1.setdb('test.db')
c1.train('python has dynamic types', 'language')
c1.train('pythons are constrictors', 'snake')
c1.classify(''boa constrictors')
u'snake'
c1.classify('dynamic programming')
u'language'

I have to take a short break for a coding problem and including it in this post before continuing.
How many different distinct subsequences exist for a string T in string S and what is the most efficient way to find it. For example rabbit appears in rabbbit in three different ways. For every match until the length of the given pattern, we explore the remaining of the given input to see if there's a full match by skipping non-matching and repetititive characters. When the full match is found, we increment a counter. Another way to do this would be to use an automata or Rabin Karp method or KMP method.


void KMP(string pattern, string text, vector<int> *positions) {
    int patternLength = pattern.length();
    int textLength = text.length();
    int* next =  PreProcess(pattern);
 if (next == 0) return;
    int i = 0;
    int j = 0;
    while ( j < textLength )
 {
  while(true)
   if (text[j] == pattern[i]) //matches
   {
    i++;   // yes, move on to the next state
    if (i == patternLength)  // maybe that was the last state
    {
     // found a match;
     positions->push_back(j-(i-1));
     i = next[i];
    }
    break;
   }
   else if (i == 0) break; // no match in state j = 0, give up
   else i = next[i];
  j++;
 }
}

int* PreProcess( string pattern) {
 int patternLength = pattern.length();
 if (patternLength == 0) return 0;
    int * next = new int[patternLength + 1];
    if (next == 0) return 0;
    next[0] = -1;  // set up for loop below; unused by KMP

    int i = 0;
    int j = -1;
    // next[0] = -1;
    // int len = pattern.length();
    while (i < patternLength) {
  next[i + 1] = next[i] + 1;
  while ( next[i+1] > 0 &&
    pattern[i] != pattern[next[i + 1] - 1])
   next[i + 1] = next[next[i + 1] - 1] + 1;
  i++;
 }
    return next;

}
 

Saturday, September 20, 2014

Clojure is a general purpose language that targets the Java Virtual Machine, CLR and JavaScript. We briefly mentioned that it compiles to Java Byte Code. The ClojureCLR port which is written in C# enables it to target the CLR as well. Clojure is a dialect of Lisp which enables query language. It offers a transactional memory system and a reactive Agent system that ensures correct multithreaded designs. What sets Clojure apart is the immutable data and the first class functions. It helps to see Clojure language as a platform.
Features include
1) dynamic development - the Read Eval Print Loop (REPL) provides a console interface to try out Clojure commands. Clojure programs need not be compiled and run as in this case.
2) Functional Programming - fn and defn provide ways to define first class functions. Clojure provides a set of immutable lists, vectors, sets and maps where the older version is persisted and the newer versions are not a full copy.
3) Lisp Clojure is a dialect of Lisp for example 'and' is a macro
4) Runtime polymorphism 'proxy helps support implementation of interfaces
5) Concurrent programming shared objects are changed threadsafe with 'dosync', 'ref', 'set' and 'alter'. 'def' and 'binding' support isolating changing state within threads.
6) Hosted on the JVM - compiles to JVM bytecode and supports Java interfaces and classes using 'reify' and 'proxy'.
 Stackoverflow posts gives the following Clojure code for factorial:
(defn fact [x]
   (loop [n x f 1]
         ( if ( = n 1)
                  f
                  (recur (dec n) (* f n)))))
 
// HttpHeaderReader.c : Reads HTTP header and counts them
//
#include "stdafx.h"
#include "stdio.h"
#include "string.h"
static int Hash (const char * pSrc, size_t len);
int main(int argc, char* argv[])
{
 static const char filename[] = "foo.txt";
 static const int MAX_COUNT = 255;
 static int frequency[MAX_COUNT] = {0};
 FILE* fd = fopen(filename, "r");
 if ( fd != NULL )
 {
  char line[MAX_COUNT + 1];
  while ( fgets(line, sizeof line, fd) != NULL )
  {
   char* p = strchr(line, ':');
   if (p != NULL)
   {
    frequency[Hash(_strlwr(line), (size_t)(p-line))%MAX_COUNT] += 1;
   }
  }
  fclose(fd);
  char header[MAX_COUNT + 1] = {0};
  printf( "Enter the header:\n " );
  scanf("%255s", header);
  header[MAX_COUNT] = '\0';
  printf( "%s occurs %d times.\n", header, frequency[Hash(_strlwr(header), strlen(_strlwr(header)))%MAX_COUNT] );
 }
 return 0;
}
static int Hash (const char * pSrc, size_t len)
{
  size_t res = 0;
  while (len--) res = (res << 1) ^ *pSrc++; 
  return (int)res;
}

content-length: 10
 len = 14, hash = 177
user-agent: test
 len = 10, hash = 107
content-length: 14
 len = 14, hash = 177
accept: comedy
 len = 6, hash = 16
content-length: 100
 len = 14, hash = 177
content-encoding: gzip
 len = 16, hash = 134
connection: close
 len = 10, hash = 50
user-agent: test
 len = 10, hash = 107
accept: flash
 len = 6, hash = 16
user-agent: test1
 len = 10, hash = 107
content-length: 20
 len = 14, hash = 177
user-agent: test2
 len = 10, hash = 107
user-agent: test3
 len = 10, hash = 107
accept: gzip
 len = 6, hash = 16
Enter the header:
 user-aGENT
user-agent occurs 5 times.
 

Friday, September 19, 2014

Today we will review Node.Js. We talked about Node.js, Connect and Express in earlier discussions. We saw that the calls were asynchronous and that we could add Backbone for MVC on  the web server for the user interface. Express supports both production and development settings. Changes made require restart to the application and a node supervisor helps in this regard.
Sample server code to look up a phonebook works like this:
Server.js :
var express = require('express');
var app = express();
app.set('view engine', 'ejs');
app.set('view options', { layout : false });
app.use(express.bodyParser());
app.use(app.router);
app.post('/search', function (req, res) {
    var result = words.search(req.body.pattern).result;
    res.render('result', {words: result, pattern: req.body.pattern });
});
app.listen(process.env.PORT || config.port);

function search(word) {
// parameter validation and sanitization
var toSearch = word;
var result = _.filter(dictionary, function(w) {
var index = binarySearch(dictionary,w);
return index == -1 ? null ; dictionary[index];
});
return {result :result};
}
}

function binarySearch(dictionary, value)
{
   int start = 0;
   int end = dictionary.length - 1;
 
   while (start <= end)
   {
      int mid = (start + end) >>> 1
      var midword = dictionary[mid];

      if (midword < value)
           start = mid + 1
      else if (midword > value)
          end = mid -1
      else
          return mid;
    }
   return -(start+1);
}

We will next cover Clojure in this post which provides easy access to Java framework.  Closure compiles to Java byte code.

Wednesday, September 17, 2014

int memcmp(char* pSrc, char* pDest, size_ len)
{
  // assuming parameter validation
  while (len--)
  {
      if (*pSrc != *pDest)
      {
          return -1;
       } 
      pSrc++;
      pDest++;
  }
  return 0;
}
------------------------------------------------------------------------------------------------------------------
template<class C>
typename C::reverse_iterator find_last(const C& c, typename C::value_type v)
{
     typename C::reverse_iterator p = c.rbegin();
     while ( p != c.rend())
     {
         if (*p == v) return p;
         ++p;
      }
      return p;
}
------------------------------------------------------------------------------------------------------------------
using System;
using System.Collections.Generic;

class Program
{
    static void Main()
    {
List<Tuple<string,int, int>> list = new List<Tuple<string, int, int>>();
list.Add(new Tuple<string, int, int>("a", 1, 5));
list.Add(new Tuple<string, int, int>("b", 2, 4));
list.Add(new Tuple<string, int, int>("c", 3, 6));

List<Tuple<string, int>> pairs = new List<Tuple<string, int>>();
list.ForEach(x =>
{
pairs.Add(new Tuple<string, int> (x.Item1, x.Item2));
pairs.Add(new Tuple<string, int> (x.Item1, x.Item3));
});
pairs.Sort((a, b) => a.Item2.CompareTo(b.Item2));

foreach (var element in pairs)
{
   Console.WriteLine(element);
}
    }
}
------------------------------------------------------------------------------------------------------------------
// This solution was taken from another blog.
public static void place8Queens(int row, int ld, int rd, int lim, ref int ans)
{
  if (row == lim)
{
  ans++;
  return;
}
  int pos = lim & (~(row | ld | rd));
while (pos != 0)
{
  int p = pos & (-pos);
  pos -= p;
  place8Queens(row + p, (ld + p) << 1, (rd + p) >> 1, lim, ref ans);

}
}
  public static int solve(int n)
{
  int ans = 0;
int lim = (1 << 8) - 1;
place8Queens(0, 0, 0, lim, ref ans);
return ans;
}

Another approach and same answer of 92 possible solutions (needs to be verified):
        public static int place8QueensRecursive(int rstart, int rend, int cstart, int cend, ref int[,] Board)
        {
            if (cstart > cend)
            {
                if (CheckBoardState(ref Board))
                {
                    //PrintBoard(ref Board);
                    return 1; // success
                }
                else
                    return 0;
            }
         
            int countOfPossibilities = 0;
            for (int initialr = rstart; initialr < 8; initialr++)
            {
                int initialc = cstart;
                if (Board[initialr, initialc] != 0) continue;
         
                // each Queen has to be in a row by herself and a column by herself
                Board[initialr, initialc] = 2; // marks the position of the queen
             
                // MarkUnavailable(ref Board) or inline it here ;
                for (int i = 0; i < 8; i++)
                    if (Board[i, initialc] == 0) Board[i, initialc] = 1; // unavailable
                for (int j = cstart; j < 8; j++)
                    if (Board[initialr, j] == 0) Board[initialr, j] = 1; // unavailable
                for (int k = -8; k < 8; k++)
                    if ((initialr + k) >= 0 && (initialc + k) >= cstart &&
                        (initialr + k) < 8 && (initialc + k) < 8 &&
                        Board[initialr + k, initialc + k] == 0) Board[initialr + k, initialc + k] = 1;
                for (int k = -8; k < 8; k++)
                    if ((initialr + k) >= 0 && (initialc - k) >= cstart &&
                        (initialr + k) < 8 && (initialc - k) < 8 &&
                        Board[initialr + k, initialc - k] == 0) Board[initialr + k, initialc - k] = 1;
             
             
                countOfPossibilities += place8QueensRecursive(rstart, rend, cstart + 1, cend, ref Board);

                // MarkAvailable or inline it here
                Board[initialr, initialc] = 0;
                var queenOccupiedRows = new List<int>();
                var queenOccupiedCols = new List<int>();
                for (int l = 0; l < 8; l++)
                    for (int m = 0; m < cstart; m++)
                    {
                        if (Board[m, l] == 2) { queenOccupiedRows.Add(m); queenOccupiedCols.Add(l); }
                    }
                for (int i = 0; i < 8; i++)
                {
                    if (Board[i, initialc] == 1
                        && queenOccupiedRows.Any(x => x == i) == false
                        ) Board[i, initialc] = 0; // available
                }
                for (int j = cstart; j < 8; j++)
                    if (Board[initialr, j] == 1
                        && queenOccupiedCols.Any(x => x == j) == false) Board[initialr, j] = 0; // available
                for (int k = -8; k < 8; k++)
                    if ((initialr + k) >= 0 && (initialc + k) >= cstart &&
                        (initialr + k) < 8 && (initialc + k) < 8 &&
                        Board[initialr + k, initialc + k] == 1
                        && queenOccupiedRows.Any(x => x == (initialr + k)) == false
                        && queenOccupiedCols.Any(x => x == (initialc + k)) == false
                        ) Board[initialr + k, initialc + k] = 0;
                for (int k = -8; k < 8; k++)
                    if ((initialr + k) >= 0 && (initialc - k) >= cstart &&
                        (initialr + k) < 8 && (initialc - k) < 8 &&
                        Board[initialr + k, initialc - k] == 1
                        && queenOccupiedRows.Any(x => x == (initialr + k)) == false
                        && queenOccupiedCols.Any(x => x == (initialc - k)) == false                      
                        ) Board[initialr + k, initialc - k] = 0;
            }

            return countOfPossibilities;
         
        }
        public static void Reset(int rstart, int rend, int cstart, int cend, ref int[,] Board)
        {
            for (int i = rstart; i < rend + 1; i++)
                for (int j = cstart; j < cend + 1; j++)
                    Board[i,j] = 0;
        }

        public static bool CheckBoardState(ref int[,] Board)
        {
            int numQueens = 0;
            for (int i = 0; i < 8; i++)
                for (int j = 0; j < 8; j++)
                {
                    switch(Board[i,j])
                    {
                        case 0: throw new InvalidOperationException();
                        case 1: break;
                        case 2: numQueens++;
                            break;
                        default:
                            throw new InvalidOperationException();
                    }
                }

            if (numQueens != 8)
                return false;
         
            // no row has two queens
            for (int i = 0; i < 8; i++)
            {
                int queensInARow = 0;
                for (int j = 0; j < 8; j++)
                {
                    if (Board[i, j] == 2)
                    {
                        queensInARow++;
                        if (queensInARow > 1)
                            return false;
                    }
                }
            }

            // no column has two queens
            for (int j = 0; j < 8; j++)
            {
                int queensInACol = 0;
                for (int i = 0; i < 8; i++)
                {
                    if (Board[i, j] == 2)
                    {
                        queensInACol++;
                        if (queensInACol > 1)
                            return false;
                    }
                }
            }

            // no topleft-to-rightbottom diagonal has two queens
            for (int i = 0; i < 8; i++)
            {
                int j = 0;
                int queensInLRDDiagonal = 0;
                for (int k = -8; k < 8; k++)
                {
                    if (i + k >= 0 && j + k >= 0 && i + k < 8 && j + k < 8 && Board[i + k, j + k] == 2)
                    {
                        queensInLRDDiagonal++;
                        if (queensInLRDDiagonal > 1)
                            return false;
                    }
                }
            }

            for (int j = 0; j < 8; j++)
            {
                int i = 0;
                int queensInLRDDiagonal = 0;
                for (int k = -8; k < 8; k++)
                {
                    if (i + k >= 0 && j + k >= 0 && i + k < 8 && j + k < 8 && Board[i + k, j + k] == 2)
                    {
                        queensInLRDDiagonal++;
                        if (queensInLRDDiagonal > 1)
                            return false;
                    }
                }
            }

            // no topright-to-bottomleft diagonal has two queens
            for (int j = 0; j < 8; j++)
            {
                int i = 0;
                int queensInRLDiagonal = 0;
                for (int k = -8; k < 8; k++)
                {
                    if (i + k >= 0 && j - k >= 0 && i + k < 8 && j - k < 8 && Board[i + k, j - k] == 2)
                    {
                        queensInRLDiagonal++;
                        if (queensInRLDiagonal > 1)
                            return false;
                    }
                }
            }

            for (int i = 0; i < 8; i++)
            {
                int j = 7;
                int queensInRLDiagonal = 0;
                for (int k = -8; k < 8; k++)
                {
                    if (i + k >= 0 && j - k >= 0 && i + k < 8 && j - k < 8 && Board[i + k, j - k] == 2)
                    {
                        queensInRLDiagonal++;
                        if (queensInRLDiagonal > 1)
                            return false;
                    }
                }
            }
         
            return true;
        }

        public static void PrintBoard(ref int[,] Board)
        {
            for (int i = 0; i < 8; i++)
            {
                for (int j = 0; j < 8; j++)
                {
                    Console.Write("{0} ", Board[i, j]);
                }
                Console.WriteLine();
            }
            Console.WriteLine();
        }

gives output as :
Count=92 and sample as
1 1 2 1 1 1 1 1
1 1 1 1 1 2 1 1
1 1 1 2 1 1 1 1
1 2 1 1 1 1 1 1
1 1 1 1 1 1 1 2
1 1 1 1 2 1 1 1
1 1 1 1 1 1 2 1
2 1 1 1 1 1 1 1

------------------------------------------------------------------------------------------------------------------

Given n points on a 2D plane, find the maximum number of points that lie on the same straight line
Approach 1) // Least square method ?
m = (Sum(xy) - (Sum(x)Sum(y))/n) / (Sum(x^2) - Sum(x^2)/n)
b = Avg(y) - mAvg(x)
or
Approach 2)
pair-wise slope in n ^ 2 iterations.

int pointsInALine(List<Point> points)
{
double [,] slopes = new double[points.Length, points.Length] {};
for (int i = 0; i < points.Length; i++)
{
  for (int j = 0; j < points.Length; j++)
  {
      if ( i != j)
      {
        var ydiff = (points[i].y - points[j].y);
        var xdiff = (points[i].x - points[j].x);
         var slope = xdiff != 0 ? ydiff/xdiff : infinity;
         points[i,j] = slope;
       }
  }
}
var  slopeFrequency = new  Dictionary<double, int>();
for (int i = 0; i < points.Length; i++)
{
  for (int j = 0; j < points.Length; j++)
  {
      if ( i != j)
      {
           if (slopeFrequency.Keys.Contains(slopes[i,j]))
             slopeFrequency[slopes[i,j]]++;
          else
             slopeFrequency.Add(slopes[i,j], 1);
       }
  }
}
var maxSlopeFrequency = slopeFrequency.Values.max();
int maxPoints = 0;
var pointsInALine = new List<Point>();
for (int i =0; i < Points.Length; i++)
   for (int j = 0; j < Points.Length; j++)
       if (i != j && slopes[i,j] == maxSlopeFrequency)
       {
            if(pointsInALine.Contains(Points[i]) == false) pointsInALine.Add(Points[i]);
            if(pointsInALine.Contains(Points[j]) == false) pointsInALine.Add(Points[j]);
        }
return pointsInALine.Count;
}
------------------------------------------------------------------------------------------------------------------
LRU

public class LRUCache{
private List<int> keys {get; set;}
private Hashtable<int,int> keyValues {get;set;}
    public LRUCache(int capacity) {
        entries = new List<int>(capacity);
        keyValues = new List<int, int>(capacity);
    }
   
    int get(int key) {
        int index = keys.IndexOf(key);
        if (index != -1)
        {
               keys.MoveToFront(index); // extension method
               return keyValues[key];
        }
        else
              throw NotFoundException();
    }
   
    void set(int key, int value) {
        int index = keys.IndexOf(key);
        if (index == -1)
        {
             keyValues.Add(key, value);
             if (keys.Count == keys.Capacity){
                 var item = keys.RemoveLast();
                 keyValues.Remove(item);
             }
             keys.Insert(0,key); // add front
        }
        else
              throw InvalidOperationException();
    }
};

void memmove(char* pSrc, char* pDest, size_t len)
{
 if (pDest < pSrc || pSrc + len < pDest)
{
 while (len--)
       *pDest++ = *pSrc++;
}
}

A list of graph algorithms :

Minimum spanning trees:  The generic minimum spanning tree algorithm grows a minimum spanning tree  from an undirected graph G= (V,E) with a weight function w by using a greedy approach to the problem. It grows the MST by adding one edge at a time that is safe for the MST. At each step of the iteration the the set of edges maintained, say A, is a subset of the minimum spanning tree. If (S, V-S) is a cut in the graph respecting A and there is a light edge (u,v) crossing the cut then that light edge is safe for A.

Kruskal and Prim algorithm: These algorithms are elaborations of the greedy generic algorithm mentioned above. In Kruskal’s algorithm the set A is a forest. An edge that connects any two trees in the forest and has least weight is added as a safe edge.  Kruskal’s algorithm sorts the edges of E into a non-decreasing order by weight w. For each edge (u,v) belonging to E, taken in non-decreasing order by weight, if the set that u belongs to is different from the set v belongs to then the edge connecting (u,v) is added to the graph.

In Prim’s algorithm, the set A forms a single tree. A light edge crossing the cut that is safe for the tree is added. The algorithm maintains a min priority queue Q to contain all the vertices. For each vertex extracted from this queue, it adds the edge connecting vertices adjacent to the extracted vertex whose weight is minimum and updates the parent and the key.

Bellman Ford algorithm: This algorithm solves the single source shortest path problem even for edges with negative weights. It checks whether there is a negative weight cycle that is reachable from the source and returns false otherwise it returns the shortest path and their weights. The algorithm uses relaxation progressively decreasing an estimate d[v] on the weight of the shortest path s to each vertex v until it achieves the actual shortest path weight.

Dijkstra’s algorithm: This algorithm solves the single source shortest path problem for edges that have non-negative weights. It maintains a min priority queue of vertices, keyed by their d-values and it extracts each vertex from the queue, it relaxes the edges connecting to the adjacent vertices.

Floyd-Warshall algorithm: This algorithm computes the shortest path weights in a bottom-up manner. It exploits the relationship between a pair of intermediary vertices and the shortest paths that pass through them. If there is no intermediary vertex, then such a path has at most one edge and the weight of the edge is the minimum. Otherwise, the minimum weight is the minimum of the path from I to j or the path from I to k and k to j. Thus this algorithm iterates for each of the intermediary vertices for each of the given input of an N*N matrix to compute the shortest path weight.

Johnson’s algorithm: This algorithm finds the shortest path between all pairs in shortest and it has better time complexity than Floyd Warshall even for sparse graphs. It uses both subroutines from both Bellman-Ford algorithm  and Dijkstra’s algorithms to report that there is a negative weight cycle or a matrix of the shortest paths weights for all pairs of vertices. It uses the technique of reweighting which works as follows: If all the edges are non-negative, it uses Dijkstra’s algorithm iteratively from each vertex. If the edges have negative weights but no negative weight cycle, it transforms the graph into a one with non-negative weights and then applies the previous step.


Ford Fulkerson method: This method is based on residual networks (one that can admit more flows), augmenting paths, and cuts. The method is iterative : It starts with f(u,v) = 0 for all (u,v) belongs to V giving an initial flow of value 0 from the source s to the sink t along which we can send more flow, and then augmenting the flow along this path.

Tuesday, September 16, 2014

In today's post we quickly review the Longest Common Subsequence and a hashing function
Let c[i, j] be the length of the longest common subsequence of the prefix  of X[1..i] and Y[1..j]. Recursion:
c[i, j] = { 0 if i =  0 or j = 0
          = { c[i-1, j-1] + 1 if i,j > 0 and xi = yj
          = { max( c[i,j-1], c[i-1,j] ) otherwise
This way we utilize the solutions we computed. We compute the table c[i,j] bottom-up and also store the value b[i,j] that captures whether c[i,j-1], c[i-1,j] or c[i-1,j-1] + 1is optimum.  We therefore find the longest common subsequence by walking backwards on the path from the last cell of b[i,j] to the first.

Hashing function
template<class T> size_t Hash<T>::operator()(const T& obj)
{
  size_t len = sizeof(obj);
  size_t res = 0;
  const char* ptr = reinterpret_cast<const char*>(&obj);
  while (len--) res = (res << 1) ^ *ptr++;
  return res;
}

Void memcpy ( char* src, char* dest, size_t len)
{
While (len--){
      *dest++ = *src++;
}
}

void GaussianAverage(char* pSrc, char* pDest, int i, int j, int row, int col)
{
  // parameter validation
  int val = pSrc[i * col + j];
  int numNeighbors = 0;
  for (int m = i - 2; m <= i+2; m++)
        for (int n = j- 2; n <= j+2; n++)
           if ( m >= 0 && n >= 0 && m < row && n < col )
           {
                     val +=  pSrc[m * col + n]; 
                     numNeighbors++;
            }
 val = val / (numNeighbors + 1);
 pDest[m * col + n]   = val;
}