Tuesday, September 30, 2014

In Today's post, we continue our discussion on OpenStack. In particular, we refer the Architecture for OpenStack on Ubuntu. OpenStack is described on http://docs.openstack.org It is designed to be massively scalable horizontally. To begin with we can have a single cloud controller that hosts the databases, message queue service, authentication and authorization service, image management service, user dashboard, and externally accessible API endpoints for OpenStack services. By starting with a single cloud controller, we favor simplicity over fault tolerance and availability that is possible with a fully redundant cloud controller. Most OpenStack Compute central services communicate with each other using the Message Queue which also has cluster capabilities. Databases save stateful information and they are critical to the central services. The content and delivery services consists of two parts - one that is responsible for the delivery of images and the latter maintains the metadata information associated with the virtual machine images and requires a database.
The OpenStack dashboard is implemented as a Python web application that runs in Apache httpd. The OpenStack Identity service allows identity to be set in the policy.json file. The identity service supports different plugins for storing information which include an in-memory Key-Value store, SQL database, PAM and LDAP. OpenStack implementations usually involve MAAS and Juju. MAAS is a tool that helps manage physical infrastructure with the same ease and flexibility as virtual machines in the cloud. Specially, MAAS allows us to discover, commission and deploy physical servers and dynamically allocate physical resources to match workload requirements and retire servers. Juju is a service orchestration tool which allows the administrator to configure, manage, maintain, deploy and scale cloud services (workloads) quickly and efficiently leveraging MAAS. Generally MAAS and Juju GUI run on separate servers. Similarly the Controller node, Compute services, Object Storage services and Block Storage  operate on separate nodes.  This is done to provide dedicated hardware for each of these services and to reduce contention. The Controller and compute nodes should have 4 1TB physical drives with RAID5 striping. The storage should have 2 500 GB physical disks locally in addition to access to an array of such disks.

Monday, September 29, 2014

#CodingExercise
I apologize for cluttering up my posts with coding exercises, I know its distracting but cannot resist, and hence I'm tagging them so you can skip:
Successor in a binary search tree:
Node GetSuccessor(Node root)
{
if (root == null) return null;
if (root.right)
{
// return tree-minimum
Node current = root.right;
while(current && current.left)
   current = current.left;
return current;
}
Node parent = root.parent;
while(parent && parent.right == root)
{
   root = parent;
   parent =parent.parent;
}
return parent;
}

For string matching between a string T of length n and a pattern P of length m,  we mentioned Rabin Karp method earlier that brought two improvements over the brute force, one is that we can skip ahead up to a length m on a mismatch and for any spurious hits, a portion could match. We do this with a special hash function that builds up the resulting hash from the hash of the portions.
//  after preprocessing
for (int s = 0; s<= n-m; s++)
     if ( pattern == t.substring)
            // do a full comparision of the substring before calling it a match
     if s < n-m
       then update the next substring with d(ts - T[s+1]h) + T[s + m + 1])mod q

KnuthMorrisPratt improves the preprocessing with the knowledge of how to shift the substring.

Finding the convex hull of a set Q of points can be solved with one of the following methods:
1 Rotational sweep methods such as
 a) Graham's scan - Graham's scan solves the convex hull problem by maintaining a stack S of candidate points. While the angle formed by the next-to-top, top and pi makes a non-left turn, the stack is popped and the candidate point is pushed on the stack.  At the end, the stack is left with exactly the points on the hull in the  anticlockwise manner.
 b) Jarvis' march builds a sequence of vertices which has the smallest polar angle with respect to the previous point. Similarly p2 has the smallest polar angle with respect to p1 and so on. When we reach  the top we start all over and separate the left and the right sequence. The advantage of the separate chains is that we need not compute the angles but just compare it.
2) Incremental method : In the incremental method, the points are sorted from left to right, yielding a sequence. At the ith stage of the convex hull of I-1 points, the sequence is updated according to the ith point from the left.
3) Divide and Conquer method:
In this method the set of n points is divided into two subsets and the convex hull is computed recursively. Then the hulls are combined. Each recursive step takes as input the set P of points, the set X and Y each of which contains the points in P sorted monotonically by X and Y axis coordinates respectively. All the inputs are divided in half to form corresponding left and right sequences.  The recursion is to make two calls - one to find the closest pair of points in the PL and the other for finding the same in PR. The merge step occurs by finding the closest pair either from one of the recursive calls with distance delta or by combining one from the PL and the other from the PR that reside in the two times delta vertical strip. In each recursive call, we form a sorted subset of a sorted array Y.
len(YL) = len(YR) = 0;
for I = 1 to len (Y)
     do if Y[I] belongs to PL:
          then len(YL) = len(YL) + 1
                 YL(len(YL) = y[I]
          else
                 len[YR] = len[YR] + 1
                 YR[len[YR]] = Y[I]
4) The prune and search method: This finds the upper portion of the convex hull by repeatedly throwing out a constant fraction of the remaining points until only the upper chain remains. Then the process is repeated for the lower chain.

Node GetPredecessor(Node root)
{
if (root == null) return null;
if (root.left)
{
// return tree-minimum
Node current = root.left;
while(current && current.right)
   current = current.right;
return current;
}
Node parent = root.parent;
while(parent && parent.left == root)
{
   root = parent;
   parent =parent.parent;
}
return parent;
}


void PrintFactors(n)
{
// Pollard Rho
int i = 1;
Random r = new Random();
int x = r.Next(0, n-1);
int y = x;
int k = 2;
for (;;)
{
    i = i + 1;
    x = (x * x - 1) % n;
    d = gcd(y - x, n);
    if (d != 1 && d != n)
        Console.Writeline("{0}", d);
    if (i == k)
    {
        y = x;
        k = 2k;
    }      
}
}

int gcd(int a, int b)
{
// Euclid
 if (b == 0)
 {
     return a;
 }
 else
 {
     return gcd(b, a mod b);
 }
}

Kerninghan and Pike string matching code:
    /* match: search for regexp anywhere in text */
int match(char* regexp, char* text)
{
  if (regexp[0] == '^')
     return matchhere(regexp+1, text);
 do {
   if (matchhere(regexp, text))
    return 1;
}while(*text++ != '/0');
return 0;
}

int matchere(char* regexp, char* text)
{
 if (regexp[0] == '/0')
    return 1;
 if (regexp[1] == '*')
    return matchstar(regexp[0], regexp+2, text);
 if (regexp[0] == '$' && regexp[1] == '/0')
    return *text == '/0';
 if (*text != '\0' && (regexp[0] == '.' || regexp[0] == *text))
    return matchhere(regexp+1, text+1);
return 0;
}

int matchstar(int c, char* regexp, char* text)
{
do {
if (matchhere(regexp, text))
return 1;
} while (*text != '/0' && (*text++ == c || c == '.'));
return 0;
}

List<List<T>> GetPowerSet(List<T> elements)
{
    if (elements.Count == 0) return new List<List<T>>(){ new List<T>();};
    var cut = elements.GetRange(0,1);
    var remaining = elements.Except(cut);
    var powerSet = GetPowerSet(remaining) ;
    powerSet.Union(Combine(powerset, cut.First()));
    return powerSet;
}

List<List<T>> Combine (List<List<T>> powerSet, T item)

  if (item)
{
   foreach (var set in powerSet)
      set.Add(item);
}
 return powerSet;
}

void Stack::Push(Item item)
{
 this.list.InsertAt(0, Item);
}

Item Stack::Pop()
{
 this.list.RemoveAt(0);
}
Cloud computing is universally recognized as having the potential to drive greater agility, control and cost savings by providing on-demand access in a user centric environment. What IT is adding to it is a hybrid delivery model that includes a traditional IT, private and public cloud.
In this context, Cloud Management Platform provides a solution that manages cloud access, simplifies and unifies business operations, implements security and compliance, and provides an open, extensible and scalable infrastructure. The speed of delivery has increased dramatically with this new ability. Costs are reduced and control is passed back to the organization instead of the "shadow IT" and improves accountability.
We review what is meant by the open, extensible and scalable infrastructure.  Ideally, the management platform for the cloud should be optimized for heterogeneity, multi-hypervisor, multi-vendor environments and multiple public clouds. The adoption of OpenStack as the underlying technology and enabling optimized workload portability by implementing industry standards like TOSCA define openness. Flexibility for integration and extensibility comes from REST APIs with out-of-box integration and extensibility for custom environments enables an ecosystem to evolve.
 Heterogeneity with no - vendor lock-in  means the infrastructure is scalable. 
When we review the management toolset, we notice the following requirements: 1) it has to be a single  set of tools 2) simple enough to use and 3) it should work on both traditional IT and cloud.
 Some common functions from this management tools include governance, security, compliance, service assurance, patching, service lifecycle management etc. 
The toolset should be able to give a complete view of the IT infrastructure. Complete means the toolset includes automation, orchestration, security, performance management and financial management tools. In addition, the cloud management platform should be consistent in meeting business SLAs.  These SLAs are met by providing capabilities such as security with the support for automation, compliance and backup, role based access and identity management. Simple code analysis tools go a long way in assuring this. Further, Information correlation and application analysis and network level defense mechanism can thwart or preempt attacks. Some IT tools already work well in this space but the suggestion is to keep it open enough for any tool to provide the functionality or a platform tool that can integrate with others.
Performance monitoring tools is also part of this toolset. Advanced reporting and analytics tools that involve chargeback and license compliance to be reported also complete this toolset.

One more coding question for today :
A list of objects have colors red, white or blue. we need to sort them by the color red, white or blue.

public static SortedSet<Color> SortColor(List<Color> items)
{
// parameter validation
var set = new SortedSet<Color>(new ByColor());
foreach (var item in items)
{
set.Add(item);
}
return set;
}

public class ByColor : IComparer<Color>
{
public int Compare(Color x , Color y)
{
if (x.color == y.color) return 0;
if (x.color == red) return 2;
if (x.color == white)
{
 if (y.color == red) return -2;
else return 1;
}
//x.color  == blue;
if (y.color == red) return -2
else return -1;
}
}

Saturday, September 27, 2014

Another problem and solution for LeetCode challenge :
Given n, generate all structurally unique BSTs that store value 1 .. n
 The answer is said to be Catalan number = (2n)! / (n+1)! (n)!
How do we arrive at it ? Let's take a look:
Consider all possible Binary Search trees with each number as the root. For each of the n nodes as root, there are n-1 non-root nodes. We can partition these non-root nodes into the lower values than the root and the higher values than the root for the left and the right subtree. If we choose say i  in a sorted list as the root node, then there are i-1 nodes in the left subtree and n-i in the right subtree.
Thus if we denote the number of possible BSTs with the function t(n), then we can write it as the
sum of i = 1 to n of the product t(i-1) t(n-1). We multiply because the arrangement of the left subtree is independent of the right subtree and for each such arrangement of the two sub-trees, we get one BST with the root as i. Also note that regardless of the values of the elements in the sorted list, the number of unique BSTs depends on the number of elements n.
Therefore we write the code as follows:
int numUniqueBSTs(int n)
{
int count = 0;
if (n == 0) return 1;
for (int i = 1; i <= n i++)
  count += numUniqueBSTs(i-1) * numUniqueBSTs(n-i)
return count;
}
The iterative solution is also equally useful:
int numUniqueBSTs(int n)
{
int[] count = new int[n+1];
count[0] = 1;
for (I = 1; I <= n; I++)
   for (int j = 0; j < I; j++)
      count[I] += count[j] *count(I-j-1)
return count[n];
}

Okay one more coding exercise:
Given an array of positive integers with every element appearing twice except for one, find that one:
int OccursOnce(int[] num)
{
if (num == null || num.length <= 0) return -1;
int candidate = A[0];
for (int i = 1; i < num.length; i++)
{
  candidate = candidate ^ num[i];
}
return candidate.
}

My next few posts will be on managing hybrid clouds, PAAS and OpenStack.
Here's another coding exercise:
Convert a sorted list to binary search tree.
Note that the in order traversal of a binary search tree is a sorted list
The caveat is that we can only interpret a balanced binary search tree from the given list.
123456789

     5
  3       8
2  4    7  9
1       6

Node SortedListToBST(List<int> num, int start, int end, Node root)
{

int mid;

if ( (start + end) %2 == 0)
     mid = (start + end ) / 2 + 1;
else
    mid = (start + end) / 2

var node = new Node();
node.data = num[mid];

if (root != null)
if (num[mid] < root.data)
     root.left = node;
else
    root.right = node

node.left = SortedListToBST(num, start, mid-1, node);

node.right = SortedListToBST(num, mid + 1, end, node);

return node;

};


 
Today we cover one more programming exercise question from the interview problems list:
Given a binary tree, print the zigzag level order.
The binary tree is given as  3, 9, 20, -1, -1, 15, 7 where -1 means null
and the tree is
    3
   9  20
      15   7

We have to return the output as
[ (3),
  (20, 9),
   (7, 15),
]
public static List<List<int>> GetZigZag(Node root)
{
  if (root == null) return null;
  var output = new List<List<int>>();
  var level = new List<int>();
 var Q = new Queue();
Q.Enqueue(root);
Q.Enqueue(null);
while (Q.Count > 0)
{
   var item = Q.Dequeue();
   if (item == null)
  {
           output.Add(level);
           if (Q.Count == 0) break; 
           level = new List<int>();
           Q.Enqueue(null);
          continue;
  }
   level.Add(item);
   if (item.right) Q.Enqueue(item.right);
   if (item.left) Q.Enqueue(item.left);
  }
return output;
}

Friday, September 26, 2014


In the book “Programming Collective Intelligence” by OReilly Media, we use “Collaborative Filtering”, we review the chapter on discovering groups.  Data clustering is introduced in this chapter as a method for discovering and visualizing groups of things, people or ideas that are all closely related. The neural networks, decision trees and support vector machines and Bayesian filtering reviewed earlier were part of the supervised learning methods where the inputs and outputs are used to make predictions. Clustering on the other hand is an unsupervised learning method. Unlike a neural network or a decision tree, unsupervised learning algorithms are not trained with examples of correct answers. Instead we try to find the structure and make use of it.

As an example, blogs can be clustered based on word frequencies which could be useful in searching, cataloging and discovering the huge number of blogs that are currently online. A feedparser module in python makes it easy to get the titles, links and entries from any RSS or Atom feed. With this module, its possible to extract a list of words from the title and the description and use that to count the words. With the wordcount (wc) and the number of blogs each word appeared in (apcount), we proceed to create a matrix of the word counts columns for each of the blogs as rows. These are called vectors.

With this sample data, there are two clustering methods discussed in the book. The first is the hierarchical clustering and the second is the K-means clustering. The first one yields a dendrogram and the second yields k partitions. The former works by starting with many unit clusters and then building up a hierarchy of clusters by merging two closest clusters. When the clusters are merged they show which items are closer together and hence the tightness of the cluster. To determine the closeness a couple of mathematical formula help. In this case, we could use the Euclidean distance or the Pearson co-efficient. The Euclidean distance finds the distance between two points in a multidimensional space by taking the sum of the square of the differences between the coordinates of the points and then calculating the square root of the result.  The Pearson correlation co-efficient is a measure of how highly correlated the two variables are. Its generally a value between -1 and 1 where -1 means that there is a perfect inverse correlation and 1 means there is a perfect correlation while 0 means there is no correlation.  It is computed with the numerator as the sum of the two variables taken together minus the average of their individual sums  and this is divided by the square-root of the product of the squares of the substitutions by using the same variable instead of the other. As an aside, another way to measure similarity between two overlapping sets is to find the Tanimoto coefficient which is defined as the number of items in the intersection divided by the sum of the number of items in one of the sets and the number of items in the other set minus the number of items in the intersection of the two sets.

The K-means clustering on the other hand breaks the data into distinct groups instead of finding the structure as the hierarchical clustering does. Here the number of clusters to be generated is specified in advance by choosing randomly placed centroids. The centroids are updated to the average location of all the items assigned to them and the assignments change in each iteration and the iterations continue until there are no more changes.

When the clustered data is visualized, we use a technique called multidimensional scaling which will be used to find a two dimensional representation of the dataset. The algorithm takes the difference between every pair of items and makes a chart in which the distance between the items match those differences.
As I was walking out of a building today and stood at the curb waiting for a cab I had requested,  a man pulled up and said I should not be waiting at the curb near the crosswalk because it delayed him. Really ?  Just then a Platt driver crossed the road ...
Anyways reminded me to take a short break to do a coding exercise:
Given a sequence of integers find the max product of a contiguous  subsequence
If (input == null || input.length <=0) throw new Exception();
int product = input[0];
Int max = product > 0 ? product: max;
For ( int I = 1; I  <  input.Length; i++)
   If product  × input [i] > product
       Product = product × input [¡]
   Else
       Product = product[I] > 0 ? product[I] : 1;

    If (product > max)
         Max = product;

}
Return max;


Thursday, September 25, 2014

Making recommendations is yet another application of collective intelligence as described in the book mentioned in the previous posts. Recommendations include suggestions for online shopping, suggesting interesting web sites, or to find items such as movies or music. Generally for making a recommendation, first a group sharing similar taste is found and then the preferences of the group is used to make a ranked list of suggestions. This technique is called collaborative filtering. A common data structure that helps with keep tracking of people and their preferences is a nested dictionary. This dictionary could use a quantitative ranking say on a scale of  1 to 5 to denote the preferences of the people in the selected group.  To find similar people to form a group, we use some form of a similarity score. One way to calculate this score is to plot the items that the people have ranked in common and use them as axes in a chart. Then the people who are close together on the chart can form a group. This is called Euclidean Distance score. Another way to calculate the similarity score is to find a correlation coefficient which is a measure of how well two sets of data fit on a straight line. By finding a best linear fit, we adjust for what is called the grade inflation which goes to say that some people may consistently grade higher than other while sharing similarity in their trends. Hence the latter scoring technique gives better results than the former.
Searching and Ranking is another set of operations ubiquitous with collective intelligence programming. Searching is usually done with crawling whether its with a fixed set of documents or the corporate intranet. After the documents are collected, they are indexed.
Crawling or spidering involves starting with a small set of pages called the seed from which links are followed and each page visited is indexed while discovering more links to follow.
Adding to the index means that we separate the words in the text and for each entry we associate it with the url. we also remove the stop words  from the list. We check if the page has already been indexed. We also get an id for each of our entry. Finally, we keep track of the word locations.
Having looked at searching, we now looked at ranking. We retrieve the pages that match the queries.
The pages have not been scored when crawling.  Instead we find the score when we query. Scoring can be based on word frequency, document location, and word distance. As we encounter the words, we put them in a hash table to update the frequency and the location. Then as we walk the hash table, we can build a sorted dictionary where we score them based on the frequency

If we want to crawl all the url links starting from a given page and back to the same page:
public static void GetAllPaths(Link current, Link target, ref candidate, ref List<Uri> results, // other suitable parameters)
{
if ( numberOfIterations > some Threshold) return;
var links = GetAllTheLinksOnTheCurrentPage(candidate);
if (link.Contains(target))
{
// add it to candidate
// add it to the results
// remove it from candidate
}

foreach (var link in links)
{
 // add it to candidate
 GetAllTheLinksOnTheCurrentPage(current, target, ref candidate, ref results);
// remove it from candidate
}

}

Wednesday, September 24, 2014

We mentioned that the book on programming collective intelligence mentions  making recommendations, discovering groups, searching and ranking, optimization, document filtering, modeling with decision trees, building price models and advanced classification and finding independent features. In today's post, I will briefly outline these.
Evolving intelligence for example is genetic programming which is a machine learning technique inspired by the theory of biological evolution. This starts with a large set of programs that are somewhat good solutions and then they are made to compete for a user defined task. The programs are ranked from best to worst, and some parts of the program are altered slightly aka mutated for making it better or some parts are replaced by those from other programs aka crossover or breeding. The new population is called a generation and is measured with a fitness function for the quality of the programs and the procedure is iterative.
 The worst programs are eliminated at each round. The iterations are terminated when the perfect solutions has been reached, a good enough solution has been found, the solution has not improved for several generations or the number of generations has reached a specified limit.
These programs are represented as Tree. Most programming languages are compiled into Parse Tree. These programs are similar to that. LISP  programs are a way to express the Parse Tree directly. As an example of this program, we can have:
def func(x,y):
   if (x > 3):
         return y + 5
   else:
         return y + 3
In .Net, the equivalent for this would be an expression tree using System.Linq.Expressions
Expression<Func<int, bool>> lambda = x => x > 3;
 or as is clearer with :
 Expression e2 = Expression.GreaterThan(left, right);
 left = Expression.Constant(x, typeof(int));
 right = Expression.Constant(3, typeof(int));

I'm going to try out a coding exercise:
Given a mapping of characters as A = 1, Z = 26 and given a number, how many different ways can we decode it ? For eg: 12 can be AB or L.


NoSQL Databases and scalability.

Traditional databases such as SQL Server have a rich history in scalability and performance considerations for user data storage such as horizontal and vertical table and index partitioning, data compression, resource governor, IO resource governance, partition table parallelism, multiple file stream containers, NUMA aware large page memory and buffer array allocation, buffer pool extension, In-memory OLTP, delayed durability etc. In NoSQL databases, the unit of storage is the key-value collection and each row can have different number of columns from a column family. The option for performance and scalability has been to use sharding and partitions. Key Value pairs are organized according to the key. Keys in turn are assigned to a partition. Once a key is assigned to a partition, it cannot be moved to a different partition. Since it is not configurable, the number of partitions in a store is decided upfront. The number of storage nodes in use by the store can however be changed. When this happens, the store undergoes reconfiguration, the partitions are balanced between new and old shards, redistribution of partition between one shard and another takes place.  The more the number of partitions the more granularities there are for the reconfiguration. It is typical to have ten to twenty partitions per shard. Since the number of partitions cannot be changed, this is often a design consideration.
The number of nodes belonging to a shard called its replication factor improves the throughput. The higher the replication factor, the faster the read because of the availability but the same is not true for writes since there is more copying involved. Once the replication factor is set, the database takes care of creating the appropriate number of replication nodes for each shard.
A topology is a collection of storage nodes, replication nodes, and the associated services. At any point of time, a deployed store has one topology. The initial topology is such that it minimizes the possibility of a single point of failure for any given shard. If the storage node hosts more than one replication node, those replication nodes will not be from the same shard. If the host machine goes down, the shard can continue for reads and writes.
Performance tuning on the NoSQL databases is an iterative process. Topologies, replication factor, shards and partitions can be changed to improve performance or to adjust for the storage  nodes.
Courtesy:

Microsoft and Oracle documentations.

Monday, September 22, 2014

A decision tree classifier builds a model to predict the classification based on a yes/no branch based on a node’s criteria. It checks each item in the list against the model and predicts a category. The decision trees are notable for being easy to read and interpret. Classifying in a decision tree is generally easy but training it is usually trickier. The divisions are usually based on entropy which is the amount of disorder in a set and is measured as follows:
P(i) = frequency(outcome) = count(outcome) / count(total rows)
Entropy = sum of p(i) * log(p(i)) for all outcomes
A low entropy tells us that the set is homogeneous. To determine the dividing variable, we find the information gain based on the entropy which is determined as follows:
Weight1 = size of subset1  / size of original set
Weight2 = size of subset2 / size of original set
Gain = entropy(original) – weight1*entropy(set1) – weight2*entropy(set2). Based on the dividing variable, the first node can be created. This can then be further divided to form a decision tree.
Neural Networks can also be used as a classifier. For example, a simple neural network can be used for altering the ranking of search results based on what link the users have clicked in the past. It works on the basis of giving a number to every link predicting that the link with the highest number would be the one that the user would click. The numbers are thus used to change the rankings. There are many different kinds of neural networks.  As an example, there is a multilayer perceptron network that is named because it has a layer of input neurons that feed into one or more layers of hidden neurons. The output of one layer is fed as input to the next layer. If there are three nodes in the first layer where the output of first is negative and the last is positive, and the middle influences the most for a given input, the classification will result from the middle. Usually a combination of the different layers will determine the end result.

Support Vector Machines are sophisticated classification machines. These build a predictive model by finding the dividing line between two categories. In other words, the data is most distant to these lines and one of them is usually chosen as the best.  The points that are closest to the line are the ones that determine the line and are called support vectors. Once the line is found, classifying is just a preference for putting the data in the right category.

I take a brief break to discuss a coding exercise for Gray code.
Gray code is also known as reflected binary code since the 0 and 1 sequence in a bit position is reflected during single bit changes between numbers leading up to the given number.
To convert to gray code, we write the number in its binary notation first. Say 9 is 1001.
the digits d1, d2, ... dn. If the dn-1 is 1 then substitute dn with 1-dn and proceed forward to dn-1 otherwise leave it unchanged and proceed forward. The resulting number is the binary reflected Gray code. 9's gray code is 1101.
The reverse conversion from a Gray code (g1, g2, .. gn-1) to the binary notation for a number is done by calculating
Sum(n) = (Sum 1 <= i <= n-1 (gi)) (mod 2)
If this computes as 1, then replace gn by 1-gn, otherwise leave it unchanged.
        public static string GrayCode(string number)
        {
            char[] binary = number.ToCharArray();
            for (int i = binary.Length - 1; i > 0; i--)
            {
                if (binary[i - 1] == '1')
                {
                    binary[i] = binary[i] == '1'  ? '0' : '1'; // set 1-x
                }
            }
            return new String(binary);
        }

Another coding question is for buy/sell of stocks. and max profit. The trick here is that a stock must be bought before it is sold. and this can be repeated as many times just that the lookahead is for the number of days ahead of the buying date.
        public static int Profit(int[] prices)
        {
            if (prices == null || prices.Length <= 0) return 0;
            int globalProfit = 0;
            int profit = 0;
            for (int i = 1; i < prices.Length; i++)
            {
                if (prices[i] > prices[i - 1])
                    profit += prices[i] - prices[i - 1];
                else
                {
                    globalProfit += profit;
                    profit = 0;
                }
            }
            globalProfit += profit;
            return globalProfit;
                 
        }

            var prices1 = new int [] { 3, 6, 9 };
            var prices2 = new int [] { 9, 6, 3 };
            var prices3 = new int [] { 3, 9, 6 };
            var prices4 = new int [] { 6, 9, 3 };
            var prices5 = new int [] { 6, 3, 9 };
            var prices6 = new int [] { 9, 3, 6 };
            var prices7 = new int [] { 9, 9, 9 };
           
            Console.WriteLine("expected = {0}, actual = {1}", 6, Profit(prices1));
            Console.WriteLine("expected = {0}, actual = {1}", 0, Profit(prices2));
            Console.WriteLine("expected = {0}, actual = {1}", 6, Profit(prices3));
            Console.WriteLine("expected = {0}, actual = {1}", 3, Profit(prices4));
            Console.WriteLine("expected = {0}, actual = {1}", 6, Profit(prices5));
            Console.WriteLine("expected = {0}, actual = {1}", 3, Profit(prices6));
            Console.WriteLine("expected = {0}, actual = {1}", 0, Profit(prices7));
            var prices = new int[] { 5, 4, 3, 2, 1, 8, 5, 9, 11 };
            Console.WriteLine("{0}", Profit(prices));
expected = 6, actual = 6
expected = 0, actual = 0
expected = 6, actual = 6
expected = 3, actual = 3
expected = 6, actual = 6
expected = 3, actual = 3
expected = 0, actual = 0
13


Combination of string:
        public static void Combine(ref List<char> input, ref List<char> candidate, ref List<List<char>> sequences, int level, int start)
        {
            for (int i = start; i < input.Count; i++)
            {
                if (candidate.Contains(input[i]) == false)
                {
                    candidate[level] = input[i];
                    if (candidate.Count == input.Count)
                    {
                        sequences.Add(candidate);
                        Console.WriteLine("{0}", new string(candidate.ToArray()));
                    }
                    if (i < input.Count - 1)
                        Combine(ref input, ref candidate, ref sequences, level + 1, start + 1);
                    candidate[level] = '\0';
                }
            }
        }


Note that permutations is also recursive.

Here is another interview question.
void PrintPrime(int countOfPrimes)
{
int i = 2;
int count = 0;
while (count <countOfPrimes)
{
if (isPrime(i))
   {
          Console.WriteLine("{0}", i);
          count++;
    }
i++;
if (i == INT_MAX) break;
}
}

void isPrime(int x)
{
for (int i = 2; i < x; i++)
   if (x % i == 0) return false;
return true;
}

Sunday, September 21, 2014

We will start reviewing a book titled Programming Collective Intelligence from O'Reilly Media in the subsequent blog posts. This book includes such things as making recommendations, discovering groups, searching and ranking, optimization, document filtering, modeling with decision trees, building price models, advanced classification - Kernel methods and SVMs, finding independent features, evolving intelligence and a summary of algorithms. The sample code is presented in Python for readability.
As an example, a Bayesian classifier can help classify a sentence as language for a use of the word python or as a snake for the use of the same word. It works based on the probability P(category | Document) = P(Document | Category) * P(category)
where P(Document | Category) = P(word1 | category) * P(word2 | category) * ....
All we need is a feature extraction that turns the data we're using for training or classification into a list of features basically something that takes an object and returns a list such as a sentence converted into words. This classifier is trained on the strings. Then it can be used to predict the test data.
 docclass.getwords('python is a dynamic language')
{'python' : 1, 'dynamic':1,  'language':1}
c1 = docclass.naivebayes(docclass.getwords)
c1.setdb('test.db')
c1.train('python has dynamic types', 'language')
c1.train('pythons are constrictors', 'snake')
c1.classify(''boa constrictors')
u'snake'
c1.classify('dynamic programming')
u'language'

I have to take a short break for a coding problem and including it in this post before continuing.
How many different distinct subsequences exist for a string T in string S and what is the most efficient way to find it. For example rabbit appears in rabbbit in three different ways. For every match until the length of the given pattern, we explore the remaining of the given input to see if there's a full match by skipping non-matching and repetititive characters. When the full match is found, we increment a counter. Another way to do this would be to use an automata or Rabin Karp method or KMP method.


void KMP(string pattern, string text, vector<int> *positions) {
    int patternLength = pattern.length();
    int textLength = text.length();
    int* next =  PreProcess(pattern);
 if (next == 0) return;
    int i = 0;
    int j = 0;
    while ( j < textLength )
 {
  while(true)
   if (text[j] == pattern[i]) //matches
   {
    i++;   // yes, move on to the next state
    if (i == patternLength)  // maybe that was the last state
    {
     // found a match;
     positions->push_back(j-(i-1));
     i = next[i];
    }
    break;
   }
   else if (i == 0) break; // no match in state j = 0, give up
   else i = next[i];
  j++;
 }
}

int* PreProcess( string pattern) {
 int patternLength = pattern.length();
 if (patternLength == 0) return 0;
    int * next = new int[patternLength + 1];
    if (next == 0) return 0;
    next[0] = -1;  // set up for loop below; unused by KMP

    int i = 0;
    int j = -1;
    // next[0] = -1;
    // int len = pattern.length();
    while (i < patternLength) {
  next[i + 1] = next[i] + 1;
  while ( next[i+1] > 0 &&
    pattern[i] != pattern[next[i + 1] - 1])
   next[i + 1] = next[next[i + 1] - 1] + 1;
  i++;
 }
    return next;

}
 

Saturday, September 20, 2014

Clojure is a general purpose language that targets the Java Virtual Machine, CLR and JavaScript. We briefly mentioned that it compiles to Java Byte Code. The ClojureCLR port which is written in C# enables it to target the CLR as well. Clojure is a dialect of Lisp which enables query language. It offers a transactional memory system and a reactive Agent system that ensures correct multithreaded designs. What sets Clojure apart is the immutable data and the first class functions. It helps to see Clojure language as a platform.
Features include
1) dynamic development - the Read Eval Print Loop (REPL) provides a console interface to try out Clojure commands. Clojure programs need not be compiled and run as in this case.
2) Functional Programming - fn and defn provide ways to define first class functions. Clojure provides a set of immutable lists, vectors, sets and maps where the older version is persisted and the newer versions are not a full copy.
3) Lisp Clojure is a dialect of Lisp for example 'and' is a macro
4) Runtime polymorphism 'proxy helps support implementation of interfaces
5) Concurrent programming shared objects are changed threadsafe with 'dosync', 'ref', 'set' and 'alter'. 'def' and 'binding' support isolating changing state within threads.
6) Hosted on the JVM - compiles to JVM bytecode and supports Java interfaces and classes using 'reify' and 'proxy'.
 Stackoverflow posts gives the following Clojure code for factorial:
(defn fact [x]
   (loop [n x f 1]
         ( if ( = n 1)
                  f
                  (recur (dec n) (* f n)))))
 
// HttpHeaderReader.c : Reads HTTP header and counts them
//
#include "stdafx.h"
#include "stdio.h"
#include "string.h"
static int Hash (const char * pSrc, size_t len);
int main(int argc, char* argv[])
{
 static const char filename[] = "foo.txt";
 static const int MAX_COUNT = 255;
 static int frequency[MAX_COUNT] = {0};
 FILE* fd = fopen(filename, "r");
 if ( fd != NULL )
 {
  char line[MAX_COUNT + 1];
  while ( fgets(line, sizeof line, fd) != NULL )
  {
   char* p = strchr(line, ':');
   if (p != NULL)
   {
    frequency[Hash(_strlwr(line), (size_t)(p-line))%MAX_COUNT] += 1;
   }
  }
  fclose(fd);
  char header[MAX_COUNT + 1] = {0};
  printf( "Enter the header:\n " );
  scanf("%255s", header);
  header[MAX_COUNT] = '\0';
  printf( "%s occurs %d times.\n", header, frequency[Hash(_strlwr(header), strlen(_strlwr(header)))%MAX_COUNT] );
 }
 return 0;
}
static int Hash (const char * pSrc, size_t len)
{
  size_t res = 0;
  while (len--) res = (res << 1) ^ *pSrc++; 
  return (int)res;
}

content-length: 10
 len = 14, hash = 177
user-agent: test
 len = 10, hash = 107
content-length: 14
 len = 14, hash = 177
accept: comedy
 len = 6, hash = 16
content-length: 100
 len = 14, hash = 177
content-encoding: gzip
 len = 16, hash = 134
connection: close
 len = 10, hash = 50
user-agent: test
 len = 10, hash = 107
accept: flash
 len = 6, hash = 16
user-agent: test1
 len = 10, hash = 107
content-length: 20
 len = 14, hash = 177
user-agent: test2
 len = 10, hash = 107
user-agent: test3
 len = 10, hash = 107
accept: gzip
 len = 6, hash = 16
Enter the header:
 user-aGENT
user-agent occurs 5 times.
 

Friday, September 19, 2014

Today we will review Node.Js. We talked about Node.js, Connect and Express in earlier discussions. We saw that the calls were asynchronous and that we could add Backbone for MVC on  the web server for the user interface. Express supports both production and development settings. Changes made require restart to the application and a node supervisor helps in this regard.
Sample server code to look up a phonebook works like this:
Server.js :
var express = require('express');
var app = express();
app.set('view engine', 'ejs');
app.set('view options', { layout : false });
app.use(express.bodyParser());
app.use(app.router);
app.post('/search', function (req, res) {
    var result = words.search(req.body.pattern).result;
    res.render('result', {words: result, pattern: req.body.pattern });
});
app.listen(process.env.PORT || config.port);

function search(word) {
// parameter validation and sanitization
var toSearch = word;
var result = _.filter(dictionary, function(w) {
var index = binarySearch(dictionary,w);
return index == -1 ? null ; dictionary[index];
});
return {result :result};
}
}

function binarySearch(dictionary, value)
{
   int start = 0;
   int end = dictionary.length - 1;
 
   while (start <= end)
   {
      int mid = (start + end) >>> 1
      var midword = dictionary[mid];

      if (midword < value)
           start = mid + 1
      else if (midword > value)
          end = mid -1
      else
          return mid;
    }
   return -(start+1);
}

We will next cover Clojure in this post which provides easy access to Java framework.  Closure compiles to Java byte code.

Wednesday, September 17, 2014

int memcmp(char* pSrc, char* pDest, size_ len)
{
  // assuming parameter validation
  while (len--)
  {
      if (*pSrc != *pDest)
      {
          return -1;
       } 
      pSrc++;
      pDest++;
  }
  return 0;
}
------------------------------------------------------------------------------------------------------------------
template<class C>
typename C::reverse_iterator find_last(const C& c, typename C::value_type v)
{
     typename C::reverse_iterator p = c.rbegin();
     while ( p != c.rend())
     {
         if (*p == v) return p;
         ++p;
      }
      return p;
}
------------------------------------------------------------------------------------------------------------------
using System;
using System.Collections.Generic;

class Program
{
    static void Main()
    {
List<Tuple<string,int, int>> list = new List<Tuple<string, int, int>>();
list.Add(new Tuple<string, int, int>("a", 1, 5));
list.Add(new Tuple<string, int, int>("b", 2, 4));
list.Add(new Tuple<string, int, int>("c", 3, 6));

List<Tuple<string, int>> pairs = new List<Tuple<string, int>>();
list.ForEach(x =>
{
pairs.Add(new Tuple<string, int> (x.Item1, x.Item2));
pairs.Add(new Tuple<string, int> (x.Item1, x.Item3));
});
pairs.Sort((a, b) => a.Item2.CompareTo(b.Item2));

foreach (var element in pairs)
{
   Console.WriteLine(element);
}
    }
}
------------------------------------------------------------------------------------------------------------------
// This solution was taken from another blog.
public static void place8Queens(int row, int ld, int rd, int lim, ref int ans)
{
  if (row == lim)
{
  ans++;
  return;
}
  int pos = lim & (~(row | ld | rd));
while (pos != 0)
{
  int p = pos & (-pos);
  pos -= p;
  place8Queens(row + p, (ld + p) << 1, (rd + p) >> 1, lim, ref ans);

}
}
  public static int solve(int n)
{
  int ans = 0;
int lim = (1 << 8) - 1;
place8Queens(0, 0, 0, lim, ref ans);
return ans;
}

Another approach and same answer of 92 possible solutions (needs to be verified):
        public static int place8QueensRecursive(int rstart, int rend, int cstart, int cend, ref int[,] Board)
        {
            if (cstart > cend)
            {
                if (CheckBoardState(ref Board))
                {
                    //PrintBoard(ref Board);
                    return 1; // success
                }
                else
                    return 0;
            }
         
            int countOfPossibilities = 0;
            for (int initialr = rstart; initialr < 8; initialr++)
            {
                int initialc = cstart;
                if (Board[initialr, initialc] != 0) continue;
         
                // each Queen has to be in a row by herself and a column by herself
                Board[initialr, initialc] = 2; // marks the position of the queen
             
                // MarkUnavailable(ref Board) or inline it here ;
                for (int i = 0; i < 8; i++)
                    if (Board[i, initialc] == 0) Board[i, initialc] = 1; // unavailable
                for (int j = cstart; j < 8; j++)
                    if (Board[initialr, j] == 0) Board[initialr, j] = 1; // unavailable
                for (int k = -8; k < 8; k++)
                    if ((initialr + k) >= 0 && (initialc + k) >= cstart &&
                        (initialr + k) < 8 && (initialc + k) < 8 &&
                        Board[initialr + k, initialc + k] == 0) Board[initialr + k, initialc + k] = 1;
                for (int k = -8; k < 8; k++)
                    if ((initialr + k) >= 0 && (initialc - k) >= cstart &&
                        (initialr + k) < 8 && (initialc - k) < 8 &&
                        Board[initialr + k, initialc - k] == 0) Board[initialr + k, initialc - k] = 1;
             
             
                countOfPossibilities += place8QueensRecursive(rstart, rend, cstart + 1, cend, ref Board);

                // MarkAvailable or inline it here
                Board[initialr, initialc] = 0;
                var queenOccupiedRows = new List<int>();
                var queenOccupiedCols = new List<int>();
                for (int l = 0; l < 8; l++)
                    for (int m = 0; m < cstart; m++)
                    {
                        if (Board[m, l] == 2) { queenOccupiedRows.Add(m); queenOccupiedCols.Add(l); }
                    }
                for (int i = 0; i < 8; i++)
                {
                    if (Board[i, initialc] == 1
                        && queenOccupiedRows.Any(x => x == i) == false
                        ) Board[i, initialc] = 0; // available
                }
                for (int j = cstart; j < 8; j++)
                    if (Board[initialr, j] == 1
                        && queenOccupiedCols.Any(x => x == j) == false) Board[initialr, j] = 0; // available
                for (int k = -8; k < 8; k++)
                    if ((initialr + k) >= 0 && (initialc + k) >= cstart &&
                        (initialr + k) < 8 && (initialc + k) < 8 &&
                        Board[initialr + k, initialc + k] == 1
                        && queenOccupiedRows.Any(x => x == (initialr + k)) == false
                        && queenOccupiedCols.Any(x => x == (initialc + k)) == false
                        ) Board[initialr + k, initialc + k] = 0;
                for (int k = -8; k < 8; k++)
                    if ((initialr + k) >= 0 && (initialc - k) >= cstart &&
                        (initialr + k) < 8 && (initialc - k) < 8 &&
                        Board[initialr + k, initialc - k] == 1
                        && queenOccupiedRows.Any(x => x == (initialr + k)) == false
                        && queenOccupiedCols.Any(x => x == (initialc - k)) == false                      
                        ) Board[initialr + k, initialc - k] = 0;
            }

            return countOfPossibilities;
         
        }
        public static void Reset(int rstart, int rend, int cstart, int cend, ref int[,] Board)
        {
            for (int i = rstart; i < rend + 1; i++)
                for (int j = cstart; j < cend + 1; j++)
                    Board[i,j] = 0;
        }

        public static bool CheckBoardState(ref int[,] Board)
        {
            int numQueens = 0;
            for (int i = 0; i < 8; i++)
                for (int j = 0; j < 8; j++)
                {
                    switch(Board[i,j])
                    {
                        case 0: throw new InvalidOperationException();
                        case 1: break;
                        case 2: numQueens++;
                            break;
                        default:
                            throw new InvalidOperationException();
                    }
                }

            if (numQueens != 8)
                return false;
         
            // no row has two queens
            for (int i = 0; i < 8; i++)
            {
                int queensInARow = 0;
                for (int j = 0; j < 8; j++)
                {
                    if (Board[i, j] == 2)
                    {
                        queensInARow++;
                        if (queensInARow > 1)
                            return false;
                    }
                }
            }

            // no column has two queens
            for (int j = 0; j < 8; j++)
            {
                int queensInACol = 0;
                for (int i = 0; i < 8; i++)
                {
                    if (Board[i, j] == 2)
                    {
                        queensInACol++;
                        if (queensInACol > 1)
                            return false;
                    }
                }
            }

            // no topleft-to-rightbottom diagonal has two queens
            for (int i = 0; i < 8; i++)
            {
                int j = 0;
                int queensInLRDDiagonal = 0;
                for (int k = -8; k < 8; k++)
                {
                    if (i + k >= 0 && j + k >= 0 && i + k < 8 && j + k < 8 && Board[i + k, j + k] == 2)
                    {
                        queensInLRDDiagonal++;
                        if (queensInLRDDiagonal > 1)
                            return false;
                    }
                }
            }

            for (int j = 0; j < 8; j++)
            {
                int i = 0;
                int queensInLRDDiagonal = 0;
                for (int k = -8; k < 8; k++)
                {
                    if (i + k >= 0 && j + k >= 0 && i + k < 8 && j + k < 8 && Board[i + k, j + k] == 2)
                    {
                        queensInLRDDiagonal++;
                        if (queensInLRDDiagonal > 1)
                            return false;
                    }
                }
            }

            // no topright-to-bottomleft diagonal has two queens
            for (int j = 0; j < 8; j++)
            {
                int i = 0;
                int queensInRLDiagonal = 0;
                for (int k = -8; k < 8; k++)
                {
                    if (i + k >= 0 && j - k >= 0 && i + k < 8 && j - k < 8 && Board[i + k, j - k] == 2)
                    {
                        queensInRLDiagonal++;
                        if (queensInRLDiagonal > 1)
                            return false;
                    }
                }
            }

            for (int i = 0; i < 8; i++)
            {
                int j = 7;
                int queensInRLDiagonal = 0;
                for (int k = -8; k < 8; k++)
                {
                    if (i + k >= 0 && j - k >= 0 && i + k < 8 && j - k < 8 && Board[i + k, j - k] == 2)
                    {
                        queensInRLDiagonal++;
                        if (queensInRLDiagonal > 1)
                            return false;
                    }
                }
            }
         
            return true;
        }

        public static void PrintBoard(ref int[,] Board)
        {
            for (int i = 0; i < 8; i++)
            {
                for (int j = 0; j < 8; j++)
                {
                    Console.Write("{0} ", Board[i, j]);
                }
                Console.WriteLine();
            }
            Console.WriteLine();
        }

gives output as :
Count=92 and sample as
1 1 2 1 1 1 1 1
1 1 1 1 1 2 1 1
1 1 1 2 1 1 1 1
1 2 1 1 1 1 1 1
1 1 1 1 1 1 1 2
1 1 1 1 2 1 1 1
1 1 1 1 1 1 2 1
2 1 1 1 1 1 1 1

------------------------------------------------------------------------------------------------------------------

Given n points on a 2D plane, find the maximum number of points that lie on the same straight line
Approach 1) // Least square method ?
m = (Sum(xy) - (Sum(x)Sum(y))/n) / (Sum(x^2) - Sum(x^2)/n)
b = Avg(y) - mAvg(x)
or
Approach 2)
pair-wise slope in n ^ 2 iterations.

int pointsInALine(List<Point> points)
{
double [,] slopes = new double[points.Length, points.Length] {};
for (int i = 0; i < points.Length; i++)
{
  for (int j = 0; j < points.Length; j++)
  {
      if ( i != j)
      {
        var ydiff = (points[i].y - points[j].y);
        var xdiff = (points[i].x - points[j].x);
         var slope = xdiff != 0 ? ydiff/xdiff : infinity;
         points[i,j] = slope;
       }
  }
}
var  slopeFrequency = new  Dictionary<double, int>();
for (int i = 0; i < points.Length; i++)
{
  for (int j = 0; j < points.Length; j++)
  {
      if ( i != j)
      {
           if (slopeFrequency.Keys.Contains(slopes[i,j]))
             slopeFrequency[slopes[i,j]]++;
          else
             slopeFrequency.Add(slopes[i,j], 1);
       }
  }
}
var maxSlopeFrequency = slopeFrequency.Values.max();
int maxPoints = 0;
var pointsInALine = new List<Point>();
for (int i =0; i < Points.Length; i++)
   for (int j = 0; j < Points.Length; j++)
       if (i != j && slopes[i,j] == maxSlopeFrequency)
       {
            if(pointsInALine.Contains(Points[i]) == false) pointsInALine.Add(Points[i]);
            if(pointsInALine.Contains(Points[j]) == false) pointsInALine.Add(Points[j]);
        }
return pointsInALine.Count;
}
------------------------------------------------------------------------------------------------------------------
LRU

public class LRUCache{
private List<int> keys {get; set;}
private Hashtable<int,int> keyValues {get;set;}
    public LRUCache(int capacity) {
        entries = new List<int>(capacity);
        keyValues = new List<int, int>(capacity);
    }
   
    int get(int key) {
        int index = keys.IndexOf(key);
        if (index != -1)
        {
               keys.MoveToFront(index); // extension method
               return keyValues[key];
        }
        else
              throw NotFoundException();
    }
   
    void set(int key, int value) {
        int index = keys.IndexOf(key);
        if (index == -1)
        {
             keyValues.Add(key, value);
             if (keys.Count == keys.Capacity){
                 var item = keys.RemoveLast();
                 keyValues.Remove(item);
             }
             keys.Insert(0,key); // add front
        }
        else
              throw InvalidOperationException();
    }
};

void memmove(char* pSrc, char* pDest, size_t len)
{
 if (pDest < pSrc || pSrc + len < pDest)
{
 while (len--)
       *pDest++ = *pSrc++;
}
}

A list of graph algorithms :

Minimum spanning trees:  The generic minimum spanning tree algorithm grows a minimum spanning tree  from an undirected graph G= (V,E) with a weight function w by using a greedy approach to the problem. It grows the MST by adding one edge at a time that is safe for the MST. At each step of the iteration the the set of edges maintained, say A, is a subset of the minimum spanning tree. If (S, V-S) is a cut in the graph respecting A and there is a light edge (u,v) crossing the cut then that light edge is safe for A.

Kruskal and Prim algorithm: These algorithms are elaborations of the greedy generic algorithm mentioned above. In Kruskal’s algorithm the set A is a forest. An edge that connects any two trees in the forest and has least weight is added as a safe edge.  Kruskal’s algorithm sorts the edges of E into a non-decreasing order by weight w. For each edge (u,v) belonging to E, taken in non-decreasing order by weight, if the set that u belongs to is different from the set v belongs to then the edge connecting (u,v) is added to the graph.

In Prim’s algorithm, the set A forms a single tree. A light edge crossing the cut that is safe for the tree is added. The algorithm maintains a min priority queue Q to contain all the vertices. For each vertex extracted from this queue, it adds the edge connecting vertices adjacent to the extracted vertex whose weight is minimum and updates the parent and the key.

Bellman Ford algorithm: This algorithm solves the single source shortest path problem even for edges with negative weights. It checks whether there is a negative weight cycle that is reachable from the source and returns false otherwise it returns the shortest path and their weights. The algorithm uses relaxation progressively decreasing an estimate d[v] on the weight of the shortest path s to each vertex v until it achieves the actual shortest path weight.

Dijkstra’s algorithm: This algorithm solves the single source shortest path problem for edges that have non-negative weights. It maintains a min priority queue of vertices, keyed by their d-values and it extracts each vertex from the queue, it relaxes the edges connecting to the adjacent vertices.

Floyd-Warshall algorithm: This algorithm computes the shortest path weights in a bottom-up manner. It exploits the relationship between a pair of intermediary vertices and the shortest paths that pass through them. If there is no intermediary vertex, then such a path has at most one edge and the weight of the edge is the minimum. Otherwise, the minimum weight is the minimum of the path from I to j or the path from I to k and k to j. Thus this algorithm iterates for each of the intermediary vertices for each of the given input of an N*N matrix to compute the shortest path weight.

Johnson’s algorithm: This algorithm finds the shortest path between all pairs in shortest and it has better time complexity than Floyd Warshall even for sparse graphs. It uses both subroutines from both Bellman-Ford algorithm  and Dijkstra’s algorithms to report that there is a negative weight cycle or a matrix of the shortest paths weights for all pairs of vertices. It uses the technique of reweighting which works as follows: If all the edges are non-negative, it uses Dijkstra’s algorithm iteratively from each vertex. If the edges have negative weights but no negative weight cycle, it transforms the graph into a one with non-negative weights and then applies the previous step.


Ford Fulkerson method: This method is based on residual networks (one that can admit more flows), augmenting paths, and cuts. The method is iterative : It starts with f(u,v) = 0 for all (u,v) belongs to V giving an initial flow of value 0 from the source s to the sink t along which we can send more flow, and then augmenting the flow along this path.

Tuesday, September 16, 2014

In today's post we quickly review the Longest Common Subsequence and a hashing function
Let c[i, j] be the length of the longest common subsequence of the prefix  of X[1..i] and Y[1..j]. Recursion:
c[i, j] = { 0 if i =  0 or j = 0
          = { c[i-1, j-1] + 1 if i,j > 0 and xi = yj
          = { max( c[i,j-1], c[i-1,j] ) otherwise
This way we utilize the solutions we computed. We compute the table c[i,j] bottom-up and also store the value b[i,j] that captures whether c[i,j-1], c[i-1,j] or c[i-1,j-1] + 1is optimum.  We therefore find the longest common subsequence by walking backwards on the path from the last cell of b[i,j] to the first.

Hashing function
template<class T> size_t Hash<T>::operator()(const T& obj)
{
  size_t len = sizeof(obj);
  size_t res = 0;
  const char* ptr = reinterpret_cast<const char*>(&obj);
  while (len--) res = (res << 1) ^ *ptr++;
  return res;
}

Void memcpy ( char* src, char* dest, size_t len)
{
While (len--){
      *dest++ = *src++;
}
}

void GaussianAverage(char* pSrc, char* pDest, int i, int j, int row, int col)
{
  // parameter validation
  int val = pSrc[i * col + j];
  int numNeighbors = 0;
  for (int m = i - 2; m <= i+2; m++)
        for (int n = j- 2; n <= j+2; n++)
           if ( m >= 0 && n >= 0 && m < row && n < col )
           {
                     val +=  pSrc[m * col + n]; 
                     numNeighbors++;
            }
 val = val / (numNeighbors + 1);
 pDest[m * col + n]   = val;
}

Monday, September 15, 2014

In Today's post, we will be reviewing OpenStack. This is an opensource software stack for Cloud that works on Linux and Debian.  It has the following components:
Nova for the computing fabric over the commodity hardware. It's a IaaS offering which can work with virtual servers, servers, storage, load balancers, and other infrastructure. It facilitates concurrent programming, database access and messaging
Swift for object storage. It promotes redundancy by storing objects and files on different computers.
This is a replacement for CloudFiles.
Cinder is one layer below. It is a block storage system that can work with different storage platform from vendors.
Neutron is a networking stack that features network configurations by user and app groups.
Horizon provides a GUI dashboard to automate cloud based resources.
Keystone provides an identity service for authentication and integrates with LDAP. AD integration is provided by some storage platform vendors.
Ceilometer provides all the instrumentation or telemetry for billing.
Heat provides an app management framework using REST and Query API.
Trove works with traditional databases.
Sahara provides an elastic map reduce framework that can work with Hadoop.
The OpenStack shared services consist of the Compute Networking and Storage stacks and these can operate on the Hypervisor and Standardized Hardware. One of the benefits of the OpenStack is that you need not know which stack is comprised of what and what vendor is providing it. The APIs abstract those away and provide a reliable set to work with. They consist of the following:
Block Storage Service API In the OpenStack Swift architecture, the storage consists of three components the Object Storage Service, the container storage service and the account storage service. The account layer process handles requests  regarding metadata and the individual account  The container server processes handles requests regarding container metadata or object list. The object server process is responsible for the actual storage of objects on the drives of it's node. A proxy is used to compute a hash of the storage location. An object ring is a modified consistent hashing ring that enables an account/object/container path to be mapped to partitions.
Compute API A compute worker manages computing instances on host machines.and includes commands to run, terminate, reboot instances, attach/detach volumes etc.
Identity Service API enables provisioning certificates for PKI, integrating with LDAP, token binding, user CRUD with logging and monitoring etc.
Image Service API Images are created small and are untouched. Image and runtime state are used to create instances. cinder-volume is mapped separately.
Networking API: networking services can run on different hosts and includes a cloud controller host, a network gateway host, and a number of hypervisors for hosting virtual machines.
Object Storage API such as https://swift.example.com/v1/account/container/object


Sunday, September 14, 2014

    class Program
    {
        static void Main(string[] args)
        {
            string numbers = "1,2,3,4,5,6,7,8,9";
            Console.WriteLine("Sum = {0}", ToNumbers(numbers, -1).Sum());
            List<int> numerals = Reverse(numbers);
            Console.WriteLine(ReverseString(numbers));
        }

        public static List<int> ToNumbers(string commaSeparatedNumbers, int start)
        {
            if (commaSeparatedNumbers == null) return null;
            var candidates = commaSeparatedNumbers.Split(new char[] { ',' }).ToList();
            candidates.RemoveRange(0, start - 1 > 0 && start < candidates.Count() ? start - 1 : 0);
            Converter<string, Int32> converter = s => { Int32 result; return Int32.TryParse(s, out result) ? result : 0; };
            return candidates.ConvertAll<Int32>(converter).ToList();
        }

        public static string ToString(List<int> numbers, int start)
        {
            string result = String.Empty;
            numbers.ForEach(x => result += x.ToString() + ", ");
            result = result.TrimEnd(new char[] {',', ' '});
            return result;
        }

        public static List<int> Reverse(string commaSeparated)
        {
            var numbers = ToNumbers(commaSeparated, 0);
            numbers.Reverse();
            Console.WriteLine(ToString(numbers, 0));
            return numbers;
        }

        public static string ReverseString(string commaSeparated)
        {
            var words = commaSeparated.Split(new char[] { ',', ' ' });
            var reversed = words.Aggregate((sent, next) => next + ", " + sent);
            return reversed;
        }
    }

Sum = 45
9, 8, 7, 6, 5, 4, 3, 2, 1
9, 8, 7, 6, 5, 4, 3, 2, 1