Cluster computing

A decision tree classifier builds a model to predict the classification based on a yes/no branch based on a node’s criteria. It checks each item in the list against the model and predicts a category. The decision trees are notable for being easy to read and interpret. Classifying in a decision tree is generally easy but training it is usually trickier. The divisions are usually based on entropy which is the amount of disorder in a set and is measured as follows:

P(i) = frequency(outcome) = count(outcome) / count(total rows)

Entropy = sum of p(i) * log(p(i)) for all outcomes

A low entropy tells us that the set is homogeneous. To determine the dividing variable, we find the information gain based on the entropy which is determined as follows:

Weight1 = size of subset1 / size of original set

Weight2 = size of subset2 / size of original set

Gain = entropy(original) – weight1*entropy(set1) – weight2*entropy(set2). Based on the dividing variable, the first node can be created. This can then be further divided to form a decision tree.

Neural Networks can also be used as a classifier. For example, a simple neural network can be used for altering the ranking of search results based on what link the users have clicked in the past. It works on the basis of giving a number to every link predicting that the link with the highest number would be the one that the user would click. The numbers are thus used to change the rankings. There are many different kinds of neural networks. As an example, there is a multilayer perceptron network that is named because it has a layer of input neurons that feed into one or more layers of hidden neurons. The output of one layer is fed as input to the next layer. If there are three nodes in the first layer where the output of first is negative and the last is positive, and the middle influences the most for a given input, the classification will result from the middle. Usually a combination of the different layers will determine the end result.

Support Vector Machines are sophisticated classification machines. These build a predictive model by finding the dividing line between two categories. In other words, the data is most distant to these lines and one of them is usually chosen as the best. The points that are closest to the line are the ones that determine the line and are called support vectors. Once the line is found, classifying is just a preference for putting the data in the right category.

I take a brief break to discuss a coding exercise for Gray code.
Gray code is also known as reflected binary code since the 0 and 1 sequence in a bit position is reflected during single bit changes between numbers leading up to the given number.
To convert to gray code, we write the number in its binary notation first. Say 9 is 1001.
the digits d1, d2, ... dn. If the dn-1 is 1 then substitute dn with 1-dn and proceed forward to dn-1 otherwise leave it unchanged and proceed forward. The resulting number is the binary reflected Gray code. 9's gray code is 1101.
The reverse conversion from a Gray code (g1, g2, .. gn-1) to the binary notation for a number is done by calculating
Sum(n) = (Sum 1 <= i <= n-1 (gi)) (mod 2)
If this computes as 1, then replace gn by 1-gn, otherwise leave it unchanged.
public static string GrayCode(string number)
{
char[] binary = number.ToCharArray();
for (int i = binary.Length - 1; i > 0; i--)
{
if (binary[i - 1] == '1')
{
binary[i] = binary[i] == '1' ? '0' : '1'; // set 1-x
}
}
return new String(binary);
}

Another coding question is for buy/sell of stocks. and max profit. The trick here is that a stock must be bought before it is sold. and this can be repeated as many times just that the lookahead is for the number of days ahead of the buying date.
        public static int Profit(int[] prices)
        {
            if (prices == null || prices.Length <= 0) return 0;
            int globalProfit = 0;
            int profit = 0;
            for (int i = 1; i < prices.Length; i++)
            {
                if (prices[i] > prices[i - 1])
                    profit += prices[i] - prices[i - 1];
                else
                {
                    globalProfit += profit;
                    profit = 0;
                }
            }
            globalProfit += profit;
            return globalProfit;

        }

            var prices1 = new int [] { 3, 6, 9 };
            var prices2 = new int [] { 9, 6, 3 };
            var prices3 = new int [] { 3, 9, 6 };
            var prices4 = new int [] { 6, 9, 3 };
            var prices5 = new int [] { 6, 3, 9 };
            var prices6 = new int [] { 9, 3, 6 };
            var prices7 = new int [] { 9, 9, 9 };

            Console.WriteLine("expected = {0}, actual = {1}", 6, Profit(prices1));
            Console.WriteLine("expected = {0}, actual = {1}", 0, Profit(prices2));
            Console.WriteLine("expected = {0}, actual = {1}", 6, Profit(prices3));
            Console.WriteLine("expected = {0}, actual = {1}", 3, Profit(prices4));
            Console.WriteLine("expected = {0}, actual = {1}", 6, Profit(prices5));
            Console.WriteLine("expected = {0}, actual = {1}", 3, Profit(prices6));
            Console.WriteLine("expected = {0}, actual = {1}", 0, Profit(prices7));
            var prices = new int[] { 5, 4, 3, 2, 1, 8, 5, 9, 11 };
            Console.WriteLine("{0}", Profit(prices));
expected = 6, actual = 6
expected = 0, actual = 0
expected = 6, actual = 6
expected = 3, actual = 3
expected = 6, actual = 6
expected = 3, actual = 3
expected = 0, actual = 0
13

Combination of string:
public static void Combine(ref List<char> input, ref List<char> candidate, ref List<List<char>> sequences, int level, int start)
{
for (int i = start; i < input.Count; i++)
{
if (candidate.Contains(input[i]) == false)
{
candidate[level] = input[i];
if (candidate.Count == input.Count)
{
sequences.Add(candidate);
Console.WriteLine("{0}", new string(candidate.ToArray()));
}
if (i < input.Count - 1)
Combine(ref input, ref candidate, ref sequences, level + 1, start + 1);
candidate[level] = '\0';
}
}
}

Note that permutations is also recursive.

Here is another interview question.
void PrintPrime(int countOfPrimes)
{
int i = 2;
int count = 0;
while (count <countOfPrimes)
{
if (isPrime(i))
{
Console.WriteLine("{0}", i);
count++;
}
i++;
if (i == INT_MAX) break;
}
}

void isPrime(int x)
{
for (int i = 2; i < x; i++)
if (x % i == 0) return false;
return true;
}

Cluster computing

Monday, September 22, 2014

No comments:

Post a Comment