Cluster computing

Tuesday, November 24, 2015

Modified Principal Component Analysis:

PCA can be considered as fitting an n-dimensional ellipsoid to the data, where each axis of the ellipsoid represents a principal component. If some axis of the ellipse is small, then the variance along that axis is also small, and by omitting the axis, the loss can be tolerated.

To find the axes of the ellipse, we subtract the mean from each variable to center the data around the origin. Then we compute the covariance matrix of the data and then calculate the eigen values and the corresponding eigenvectors of this covariance matrix. Then we must orthogonalize the set of eigen vectors and normalize each to become unit vectors. The resulting eigen vectors become the axis of the ellipsoid fitted to the data. The proportion of the variance that each eigenvector represents can be calculated dividing the eigenvalue corresponding to that eigenvector by the sum of all eigenvalues. The eigenvectors are ordered such that the greatest variance by some projection of the data comes to lie on the first co-ordinate, the second greatest variance on the second co-ordinate and so on.

When we parallelize this algorithm on chunks of data, each mapper computes the principle eigenvectors of the covariance matrix as 1/m times the summation of the support vector and support vector transposed from I = 1 to m and minus the second term comprising of mean and mean transposed. The latter component consisting of only means can also be expressed as summation form. Therefore the covariance matrix is all in the summation form.

The modified algorithm adjusts the summation forms the same way as in the examples of adjusting mean by taking weighted average of the summation where the weights are the number of associated data points.

In other words:

For new and now available data:

Compute the first component and the second component of the covariance matrix.

For older data:

Adjust the covariance matrix by taking the weighted average of the components.

Monday, November 23, 2015

In Naïve Bayes, the classic algorithm is as follows:  

Estimate probability of a training vector given a condition exists, (Aka likelihood) 

Estimate Probability of a training vector given  condition does not exist. 

We also calculate probability that the condition exists (Aka prior) and the condition doesn't exists. These are referred to as weights to the above. 

this is applied to entire data.  

Now we modify the previous algorithm for incremental data as follows: 

For new data: 

Calculate As above 

For old data: 

 Sum the previous calculated Target directly with the new target 

This is because the denominator remains the same and the weights remain the same. 

We now discuss discriminant analysis which uses concepts similar to Bayes.  In Bayes, we expressed the conditional probabilities. In DA we use the conditional probability and the prior for each target variable.  This method applies to normal distribution data only. Consequently we calculate the mean and the standard deviation. For computing the mean incrementally as and when data becomes available,  we could find the mean of the new data and take the weighted average of the previous and current mean. For computing the standard deviation as and when data becomes available, we use variance from the previous data and the variance from the new data and take their weighted mean. Then we take the square root of the result to find the new standard deviation. In other words we do the same as what we did for finding mean incrementally because the support variables or their squares are both in summation form. We argued earlier that summation form yields itself to parallelization as well as incremental updates. 

Let us find out now how to build discriminant analysis incrementally.thus far we have said that mean, standard deviation and probability can each be calculated incrementally. And DA can be expressed in a form that has two components : the first component is based on squared differences from the mean as numerator and standard deviation as denominator and the second component is based  on the natural logarithm of the probability. Both the components clearly rely on parameters that can be found incrementally. Consequently, the DA can be calculated at every stage. 

Sunday, November 22, 2015

Today we review a book called "Strategy Rules" by Yoffie and Cusumano.
Yoffie is an Harvard Business School professor and Cusumano is an MIT professor.
They offer a new take on three giants - Bill Gates, Andy Grove and Steve Jobs by bringing them together into one how-to guide on strategy and offer a new take on these three giants of entrepreneurship and technology by bringing them together.
According to the authors, the three men differ widely in their personalities but followed the same five rules for strategy and execution.
1. Look forward, Reason back. The first rule was to look forward into the future and reason back to the actions required today. They imagined what the world could be with their vision and have made a tremendous impact on the world. All three had the ability to determine in detail what needed to happen immediately to turn vision into reality.
2. Make Big Bets, Without betting the company. Gates Grove and Jobs were all leaders but not reckless They knew how to time or diversify their big bets so that even huge strategic bets were not reversible.
3. Build platforms AND ecosystems. Another important rule, the authors write was to build platforms and ecosystems as opposed to pursuing a product strategy. This we know from our review of "The Responsible Entrepreneur" is not just game changing but self-sustaining. As an example, Gates chose not to sell his product DOS to IBM that requested it but retained the right to license the system to other companies. The rest is history.
4. Exploit leverage and power - According to the authors all three men could play judo and sumo Judo requires using the opponents strength and turn it into weaknesses. As an example, Job could successfully negotiate with the music companies for a license to their music. Apple was so tiny and even its market share was 2%, yet the music companies made the agreement to Apple. This laid the foundation of the iTunes revolution. At the same time, the three did not hesitate to use their power.
5. Shape the company around your personal anchor - Personally each had their own taste, strengths and interest - Gates the coding genius, Grove - a precise engineer, and Jobs a wizard at design. The companies they built reflected these strengths. At their peak, Microsoft, Apple and Intel were worth 1.5 trillion. And these companies are the talk of the world.

1 / 1

In Naïve Bayes, the classic algorithm is as follows:

Estimate probability of a training vector given a condition exists, (Aka likelihood)

Estimate Probability of a training vector given condition does not exist.

We also calculate probability that the condition exists (Aka prior) and the condition doesn't exists. These are referred to as weights to the above.

this is applied to entire data.

Now we modify the previous algorithm for incremental data as follows:

For new data:

Calculate As above

For old data:

Sum the previous calculated Target directly with the new target

This is because the denominator remains the same and the weights remain the same.

Saturday, November 21, 2015

Friday, November 20, 2015

In continuation of our discussion on applying the parallelization technique on more machine learning algorithms, let us discuss a few more. The technique enables incremental processing of more and more data as they become available.
The summation form lends itself to this technique on the examples we say because the computations of the algorithm does not change and while the data may not all be divided and computed in parallel, it is in fact being parallelized and computed the same ways as the data becomes available one chunk after the other.

In these cases we are reusing the previous computation of the data and we don't have to revisit that data again. We make summarized information already which we adjust based on new data. In the case the summary from the first data set is the result of the computation from the first data set.

Now let we can generalize this technique to be applied to all summation forms.

In the summation form, the data is already chunked into subgroups and then the computation is performed. The reducer merely aggregates the results. In other words, the data doesn’t overlap so it is easier to aggregate the results. In a streaming mode for the data when it becomes available the same computations can be done as and when the non-overlapping data arrives. Therefore the summary from each data set now need to be combined and since they represent non-overlapping data, it becomes easier to combine since each summary will contribute independently to the overall picture and the reducer in this case will merely applying smoothing of the aggregated results.
Now let us take another example. We compute k-means. Here the distance between the data points and centroid called the Euclidean distance can be calculated on every new dataset as they arrive on just the same way as for any subgroup or the entire dataset. Moreover combining centroid is also possible because the data points that are close to either centroid will be roughly equidistant to the new midpoint between the previous and the current centroid of the same class. Thus given the update to the previous centroid from existing data is also straightforward from the new data. Recall that the original k-means had an iteration step that aaigns the data points to the nearest cluster. With the technique above we are already prepared for iteration and in fact we do it incrementally. While the original iterations refined the centroids by reassigning, we improve it with newer data giving better answers at each stage and eliminating reprocessing over entire data

Thursday, November 19, 2015

#coding interview question
1) Write a program for run-length encoding
for eg: AAABBCCCCC
is written as 3A2B5C
String RLE(String input)
{
if String.IsNullOrEmpty(input) return input;
int count = 0;
int prevChar = '/0';
StringBuilder output = "";
for (int i =0; i < input.Length; i++)
{
if (input[i] != prevChar){
if (prevChar != '/0') {
output += count.toString();
output += prevChar;
}
count = 1;
prevChar = input[i];
}else{
count+=1;
} // end if
} // end for
output += count.toString();
output += prevChar;
return output;
}

2) And in place encoding is as follows:
<work-in-progress>
Void RLE(ref StringBuilder input)
{
if String.IsNullOrEmpty(input.toString ()) return;

int count = 0;
int prev = -1; // previous index
int prevCount = 0;
Char prevChar = '/0';
String readOnlyCopy = input.SubString(0,1); // just the first bRead to get started
int countLen = count.ToString().Length;
int i = 0; // index to span reading the length of the string
bool readAhead = false;
while ( i < input.Length && readOnlyCopy.Length > 0)
{
if (prev == -1 || input[i] != input[prev] ){
if (prev != -1) {
prevChar = input [prev];
prevCount = count;
countLen = count.toString().Length; // actually we need fixed size packing say 4 bytes for number and one byte for char but the processing is similar
if (countLen >= count){
readOnlyCopy += input.subString (i+1, countLen); // to include count and char
readAhead = true;
}

Int w = prev;
If (prev==-1) w++;
If (countLen > input.Length - i ) input.resize (input.Length×2);
For (int j=0; j < countLen; j++){
Input[w]=count.toString ()[j];
w++;
}
Input [w] = prevChar;
// i = i + countLen;
readOnlyCopy.TrimLeft(0, countLen);

}
count = 1;
prev = i;
prevChar = input [prev];
Countlen=1;
}else{
count+=1;
} // end if
i++;
if (readahead == false)
readOnlyCopy += input.SubString(i, 1); // next bRead
else
readahead=false;
} // end while
// one last while lookahead
int w = prev;
if (w + count.toString().Length + 2 > input.Length) input.resize(input.Length *2 );
countLen = count.toString().Length;
for (int j = 0; j < countLen; j++){
Input[w] = count.toString()[j];
w++;
}
Input[w] = prevChar;
I = i + countLen;
w++;
Input[w] = '/0';
return ;
}