Cluster computing

Tuesday, November 24, 2015

#codingexercise
double GetStandardDeviation(List<int> numbers)
{
double count = numbers.Count();
if (count <= 0) return 0;
double sum = numbers.Sum();
double mean = sum / count;
double variance = 0;
foreach (var number in numbers)
variance += (number-mean)*(number-mean);
double stddev = Math.sqrt(variance/count);
return stddev;
}

Parallel and Concurrent programming of Independent Component Analysis:

A famous example of Independent Component Analysis is given by the “cocktail party problem”, where a recording of people talking simultaneously in a room is studied to separate out the underlying speech signals from the sample data. This is based on two assumptions – one that the source signals are independent of each other and two the values of each source signal have non-gaussian distributions.

When the data is represented by a random vector x = (x1, x2, … xm) Transposed and the components as the random vectors (s1, s2, … sn) Transposed. The task is to transform the observed data x, using a linear static transformation W as s = Wx into maximally independent component s measured by some objective function on (s1, s2 … sn) of independence.

The data as represented by x above is considered to be a sum of the independent components sk weighted by the mixing weights ai,k for each xi. This is written as x = A.s Therefore W = A-inverse.

In ICA, the main goal is to compute the unmixing matrix W. The linear equation s= Wx can be solved by batch gradient descent to optimize W’s likelihood. Recall Likelihood is another term for conditional probability of the training vector from the Bayes definition mentioned earlier. The components can then be determined as the expression as

the singular column matrix [ 1-2g(wi Transposed . xi ) ] . xi Transposed

Here the wi.Transposed .x is the objective function which is defined as the mixing weighted data.

The first term of unity is the starting data point and g is the gradient. The components are updated by taking the difference between the starting and the adjustment. This batch gradient descent is performed till the components stabilize. At which point they can be used as predictors.

The original algorithm calls for the processing of the full training set before each update of the components.

In parallelization with sub groups of data, this is computed by each mapper and then aggregated by the reducer. The expression above can be calculated on each dataset separately but the reducer performs the updates to the components by aggregating the sum.

In modified parallelization, we retain the same procedure for every new data we encounter as the data becomes available. But for the already calculated old data where we have the components determined so far as well as the summary which is the expression above, we could use the summaries to calculate the components anew because there is no concept of overlapping and non-overlapping of data since all the data participates in the iterative update of the components – just that we use the summary instead of recomputing the sum. This is a trivial rearrangement of the subgroups of data into old and new as it becomes available but applying the same parallelization as with partitioned subgroups of data. An alternative could be to try and merge the components by weights against the summary because each component is linearly transformed to the next update.
Note that the components are batch updated so the merging will be across all components.

Cluster computing

Tuesday, November 24, 2015

No comments:

Post a Comment