Cluster computing

Tuesday, January 2, 2018

We resume our discussion on Monte Carlo Simulation of Synthetic Data Sets shortly. This is a powerful technique. It generates samples that are similar to the actual data set. A large number of samples generated enable us to get closer to the global minima. For example, for estimating the value of pi, when we have a large number of samples, the computed value could get very close to the actual value. Monte Carlo assumes that if the fitted parameters a0 is a reasonable estimate of the true parameters by minimizing the chi-square then the distribution of difference in subsequent parameters from a0 should be similar to that of the corresponding calculation with true parameters
The assumed a0 initial parameter set together with noise can be used to generate new data sets at the same values of the independent axis coordinate as the actual data set has with that axis. The synthetic data aligns with the actual data at these points of reference on the line. Now the interesting part is that the new dataset has the same relationship to a0 as the actual data has with the true parameter set.
We can generate many such data sets and they are called synthetic data sets. For each such dataset, we can fit a model and obtain the corresponding parameter set. This yields one data point for the difference between the parameter set corresponding to the synthetic data from the initial parameter set. The simulation can be run thousands of times to generate sufficient data points. This leads to a probability distribution which is M-dimensional. We can even calculate the standard deviation of the original parameter set after fit using the cumulative data point differences.
Historically simulations were used to test a previously understood deterministic problem. The sampling was used to generate uncertainities in the simulations. Monte Carlo simulations inverted this approach. It tries to solve deterministic problems using optimizations based on probabilistic interpretations. It draws samples from a probability distribution. Simulated Annealing is a special case of Monte Carlo which we had seen earlier here
#codingexercise
Count the number of ways to reach the n'th stair using 1,2 or 3
int GetCount(int n)
{
switch(n)
{
case 0:
return 0;
case 1:
return 1;
case 2:
return 2;
case 3:
return 4;
default:
return GetCount(n - 3) +
GetCount(n - 2) +
GetCount(n - 1);
}
}
The above can be rewritten with if then else statement.
f (4) = f (1) +f (2) +f (3) = 1 + 2 + 4 = 7
enumerated as
1 1 1 1
121
112
211
22
13
31
We can also similarly show f (5) = f (2)+f (3)+f (4) = 2 + 4 + 7 = 13
Note we could also write this dp using memoization and a bottom up manner computation.

Cluster computing

Tuesday, January 2, 2018

No comments:

Post a Comment