Cluster computing

Saturday, November 25, 2017

Today we start talking about correlation instead of regression. We saw that one of the best advantage of a linear regression is the prediction with regard to time as in independent variable. When the data point have many factors contributing to their occurrence, a linear regression gives an immediate ability to predict where the next occurrence may happen. This is far easier to do than come with up a model that behaves as good fit for all the data points. It gives an indication of the trend which is generally more helpful than the data points themselves. Also a scatter plot s only changing in one dependent variable in conjunction with the independent variable. Thus lets us pick the dimension we consider to fit the linear regression independent of others. Lastly, the linear regression also gives an indication of how much the data is adhering to the trend via the estimation of errors.
We also saw how model parameters for linear regressions are computed. We saw how the best values for the model parameters can be determined from the
The correlation coefficient describes the strength of the association between two variables. If the two variables, the correlation coefficient tends to +1. If one decreases as another increases, the correlation coefficient tends to -1. If they are not related to one another, the correlation coefficient stays at zero. In addition, the correlation coefficient can be related to the results of the regression. This is helpful because we now find a correlation not between parameters but between our notions of cause and effect. This also leads us to use correlation between any x and y which are not necessarily independent and dependent variables. This follows from the fact that the correlation coefficient (denoted by r) is symmetric in x and y. This differentiates the coefficient from the regression.
Non-linear equations can also be "linearized" by selecting a suitable change of variables. This is quite popular because it makes the analysis simpler. But reducing the dimensions is prone to distortion of the error structure. It is a oversimplification of the model. It violates key assumptions and impacts the resulting parameter values. All of this contributes toward incorrect predictions and are best avoided. Non-linear squares analysis has well defined techniques that are not too difficult with computing. Therefore it is better to do non-linear square analysis when dealing with non-linear inverse models.
#codingexercise
There are 4 persons (A, B, C and D) who want to cross a bridge in night.
A takes 1 minute to cross the bridge.
B takes 2 minutes to cross the bridge.
C takes 5 minutes to cross the bridge.
D takes 8 minutes to cross the bridge.
There is only one torch with them and the bridge cannot be crossed without the torch. There cannot be more than two persons on the bridge at any time, and when two people cross the bridge together, they must move at the slower person’s pace.
Can they all cross the bridge in 15 minutes ?
Solution: A and B cross the bridge. A comes back. Time taken 3 minutes. Now B is on the other side.
C and D cross the bridge. B comes back. Time taken 8 + 2 minutes. Now C and D are on the other side.
A and B cross the bridge. Time taken is 2 minutes. All are on the other side.
Total time spent is 3 + 10 + 2 = 15 minutes.
The combination chosen works for this example by observing the tradeoff between having at least one fast member available on the right to come back and the pairing of slow folks on the left to go to the right so that they are not counted individually.
However, for a general purpose we need to try out all cases with overlapping subproblems.

Friday, November 24, 2017

We were talking about linear regression. One of the best advantage of a linear regression is the prediction with regard to time as in independent variable. When the data point have many factors contributing to their occurrence, a linear regression gives an immediate ability to predict where the next occurrence may happen. This is far easier to do than come with up a model that behaves as good fit for all the data points. It gives an indication of the trend which is generally more helpful than the data points themselves. Also a scatter plot s only changing in one dependent variable in conjunction with the independent variable. Thus lets us pick the dimension we consider to fit the linear regression independent of others. Lastly, the linear regression also gives an indication of how much the data is adhering to the trend via the estimation of errors.
Non-linear equations can also be "linearized" by selecting a suitable change of variables. This is quite popular because it makes the analysis simpler. But reducing the dimensions is prone to distortion of the error structure. It is a oversimplification of the model. It violates key assumptions and impacts the resulting parameter values. All of this contributes toward incorrect predictions and are best avoided. Non-linear squares analysis has well defined techniques that are not too difficult with computing. Therefore it is better to do non-linear square analysis when dealing with non-linear inverse models.
#codingexercise

double maximizeScalarProductWhenSwappingIsAllowed(List<int> A, List<int>B)

{

assert (A.Count == B.Count);

A.Sort();

B.Sort()

A.Reverse();

B.Reverse();

Return GetScalarProduct(A,B);

}

Double GetScalarProduct(List<int>A, List<int>B)

{

Double result = 0;

For (int I = 0; < A.Count; i++)

{

result += A[I] * B[I];

}

Return result;

}

Thursday, November 23, 2017

We continue our discussion on inverse modeling to represent the system. An inverse model is a mathematical model that fits experimental data. It aims to provide a best fit to the data. We use the least squares error minimization to fit the data points. Minimizing Chi Square requires that we evaluate the model based on the parameters. One way to do this is to find where the derivatives with regard to the parameters are zero. This results in a general set of non-linear equations.
We talk about linear regression analysis next. This kind of analysis fits a line to a scatter plot of data points. The same least squares error discussed earlier also helps center the line to the data plot which we call a good fit. The model parameters are adjusted to determine this fit by minimizing error.
To determine the best parameters for the slope and intercept of the line, we calculate the partial derivatives with respect to them and set them to zero. This yields two equations to be solved for two unknowns. The standard error of the estimate quantifies the standard deviation of the data at a given value of the independent variable. Standard error of slope and intercept can be used to place confidence intervals.

One of the best advantage of a linear regression is the prediction with regard to time as in independent variable. When the data point have many factors contributing to their occurrence, a linear regression gives an immediate ability to predict where the next occurrence may happen. This is far easier to do than come with up a model that behaves as good fit for all the data points. It gives an indication of the trend which is generally more helpful than the data points themselves. Also a scatter plot s only changing in one dependent variable in conjunction with the independent variable. Thus lets us pick the dimension we consider to fit the linear regression independent of others.
Lastly, the linear regression also gives an indication of how much the data is adhering to the trend via the estimation of errors.

A MagicValue X is defined as one that satisfies the inequation KX <= Sum from M to N of Factorial(I) x Fibonacci(I) such that X is maximum
Given, M, N, and K approximate X
static double Fib(int x)
{
if (x <= 1) return 1;
return Fib(x-1) + Fib(x-2);
}
static double Fact(int x)
{
double result = 1;
while (x > 0)
{
result *= x;
x--;
}
return result;
}
static int GetMagicValue(int N, int M, int K)
{
double P = 0;
double prev = P;
for (int k = M; k >= N; k--)
{
double fib = Fib(k);
double fact = Fact(k);
double product = fib * fact;
prev = P;
P += product;
}
double result = P/K;
return Convert.ToInt32(Math.Floor(result));
}

Wednesday, November 22, 2017

We were discussing Sierpinksi triangles earlier. An equilateral white triangle gets split into four equilateral sub-triangles and the one at the center gets colored red. This process is repeated for all available white squares in each iteration. You are given an integer m for the number of lines following and an integer n in each line following that for the number of iterations for each of which we want an answer.

What is the total number of triangles after each iteration

We said the recursive solution is as follows:

static double GetRecursive(int n)

{

if (n == 0) return 1;

return GetRecursive(n-1) + GetRecursive(n-1) + GetRecursive(n-1)

+ 1 // for the different colored sub-triangle

+ 1; // for the starting triangle

}

The same problem can be visualized as one where the previous step triangle becomes one of the three sub-triangles in the next step.
In this case, we have

double GetCountRepeated(int n)

{

double result = 1;

for (int i = 0; i < n; i++)

{

result = 3 * result

+ 1 // for inner triangle

+ 1; // for outer triangle

}

return result;

}

Tuesday, November 21, 2017

We continue our discussion on inverse modeling to represent the system. An inverse model is a mathematical model that fits experimental data. It aims to provide a best fit to the data.
There are two ways by which we can select the appropriate model. The first is by observing trend lines which correspond to some well known mathematical formula. The second is on the observation of underlying physical processes which contribute towards the system. These physical interpretations contribute to model parameters.
In order to fit the data points, a model may use least squares of errors. the errors called residuals may be both positive or negative which result in inaccurate measure. Instead the squares of the errors can be minimized to give a better fit.
We used the least squares error minimization to fit the data points. Another way to do this is using Maximum likelihood estimation. The maximum likelihood estimation attempts to resolve this concern by asking "Given my set of model parameters, what is the probability that this data set occurred ?" This translates as likelihood for the parameters given the data.
The maximum likelihood estimation and the least squares error are related. For a Gaussian distribution, it is easy to see that the probability of the data set coming from the model parameters involves minimizing the negative natural log of probability which is the chi-square function of weighted residuals.
Minimizing Chi Square requires that we evaluate the model based on the parameters. One way to do this is to find where the derivatives with regard to the parameters are zero. This results in a general set of non-linear equations. The derivatives can be computed deterministically otherwise they can be approximated numerically using finite differences.
#codingexercise
Yesterday's problem on Sierpinski triangles can also be viewed to progress by a different pattern. The result of each step becomes one of the three subpatterns in the next triangle. Consequently, we can rewrite the method to count the number of triangles after each step as:

double GetCountRepeated(int n)

{

double result = 1;

For (int i = 0; i < n; i++)

{

result = 3 * result + 1 + 1;

}

Return result;

}

Monday, November 20, 2017

An equilateral white triangle gets split into four equilateral sub-triangles and the one at the center gets colored red. This process is repeated for all available white squares in each iteration. You are given an integer m for the number of lines following and an integer n in each line following that for the number of iterations for each of which we want an answer.

What is the total number of triangles after each iteration

For example n = 1 Answer = 5

n = 2, Answer = 17

n = 3, Answer = 53 (?)

namespace TriangleCounter

{

class Program

{

static void Main(string[] args)

{

int n;

Int32.TryParse(Console.ReadLine(), out n);

for (int i = 0; i < n; i++)

{

int m;

Int32.TryParse(Console.ReadLine(), out m);

Console.WriteLine("{0}", GetTriangleCount(m));

}

static double GetTriangleCount(int m)

{

double white = 1;

double red = 0;

double result = 1;

for (int i = 0; i < m; i++)

{

red = white;

white = white * 3;

result = result + white + red;

}

return result;

}

We could also do this recursively as GetTriangleCountRecursive (n) = 3 * GetTriangleCountRecursive (n-1) + 1 + 1 and the terminating condition of n ==0 => 1

Sunday, November 19, 2017

We continue our discussion on determining a model to represent the system.The example to summarize the text made use of a neural net model. A model articulates how a system behaves quantitatively.
An inverse model is a mathematical model that fits experimental data. It aims to provide a best fit to the data.
There are two ways by which we can select the appropriate model. The first is by observing trend lines which correspond to some well known mathematical formula. The second is on the observation of underlying physical processes which contribute towards the system. These physical interpretations contribute to model parameters.
In order to fit the data points, a model may use least squares of errors. the errors called residuals may be both positive or negative which result in inaccurate measure. Instead the squares of the errors can be minimized to give a better fit.
We used the least squares error minimization to fit the data points. Another way to do this is using Maximum likelihood estimation. This derives inspiration from the irony that we seem to be asking whether our model parameters are correct. In reality, as the authors of the biomedical modeling explained, there is only one correct parameter set. This is Mother Nature since the experimental data is the only corroboration of any assumption. The experimentation results are also not the source of truth if we are subjecting it to our mode of questioning but if we let the data come in without assumptions, then they are at least one correct parameter set. A model is our imagination which we tether to data. While there may be great reward for believing our model is correct, the community specifically cautions against this. If we go by the data we know that the model will be more reliable. the other way around of fitting the model has the problem that we could be self-gratifying ourselves. There has to be a trade-off in what is right and the community and peer review of the model helps validate this effort. But in order to bring it up to them, we have to make our model speak for itself and support it with our own validations.
The maximum likelihood estimation attempts to resolve this concern by asking "Given my set of model parameters, what is the probability that this data set occurred ?" This is translates as likelihood for the parameters given the data.
With the help of this measure, a model for the system can be accepted as representative of the assumptions. The inverse modeling is therefore also called "maximum likelihood estimation". The chi-square error measure and maximum likelihood estimation have a relation between the two. For a Gaussian distribution, it is easy to see that the probability of the data set coming from the model parameters involves minimizing the negative natural log of probability which is the chi-square function of weighted residuals. Furthermore, if the variance is uniform, then the chi-square function yields the sum of squared residuals defined earlier.

#codingexercise
yesterday we talked about a helper method to evaluate expressions involving single digit numbers and without brackets.
int eval (List <char> exp)
{
int result = 0;
bool add = true;
for (int I =0; i < exp.count; i++)
{
if (exp [i] == '-') {
add = false;
continue;
} else if (exp [i] == '+'){
add = true;
continue;
} else {
if (add) result += Convert.toInt (exp [i]);
else result -= Converter.toInt (exp [i]);
add = true;
}
return result;
}