Cluster computing

Thursday, November 23, 2017

We continue our discussion on inverse modeling to represent the system. An inverse model is a mathematical model that fits experimental data. It aims to provide a best fit to the data. We use the least squares error minimization to fit the data points. Minimizing Chi Square requires that we evaluate the model based on the parameters. One way to do this is to find where the derivatives with regard to the parameters are zero. This results in a general set of non-linear equations.
We talk about linear regression analysis next. This kind of analysis fits a line to a scatter plot of data points. The same least squares error discussed earlier also helps center the line to the data plot which we call a good fit. The model parameters are adjusted to determine this fit by minimizing error.
To determine the best parameters for the slope and intercept of the line, we calculate the partial derivatives with respect to them and set them to zero. This yields two equations to be solved for two unknowns. The standard error of the estimate quantifies the standard deviation of the data at a given value of the independent variable. Standard error of slope and intercept can be used to place confidence intervals.

One of the best advantage of a linear regression is the prediction with regard to time as in independent variable. When the data point have many factors contributing to their occurrence, a linear regression gives an immediate ability to predict where the next occurrence may happen. This is far easier to do than come with up a model that behaves as good fit for all the data points. It gives an indication of the trend which is generally more helpful than the data points themselves. Also a scatter plot s only changing in one dependent variable in conjunction with the independent variable. Thus lets us pick the dimension we consider to fit the linear regression independent of others.
Lastly, the linear regression also gives an indication of how much the data is adhering to the trend via the estimation of errors.

A MagicValue X is defined as one that satisfies the inequation KX <= Sum from M to N of Factorial(I) x Fibonacci(I) such that X is maximum
Given, M, N, and K approximate X
static double Fib(int x)
{
if (x <= 1) return 1;
return Fib(x-1) + Fib(x-2);
}
static double Fact(int x)
{
double result = 1;
while (x > 0)
{
result *= x;
x--;
}
return result;
}
static int GetMagicValue(int N, int M, int K)
{
double P = 0;
double prev = P;
for (int k = M; k >= N; k--)
{
double fib = Fib(k);
double fact = Fact(k);
double product = fib * fact;
prev = P;
P += product;
}
double result = P/K;
return Convert.ToInt32(Math.Floor(result));
}

Wednesday, November 22, 2017

We were discussing Sierpinksi triangles earlier. An equilateral white triangle gets split into four equilateral sub-triangles and the one at the center gets colored red. This process is repeated for all available white squares in each iteration. You are given an integer m for the number of lines following and an integer n in each line following that for the number of iterations for each of which we want an answer.

What is the total number of triangles after each iteration

We said the recursive solution is as follows:

static double GetRecursive(int n)

{

if (n == 0) return 1;

return GetRecursive(n-1) + GetRecursive(n-1) + GetRecursive(n-1)

+ 1 // for the different colored sub-triangle

+ 1; // for the starting triangle

}

The same problem can be visualized as one where the previous step triangle becomes one of the three sub-triangles in the next step.
In this case, we have

double GetCountRepeated(int n)

{

double result = 1;

for (int i = 0; i < n; i++)

{

result = 3 * result

+ 1 // for inner triangle

+ 1; // for outer triangle

}

return result;

}

Tuesday, November 21, 2017

We continue our discussion on inverse modeling to represent the system. An inverse model is a mathematical model that fits experimental data. It aims to provide a best fit to the data.
There are two ways by which we can select the appropriate model. The first is by observing trend lines which correspond to some well known mathematical formula. The second is on the observation of underlying physical processes which contribute towards the system. These physical interpretations contribute to model parameters.
In order to fit the data points, a model may use least squares of errors. the errors called residuals may be both positive or negative which result in inaccurate measure. Instead the squares of the errors can be minimized to give a better fit.
We used the least squares error minimization to fit the data points. Another way to do this is using Maximum likelihood estimation. The maximum likelihood estimation attempts to resolve this concern by asking "Given my set of model parameters, what is the probability that this data set occurred ?" This translates as likelihood for the parameters given the data.
The maximum likelihood estimation and the least squares error are related. For a Gaussian distribution, it is easy to see that the probability of the data set coming from the model parameters involves minimizing the negative natural log of probability which is the chi-square function of weighted residuals.
Minimizing Chi Square requires that we evaluate the model based on the parameters. One way to do this is to find where the derivatives with regard to the parameters are zero. This results in a general set of non-linear equations. The derivatives can be computed deterministically otherwise they can be approximated numerically using finite differences.
#codingexercise
Yesterday's problem on Sierpinski triangles can also be viewed to progress by a different pattern. The result of each step becomes one of the three subpatterns in the next triangle. Consequently, we can rewrite the method to count the number of triangles after each step as:

double GetCountRepeated(int n)

{

double result = 1;

For (int i = 0; i < n; i++)

{

result = 3 * result + 1 + 1;

}

Return result;

}

Monday, November 20, 2017

An equilateral white triangle gets split into four equilateral sub-triangles and the one at the center gets colored red. This process is repeated for all available white squares in each iteration. You are given an integer m for the number of lines following and an integer n in each line following that for the number of iterations for each of which we want an answer.

What is the total number of triangles after each iteration

For example n = 1 Answer = 5

n = 2, Answer = 17

n = 3, Answer = 53 (?)

namespace TriangleCounter

{

class Program

{

static void Main(string[] args)

{

int n;

Int32.TryParse(Console.ReadLine(), out n);

for (int i = 0; i < n; i++)

{

int m;

Int32.TryParse(Console.ReadLine(), out m);

Console.WriteLine("{0}", GetTriangleCount(m));

}

static double GetTriangleCount(int m)

{

double white = 1;

double red = 0;

double result = 1;

for (int i = 0; i < m; i++)

{

red = white;

white = white * 3;

result = result + white + red;

}

return result;

}

We could also do this recursively as GetTriangleCountRecursive (n) = 3 * GetTriangleCountRecursive (n-1) + 1 + 1 and the terminating condition of n ==0 => 1

Sunday, November 19, 2017

We continue our discussion on determining a model to represent the system.The example to summarize the text made use of a neural net model. A model articulates how a system behaves quantitatively.
An inverse model is a mathematical model that fits experimental data. It aims to provide a best fit to the data.
There are two ways by which we can select the appropriate model. The first is by observing trend lines which correspond to some well known mathematical formula. The second is on the observation of underlying physical processes which contribute towards the system. These physical interpretations contribute to model parameters.
In order to fit the data points, a model may use least squares of errors. the errors called residuals may be both positive or negative which result in inaccurate measure. Instead the squares of the errors can be minimized to give a better fit.
We used the least squares error minimization to fit the data points. Another way to do this is using Maximum likelihood estimation. This derives inspiration from the irony that we seem to be asking whether our model parameters are correct. In reality, as the authors of the biomedical modeling explained, there is only one correct parameter set. This is Mother Nature since the experimental data is the only corroboration of any assumption. The experimentation results are also not the source of truth if we are subjecting it to our mode of questioning but if we let the data come in without assumptions, then they are at least one correct parameter set. A model is our imagination which we tether to data. While there may be great reward for believing our model is correct, the community specifically cautions against this. If we go by the data we know that the model will be more reliable. the other way around of fitting the model has the problem that we could be self-gratifying ourselves. There has to be a trade-off in what is right and the community and peer review of the model helps validate this effort. But in order to bring it up to them, we have to make our model speak for itself and support it with our own validations.
The maximum likelihood estimation attempts to resolve this concern by asking "Given my set of model parameters, what is the probability that this data set occurred ?" This is translates as likelihood for the parameters given the data.
With the help of this measure, a model for the system can be accepted as representative of the assumptions. The inverse modeling is therefore also called "maximum likelihood estimation". The chi-square error measure and maximum likelihood estimation have a relation between the two. For a Gaussian distribution, it is easy to see that the probability of the data set coming from the model parameters involves minimizing the negative natural log of probability which is the chi-square function of weighted residuals. Furthermore, if the variance is uniform, then the chi-square function yields the sum of squared residuals defined earlier.

#codingexercise
yesterday we talked about a helper method to evaluate expressions involving single digit numbers and without brackets.
int eval (List <char> exp)
{
int result = 0;
bool add = true;
for (int I =0; i < exp.count; i++)
{
if (exp [i] == '-') {
add = false;
continue;
} else if (exp [i] == '+'){
add = true;
continue;
} else {
if (add) result += Convert.toInt (exp [i]);
else result -= Converter.toInt (exp [i]);
add = true;
}
return result;
}

Saturday, November 18, 2017

A unix-shell command to summarize text:

Programs in UNIX are founded on a principle – to do one thing and do that thing very well. The idea is that complex tasks can be easily composed from granular, well-defined and easy to use building blocks. Tools that process text on UNIX systems are limited to search, slice and dice, find and replace or differencing commands. These scan the text and perform their operations in stream like manner so that the output of a processor can result in the input of another processor. These don’t modify the original text and merely produce an output text or stream.

While usage of command line options is less popular than a user interface say from a browser in general, there is still a lot of value in the commands because they come in helpful to groups such as system administrators. Consequently, writing a program to do text processing jobs as tokenizing, stemming or summarizing text can be just as helpful. Imagine being able to read a text file with the contents equivalent to a novel and print a summary by extracting the sentences that together bring out the most of it.

Programs in UNIX are generally written in C language and are portable on almost all flavors of such systems. Text processing commands can also be written in C. There are more packages and helpers available in higher level languages such as Python which result in smaller or succinct code but it became possible to write such program on Unix using Word2Net. A text summarization command for Unix is made possible with the following steps:

Convert text to word vectors using word2net command

Generate k-means classification of the word vectors and find their centroids :

https://github.com/ravibeta/cexamples/blob/master/classifier.c

https://github.com/ravibeta/cexamples/blob/master/classifiertest.c

Generate summary using the centroids.

This is optional but it would be helpful to keep track of the position of the occurrence of the centroid as the collocation data is used to generate the word vectors so that the most contributing positions of the determined vector can be used to select sentences for the summary. Previously the selection of sentences was heavier in logic and occurred in step 3 as the words closest to the centroid were used to determine the location of the sentences to be selected but here we propose that the best occurrence of the centroid alone can be used to pick the sentences.

Packaging the program involves using conventional means to package shell scripts. A variety of utilities such as bpkg, jean or sparrow can be used. We can also make it available to install from repositories as we usually do on ubuntu by publishing the repository. These commands however require a tremendous of desktop diskspace because they require to download a corpus of text data that is usually very large in itself and available separately. A processed word vector file from a known corpus may alternatively be shipped with the script and while this may take disk space it usually bootstraps and deploys the command for immediate use.

Conclusion: Writing a text processor such as for text summarization is a viable option on UNIX flavor systems.

#codingexercise

Check if two expressions are same:

For example:

3 - (2-1) and 3-2+1

The expressions only differ in the presence of brackets
solution: one approach is to evaluate the expression using a stack. When we encounter a closing paranthesis, we pop all elements iand their operators and the paranthesis, evaluate it and push the result back on the stack we use a helper method eval that can evaluate expressions without brackets by performing the operations in sequence
bool AreEqual(List<char> exp1, List<char> exp2)
{
var s1 = new Stack<char>();
for(int i = 0; i < exp1.Count; i++)
{
if (exp1[i] == ")"){
var ret = new List<int>();
while (s1.empty() == false && s1.Peek() != "(")
ret.Add(s1.pop());
if (s1.empty()) throw new InvalidExpressionException();
s1.Pop();
s1.Push(eval(ret.reverse());
} else {
s1.Push(exp1[i]);
}
}
if (eval(s1.ToList()) == eval(exp2))
return true;
return false;
}

Friday, November 17, 2017

We continue our discussion on determining a model to represent the system. A model articulates how a system behaves quantitatively.
An inverse model is a mathematical model that fits experimental data. It aims to provide a best fit to the data.
There are two ways by which we can select the appropriate model. The first is by observing trend lines which correspond to some well known mathematical formula. The second is on the observation of underlying physical processes which contribute towards the system. These physical interpretations contribute to model parameters.
In order to fit the data points, a model may use least squares of errors. the errors called residuals may be both positive or negative which result in inaccurate measure. Instead the squares of the errors can be minimized to give a better fit.
#codingexercise
Find next greater number using the same digits as the given number. If no other number is possible return the original
This is an alternative to the technique discussed earlier.
Int GetNextHigher(int n)
{
Var digits = Integer.ToDigits(n);
For (int I = n+1; I < INT_MAX; I++)
{
Var newdigits = Integer.ToDigits(I);
If (digits.ToHashTable().Equals(newdigits.ToHashTable()))
Return I;
}
Return –1;
}
There are two things to observe here
1) we don't need to compare the hash table for each increment. if we have a slot array for digits 0 to 9, we only consider numbers, we quickly discard numbers that increment digits not in the original number
2) the range from the number to its next higher is very small.