Cluster computing

Tuesday, November 21, 2017

We continue our discussion on inverse modeling to represent the system. An inverse model is a mathematical model that fits experimental data. It aims to provide a best fit to the data.
There are two ways by which we can select the appropriate model. The first is by observing trend lines which correspond to some well known mathematical formula. The second is on the observation of underlying physical processes which contribute towards the system. These physical interpretations contribute to model parameters.
In order to fit the data points, a model may use least squares of errors. the errors called residuals may be both positive or negative which result in inaccurate measure. Instead the squares of the errors can be minimized to give a better fit.
We used the least squares error minimization to fit the data points. Another way to do this is using Maximum likelihood estimation. The maximum likelihood estimation attempts to resolve this concern by asking "Given my set of model parameters, what is the probability that this data set occurred ?" This translates as likelihood for the parameters given the data.
The maximum likelihood estimation and the least squares error are related. For a Gaussian distribution, it is easy to see that the probability of the data set coming from the model parameters involves minimizing the negative natural log of probability which is the chi-square function of weighted residuals.
Minimizing Chi Square requires that we evaluate the model based on the parameters. One way to do this is to find where the derivatives with regard to the parameters are zero. This results in a general set of non-linear equations. The derivatives can be computed deterministically otherwise they can be approximated numerically using finite differences.
#codingexercise
Yesterday's problem on Sierpinski triangles can also be viewed to progress by a different pattern. The result of each step becomes one of the three subpatterns in the next triangle. Consequently, we can rewrite the method to count the number of triangles after each step as:

double GetCountRepeated(int n)

{

double result = 1;

For (int i = 0; i < n; i++)

{

result = 3 * result + 1 + 1;

}

Return result;

}

Monday, November 20, 2017

An equilateral white triangle gets split into four equilateral sub-triangles and the one at the center gets colored red. This process is repeated for all available white squares in each iteration. You are given an integer m for the number of lines following and an integer n in each line following that for the number of iterations for each of which we want an answer.

What is the total number of triangles after each iteration

For example n = 1 Answer = 5

n = 2, Answer = 17

n = 3, Answer = 53 (?)

namespace TriangleCounter

{

class Program

{

static void Main(string[] args)

{

int n;

Int32.TryParse(Console.ReadLine(), out n);

for (int i = 0; i < n; i++)

{

int m;

Int32.TryParse(Console.ReadLine(), out m);

Console.WriteLine("{0}", GetTriangleCount(m));

}

static double GetTriangleCount(int m)

{

double white = 1;

double red = 0;

double result = 1;

for (int i = 0; i < m; i++)

{

red = white;

white = white * 3;

result = result + white + red;

}

return result;

}

We could also do this recursively as GetTriangleCountRecursive (n) = 3 * GetTriangleCountRecursive (n-1) + 1 + 1 and the terminating condition of n ==0 => 1

Sunday, November 19, 2017

We continue our discussion on determining a model to represent the system.The example to summarize the text made use of a neural net model. A model articulates how a system behaves quantitatively.
An inverse model is a mathematical model that fits experimental data. It aims to provide a best fit to the data.
There are two ways by which we can select the appropriate model. The first is by observing trend lines which correspond to some well known mathematical formula. The second is on the observation of underlying physical processes which contribute towards the system. These physical interpretations contribute to model parameters.
In order to fit the data points, a model may use least squares of errors. the errors called residuals may be both positive or negative which result in inaccurate measure. Instead the squares of the errors can be minimized to give a better fit.
We used the least squares error minimization to fit the data points. Another way to do this is using Maximum likelihood estimation. This derives inspiration from the irony that we seem to be asking whether our model parameters are correct. In reality, as the authors of the biomedical modeling explained, there is only one correct parameter set. This is Mother Nature since the experimental data is the only corroboration of any assumption. The experimentation results are also not the source of truth if we are subjecting it to our mode of questioning but if we let the data come in without assumptions, then they are at least one correct parameter set. A model is our imagination which we tether to data. While there may be great reward for believing our model is correct, the community specifically cautions against this. If we go by the data we know that the model will be more reliable. the other way around of fitting the model has the problem that we could be self-gratifying ourselves. There has to be a trade-off in what is right and the community and peer review of the model helps validate this effort. But in order to bring it up to them, we have to make our model speak for itself and support it with our own validations.
The maximum likelihood estimation attempts to resolve this concern by asking "Given my set of model parameters, what is the probability that this data set occurred ?" This is translates as likelihood for the parameters given the data.
With the help of this measure, a model for the system can be accepted as representative of the assumptions. The inverse modeling is therefore also called "maximum likelihood estimation". The chi-square error measure and maximum likelihood estimation have a relation between the two. For a Gaussian distribution, it is easy to see that the probability of the data set coming from the model parameters involves minimizing the negative natural log of probability which is the chi-square function of weighted residuals. Furthermore, if the variance is uniform, then the chi-square function yields the sum of squared residuals defined earlier.

#codingexercise
yesterday we talked about a helper method to evaluate expressions involving single digit numbers and without brackets.
int eval (List <char> exp)
{
int result = 0;
bool add = true;
for (int I =0; i < exp.count; i++)
{
if (exp [i] == '-') {
add = false;
continue;
} else if (exp [i] == '+'){
add = true;
continue;
} else {
if (add) result += Convert.toInt (exp [i]);
else result -= Converter.toInt (exp [i]);
add = true;
}
return result;
}

Saturday, November 18, 2017

A unix-shell command to summarize text:

Programs in UNIX are founded on a principle – to do one thing and do that thing very well. The idea is that complex tasks can be easily composed from granular, well-defined and easy to use building blocks. Tools that process text on UNIX systems are limited to search, slice and dice, find and replace or differencing commands. These scan the text and perform their operations in stream like manner so that the output of a processor can result in the input of another processor. These don’t modify the original text and merely produce an output text or stream.

While usage of command line options is less popular than a user interface say from a browser in general, there is still a lot of value in the commands because they come in helpful to groups such as system administrators. Consequently, writing a program to do text processing jobs as tokenizing, stemming or summarizing text can be just as helpful. Imagine being able to read a text file with the contents equivalent to a novel and print a summary by extracting the sentences that together bring out the most of it.

Programs in UNIX are generally written in C language and are portable on almost all flavors of such systems. Text processing commands can also be written in C. There are more packages and helpers available in higher level languages such as Python which result in smaller or succinct code but it became possible to write such program on Unix using Word2Net. A text summarization command for Unix is made possible with the following steps:

Convert text to word vectors using word2net command

Generate k-means classification of the word vectors and find their centroids :

https://github.com/ravibeta/cexamples/blob/master/classifier.c

https://github.com/ravibeta/cexamples/blob/master/classifiertest.c

Generate summary using the centroids.

This is optional but it would be helpful to keep track of the position of the occurrence of the centroid as the collocation data is used to generate the word vectors so that the most contributing positions of the determined vector can be used to select sentences for the summary. Previously the selection of sentences was heavier in logic and occurred in step 3 as the words closest to the centroid were used to determine the location of the sentences to be selected but here we propose that the best occurrence of the centroid alone can be used to pick the sentences.

Packaging the program involves using conventional means to package shell scripts. A variety of utilities such as bpkg, jean or sparrow can be used. We can also make it available to install from repositories as we usually do on ubuntu by publishing the repository. These commands however require a tremendous of desktop diskspace because they require to download a corpus of text data that is usually very large in itself and available separately. A processed word vector file from a known corpus may alternatively be shipped with the script and while this may take disk space it usually bootstraps and deploys the command for immediate use.

Conclusion: Writing a text processor such as for text summarization is a viable option on UNIX flavor systems.

#codingexercise

Check if two expressions are same:

For example:

3 - (2-1) and 3-2+1

The expressions only differ in the presence of brackets
solution: one approach is to evaluate the expression using a stack. When we encounter a closing paranthesis, we pop all elements iand their operators and the paranthesis, evaluate it and push the result back on the stack we use a helper method eval that can evaluate expressions without brackets by performing the operations in sequence
bool AreEqual(List<char> exp1, List<char> exp2)
{
var s1 = new Stack<char>();
for(int i = 0; i < exp1.Count; i++)
{
if (exp1[i] == ")"){
var ret = new List<int>();
while (s1.empty() == false && s1.Peek() != "(")
ret.Add(s1.pop());
if (s1.empty()) throw new InvalidExpressionException();
s1.Pop();
s1.Push(eval(ret.reverse());
} else {
s1.Push(exp1[i]);
}
}
if (eval(s1.ToList()) == eval(exp2))
return true;
return false;
}

Friday, November 17, 2017

We continue our discussion on determining a model to represent the system. A model articulates how a system behaves quantitatively.
An inverse model is a mathematical model that fits experimental data. It aims to provide a best fit to the data.
There are two ways by which we can select the appropriate model. The first is by observing trend lines which correspond to some well known mathematical formula. The second is on the observation of underlying physical processes which contribute towards the system. These physical interpretations contribute to model parameters.
In order to fit the data points, a model may use least squares of errors. the errors called residuals may be both positive or negative which result in inaccurate measure. Instead the squares of the errors can be minimized to give a better fit.
#codingexercise
Find next greater number using the same digits as the given number. If no other number is possible return the original
This is an alternative to the technique discussed earlier.
Int GetNextHigher(int n)
{
Var digits = Integer.ToDigits(n);
For (int I = n+1; I < INT_MAX; I++)
{
Var newdigits = Integer.ToDigits(I);
If (digits.ToHashTable().Equals(newdigits.ToHashTable()))
Return I;
}
Return –1;
}
There are two things to observe here
1) we don't need to compare the hash table for each increment. if we have a slot array for digits 0 to 9, we only consider numbers, we quickly discard numbers that increment digits not in the original number
2) the range from the number to its next higher is very small.

Thursday, November 16, 2017

We continue our discussion on modeling. A model articulates how a system behaves quantitatively. Models use numerical methods to examine complex situations and come up with predictions. Most common techniques involved for coming up with a model include statistical techniques, numerical methods, matrix factorization and optimizations.
An inverse model is a mathematical model that fits experimental data. It aims to provide a best fit to the data. Values for the parameters are obtained from estimation techniques. It generally involves an iterative process to minimize the average difference. The quality of the inverse model is evaluated using well known mathematical techniques as well as intuition.
The steps for inverse modeling of data include:
1) selecting an appropriate mathematical model using say polynomial or other functions
2) defining an objective function that agrees between the data and the model
3) adjusting model parameters to get a best fit usually by minimizing the objective function
4) evaluating goodness of fit to data by not being perfect due to measurement noise
5) estimating accuracy of best fit parameter values
6) determining whether a much better fit is possible which might be necessary if there is local minima

There are two ways by which we can select the appropriate model. The first is by observing trend lines which correspond to some well known mathematical formula. The second is on the observation of underlying physical processes which contribute towards the system. These physical interpretations contribute to model parameters. ain order to fit the data points, a model may use least squares of errors. the errors called residuals may be both positive or negative which result in inaccurate measure. Instead the squares of the errors can be minimized to give a better fit.
#codingexercise
Find next greater number using the same digits as the given number. If no other number is possible return the original
Int GetNextGreater(uint n)
{
Var digits = Int.ToDigits(n);
If (digits.IsEmpty()) return 0;
Int I = 0;
Int J = 0;
// find the start for change in digits
For (int i = digits.Count-1;I > 0; I--)
{
If (digits[I] > digits[I-1]) {
break;
}
If (I == 0) return n;
//find the substitute and sort the digits from position
Int min = I;
For (j = I+1; j < digits.Count; j++)
If (digits[j] > digits[I-1] && digits[j] < digits[min])
min = j;
Swap(digits, min, I-1)
returnDigits.GetRange(0,I-1).Union(digts.GetRange(I, digits.Count-I+1).Sort()).ToList().ToInteger();

}

There is an alternative to getting the number as above. It simply rolls the number forward until each number has the other number has the same count of each digits.

Wednesday, November 15, 2017

We continue our discussion on modeling. A model articulates how a system behaves quantitatively. Models use numerical methods to examine complex situations and come up with predictions. Most common techniques involved for coming up with a model include statistical techniques, numerical methods, matrix factorization and optimizations.
A forward model is a mathematical model that is detailed enough to include the desired level of real world behaviour or features. It is used for simulating realistic experimental data which under the right constraints can be used to test hypothesis. While it may be too complicated to fit experimental data, it can be used to generate synthetic data sets for evaluating parameters.
An inverse model is a mathematical model that fits experimental data. It aims to provide a best fit to the data. Values for the parameters are obtained from estimation techniques. It generally involves an iterative process to minimize the average difference. The quality of the inverse model is evaluated using well known mathematical techniques as well as intuition.
A forward-inverse modeling is a process to combine data simulation with model fitting so that all parameters can be sufficiently evaluated for robustness, uniqueness and sensitivity. This is very powerful for improving data analysis and understanding the limitations.
A good inverse model should have a good fit and describe the data adequately so that some insights may follow. The parameters are unique and their values are consistent with the hypothesis and changes to experimental data in response to alterations in the system.
The steps for inverse modeling of data include:
1) selecting an appropriate mathematical model using say polynomial or other functions
2) defining an objective function that agrees between the data and the model
3) adjusting model parameters to get a best fit usually by minimizing the objective function
4) evaluating goodness of fit to data by not being perfect due to measurement noise
5) estimating accuracy of best fit parameter values
6) determining whether a much better fit is possible which might be necessary if there is local minima as compared to global minimum.
#codingexercise
Given an array and an integer k, find the maximum for each and every contiguous subarray of size k.
List<int> GetMaxInSubArrayOfSizeK(List<int> A, int k)
{
var ret = new List<int>();
var q = new Deque<int>();
for (int i = 0; i < k; i++)
{
while ( (q.IsEmpty() == false) && A[i] >= A[q.Last()])
q.PopLast();

q.AddLast(i);
}

for (int i = k ; i < A.Count; i++)
{
ret.Add(A[q.PeekFirst()]);

while ( (q.IsEmpty() == false) && q.PeekFirst() <= i - k)
q.PopFirst();

while ( (q.IsEmpty() == false) && A[i] >= A[q.PeekLast()])
q.PopLast();

q.AddLast(i);
}

if (q.IsEmpty () == false)
ret.Add(A [q.PeekFirst()]);
return ret;
}