Cluster computing

Friday, March 7, 2014

I read an MSDN article written a while back titled Don't let memory allocation failures crash your legacy STL application. Here it is mentioned that application developers who use STL for C++ and Visual C++ compiler 6.0 have a high risk of crash under low memory conditions. This is because the new operator does not respond in a standard manner when it fails. The response could be to throw an exception or return NULL. Additionally, in 6.0 there was no exception thrown but in 7.0 and later exception is thrown. Furthermore, MFC adds its own behavior to this operator but Visual C++ and MFC are different products. 6.0 was not STL compliant. C++ .Net version was more compliant than earlier versions. The exceptions thrown by C++ compiler and MFC are different.
So a project written in 6.0 and using STL has some challenges.
Generally new is not checked. This is because its assumed to never fail or because it is considered to throw an exception. But both assumptions are flawed. Low memory and no exceptions are possible.
The STL implementation at the time of 6.0 for a string copy was to allocate and increment the pointer. It was correct when it relied on the assumption that new would throw when it failed and it correctly caught all exceptions. So handlers were added.
Sometimes its preferred to use the new operator with the nothrow specification. This is also common. However what is not common or expected is that the new operator may still throw even with that specification. This could be because the constructor might fail or because the 6.0 version is built for Release and the release version throws while the debug version doesn't. When this happens the pointer is not NULL. and with the release version the exception is bad allocation.
This is fixed as follows: First we specify that this handler should not throw. Then we wrap the new operator in a try catch and catch bad alloc exception. Next we set the pointer to null in the catch block.
If the new were to be changed to throw an exception as is mandatory with the STL, then the fix would be to wrap the std:: nothrow. This may work for code with which these handlers are compiled. But exiting third party libraries that use std::nothrow now pose a challenge. In this case, either the codepath using that nothrow specification is to be modified or avoided or recompiled with the static library.
One more issue in this context is that the STL itself may use this form of nothrow allocators and so they may need to be avoided or STL implementations that don't use nothrow can be used instead.

Thursday, March 6, 2014

In today's post we will look at some more challenges with algorithms for numerical computations.
The second type of problems is round off errors. Here we use bounded representations. i.e e use the following format to specify values : sign, mantissa, an implicit base, an exponent. We use this to represent 6.2 x 10 ^ 23 as + . 602 (10) 24
However this does not represent numbers that don't have a finite number of digits such as pi which will then be truncated.
Binary representations are p bits where base is 2
More precision implies more accuracy but it also implies slower computations.
Single digit precision is 6/7 digits after, double digits precision is 13/14 digits after and extended precision is 19/20 digits after.
This rounding off error has to be measured.
We do it this way.
If we denote the actual value of a numeric variable as alpha-* and its representation as alpha, then the error is represented by the magnitude of (actual - representation)
There are other errors to representation such as
1. overflow This means that value is too large to represent.
We handle this by reordering operators. For example a large number 1 multiplied by large number 2 divided by a large number 3 could be calculated by first doing the division and then multiplication.
Another approach is to use alternate representations where we pre-compute or use notations such as to avoid doing 100!/(2!98!) and instead use 100.99/2
The third way to solve overflow is to use logarithms since they reduce the numbers considerably.
The second problem is underflow.
This manifests itself when values are too close to zero and computation proceeds with the value zero, instead it should be marked with a flag to indicate underflow. This is different from overflow in that computation could proceed as if the value was zero.
The third problem is that arithmetic operations are not exact.
i.e if we take the value of pi and remove the seventh place number which happens to be the digit 6, we expect the difference between old and new values to be 6 * 10 ^ (-7). but this is taken as zero because the relative error is large.
The errors cascade too in the numerical computations. For example, early errors are propagated to later stages. Numbers in a computation may change slightly and this could result in a completely different answer.

Wednesday, March 5, 2014

We look at the challenges of numerical algorithms i.e we show that there are limitations of algorithmic power.

Numerical algorithms are those for computations of problems in a continuous mathematical world.
Therefore the first obstacle is that we have finite approximations of what is really infinite objects.
This results in truncation erros.
The second class of errors we get from round off errors. Real numbers cannot be used accurately because we cannot represent the real answers so we put them in a range.
To overcome the first limitation, we use what is called the Taylor's expansion.

e ^ x = 1 + x + x ^ 2 / 2! + x ^ 3 / 3! + ... x ^ n / n! + ...

we stop the expansion at a definite n. Anything beyond that fixed n is the truncation error.

The second class is evident from when we take a graph of the curve and want to compute the area under it. In this case, we slice up the area into geometrical shapes, trapezoidals to be precise and this is called the trapezoidal rule by which we approximately compute the area under the curve.

One way to get an upper bound on the truncations error is
to use the equation:
e ^ x - ( 1 + x + x ^ 2/ 2! + ... x ^n / n! ) <= M/(n+1)! |x| ^ n+1 where M = max e ^ y for 0 <y <x
i.e by choosing an upper bound M for any y that can be between 0 and x, we are able to establish an upper bound on the truncation error.

We can take an example to show how we come up with M
Let's say we target e ^ 0.5 with an error of 0.0001
Since 0< y < x and e ^ y is increasing , this has to be less than e ^ 0.5. Since e is between 2 and 3, we can take value of M as 4 ^ 0.5 = 2
Now the truncation error is M / (n + 1)! |x| ^ n + 1 and this is determined to be upper bounded by ( as per our chosen values) as 2 / (n + 1) ! | 0.5 | ^ n + 1 which should be less than 0.0001 and this lets us find out n. In this case, we may find n = 5.

We can similarly write a formula to find the upper bound on the error of the trapezoidal rule. Here the problem is defined as how thin and how many trapezoidals can be put between lower and upper limit of the curve.

In Today's post we look at the limitations of algorithm power specifically the lower bound arguments.
If a problem P takes time t(n) then there is a time t under which no algorithm can solve P.
Knowing the lower bound is advantageous. For example, we know that sorting algorithms can take O(n^2) in some cases and O(nlogn) in other cases. Searching a sorted array with brute force takes O(n) and with binary search takes O(log n) and thus an improvement. Minimum spanning tree on a connected graph is done with Prim's algorithm in O(mlogn) and single source shortest path is done by Djikstra's algorithm in O(mlogn). How far can we go in each of these cases ? This is perhaps answered by a lower bound.
Some problems can be really hard to solve. For example factoring a number into its prime numbers is hard to solve. Others require a lot of processing. For example, The RSA cryptographic scheme as used by say a bank to secure its clients messages generates two large prime numbers p and q and takes its product p.q as n. Then it creates a decryption encryption pair de = 1mod(p-1)(q-1) where (e,n) is taken as the public key and d is taken as the private key. When a message is sent by a client, he signs the message with m^e mod n. When the bank receives the message, it decrypts it by applying ((m^e)^d) mod n which is equal to m. Since factoring a number n into p.q is hard, this is a good security scheme.
A lower bound can be trivially established by enumerating the cases that need to be generated. This is the case for generating permutations of n elements since it will be n! Similarly a search in an unsorted array will have to cover the n elements and that O(n) becomes the lower bound.
Another way to establish the lower bound is the information theoretic. This is the case in a binary search where we know the number of questions that need to be answered before a problem can be solved.
Another way to establish a lower bound is by using an adversary arguments. In this case we don't show the full input and an adversary answers the questions an algorithm asks on behalf of the input.
as an example, let us take the problem of merging two sorted lists. In this case an algorithm is forced to ask 2n-1 questions on the input to come up with the merged list. That is the first element of the first list and the first element of the second list are compared and the second element of the first list and the first element of the second list are compared. i.e the way in which these comparisions are made are finite and pre-determined. If an algorithm asked a different set of questions and produced another output that was valid that would mean there are two different algorithms that produce two different valid outputs which should not happen for the same given input. Thus this is another way of establishing a lower bound.
The final way of establishing a lower bound is by a class of techniques called Reduction. Here if P reduces to Q in constant time, then any lower bound for P will hold for Q as well.
If Q were solvable any faster then that means P could be solved faster too which is a contradiction to the assumption we started with. Hence reduction can also be used to find the lower bound.
Courtesy: @mecr.org

Tuesday, March 4, 2014

In today's post we discuss RabinKarp algorithm for string matching based on mecr lectures. We match a string pattern P of length m in a target string T of length n. The pattern matches O(mn) times when we go character by character in the target string. This can be improved with two good ideas:
one is that on a mismatch we try to skip ahead and we can skip upto m characters. So we have O(m) extra space and O(m + n) worst case time. This is still an improvement.
second is that we do approximate matches where if there is a mismatch, we know for sure but if there is a match, it may or may not be so. This way we can skip ahead and definitive mis-matches.
We implement the approximation for instance with a hash match. We take strings at index 0 to n - m and hash m characters to see if the hash matches that of the pattern.
If a portion of the string matches, then their portion of hash should match. But we look at the opposite way here. If the hash of the match is different then we know that the string is not the same as the pattern and we can skip ahead. So while we do a little bit more work in the hashing and consecutively our worst case is more than the first approach but in most cases our hash function will work very well. We just have a hash function that matches our needs which are:
1) we should pre-compute a hash of the pattern
2) we compute hash of string at index i to i+m-1for different i
3) the hash function should be such that
h(T[i, i+m-1]) = g(T[i]) "+" g(T[i+1]) "+" .... g(T[i+m-1])
where + allows us to compute
h(T[i+1, i+m]) = g(T[i+1]) + g(T[i +2)) + .... g(T[i+m])
with these properties it takes a constant time to compute
h(T[i+1, i+m]) from h(T[i, i+m-1])
In addition if this hash function has very few collisions, then we have a fast running time of O(n+m)
As an example, we can take a hash function h(string.letter) as 4h(string) + h(letter)
if a string is made up of DNA letters ACGT
h(ACT) = 4h(AC) + h(T) and so on.

Monday, March 3, 2014

Today I looked at a few course materials available on mecr.org and these are available for reading. One of the topics I looked at was evaluating complexity in terms of big O, Omega and Theta. I want to briefly mention them here.

If we take a few methods f and g where f is O(g(n)) we can show there are non-unique values of coefficient c and n that satisfy this i.e c > 1 n > 1 and there are different forms of polynomials that can satisfy the relationship for f and g.
Typically in calculating complexity, usually only the higher term matters. This becomes important in polynomials as it simplifies the discussion on complexity.
While O is applicable to a single algorithm. Omega is applicable to a class of algorithms. For example, we say that Omega for sorting algorithms is nlogn. This we see from different sorting algorithms having a complexity of O(nlogn).
Theta on the other hand is used for a range. For example if f(n) lies between c1.g(n) and c2.g(n) i.e
c1.g(n) < f(n) < c2.g(n) then we say f(n) is theta for g(n) for all n0, c1, c2 such that n > n0.
These two functions are generally of the same order.
big O, Omega, Theta are all related.
For example, f(n) is O(g(n)) and g(n) is Omega(f(n)). Then we can take two coefficients c and 1/c for which we can say f(n) <= c.g(n) and 1/c .f(n) <= g(n)
and consecutively we can say f(n) is Theta(g(n). In other words, f(n) is O(g(n)) and also f(n) is Omega(g(n)).
There is also an interesting observation that
if f1 is O(g1(n))
and f2 is O(g2(n))
the f1 + f2 is O( max(g1, g2))
This is true even if we have more than two such functions. Its easier to visualize these as applicable to a sequence of code blocks in a program where the overall program complexity is the max of any one of the blocks of code.
Also, we can show that with a choice of coefficients for the functions above, where say n3 is max of n1, n2 and c3 is max of c1, c2, then f1 + f2 = c3g1(n) + c3g2(n) = 2.c3.(max (g1,g2)) and that leads to
f1 + f2 is O(max(g1, g2))

Sunday, March 2, 2014

In today's post I want to cover a topic on writing fast low level logs. There are many logging providers and I've covered at different levels earlier. Some examples included log4net and log4cpp. Kernel mode tracing is often considered as an example of low level logging. Database systems also have their own system of logging.
There are a few salient features of most logging methods.
First is they enable formatting to the text that needs to be logged. These are variable length arguments and allow the different specifiers to be enabled
Second is they enable different levels of logging. By this we mean we control how much or how little of the logs to be written. For the components that provide the log strings, they could continue to log at different levels and the end user could decide how much or how little of the logs to be seen. This is a switch independent of all the components and each log provider is expected to honor these levels.
Components are expected to separate their logging with labels or strings that they identify by initializing or teardown.
Another feature is that these logging are based on file appenders and the logs are written at the end of the file.
Aside from the debug levels, there may be other metadata for the log.
But if we take the implementation of a logger, we neither use an event loop or a overlapped IO.
In principle, both could be used. However, neither are.
A logger has many clients that send messages. The messages are serialized onto the end of the file.
The offset of the file is incremented in a thread safe manner as opposed to locking the entire data.
or putting the entire log entry addition code within critical section. The process of writing the log is that of filling out buffers which are then flushed to disks. Typically this buffer is left to file system instead of maintaining a separate one. Some applications choose to implement a round robin circular linked list of buffers which are then flushed to disk.
No matter what the buffering is there can be many producers and one consumer and serialized queue. That is why this sounds similar to an overlapped IO.