Cluster computing

Monday, March 10, 2014

Today we take a break to look at No queens problem. This problem talks about placing N queens in a NxN board so that no queen can attack each other. Before we look at the solution, let's first see that when N = 2 and when N = 3 there are no solutions. And now let us take a NxN chessboard here we find there us a solution possible. So how do we arrive at that solution ? Let's say we placed one queen in one row. That queen can attack any other horizontally, vertically and diagonally. So one queen can occupy one row and there are N positions to choose from so we have nxn choose N positions that is quite large for large N. We could improve that by putting one queen on one row only and then we have N power N way to choose from. This is much smaller than the previous way
Next we look at a recursive method where we place a queen on the board and eliminate the cells where other queen can be placed Then we use this to place the other queens in a depth first manner . This strategy is called branch and bound. We set up a while loop where we eliminate the previous cells and place the next queen. In each recursion if we could pick the sub problem that is most promising and also eliminate the sub problem that doesn't have a solution, we will have a faster path to the solution.

Sunday, March 9, 2014

Splunk modular input for ELMAH.

For many web developers particularly those who work with the Microsoft.Net stack, ELMAH is a well known logging module and handler for ASP.Net. What distinguishes it from the Apache log4net module is that it is "completely pluggable and can be dynamically added to a running ASP.net application without any change to its code" i.e there are no log write instructions added or removed to the existing binary of a running application.When ELMAH is dropped into a running application, it can log nearly all unhandled exceptions, provide a web page to remotely view the entire log of recorded exceptions or the full details of a single exception and to write to different back-end providers or to send out e-mail notifications.

While we discuss ways to send the output of ELMAH to splunk via a connector, we discuss multiple ideas such as
1) enable Splunk to be a first class back-end provider for loggers such as ELMAH, enterprise library and log4net.
2) write a http handler for Splunk on Windows that can transparently send web messages to Splunk
3) write a Splunk NuGet package that can provide the Splunk SDK to .Net developers.
and
4) write the Splunk connector to directly read the static and dynamic content of different web pages that a web application registers.

We will explore these options in upcoming posts and will discuss ELMAH for now. When a request arrives at an IIS web server, IIS examines the extension of the request to see how to proceed. In addition to processing the extension, IIS provides an ability to write filters that can handle various events raised by IIS such as the event when a request comes in, when a request is authenticated and when a content is rendered back. When the resource requested in an ASP.Net resource, it is forwarded to the ASP.Net engine. The ASP.Net engine behaves similar to IIS in that it knows how to respond to the request and it raises events. While the IIS uses unmanaged code, the ASP.Net engine uses managed classes called handlers or modules.
An HTTP module is one that can tap into various events raised as the request passes through the http pipeline. One such event is the Error event and this is what ELMAH is registered to listen to. Many modules can subscribe to many different events. An http handler returns the markup for the requested resource. The ASP.Net engine passes the rendered content to the IIS which forwards it to the client.

Since application servers and database servers are two common production deployments, writing a handler for Splunk like ELMAH will make it easier to collect data instead of configuring each source for input.

We look at establishing lower bounds via decision trees. In particular we look at comparison algorithms. These essentially restrict algorithms to what can be compared based on input. For example, given an array of integers, if we change the input how does the algorithm behave.
In the case of sorting we can show that algorithms will asymptotically reach O(nlogn) and in the case of searching where we look for an element in a sorted array we can show that algorithms will asymptotically reach O(log n).
Now we look at decision trees. In decision trees we ask questions that have a yes or no value. In comparison algorithms we don't modify the input. we compare the data and the algorithms output a result based on comparisons and the input doesn't matter. That is the relative ordering matters but not the values. For example, [5,7,2] will be treated the same as [6,9,1]. Outputs can be seen as a permutation of [1..n]
For example, if we look at Selection sort, we compare based on the values of the elements of the array and when we swap we swap only in a local copy of the array and not the original input.
For an input a1, a2, and a3 with n=3, a selection sort can be seen as a decision tree with the following comparisons
Level 0: a1 < a2
Level 1: a1 < a3 and a2 < a3
Level 2: a2 < a3 and a2 < a1 and a1 < a3 and a2 < a1
yielding possible outcomes as
a1a2a3, a1a3a2 a3a1a2 a2a1a3 a2a3a1 a3a2a1
In general we can say that for a fixed input size n elements of an array, there are f(n) possible outputs
where we the height of the decision tree for comparisons is > = log(f(n))
The lower bound for sorting in the comparison model is based on the number of outputs. For n elements, the number of outputs is n! which are the number of leaves. # leaves >= n! and this is f(n). For any algorithm A for sorting int the comparison model, it must make log(n!) comparison. We know that n! > (n /e ) ^ m
log n! = Omega(nlogn)
The worst case for comparisons is lower bounded by Omega(nlogn)
The theorem is that any comparison based sorting algorithm requires omega(nlogn) comparisons in the worst case. Additionally, we find that the average case for complexity of such sorting algorithm is Omega(nlogn)

Saturday, March 8, 2014

In today's post we revisit P, NP and NP completeness in algorithms and with examples.
There is a very celebrated open problem which is if P = NP and this has not been answered.
we find polynomial times solution for searching sorting, minimum spanning tree etc.
P is the class of all problems that can be solved by some algorithm that takes polynomial time.
There are certain problems that are not decidable. But there are a set of problems that are decidable that may or may not be polynomial time. For example, the problem of deciding whether a context free language is a regular language is decidable because it can be shown that a regular language can be expressed in context-free language. Therefore we have seen so far that the space of decidable algorithms includes the class of problems P and is a superset. Within this decidable space is a subset NP which is a class of all problems that can be solved by some non-deterministic algorithm that takes polynomial time. In other words NP is a superset including P but includes problems that can be solved non-deterministically. Non-deterministic means unintuitive notion.
Deterministic choice is when we use some conditional branching like if ( I = 5 ) etc ..Non-deterministic choice is when we guess the value and make the choice for eg ( if (*) then etc. Programming languages are usually deterministic because they will look at all available data points before making a choice. A non-deterministic problem will magically pick up the solution because it does not try to exhaust the search space and is much faster.
There are actually large class of very interesting problems in various domains that have actually a non-deterministic algorithm that we don't yet have a deterministic algorithm.
we know that P is a subset of NP but we don't know if P = NP. Most researchers tend to agree that the equality is not really there.
Further there is a sub class of problems in NP that is called NP complete (NP-c) that is the set of the hardest problems. They are hardest because if you solve one of them, you can show that an entire class of problems can be solved similarly.
One interesting observation is that : If A is NP-c, then A belongs to P iff P = NP.
These are again from various domains.
In Practice, if you cannot find a polynomial time algorithm, then you can claim that the problem is NP-complete i.e. solving A fast in P is unlikely and it can be claimed to be a problem that cannot be addressed in this paygrade and is higher than it.

Friday, March 7, 2014

In today's post we look at some more difficulties with numerical algorithms and their computations.
We know the roots of a quadratic with well known formulas.
The first problem is how do we compute the square root. There is a very neat algorithm by Newton which is iterative and converges pretty rapidly,
x0 = (1 + D ) / 2 and x n+1 = (xn + D / xn ) / 2
where D > 1 . In 4 iterations, error is within 10 ^ - 15.
Stopping when we have reasonable values.
For eg: when we have sqrt(2)
we compute x0 = ( 1 + 2 ) / 2 = 1.5
x1 = (1.5 + 2 / 1.5 ) / 2 = 1.416667
x2 = ( x1 + 2 / x1 ) / 2 = 1.414216
and in 4th iteration, there is litttle or no change.
The roots of a quadratic as given by (-b + - Sqrt( b ^ 2 - 4.a.c ) ) / 2 . a
Here another problem is given by the fact that the numerator has a negative component and there is a cancellation possible. This is the substractive cancellation in the x2 numerator.
Take for example, x ^2 - 10 ^ 5 x + 1 = 0
x1 * = 999999.999990
x2 * = 0.000000000001

In this case, we work around it by rewriting the formula as the same multiplied by ( -b - sqrt( b ^2 - 4.a. c) / ( -b - sqrt( b ^2 - 4.a. c) where the numerators on multiplication gets multiplied and squared and the denominator becomess both components negative and we have eliminated substractive cancelation in favor of addition.

If b < 0 we take x1 as before and x2 as 2c/ ( -b + sqrt( b ^ 2 - 4.a.c))
o summarize, numerical algorithms may suffer from
Truncation errors ( we use Finite Approach)
Round off erors ( we use Bounded Representation )
Overflow, underflow,
substractive cancellation,
ill conditional problems
Therefore they require careful analysis of errors.

I read an MSDN article written a while back titled Don't let memory allocation failures crash your legacy STL application. Here it is mentioned that application developers who use STL for C++ and Visual C++ compiler 6.0 have a high risk of crash under low memory conditions. This is because the new operator does not respond in a standard manner when it fails. The response could be to throw an exception or return NULL. Additionally, in 6.0 there was no exception thrown but in 7.0 and later exception is thrown. Furthermore, MFC adds its own behavior to this operator but Visual C++ and MFC are different products. 6.0 was not STL compliant. C++ .Net version was more compliant than earlier versions. The exceptions thrown by C++ compiler and MFC are different.
So a project written in 6.0 and using STL has some challenges.
Generally new is not checked. This is because its assumed to never fail or because it is considered to throw an exception. But both assumptions are flawed. Low memory and no exceptions are possible.
The STL implementation at the time of 6.0 for a string copy was to allocate and increment the pointer. It was correct when it relied on the assumption that new would throw when it failed and it correctly caught all exceptions. So handlers were added.
Sometimes its preferred to use the new operator with the nothrow specification. This is also common. However what is not common or expected is that the new operator may still throw even with that specification. This could be because the constructor might fail or because the 6.0 version is built for Release and the release version throws while the debug version doesn't. When this happens the pointer is not NULL. and with the release version the exception is bad allocation.
This is fixed as follows: First we specify that this handler should not throw. Then we wrap the new operator in a try catch and catch bad alloc exception. Next we set the pointer to null in the catch block.
If the new were to be changed to throw an exception as is mandatory with the STL, then the fix would be to wrap the std:: nothrow. This may work for code with which these handlers are compiled. But exiting third party libraries that use std::nothrow now pose a challenge. In this case, either the codepath using that nothrow specification is to be modified or avoided or recompiled with the static library.
One more issue in this context is that the STL itself may use this form of nothrow allocators and so they may need to be avoided or STL implementations that don't use nothrow can be used instead.

Thursday, March 6, 2014

In today's post we will look at some more challenges with algorithms for numerical computations.
The second type of problems is round off errors. Here we use bounded representations. i.e e use the following format to specify values : sign, mantissa, an implicit base, an exponent. We use this to represent 6.2 x 10 ^ 23 as + . 602 (10) 24
However this does not represent numbers that don't have a finite number of digits such as pi which will then be truncated.
Binary representations are p bits where base is 2
More precision implies more accuracy but it also implies slower computations.
Single digit precision is 6/7 digits after, double digits precision is 13/14 digits after and extended precision is 19/20 digits after.
This rounding off error has to be measured.
We do it this way.
If we denote the actual value of a numeric variable as alpha-* and its representation as alpha, then the error is represented by the magnitude of (actual - representation)
There are other errors to representation such as
1. overflow This means that value is too large to represent.
We handle this by reordering operators. For example a large number 1 multiplied by large number 2 divided by a large number 3 could be calculated by first doing the division and then multiplication.
Another approach is to use alternate representations where we pre-compute or use notations such as to avoid doing 100!/(2!98!) and instead use 100.99/2
The third way to solve overflow is to use logarithms since they reduce the numbers considerably.
The second problem is underflow.
This manifests itself when values are too close to zero and computation proceeds with the value zero, instead it should be marked with a flag to indicate underflow. This is different from overflow in that computation could proceed as if the value was zero.
The third problem is that arithmetic operations are not exact.
i.e if we take the value of pi and remove the seventh place number which happens to be the digit 6, we expect the difference between old and new values to be 6 * 10 ^ (-7). but this is taken as zero because the relative error is large.
The errors cascade too in the numerical computations. For example, early errors are propagated to later stages. Numbers in a computation may change slightly and this could result in a completely different answer.