Cluster computing

Friday, April 25, 2014

We sometimes need to bound the size of a binomial coefficient. For 1 <= k= < n , we have the lower bound as
(n
k) = n ( n-1) ... (n-k + 1) / k.k-1...1
which we can rearrange to get >= (n/k) ^ n
Here use the inequality k! >= (k / e ) ^ k derived from Stirling's approximation, we obtain the upper bounds
(n
k) = n.(n-1)...(n-k+1) / k. k-1. ... 1 <= (n^k / k!) <= (en/k) ^ k
for all 0 =k <= n , we can use mathematical induction to prove the bound
(n
k) <= n^n / k^k.(n-k)^(n-k)
where for convenience we assume that 0 ^ 0 = 1
i.e for the trivial case we have k = 0, we have
(n
0) <= 1
and for k we assume that it holds, now we look at k+1
(n
k+1) <= n^n/ (k+1)^(k+1).(n-k-1)^(n-k-1) which we rearrange to
write as <= n^n / k^k.(n-k)^(n-k) because the denominator increases with k+1
and it holds for k+1
For k = lambda.n, where 0<=lambda<=1, this bound can be rewritten as
(n
lambda.n) <= n^n / (lambda.n) ^ (lambda.n) . (((1-lambda)n) ^ ((1-lambda)n))

Thursday, April 24, 2014

We look at approximation by integrals today.When a summation is expressed in terms of a function of k where this f(k) is a monitonically increasing function, we can approximate it by integrals because it finds the area under the curves in slices. By using the summation of the rectangles under the curve we can get a pretty good idea on the bounds of the function. Moreover, this applies to both the monotonically increasing and decreasing curves.
In the case of the monotonically increasing function on a graph from left to right, the area of the slice on the left of a chosen slice will be lesser or equal and the slice on the right of the chosen slice will be higher or equal. That is we have a generic three partitions of the ranges and we can show this for any of the repeating three slices.
The integral approximation gives a tight estimate for the nth harmonic number. For a lower bound, we obtain.
Sum of k = 1 to n of (1/k) greater than or equal to Integral of 1 to n slices of (1/x) dx = ln (n + 1) because each slice can be of unit width.
For the upper bound we derive the inequality
Sum of k = 2 to n of (1/k) is less than or equal to Integral of slices (1/x)(dx) = ln (n) again based on unit-width slices
Together this yields the bound of the harmonic series Sum of k = 1 to n of (1/k) <= ln (n) + 1.
We note that the total area under the curve is the the value of the summation. The integral is represented by the shaded area under the curve. By comparing the areas for the lower and upper bound, we see that the rectangles are slightly under the curve in the case of the lower bound and slightly over the curve in the case of the upper bound. We compute the areas as
Sum m-1 to n of f(x)dx <= Sum m to n of f(x)dx and by shifting the rectanges one to the right we establish sum m to n of f(x)dx <= Sum m to n+1 of f(x)dx.

Wednesday, April 23, 2014

We will now consider some other techniques in addition to the previous post. One way to obtain bounds on a difficult summation is to express the series as the sum of two or more series by partitioning the range of the index and then to bound each of the resulting series. For example, suppose we try to find a lower bound on the arithmetic series we had seen earlier. If the number of terms were even, this could comprise of two partitions. The lower bound has to be greater than or equal to that obtained from the upper half. i.e. the lower half can now be ignored because the upper half has similar and is greater in value for the term at each of the corresponding positions. In other words, the initial terms of the summation are all constant. The sum of the upper half we can compute from arithmetic series to be equal to (n/2)^2. This gives us the lower bound of Omega(n^2) which is an asymptotically tight bound because the arithmetic sum of those n numbers is big-oh(n^2).
The caveat here is that if we had chosen to bound each term by the smallest term, because that term happens to be 1, we may have got a lower bound of n which would not be near the better bound we found.
This technique tells us that when performing an analysis of an algorithm, we can split the summation and ignore a constant number of initial terms and is generally applicable when each term is independent.
Another way this technique can help us is with infinite series.
For example to find an asymptotic upper bound on Sum k = 0 to infinity of ((k ^ 2) / 2 ^ k ), we observer that the ratio of the consecutive terms is less than or equal to 8/9.
In this case if k > = 3 then the summation can be split into a fixed set of initial terms (in this case upto 2) and the rest ( 3 onwards to infinity) in the other set.
The latter can be said to be less than or equal to 9/8 * Sum k = 0 to infinity of ((8/9) ^ k).
In other words he total of the such a infinite series is a constant.
This technique can also work for more difficult series such as harmonic series.
In the harmonic series of sum k = 1 to n of (1/k), we split the range into lg N pieces and upper bound the contribution of each piece by 1. Each piece consists of the terms starting at 1/(2^i) and going up to but not including 1/(2^(i+1)). This makes each piece to be upper bounded by 1 and the that for the harmonic series to be lg N + 1. With each of the pieces contributing a constant and there being lg N pieces, we get the upper bound as lg N + 1.
We will next look at approximation by integrals.
Integrals are another technique for algorithm analysis

Today also we continue the discussion on the summation properties:
There are many techniques for bounding the summations that describe the running times of algorithms. We will outline a few methods:
1) Mathematical induction :
This is a technique where we first show that a particular condition holds for a trivial case. Then we assume it holds true for nth case and prove that it then holds for n+1 th case. As an example, we show that the arithmetic series sum of k from 1 to n evaluates to 1/2 n (n+1). We can easily verify this for n = 1. Now for the nth case, the assumption holds and we prove that it works for n+1 as follows:
Sum k = 1 to n + 1 of (k) = Sum k = 1 to n of (k) + (n+1)
                                           = 1/2 n (n+1) + (n+1)
                                           = 1/2 (n+1)(n+2)
A caveat with proving bounds by mathematical induction is that the constant hidden by the "big-oh" could grow with n and thus may not be constant.
2) Bounding the terms:
Sometimes a good upperbound in a series can be obtained by bounding each term of the series, and it often suffices to use the largest term to bound the others. For example, a quick upper bound on the Arithmetic series (A.1) is
Sum k = 1 to n of (k) is <= Sum k = 1 to n of (n) = n ^ 2
In general for a series Sum k = 1 to n of ak , if we let a-max = max 1<=k<=n ak,
then sum k = 1 to n (ak) <= n a-max
This technique of bounding each term by the largest term is a weak method when the series can be bounded by a geometric series.
For example given the arithmetic series as above, we can assume ak+1/ak <= r for all k > =0, where 0 < r < 1 is a constant. The sum can be bounded by an infinite decreasing geometric series, since ak <= a0 r^k
Sum k = 0 to n ak <= sum k = 0 to infinity of (a0 r ^k)
                               = a0 Sum k = 0 to infinity of r ^k
                               = a0.(1/1-r)
This is better than bounding with the quick upper bound on the arithmetic series.
There is a caveat here: a common bug in applying this method is to show that the ratio of consecutive terms is less than 1 and then to assume that the summation is bounded by the geometric series because the series could diverge i.e. the r is not less than 1 or it is not constant or the ratio of some or all pairs of consecutive terms could be may become arbitrarily close to 1. This is the case in the harmonic series for example where the r is arbitrarily close to 1.

Tuesday, April 22, 2014

Today we take a short break to discuss summation properties from the book on Algorithms by Cormen et al.
Algorithms are analyzed often on mathematical tools. The following are the methods for evaluating and bounding summations which occur frequently in the analysis of algorithms.
we denote the summation with the symbol SUM,
First, we describe Linearity as
Sum(c.ak + bk) = c Sum(ak) + Sum(bk)
Next, we describe arithmetic series as 1/2n(n+1) = Theta(n^2)
Sum of the squares is defined as n(n+1)(2n+1)/6
Sum of the cubes is defined as (n ^ 2 )(n +1) ^2 / 4
The geometric series is defined as
1 + x ^ 2 + x ^ 3 + ... x ^ n = (x ^ (n+1) - 1) / (x - 1)
When the summation is infinite and |x| < 1, then this sum is 1 / ( 1 - x)
For Harmonic series :
the nth Harmonic number is
Hn = 1 + 1/2 + 1/3 + ... + 1/n = ln n + O(1)
Telescopic series are very useful to find patterns in seemingly different numbers.
Each of the terms is added in exactly once and subtracted out exactly once.
Sum k = 0 to n - 1 of (ak - ak-1) = an - a0
For example telescopic Sum k = 1 to n-1 of 1/(k(k+1)) = 1 - 1/n
Integrating and differentiating series are alse commonly encountered. These are handled in the following manner:
For example, if we differentiate both sides of the infinite geometric series(A.6) and multiplying by x, we get
Sum k = 0 to infinity of k. (x^k) = x / (1-x)^2
for |x| < 1
Additional formulas can be obtained by integrating or differentiating the formulas above.
This makes integrating and differentiating series useful.
Products can be expressed in summations.
For example, if we take the finite product a1.a2....an
we convert a formula with the product to a formula with a summation by using the identity
lg(product(a1..an)) = sum of k = 1 to n (lg ak)

Monday, April 21, 2014

In this post, we summarize the readings on winpcap. First winpcap is a packet capturing framework built for windows that is similar to libpcap but can be considered an improvement The three primary components of a BSD implementation : a BPF filter, a Network tap and a user mode library for applications.
The BPF filter had a pair of swapping buffers. This has been improved on with a ring buffer in the equivalent NPF and fewer context switches. This includes the delayed write capability.The buffers are in kernel mode and are able to access more memory than ever before.
The network tap talks to NDIS and has high performance. It didn't have access to lower level protocols packets but that has changed since. The winpcap stack has a packet module for user mode programmability of kernel mode NPF. The libpcap Interface to users talks almost exclusively to this packet module. Packet filtering can be specified in similar syntax and these are evaluated by the NPF. The packets delivered to the application depends on the user mode buffer made available and as such can support real time operations. Tests to exercise the overall code path and to dump packets to file to measure packet loss validated the design choices.
In our posts we will investigate packet capture input on Windows for Splunk.

We continue with our discussion on WinPCap today. We talked about how the application behaved in the data flow all the way to the user application. We will now talk about the overall summary. In the tests discussed, we have the packet generation process which is able to load the network. The capturing process has an excellent implementation and it outperforms the original BPF /libpcap implementations. The overall performance of the winpcap was evaluated with an end to end flow however the test that dumps all packets to a file is more interesting to the user. The test confirms also that other parts of the OS may have an importance that is far larger than the packet capture components. FreeBSD seems to work poorly than Windows when comparing the packet capture process. In the studies, WinPCap has been used with the standard kernel buffer in the presence of heavy traffic and the size of this buffer can be increased by the application through a simple function, improving noticeably the overall performance of the system. This can improve the performance even more.
The tests overall validate the architectural choices such as the use of circular kernel buffer instead of the original buffering, the delayed write implementation which looks at a few bytes of the packet and copies the entire packet during a single call reducing the number of context switches and lastly, the update-space-during-copy operation in the kernel buffer.
Among the supported platforms during the study, Windows 2000 was found the best one for high-performance network analyzers while FreeBSD did not perform as well. A large size for the kernel buffer does not seem to be able to influence the performance of the capture process.
WinPCap has been proved being an excellent choice for the several applications that are based on high-performance packet capture.