Cluster computing

Wednesday, April 23, 2014

Today also we continue the discussion on the summation properties:
There are many techniques for bounding the summations that describe the running times of algorithms. We will outline a few methods:
1) Mathematical induction :
This is a technique where we first show that a particular condition holds for a trivial case. Then we assume it holds true for nth case and prove that it then holds for n+1 th case. As an example, we show that the arithmetic series sum of k from 1 to n evaluates to 1/2 n (n+1). We can easily verify this for n = 1. Now for the nth case, the assumption holds and we prove that it works for n+1 as follows:
Sum k = 1 to n + 1 of (k) = Sum k = 1 to n of (k) + (n+1)
                                           = 1/2 n (n+1) + (n+1)
                                           = 1/2 (n+1)(n+2)
A caveat with proving bounds by mathematical induction is that the constant hidden by the "big-oh" could grow with n and thus may not be constant.
2) Bounding the terms:
Sometimes a good upperbound in a series can be obtained by bounding each term of the series, and it often suffices to use the largest term to bound the others. For example, a quick upper bound on the Arithmetic series (A.1) is
Sum k = 1 to n of (k) is <= Sum k = 1 to n of (n) = n ^ 2
In general for a series Sum k = 1 to n of ak , if we let a-max = max 1<=k<=n ak,
then sum k = 1 to n (ak) <= n a-max
This technique of bounding each term by the largest term is a weak method when the series can be bounded by a geometric series.
For example given the arithmetic series as above, we can assume ak+1/ak <= r for all k > =0, where 0 < r < 1 is a constant. The sum can be bounded by an infinite decreasing geometric series, since ak <= a0 r^k
Sum k = 0 to n ak <= sum k = 0 to infinity of (a0 r ^k)
                               = a0 Sum k = 0 to infinity of r ^k
                               = a0.(1/1-r)
This is better than bounding with the quick upper bound on the arithmetic series.
There is a caveat here: a common bug in applying this method is to show that the ratio of consecutive terms is less than 1 and then to assume that the summation is bounded by the geometric series because the series could diverge i.e. the r is not less than 1 or it is not constant or the ratio of some or all pairs of consecutive terms could be may become arbitrarily close to 1. This is the case in the harmonic series for example where the r is arbitrarily close to 1.

Tuesday, April 22, 2014

Today we take a short break to discuss summation properties from the book on Algorithms by Cormen et al.
Algorithms are analyzed often on mathematical tools. The following are the methods for evaluating and bounding summations which occur frequently in the analysis of algorithms.
we denote the summation with the symbol SUM,
First, we describe Linearity as
Sum(c.ak + bk) = c Sum(ak) + Sum(bk)
Next, we describe arithmetic series as 1/2n(n+1) = Theta(n^2)
Sum of the squares is defined as n(n+1)(2n+1)/6
Sum of the cubes is defined as (n ^ 2 )(n +1) ^2 / 4
The geometric series is defined as
1 + x ^ 2 + x ^ 3 + ... x ^ n = (x ^ (n+1) - 1) / (x - 1)
When the summation is infinite and |x| < 1, then this sum is 1 / ( 1 - x)
For Harmonic series :
the nth Harmonic number is
Hn = 1 + 1/2 + 1/3 + ... + 1/n = ln n + O(1)
Telescopic series are very useful to find patterns in seemingly different numbers.
Each of the terms is added in exactly once and subtracted out exactly once.
Sum k = 0 to n - 1 of (ak - ak-1) = an - a0
For example telescopic Sum k = 1 to n-1 of 1/(k(k+1)) = 1 - 1/n
Integrating and differentiating series are alse commonly encountered. These are handled in the following manner:
For example, if we differentiate both sides of the infinite geometric series(A.6) and multiplying by x, we get
Sum k = 0 to infinity of k. (x^k) = x / (1-x)^2
for |x| < 1
Additional formulas can be obtained by integrating or differentiating the formulas above.
This makes integrating and differentiating series useful.
Products can be expressed in summations.
For example, if we take the finite product a1.a2....an
we convert a formula with the product to a formula with a summation by using the identity
lg(product(a1..an)) = sum of k = 1 to n (lg ak)

Monday, April 21, 2014

In this post, we summarize the readings on winpcap. First winpcap is a packet capturing framework built for windows that is similar to libpcap but can be considered an improvement The three primary components of a BSD implementation : a BPF filter, a Network tap and a user mode library for applications.
The BPF filter had a pair of swapping buffers. This has been improved on with a ring buffer in the equivalent NPF and fewer context switches. This includes the delayed write capability.The buffers are in kernel mode and are able to access more memory than ever before.
The network tap talks to NDIS and has high performance. It didn't have access to lower level protocols packets but that has changed since. The winpcap stack has a packet module for user mode programmability of kernel mode NPF. The libpcap Interface to users talks almost exclusively to this packet module. Packet filtering can be specified in similar syntax and these are evaluated by the NPF. The packets delivered to the application depends on the user mode buffer made available and as such can support real time operations. Tests to exercise the overall code path and to dump packets to file to measure packet loss validated the design choices.
In our posts we will investigate packet capture input on Windows for Splunk.

We continue with our discussion on WinPCap today. We talked about how the application behaved in the data flow all the way to the user application. We will now talk about the overall summary. In the tests discussed, we have the packet generation process which is able to load the network. The capturing process has an excellent implementation and it outperforms the original BPF /libpcap implementations. The overall performance of the winpcap was evaluated with an end to end flow however the test that dumps all packets to a file is more interesting to the user. The test confirms also that other parts of the OS may have an importance that is far larger than the packet capture components. FreeBSD seems to work poorly than Windows when comparing the packet capture process. In the studies, WinPCap has been used with the standard kernel buffer in the presence of heavy traffic and the size of this buffer can be increased by the application through a simple function, improving noticeably the overall performance of the system. This can improve the performance even more.
The tests overall validate the architectural choices such as the use of circular kernel buffer instead of the original buffering, the delayed write implementation which looks at a few bytes of the packet and copies the entire packet during a single call reducing the number of context switches and lastly, the update-space-during-copy operation in the kernel buffer.
Among the supported platforms during the study, Windows 2000 was found the best one for high-performance network analyzers while FreeBSD did not perform as well. A large size for the kernel buffer does not seem to be able to influence the performance of the capture process.
WinPCap has been proved being an excellent choice for the several applications that are based on high-performance packet capture.

Sunday, April 20, 2014

We will continue to look at winpcap today. The sending process, the network tap and filtering process are all comparable between BPF and NPF but the windows operating system is faster at handling hardware interrupts and in all the operations made by the NIC driver and NDIS code. In order to test the overall flow through the system, an application that calls the packet capture mechanism but discards all the packets was used. This test evaluated the entire WinPCap architecture, including copy process from the interface driver to the kernel buffer and then to the user buffer. There were no filters in this test and all packets received by the tap were delivered to the application. There was no packet loss. The delayed write capability that lets the kernel to wait for a minimum amount of data and to copy a large block of data to user space in a single system call tremendously helped with this test.
Another test aimed to dump all the packets to a file. When the network is overloaded, the systems suffer noticeable losses when the cpu time is not available i.e a new packet arrives when the tap was processing an earlier one. or when there is no space in the kernel buffer. In this test, there was a non-negligible worsening when the whole packet is dumped to file.
An adhoc program was used to test the monitoring capabilities of WinPCap. The test confirmed that the CPU load is considerably low and that the results match the ones earlier. The additional cost of the monitoring code degrades the user level application results since it requires a non-negligible amount of memory.

We will next look at some portability considerations and performance concerns with WinPCap. Porting of libpcap to winpcap was made easier because BPF and NPF have similar interfaces. There are some system calls in Unix that don't have a direct mapping to the winsock library and so these were written with windows dependent code. The porting resides mostly in the WPCap module. This as we discussed uses the packet module methods instead of the NPF. WinPCap has some differences from libPCap because of Windows dependent code. For example, Win32 applications cannot use the select function on NPF device in order to know if there are packets that needs to be read so winpcap implements a new event.
To test the performance, two machines were used one as a sender and another as a receiver. The sender generates the traffic and the receiver captures the packets. The packets were made such that it would generate the maximum amount of packets per second since this is the worst case for a network analyzer. Packet size of 88 bytes showed the maximum number of packets. The minimum packet size on Ethernet is 64 bytes. Ethernet load was full at packet size of 400 bytes.
Tests evaluated the performance of both sending process as well as filtering process. Packets are received by the network tap and checked by the filter. No packets are expected to match the filter so this utilizes the NPF as much as possible. Results show that windows flavors have similar behavior and almost all the packets are received and filtered by NPF.

Saturday, April 19, 2014

We continue to review the WinPCap today. We talked about the statistics mode. Next we talk about the packets injection mode. Both BPF and NPF have write capabilities that allow the user sending raw packets to the network. However, libpcap did not support this, thus BPF was never used for this purpose. Unix applications uses raw sockets instead. Win32 has very limited support for raw sockets. This is where WinPCap provides a standard and consistent set of functions for packet injection. The packet data is copied to the kernel buffer and then sent to the network over the NDIS.
There is a well-known libnet packet assembly library that was used to add a layer for packet construction and injection on top of WinPCap.
We now talk about the packet.dll which is one of the modules that comes with WinPCap and provides a common system independent API to enable programs to capture packets on a variety of windows operating system flavors, architectures, editions and versions. Packet.dll includes several additional functionalities such as low level operations on the network adapters and the dynamic loading of the drivers and some hardware counters.
Note that packet.dll interfaces only with the Network packet filter and does not partiticpate in statistics mode directly off the NDIS.
Besides the NPF and the packet module, the third module that is not OS dependent is the WPcap.dll. This has high level functions such as filter generation and user level buffering, plus advanced features such as statistics and packet injection. This set of API is more granular than the low level APIs exposed by the packet module where there was almost a one to one mapping between the APIs and the kernel mode calls. The higher level functions of WPCap module are more user-friendly and a single call can translate to several packet module calls.
Win32 network architecture relies on NDIS which is the Network Driver Interface Specification which handles the interaction between the NIC drivers and protocol drivers. Internally, NDIS creates an artificial world of its own abstracting the protocol drivers from the complexity of the hardware. This implies that the same protocol stack can now work with a variety of technologies.
While BPF requires NIC device drivers, NPF does not have the same luxury as a BPF driver specification. Instead NPF locates the network tap as protocol driver on top of the NDIS structures. This pseudo device works the same way as virtual private networking protocol drivers. Due to its location, NPF has the restriction that it cannot tap the lower level packets that do not reach the NDIS upper layer as for example point to point protocol including link control protocols, network control protocols including auth and encryption etc. As a comparision, the Packet Socket in a Linux kernel also suffers from a similar problem.
Note that NPF system calls are all blocking in nature. Besides, NDIS is performant in that it doesn't copy the entire packet if no protocol drivers require it. Thus NPF provides a clean and isolated plugin for an efficient packet capture on commercial systems.