Cluster computing: April 2014

Wednesday, April 30, 2014

Today I'm going to elaborate on a noninvasive method of reading MSMQ to trigger alerts. Let us say we keep a round robin of buffers to read from different queues - one round robin per queue or one round robin per rule/application. We will discuss more about the consumers of the round robin buffers shortly. But first we mention what they are. Basically we are reading the queues just as TCP maintains a sliding window. The queues are read in the order of the messages being processed. As each message arrives, it is evaluated against the rules to invoke the corresponding action. The same could have been done by the queue processor. The only difference is that this is now handled externally to the queue processor. The user for the queue alerts module could directly subscribe to the events and provide the delegates as necessary. There is no need for a singleton delegate in such a case. However, the queue alerts module facilitates the subscription to the events by letting the user focus exclusively on the rules. In addition, the queue alerts module provides more functionalities. For example, the queue alerts module translates all the rules registered to filter out the messages efficiently. Secondly, the queue alerts module manages the lifetime of the messages and the actions performed on it. Thirdly, the queue alerts module makes the packets available in a streaming mode to the applications.
The rules to events mapping is avoided by having the rules evaluate against each of the message. This means that all the rules registered by a single application are evaluated at once on the arrival of a message. If the rules evaluate positively, the message is copied to the buffer for the application to read. The messages are copied only as fast as the applications are reading. The messages are not copied to the applications if they are smaller than a size. It is better to provide a delayed write in such a case and this can be configurable. If the application provides a small buffer, the messages can be copied more often as if real-time. There can also be a timeout value specified which handles the case when messages are not available. The flow of data is unidirectional from the source to the application. The queue alerts module focuses on the buffers and the operations. If the buffers are per queue then it can handle bursts in traffic. As each message is pulled from the buffers, it is evaluated and acted upon either by the alerts module or by the application.
In both invasive (queue processor calls delegates) and non-invasive mode ( queue alerts module calls delegates ), the queue processor raises alerts. Additionally, the queue alerts module may mix and match delegates from different applications to each of the queues. As applications update the delegates or the conditions, the queue alerts module reassigns the delegates across the queue buffers. Otherwise it has to evaluate the conditions for the delegate from all applications for every message.

Tuesday, April 29, 2014

Today we will refine the solution to the scalable queue alert design. If we take the approach that we want to subscribe to the events generated by the queue processor, then we must connect the events to the delegate. The event handling mechanism works by subscribing to the events with the Observer design pattern. The observers to the events provide a callback called Notify() and the subject for the observers has a method call NotifyObservers() that calls different notify on all the observers. Delegates are these callbacks. When the queue processor finds an event to raise at any point during its execution, the subscribers to the event know that the state changed because the raise method notifies all the subscribers registered.This is a behavior or interface that the events implement.
The events are not hierarchical. They are discrete. The events are encapsulated in the queues so that they are raised when the state of the queue or the message changes.
The delegates at this level of handling can then invoke the user level delegates that are registered for certain events. Events are generic but the queues that they belong to are specific. When the user specifies a filter, it may apply to one or more queues. Delegates may need to be added to all these queues. If the mapping between delegates and queues may not be clear from the filter such as when the filter is based on a message attribute, it is added to all the queues and the delegates then decide based on the messages whether to take any action. In other words, the user level delegates may subscribe to as many or all events and then take the appropriate action given the queue and the message and the state. This means there is a single user level delegate that can take different actions based on different rules. In such a delegate, there would be several successive conditional checks involved.
We say that the rules are encapsulated in a single user level delegate, this is wired to all the events raised. When the event is raised we have the queue information, the message that it was acting on and the state such as arrival, process begin, process complete, depart etc.
In the queue alerts module, if we take the approach that we select the messages and the queues and store the individual rules to map against, we have a different data structure altogether. Here we get all the messages in a delta sweep that are of interest to the rule evaluation and their corresponding actions, So we store a copy of the messages and queues outside of the queue processor. The mapping between different sets of messages for different rules is the purpose of this data structure. As such we could use different lists for each of the rules.

The design criteria for the queue alert module mentioned in the previous post include :

Support for retries by the queue processor: The design of a queue alert module must consider the retries by the queue processor for the same message. All exception paths including dead letter queues should be considered for the same treatment.

Non-invasive: When possible, we should consider a non-invasive approach that doesn’t require instrumentation of the queues. In other words, it can work for any version of the queues and doesn’t affect the queue processing. It could work by sniffing the data changes or the logs.

Polling: Any polling approach must be robust and rely on relieving high CPU usages during its processing.

Support for transactional as well as non-transactional messages: The alerts module must work for both kinds of messages so that the user can specify the criteria and not be limited to only a set of messages. Concurrent processing of both messages must be supported.

Support for distributed transactions: When transactions involve messages across queues, this alert module should enable evaluating those messages as part of the transaction or at least log the transaction and the actions taken with the transactions so that the final state can be determined by the alerts module.

Support for clusters: The queues may not all be local to a machine and could be distributed on different nodes in a cluster or they may all be in a failover cluster. Alert module should target the messages and the queues and even the machines.

Scoping of alerts: Alerts need not be registered at the message level. They could be registered at the queue level or at the machine level. Whatever the hierarchy chosen would take care of all the alerts at the inner scope by the outer scope. This means that the CRUD on the alerts at a queue scope automatically performs the same at the message scope.

Changes to the rules or registration of alerts: Alerts registered with the alerts module will not take effect until the system reconfigures. This enables the changes to the alerts to be picked up for processing by the module and gives time for setup and cleanup operations by the module.

Deployment: The alerts module should come in a standalone service or executable so that it can be an add-on to existing queue processing. The module itself could be deployable by copying or via an installer.

Automatic administration of rules, actions, messages and queues could be performed where possible.

The use of message format: When interacting with the queues to read the messages, the alerts module will evaluate the fields of the messages against the criteria specified in the rules by the user. The message format should not be opaque and as in the case of MSMQ should expose known fields for evaluation against the rules.

Control of concurrency: The alerts module could make the invocation of actions registered with the rules as concurrent so that the evaluation of an action for a message does not block other.

Full-text or key-value search over message body: The expressions to search over the text of the messages could be resource intensive and optionally enabled. Rules to perform such search could be outside the alerts mechanism and done with the help of an indexer. As such this may not be in scope for the alerts module.

Text messages versus binary messages: The alerts module should support both formats. The module should rely on the fields versus the contents. Subsequent processing of say JSON vs. XML text could be offloaded to other systems.

Asynchronous communication mechanism: This could be enabled between producers and consumers so that they don’t block each other.

Performance: Volumes of hundred thousand transactions per submission that reach millions of transactions per day and involve several messages across different queues should be targeted. Working on a set of few messages or queues or rules or alerts at a time could enable this.

Monday, April 28, 2014

Today we will look at scalable queue alert design I describe a method to evaluate multiple queues for processing. Queues can have multiple messages. Messages can have different attributes. User would like to author rules for actions on queues based on attributes of both queues and messages. If a message arrives in a queue, it is evaluated against all the rules authored by the users for the corresponding action to be taken. Rules comprise of conditions and actions. Conditions are expressions based on attributes and logical operators. Action can be any one of predetermined actions such as running a script or logging. The rules are specified in a user defined function. This helps the user to manage the rules. The rules are all evaluated against each message of each queue. This means that the attributes have to be deterministic, with no side effects and easy to lookup.
When we scale out the queues, we are going to evaluate these rules based on each of the queues. When we process the messages we may do one after the other across queues. This means the user defined rules can work the same across queues and messages.
The rules evaluation for any message in any queue will evaluate to one or more of the actions. The default action is a no-op which is not specified explicitly. No-op here in this case means that no additional actions will be triggered other than the default message processing by the queue. The alerting mechanism is independent of the queue processing and is checked right after the message is processed. This could be done right before the message processing but its only when the message is processed do we know that the current message has been handled.
The queue alert mechanism can live outside the queue processing service. This implies that the queue alert mechanism can be used for journaling in a non-invasive manner. The action corresponding to the queue processing could be to log the messages.
Another use of the queue alert mechanism is to enable different actions to be specified for these messages. For example, action could be to launch a script for selected messages instead of all messages. Scripts could trigger additional workflows
Trigger mechanism needs to be up-to-date with the queues. If the queues are added or deleted, then the rules may need to be re-defined. Evaluation of stale rules should default to no-op. This ensures execution of the messages.

Sunday, April 27, 2014

Today we look at some more usages of random variables. We mentioned so far that random variables can be combined. We know that random variables can be independent and for the different values that it can take, the average value gives a good indication of the summary. Let us take an interesting application of this technique in a hiring problem. Let us say you wanted to hire an office assistant. We can use indicator random variables with this. Let us first describe the problem. When you hire an office assistant, you may have to interview some candidates. You want to hire a suitable candidate but to actually hire somebody you have more costs. You have to fire the current candidate and you must pay a large hiring fee to the employment agency that is sending the candidates. You are interested in estimating this price This is the hiring problem.
This is written as follows:
int Hire-Assistant(int[] candidates)
{
int best = -1; // least - qualified dummy candidate
foreach (var candidate in candidates)
{
if (interview(candidate) > best )
    {
   best = candidate;
    }
}
hire(best);
return best;
}
We now use probability to analyze this problem. In order to do that, we must use the assumptions about the distribution of inputs. Then we analyze our algorithm and compute an expected run-time.
Since we take the distribution over the inputs, we are averaging the running time over all possible inputs.
We use probabilistic analysis when we can make assumptions about the distribution of inputs i.e. we can assume something about the set of all possible inputs and for designing an efficient algorithm and as a means for gaining insight into the hiring problem. For the hiring problem, we can assume that the candidates are coming in random order. This means we can compare any two candidates in any order and decide who's best.In fact, we can use this fact to establish a distinct ranking of the candidates.
An indicator random variable associated with an event A is defined as 1 if the event occurs and 0 otherwise.
Let us determine the expected number of successes for the interviews. Our sample space is S = {S, F} and we define a random variable which can take one of the two values of Success or Failure with equal probability. We can then define an indicator random variable which we can express as the event Y = Success. The expected number of successes obtained in one interview is simply the expected value of our indicator variable.

Saturday, April 26, 2014

In today's post we discuss discrete random variables from the textbook we have been referring. A random variable X is a function from a finite or countably infinite sample space S to the real numbers. It associates a real number with each possible outcome of an experiment, which allows us to work on probability distribution induced on the resulting set of numbers. These variables can also be defined for uncountably infinite sample spaces but we will only look at random variables that are discrete.
For a random variable X and a real number x, the event X = x to be such that {s belongs to S : X(s) = x } thus Pr[ X = x] = Sum Pr[s]
The function f(x) = Pr[X = x] is the probability density function of the random variable X
Per the definitions of probabilities we know that Pr[X = x] >= 0
and that the sum of the individual probabilities is equal to 1.
If we take the example of a pair of dice with six possible outcomes each and we define a random variable X to be the maximum of the two values showing on the dice, then we have
Pr[X = 3] = 5/ 36
because there are 36 possible outcomes when we take the values in pairs
and the value that X assigns is 3 since
it has 5 possible outcomes (1,3), (2,3), (3,3), (3,2), (3,1)
It is common for several random variables to be defined on the same sample space.
If there are two random variables defined on the same sample space, say X and Y
then their co-occurrence has a probability distribution function that is
Pr [ X = x and Y = y] which is the joint probability distribution.
If we fix one of the values, we can vary the other and this can be summed.
For a fixed value y, Pr[Y = y] = Sum of all x Pr[X=x and Y = y]
The same goes for a fixed value of x, where we can vary y.
We can extend this to conditional probabilities as well. For example,
Pr[X = x | Y = y] = Pr [ X = x and Y = y] / Pr [Y = y]
We can say that two random variables x and y are independent if for all x and y
the events X = x and Y = y are independent which we can express as
Pr[ X = x and Y = y] = Pr [X = x].Pr[Y = y]
The simplest summary of the distribution of a random variable is the average of the values it takes.

Friday, April 25, 2014

Today we look at some counting theories:
Counting theories explain how many without actually enumerating how many. This is very helpful when it is not only daunting to count a set of items but also when it is difficult to make the set.
Consider for example how many different ways can we arrange n distinct elements ?
We review some of the elements of counting theory.
A set of items that we wish to count can sometimes be expressed as a union of disjoint sets or as a Cartesian product of sets.
The rule of sum says that the number of ways to choose an element from one of two disjoints sets is the sum of the cardinalities of the sets.
The rule of product says that the number of ways to choose an ordered pair is the number of ways to choose the first element times the number of ways to choose the second element.
We look at them in detail now.
If A and B are two finite sets with no members in common, then the number of ways to choose an item from one of the sets is the count of items in both sets. For example, a license plate may have either alphabets or numbers in each of the position. Since there are 26 alphabets and 10 numbers, there is only one pick out of 36. We can now extend this to sets that have duplicates and the answer does not change because it depends on cardinalities.
If we use the same sets A and B we can express the number of ways to choose an ordered pair is to choose the first element times that from the other set. For example, an icecream with 28 flavors and 4 toppings can be mixed and matched to give 28*4 different icecreams.

We sometimes need to bound the size of a binomial coefficient. For 1 <= k= < n , we have the lower bound as
(n
k) = n ( n-1) ... (n-k + 1) / k.k-1...1
which we can rearrange to get >= (n/k) ^ n
Here use the inequality k! >= (k / e ) ^ k derived from Stirling's approximation, we obtain the upper bounds
(n
k) = n.(n-1)...(n-k+1) / k. k-1. ... 1 <= (n^k / k!) <= (en/k) ^ k
for all 0 =k <= n , we can use mathematical induction to prove the bound
(n
k) <= n^n / k^k.(n-k)^(n-k)
where for convenience we assume that 0 ^ 0 = 1
i.e for the trivial case we have k = 0, we have
(n
0) <= 1
and for k we assume that it holds, now we look at k+1
(n
k+1) <= n^n/ (k+1)^(k+1).(n-k-1)^(n-k-1) which we rearrange to
write as <= n^n / k^k.(n-k)^(n-k) because the denominator increases with k+1
and it holds for k+1
For k = lambda.n, where 0<=lambda<=1, this bound can be rewritten as
(n
lambda.n) <= n^n / (lambda.n) ^ (lambda.n) . (((1-lambda)n) ^ ((1-lambda)n))

Thursday, April 24, 2014

We look at approximation by integrals today.When a summation is expressed in terms of a function of k where this f(k) is a monitonically increasing function, we can approximate it by integrals because it finds the area under the curves in slices. By using the summation of the rectangles under the curve we can get a pretty good idea on the bounds of the function. Moreover, this applies to both the monotonically increasing and decreasing curves.
In the case of the monotonically increasing function on a graph from left to right, the area of the slice on the left of a chosen slice will be lesser or equal and the slice on the right of the chosen slice will be higher or equal. That is we have a generic three partitions of the ranges and we can show this for any of the repeating three slices.
The integral approximation gives a tight estimate for the nth harmonic number. For a lower bound, we obtain.
Sum of k = 1 to n of (1/k) greater than or equal to Integral of 1 to n slices of (1/x) dx = ln (n + 1) because each slice can be of unit width.
For the upper bound we derive the inequality
Sum of k = 2 to n of (1/k) is less than or equal to Integral of slices (1/x)(dx) = ln (n) again based on unit-width slices
Together this yields the bound of the harmonic series Sum of k = 1 to n of (1/k) <= ln (n) + 1.
We note that the total area under the curve is the the value of the summation. The integral is represented by the shaded area under the curve. By comparing the areas for the lower and upper bound, we see that the rectangles are slightly under the curve in the case of the lower bound and slightly over the curve in the case of the upper bound. We compute the areas as
Sum m-1 to n of f(x)dx <= Sum m to n of f(x)dx and by shifting the rectanges one to the right we establish sum m to n of f(x)dx <= Sum m to n+1 of f(x)dx.

Wednesday, April 23, 2014

We will now consider some other techniques in addition to the previous post. One way to obtain bounds on a difficult summation is to express the series as the sum of two or more series by partitioning the range of the index and then to bound each of the resulting series. For example, suppose we try to find a lower bound on the arithmetic series we had seen earlier. If the number of terms were even, this could comprise of two partitions. The lower bound has to be greater than or equal to that obtained from the upper half. i.e. the lower half can now be ignored because the upper half has similar and is greater in value for the term at each of the corresponding positions. In other words, the initial terms of the summation are all constant. The sum of the upper half we can compute from arithmetic series to be equal to (n/2)^2. This gives us the lower bound of Omega(n^2) which is an asymptotically tight bound because the arithmetic sum of those n numbers is big-oh(n^2).
The caveat here is that if we had chosen to bound each term by the smallest term, because that term happens to be 1, we may have got a lower bound of n which would not be near the better bound we found.
This technique tells us that when performing an analysis of an algorithm, we can split the summation and ignore a constant number of initial terms and is generally applicable when each term is independent.
Another way this technique can help us is with infinite series.
For example to find an asymptotic upper bound on Sum k = 0 to infinity of ((k ^ 2) / 2 ^ k ), we observer that the ratio of the consecutive terms is less than or equal to 8/9.
In this case if k > = 3 then the summation can be split into a fixed set of initial terms (in this case upto 2) and the rest ( 3 onwards to infinity) in the other set.
The latter can be said to be less than or equal to 9/8 * Sum k = 0 to infinity of ((8/9) ^ k).
In other words he total of the such a infinite series is a constant.
This technique can also work for more difficult series such as harmonic series.
In the harmonic series of sum k = 1 to n of (1/k), we split the range into lg N pieces and upper bound the contribution of each piece by 1. Each piece consists of the terms starting at 1/(2^i) and going up to but not including 1/(2^(i+1)). This makes each piece to be upper bounded by 1 and the that for the harmonic series to be lg N + 1. With each of the pieces contributing a constant and there being lg N pieces, we get the upper bound as lg N + 1.
We will next look at approximation by integrals.
Integrals are another technique for algorithm analysis

Today also we continue the discussion on the summation properties:
There are many techniques for bounding the summations that describe the running times of algorithms. We will outline a few methods:
1) Mathematical induction :
This is a technique where we first show that a particular condition holds for a trivial case. Then we assume it holds true for nth case and prove that it then holds for n+1 th case. As an example, we show that the arithmetic series sum of k from 1 to n evaluates to 1/2 n (n+1). We can easily verify this for n = 1. Now for the nth case, the assumption holds and we prove that it works for n+1 as follows:
Sum k = 1 to n + 1 of (k) = Sum k = 1 to n of (k) + (n+1)
                                           = 1/2 n (n+1) + (n+1)
                                           = 1/2 (n+1)(n+2)
A caveat with proving bounds by mathematical induction is that the constant hidden by the "big-oh" could grow with n and thus may not be constant.
2) Bounding the terms:
Sometimes a good upperbound in a series can be obtained by bounding each term of the series, and it often suffices to use the largest term to bound the others. For example, a quick upper bound on the Arithmetic series (A.1) is
Sum k = 1 to n of (k) is <= Sum k = 1 to n of (n) = n ^ 2
In general for a series Sum k = 1 to n of ak , if we let a-max = max 1<=k<=n ak,
then sum k = 1 to n (ak) <= n a-max
This technique of bounding each term by the largest term is a weak method when the series can be bounded by a geometric series.
For example given the arithmetic series as above, we can assume ak+1/ak <= r for all k > =0, where 0 < r < 1 is a constant. The sum can be bounded by an infinite decreasing geometric series, since ak <= a0 r^k
Sum k = 0 to n ak <= sum k = 0 to infinity of (a0 r ^k)
                               = a0 Sum k = 0 to infinity of r ^k
                               = a0.(1/1-r)
This is better than bounding with the quick upper bound on the arithmetic series.
There is a caveat here: a common bug in applying this method is to show that the ratio of consecutive terms is less than 1 and then to assume that the summation is bounded by the geometric series because the series could diverge i.e. the r is not less than 1 or it is not constant or the ratio of some or all pairs of consecutive terms could be may become arbitrarily close to 1. This is the case in the harmonic series for example where the r is arbitrarily close to 1.

Tuesday, April 22, 2014

Today we take a short break to discuss summation properties from the book on Algorithms by Cormen et al.
Algorithms are analyzed often on mathematical tools. The following are the methods for evaluating and bounding summations which occur frequently in the analysis of algorithms.
we denote the summation with the symbol SUM,
First, we describe Linearity as
Sum(c.ak + bk) = c Sum(ak) + Sum(bk)
Next, we describe arithmetic series as 1/2n(n+1) = Theta(n^2)
Sum of the squares is defined as n(n+1)(2n+1)/6
Sum of the cubes is defined as (n ^ 2 )(n +1) ^2 / 4
The geometric series is defined as
1 + x ^ 2 + x ^ 3 + ... x ^ n = (x ^ (n+1) - 1) / (x - 1)
When the summation is infinite and |x| < 1, then this sum is 1 / ( 1 - x)
For Harmonic series :
the nth Harmonic number is
Hn = 1 + 1/2 + 1/3 + ... + 1/n = ln n + O(1)
Telescopic series are very useful to find patterns in seemingly different numbers.
Each of the terms is added in exactly once and subtracted out exactly once.
Sum k = 0 to n - 1 of (ak - ak-1) = an - a0
For example telescopic Sum k = 1 to n-1 of 1/(k(k+1)) = 1 - 1/n
Integrating and differentiating series are alse commonly encountered. These are handled in the following manner:
For example, if we differentiate both sides of the infinite geometric series(A.6) and multiplying by x, we get
Sum k = 0 to infinity of k. (x^k) = x / (1-x)^2
for |x| < 1
Additional formulas can be obtained by integrating or differentiating the formulas above.
This makes integrating and differentiating series useful.
Products can be expressed in summations.
For example, if we take the finite product a1.a2....an
we convert a formula with the product to a formula with a summation by using the identity
lg(product(a1..an)) = sum of k = 1 to n (lg ak)

Monday, April 21, 2014

In this post, we summarize the readings on winpcap. First winpcap is a packet capturing framework built for windows that is similar to libpcap but can be considered an improvement The three primary components of a BSD implementation : a BPF filter, a Network tap and a user mode library for applications.
The BPF filter had a pair of swapping buffers. This has been improved on with a ring buffer in the equivalent NPF and fewer context switches. This includes the delayed write capability.The buffers are in kernel mode and are able to access more memory than ever before.
The network tap talks to NDIS and has high performance. It didn't have access to lower level protocols packets but that has changed since. The winpcap stack has a packet module for user mode programmability of kernel mode NPF. The libpcap Interface to users talks almost exclusively to this packet module. Packet filtering can be specified in similar syntax and these are evaluated by the NPF. The packets delivered to the application depends on the user mode buffer made available and as such can support real time operations. Tests to exercise the overall code path and to dump packets to file to measure packet loss validated the design choices.
In our posts we will investigate packet capture input on Windows for Splunk.

We continue with our discussion on WinPCap today. We talked about how the application behaved in the data flow all the way to the user application. We will now talk about the overall summary. In the tests discussed, we have the packet generation process which is able to load the network. The capturing process has an excellent implementation and it outperforms the original BPF /libpcap implementations. The overall performance of the winpcap was evaluated with an end to end flow however the test that dumps all packets to a file is more interesting to the user. The test confirms also that other parts of the OS may have an importance that is far larger than the packet capture components. FreeBSD seems to work poorly than Windows when comparing the packet capture process. In the studies, WinPCap has been used with the standard kernel buffer in the presence of heavy traffic and the size of this buffer can be increased by the application through a simple function, improving noticeably the overall performance of the system. This can improve the performance even more.
The tests overall validate the architectural choices such as the use of circular kernel buffer instead of the original buffering, the delayed write implementation which looks at a few bytes of the packet and copies the entire packet during a single call reducing the number of context switches and lastly, the update-space-during-copy operation in the kernel buffer.
Among the supported platforms during the study, Windows 2000 was found the best one for high-performance network analyzers while FreeBSD did not perform as well. A large size for the kernel buffer does not seem to be able to influence the performance of the capture process.
WinPCap has been proved being an excellent choice for the several applications that are based on high-performance packet capture.

Sunday, April 20, 2014

We will continue to look at winpcap today. The sending process, the network tap and filtering process are all comparable between BPF and NPF but the windows operating system is faster at handling hardware interrupts and in all the operations made by the NIC driver and NDIS code. In order to test the overall flow through the system, an application that calls the packet capture mechanism but discards all the packets was used. This test evaluated the entire WinPCap architecture, including copy process from the interface driver to the kernel buffer and then to the user buffer. There were no filters in this test and all packets received by the tap were delivered to the application. There was no packet loss. The delayed write capability that lets the kernel to wait for a minimum amount of data and to copy a large block of data to user space in a single system call tremendously helped with this test.
Another test aimed to dump all the packets to a file. When the network is overloaded, the systems suffer noticeable losses when the cpu time is not available i.e a new packet arrives when the tap was processing an earlier one. or when there is no space in the kernel buffer. In this test, there was a non-negligible worsening when the whole packet is dumped to file.
An adhoc program was used to test the monitoring capabilities of WinPCap. The test confirmed that the CPU load is considerably low and that the results match the ones earlier. The additional cost of the monitoring code degrades the user level application results since it requires a non-negligible amount of memory.

We will next look at some portability considerations and performance concerns with WinPCap. Porting of libpcap to winpcap was made easier because BPF and NPF have similar interfaces. There are some system calls in Unix that don't have a direct mapping to the winsock library and so these were written with windows dependent code. The porting resides mostly in the WPCap module. This as we discussed uses the packet module methods instead of the NPF. WinPCap has some differences from libPCap because of Windows dependent code. For example, Win32 applications cannot use the select function on NPF device in order to know if there are packets that needs to be read so winpcap implements a new event.
To test the performance, two machines were used one as a sender and another as a receiver. The sender generates the traffic and the receiver captures the packets. The packets were made such that it would generate the maximum amount of packets per second since this is the worst case for a network analyzer. Packet size of 88 bytes showed the maximum number of packets. The minimum packet size on Ethernet is 64 bytes. Ethernet load was full at packet size of 400 bytes.
Tests evaluated the performance of both sending process as well as filtering process. Packets are received by the network tap and checked by the filter. No packets are expected to match the filter so this utilizes the NPF as much as possible. Results show that windows flavors have similar behavior and almost all the packets are received and filtered by NPF.

Saturday, April 19, 2014

We continue to review the WinPCap today. We talked about the statistics mode. Next we talk about the packets injection mode. Both BPF and NPF have write capabilities that allow the user sending raw packets to the network. However, libpcap did not support this, thus BPF was never used for this purpose. Unix applications uses raw sockets instead. Win32 has very limited support for raw sockets. This is where WinPCap provides a standard and consistent set of functions for packet injection. The packet data is copied to the kernel buffer and then sent to the network over the NDIS.
There is a well-known libnet packet assembly library that was used to add a layer for packet construction and injection on top of WinPCap.
We now talk about the packet.dll which is one of the modules that comes with WinPCap and provides a common system independent API to enable programs to capture packets on a variety of windows operating system flavors, architectures, editions and versions. Packet.dll includes several additional functionalities such as low level operations on the network adapters and the dynamic loading of the drivers and some hardware counters.
Note that packet.dll interfaces only with the Network packet filter and does not partiticpate in statistics mode directly off the NDIS.
Besides the NPF and the packet module, the third module that is not OS dependent is the WPcap.dll. This has high level functions such as filter generation and user level buffering, plus advanced features such as statistics and packet injection. This set of API is more granular than the low level APIs exposed by the packet module where there was almost a one to one mapping between the APIs and the kernel mode calls. The higher level functions of WPCap module are more user-friendly and a single call can translate to several packet module calls.
Win32 network architecture relies on NDIS which is the Network Driver Interface Specification which handles the interaction between the NIC drivers and protocol drivers. Internally, NDIS creates an artificial world of its own abstracting the protocol drivers from the complexity of the hardware. This implies that the same protocol stack can now work with a variety of technologies.
While BPF requires NIC device drivers, NPF does not have the same luxury as a BPF driver specification. Instead NPF locates the network tap as protocol driver on top of the NDIS structures. This pseudo device works the same way as virtual private networking protocol drivers. Due to its location, NPF has the restriction that it cannot tap the lower level packets that do not reach the NDIS upper layer as for example point to point protocol including link control protocols, network control protocols including auth and encryption etc. As a comparision, the Packet Socket in a Linux kernel also suffers from a similar problem.
Note that NPF system calls are all blocking in nature. Besides, NDIS is performant in that it doesn't copy the entire packet if no protocol drivers require it. Thus NPF provides a clean and isolated plugin for an efficient packet capture on commercial systems.

The other significant difference between a WinPCap NPF and a BSD BPF is that the former has a ring buffer for every user application. that copies data from the kernel mode to the user mode The copy operation to the user mode buffer happens via a single read call thereby reducing the transitions between the user mode and the kernel mode. This kind of buffer allows the storing of network bursts because it makes more memory available than the BPF.
The kernel buffer is also larger than in BPF. If the application is not able to read as fast as the driver captures for a limited time interval, the capturing process is penalized. The size of the user buffer is important because it determines the maximum amount of data that can be copied from the kernel space in a single system call. A smaller buffer is generally suitable for real-time applications since it guarantees that the kernel will copy the memory as soon as the application makes it available. Thus NPF is more configurable in that it allows users to choose between efficiency and responsiveness.
Another configuration parameter is the timeout between read values. By default the timeout is 1 sec and the minimum amount of data is 16K. This is referred to as delayed write.
One of the core issues of any network analysis and packet capturing is that this is a very CPU intensive task and network packets can overwhelm the CPU. This situation is obviously even worse on faster networks. The typical approach to improving speed include filtering engines and I-copy architectures, which avoid copying packets between kernel space and user space by mapping the kernel buffer in the application's memory. The advantages of this shared buffer may be limited if a user still makes one system call for every packet which results in high number of context switches.
WinPCap introduces the notion that the monitoring not only needs no copying but also pushes it down to the kernel avoiding both data transfer and processing at user mode.
Applications need not call the libpcap APIs to get the data. They can also use the statistics mode of the NPF. Statistics mode avoids packet copies and it implements a 0-copy mechanism - statistic is performed when packet is still in NIC driver's memory, then the packet is discarded. Moreover, the number of context switches is kept the lowest because the results are returned to the user by a single system call. The syntax for requesting this statistics is same as in libpcap but doesn't have to go through the libpcap. These are some of the differences.

Friday, April 18, 2014

We will continue to review WinPCap architecture today. We mentioned that the WinPCap has three main components similar to the BSD capturing components. These are:
the libpcap library based programmability component of WinPCap.
a Berkeley Packet Filter with its kernel level buffer that keeps all the data.
a Network Tap that snoops all packets flowing through the network.

We will review these components in detail.
A packet satisfying the filter is copied to the kernel buffer. The kernel buffer is subdivided into two small buffers store and hold that are used to keep the captured packets These are runtime allocations of blocks of memory. The store buffer is used to keep the data coming from the network adapter and the second one is used to copy the packets to the user buffer. If the store buffer is full and the user buffer is empty, BPF swaps them. In this way, user level applications does not interfere with the adapters device driver.

Since BPF has proved to be a powerful and stable architecture, the basic structure of WinPCap has these three components as well. However, WinPCap has significant differences in the structure and in the behavior of the capture stack. It can be seen as the evolution of BPF.
The filtering process starts with the libpcap compatible component that is able to accept a user-defined filter and compiles them into a set of pseudo instructions. The kernel module then has to execute these instructions against all incoming packets.

The WinPCap NPF as opposed to BPF uses a circular ring buffer as kernel buffer. It follows a sliding window protocol. and is a bit more challenging to maintain than that in BPF because data copies can have varying sizes and the bytes copied is updated while the transferring to user mode happens and not after. The kernel has higher priority and can pre-empt the user mode copying. This implementation allows more memory to be used by kernel as compared to only half of the memory used in the BPF swapping buffers. The entire kernel buffer is copied by means of a single read and reducing the number of system calls and therefore the number of context switches between user and kernel mode.

We are reading from the paper on an architecture for high performance network analysis by Risso and Degioanni

Thursday, April 17, 2014

Next we look at the WinPCap architecture. WinPCap adds a similar functionality to what libpcap or tcpdump does to flavors of Unix. There have been other modules in Windows and some with available APIs and each one with a kernel mode driver, however they suffer of severe limitations. However, Netmon API is not freely available and its extensibility is limited. And it did not support sending packets. In this architecture we review some of these functionalities as well.
WinPCap was the first open system for packet capture on Win32 and it fills an important gap between Unix and Windows. Furthermore WinPCap puts performance at the first place.WinPCap consists of a kernel mode component to select packets and a user mode library to deliver them to the applications. The library also provides a low level network access and allows programmers to avoid kernel level programming. WinPCap includes an optimized kernel mode driver called Netgroup packet filter and a set of user-level libraries that are libpcap compatible. From the outset, libpcap compatibility was important to WinPCap so that unix applications could be ported over.
We now look at the BSD capturing components. Getting and sending data over the low level network interfaces was an important objective of BSD. There are three components to BSD. The first block Berkeley Packet Filter is the kernel level component for packet used to store packets coming from the kernel.The Network Tap is designed to snoop all packets flowing through the network and it reads the interface through the interface driver. It is followed by the filter which analyzes incoming packets The Libpcap library is the third component. A packet satisfying the filter for Network Tap is copied to the kernel buffer in the BSD. The user has direct access to each of these three layers. The user accesses the Network Interface Card driver with other protocol stacks to send and receive data. The user code can directly access the BPF. Lastly the applications can write user code calls to libpcap.

We now look at the optimization techniques with the Berkeley Packet filter in TcpDump. Optimizing the code generation was required because users could specify compound queries. In such queries the generated code was redundant and highly inefficient. As a comparison, if we had taken a simple query such as tcp src port 100, then the resulting code would have been a linear decision tree with stteps evaluating down from IP, frag, TCP, sport, 100, true with each progress down only on true.
IF we wanted the same for both directions, we would have two such trees with an OR in between and evaluating one below the other. If the query had been both in same directions such as with tcp port 100 and 200, then we would have both trees one evaluating to the other. and this combination evaluating one below the other. In this case, we would have had a significant code bloat. This is highly redundant and inefficient. This is where the optimization techniques are used which have their roots in compiler design. One such technique is called the dominator technique. What this does, it eliminates common subexpressions. So if the exit from one is the same as entry to another, we can replace that sub-expression with a third. While this traditional technique does optimize the code, it does not address the branching because data could flow to this sub-expression from both . If we look at the edge relationships instead of node relationships, we can do even more optimization. When we assume a particular sub-expression has already been evaluated by the expression above, then we can bypass that sub-expression and directly move to tthe outcome of the sub-expression. This creates new opportunities and when we repeat the cycle for all sub-expressions, at each stage, we could eliminated redundancies. An interesting observation here is that this exercise for optimizing code could also help us detect unreachable code and simplify the graph. Now we take the example above and remove the redundancy of opcode nodes and instead replace the edged to move directly to the outcome of the sub-expressions above. In this case we had the linear decision tree of IP, frag, TCP, common and we remove three sets of those copies and adding edges directly from them to the outcome. We also add edges from src port 100 to dest port 100 and dest port 200 as well as an edge from dest port 100 to src port 200 and completing the branches from the remaining nodes to the outcomes. We only have two outcomes true or false in the leaf level and all the nodes will be connected to it via edges. This covers the optimization in the Berkeley packet filter.

Wednesday, April 16, 2014

We now look at the compiler/optimizer in TcpDump. It initially introduced two layers of logic
- a lower layer that would handle predicates with multiple values
- an upper layer that would handle the combinations of the lower layer expressions.
The lower layer sees it as key value pairs or an atomic predicate. For example, it sees it as ip host x or y. It could also see the predicate tcp port 80 or 1024.
The upper layer sees it as ip host x or y and (tcp port 80 or 1024)
But this was not working. It tried to introduce paranthesis for grouping but this was still harder on the user.
The solution instead was to have a single level of logic. i.e. the predicate or values can both be part of the expression. The expression could be either predicate or val or both.
This made the grammar easy but it made code generation tricky.
BPF parser maintained a stack of symbol, field and code.
The expression was a predicate or an expression operator a predicate or a unary expression.
The predicate was the field value.
The code generation now takes the field value and updates its stack as it goes deeper through the expression evaluation. At each step it generates the corresponding code.
To evaluate an expression " ip src host x or y and tcp dst port z "
it would push ip one level down into the stack, followed by src followed by host.
when it comes to the value x, it would push protocol selector as field followed by a wrapper for the value. These two would be popped and pushed with a predicate, field and code value
Since we have an 'or' that would get pushed on top of this existing expression followed by a wrapper for the value y.
These three levels would then be replaced by a predicate, field and value corresponding to the protocol selector with a different value.
Having parsed the expression to ip address and its values, we now push and onto the stack and parse the 'tcp dst port z' similarly.
Finally we have pushed the following items on the stack, 1) expr as sym, ISH as fld and C2 as code followed by 2) AND as sym followed by 3) field as sym and TDP as fld and lastly 4) val(z) as sym

Tuesday, April 15, 2014

Today we look at how libpcap/tcpdump works. Libpcap is a system independent use mode packet capturing system. Libpcap is open source and can be used with other application. Note that this library captures traffic unicast to an interface as is the case with TCP. All such traffic to and from the computer can be monitored. That is why if the interface is plugged into a switch, it may not capture traffic. Even if the machine is connected to a hub the hub could be switched network and it may not work. When switches replicate all the traffic on all ports to a single port then the analyzer can capture packets on that port to sniff all traffic.
What distinguishes tcpdump is that it "filters" packet before they come up the stack.
Tcpdump compiles high-level filter specification into low-level code that filters packets at driver level. The kernel module used is called the Berkeley Packet Filter. The berkeley packet filter sits right between the NIC (network interface card) and the tcp stack in the kernel. This packet filter copies packets to tcpdump. The filter also blocks traffic that would otherwise appear as noise to tcpdump. This BPF can be considered to be a virtual machine. It has an architecture with an accumulator (A) and index register (X), a packet based memory model and an arithmetic and conditional logic. For a packet capture of a TCP flow, the filter works something like this :
Is the ethernet packet type IP ? (Load ether into A)
Is IP src address 10.0.0.20 ? (Load IP src address into A)
Is the IP dest address 10.0.0.20 ? (Load IP dest address into A)
Is the IP Protocol TCP ? (Load the protocol into A)
Is it first or only frag ? (Load the frag num into A)
Is TCP src port FTP ? (Load the port 20 into index register X)
(Load the port from packet to A)
Is TCP dest port FTP ? (Load the port 20 into index register X)
(Load the dest port into A)
This virtual model is flexible but we don't want to write low-level filters. So a higher level filter language is available.
We specify rules such as src ip src port dest ip dest port and let the compiler and optimizer translate to the code.
The BPF filter language starts from a basic predicate which is true if and only if the specified packet field equals the indicated value.
Courtesy : libpcap : an architecture by Steve McCanne

Sunday, April 13, 2014

We read the topics TCP monitoring from Splunk docs today. Splunk monitors the TCP port specified. It listens for packets for one or all machines on the specified port. We use the host restriction field to specify this. Host can be specified using IP, DNS name or a custom label. On a unix system, Splunk will require root access to listen for packets from ports under 1024. SourceType is a default field added to events and so is the index. SourceType is used to determine the processing characteristics such as timestamps and event boundaries. Index is where the events are stored.
Note that when Splunk starts listening on a port, it establishes a connection on both directions. There are two workers it spawns. The first worker is the forward data receiver thread and the second worker is the replication data receiver thread. Both workers act on Tcp input and hence share similar functionality.
The forward data receiver thread creates the input pipeline for data. Therefore it manages routines such as setting up the queue names, updating bookkeeping, maintaining stats and cleanup of all associate data structures and file descriptors. The Replication data receiver thread is responsible for creating an acceptor port, scheduling memory reclaiming and handling shutdowns.
Note that in a cluster configuration, there may be multiple peers or forwarders. Therefore all the data must be handled. There is only endpoint to which the data arrives and these are consolidated.
The interface that deals with the TCP channel is the one that registers a data callback, consumes and send acknowledgements
The data callback function does the following:
It processes forwarder info and it sends acknowledgements.
The way TCP monitoring works is very similar to how tcpdump works. Raw packets read are dumped to a file. Tcpdump is written in the form of libpcap packet capture library. Libpcap works on different operating systems. It works on the principle that a TCP flow between a source ip address and port and a destination ip address and port is available to read just like any other file. Tcpdump can log to a file that can be parsed with the tcptrace tool. The use of such tools makes it unnecessary for Splunk to monitor all Tcp connections to a computer directly.

As we are discussing proxies, we will evaluate both the relay behavior as well as most of the other functionalities that it can support. In terms of monitoring and filtering, a web proxy server can do content filtering. It is used in both industrial and educational institutions primarily to prevent traffic to sites that don't conform to acceptable use. This proxy also does content user authentication and provides detailed logs of all the websites visited. The logs generated by a content filtering proxy is an example of how this proxy can produce the kind of information we can use for indexing with Splunk.
This means that if we have existing proxies, they already produce logs. The second implication is that there could be a chain of proxies involved in accessing the internet and each of these provides a level of abstraction and anonymity in accessing the internet. This is often seen in the case of connection laundering where the government investigators find it hard to follow the trail of where the connections originated unless they go hop over hop in the chain of proxies.
One of the things that this kind of filtering supports is the use of whitelists and blacklists. As we may see from the configuration specification that Splunk provides, there are several entries that can be mentioned in both lists. A whitelist is one that allows access to those mentioned in the list. A blacklist is one that denies access to those mentioned in the list. Together, we can use these two lists to fully express what traffic is permitted and what isn't because they are mutually exclusive.
One caveat that goes with the use of proxy for content filtering is that, if the rules of the filtering are based on the origin server, another proxy could bypass these rules. Therefore, these rules are effective only when the origin server is not spoofed. At the same time, rules based on destination are more effective to write.
Proxies can also be used to enhance performance. A caching proxy server accelerates service requests by retrieving content saved from a previous request. Since these requests may have been made earlier from the same or other client, the time it takes to serve these resources is reduced thereby increasing performance. When there is high volume of resource requests with duplicate resources, these can be served with higher efficiency. Finally, a proxy can also be used in translation.
I've read up from Wikipedia and StackExchange however I will cover some differences with Splunk monitoring later.
In the Splunk TCP monitoring, we are able to monitor on a single address:port endpoint.

Friday, April 11, 2014

In today's post we continue our discussion a Fiddler like application with a modular input to Splunk. One of the things we have to consider is that on production machines the type of network traffic may be very different from a desktop. So the first thing one has to do is to determine the use cases. There is more inbound traffic on a pproduction system than there I'd outbound. While there is a lot of information to gather on inbound traffic such services already being outsourced to third party proxy providers. What this app does is it gives a monitoring toil in the hands of the individual users or administrators that the cab rub on desktop. Note that even the light weight forwarder is only deployed to a handful of machines for each instance if an Enterprise class Splunk server. What we are talking about can scale to several thousands with one instance on each machine and at the same time be ubiquitous as in they can be on mobile devices as well.
Furthermore, we could argue that packet capture tools can be turned on by admins on all desktops and these could be logged onto a common location from which Splunk can read and populate the events. In practice, seldom do we enable such applications by default without user opt in on grounds of privacy and security even for employees of an organization. Besides it leads to more maintenance overhead with very little benefit for governance or security. Its more common on the other hand to selective control the intranet and internet zones, proxies etc across the organization and not go after individual computers with the exception of software updates and publishing. That said, a central log appending from multiple sources is also not common for the sake that it introduces a central point of failure, and possibly slower responsiveness on the individual users computer. That is why its better to separate this overall workflow into two separate workflows - one for pushing an app onto the individual users computer through software updates and push mechanism - and another to collect the events / log gathered by this app onto a common file or database index. The former is where our app or Fiddler will come in useful. The latter is what Splunk is very good at with its own collect and index mechanism. Our app goes an extra step over Fiddler in that it collects the packet captures and forwards to Splunk. This way we have absolutely no problems in utilizing the best of both for the number of individual user computers in an organization.

We will now look at what all things a proxy does. We will try to see not only the relay behavior but also the filtering it can do. Proxies support promiscuous mode listening. In our case we have a transparent proxy that does not modify the requests or responses. Proxies can also be forward or reverse. A forward proxy helps with anonymity in that it retrieves resources from the web on behalf of the users behind the proxy. A reverse proxy is one that secures the resources of the corporate from outside access This comes in helpful to maintain quarantine lists to stage access to the protected resources. Network address translation is a useful concept in this regard. Its also referred to as fencing when used with virtual machines. A reverse proxy can do several things such as load-balancing, authentication, decryption or caching. By treating the proxy as another server, the clients don't know which server processed the request. The reverse proxy can distribute the load to several servers which means the configuration is not just a fail over but a load balancing one.
SSL acceleration is another option where this proxy enables hardware acceleration and a central place for SSL connectivity for clients.
The proxy can also choose to serve/cache static content and facilitate compression or to communicate to the firewall server since the rules on the firewall server could now be simplified.

Thursday, April 10, 2014

I will be exploring the Netmon option for Splunk in this post. Hopefully, I should be able to cover most of the technical details. This application is expected to capture http or https traffic only, hence it is much more similar to Fiddler than it is to Wireshark or NetMon since they are based on the lower layers of networking stack in the application. The principle in a packet capture tool such as this is to substitute the internet proxy with one from the application that has an address and port as say 127.0.0.1:8888. If we look at the System.Net.WebProxy class, we have the options to set the Address property and specify the web proxy for instances of the web request in an application. Global proxy settings are specified in the machine and application level configuration file. In Internet-Explorer when we make proxy settings per-machine rather than per user, we are forcing all users to use the same proxy settings rather than to use their own settings of the proxy. This works well for packet capture because when we switch the proxy we know that we will capture all traffic. In production environments, this is typically not an issue. Since our application sits between the WinInet and the corpnet proxy on these machines, it should be able to capture all the http/s traffic. We may have to call the WinInet methods by PInvoke or use the poshHTTP class as available on the Internet. WinInet exposes four methods InternetOpen, InternetCloseHandle, InternetSetOption, and InternetQueryOption to set and retrieve Internet settings. With the WebBrowser data structure, we can push the settings to all applications accessing the internet.
Having talked about setting and removing our proxy, we now look at the other role of the proxy which is to play the man-in-the-middle. This is done via the forward option. Note that Splunk has an inbuilt option to read network traffic at a TCP port and localhost. The same can be used from the proxy to log packets. However the proxy we implement also needs to forward the packets outbound from the machine. And this is what we look at next.
Here we have to pay attention to three things:
First is that we should relay the packets hence we need to parse the destination and forward to that address and port (typically 80 or 443).
Second, we turn off the keep-alive and send the requests and responses as if new i.e. we use the connection-close on both sides.
Third, our forwarding is in both directions so we have the same implementation for source-destination basis.
With the above, we have barely scratched the surface but we know this is viable.
One important thing to mention here is that the the requests and responses will have parses be data because they have headers. By that we mean no matter what the source and destination is the requests and responses can be input to Splunk with a predetermined event schema. The value addition to regular packet capture is the Splunk search that we can now do to enable powerful analytics.

Another Splunk app could be to perform a neteork capture on any machine. This could be machine local capture or app local or even a browser page local. In these cases, the functionality is to sniff the packets similar to what Netmon, Fiddler or Wireshark does. We can listen on the network interface and log each packet. There's functionality available on windows to do that with the wininet library. We could even filter the packets per process. The application can be written in C# and with Splunk SDK library. All the packets captured can be a modular input to Splunk. The events written to Splunk can have the same format as the request response details on any one of the network capture tools. In this tools there is an option to interact and replay the packets or modify and sens a new one. In this case, this is a passive listen and log mode. The packets may be binary or text and we may choose to capture only text. We may have to switch the proxy for the duration of this application. And the packets logged can have all the header information even if it is https.
What this functionality provides is a way to search the packets in a way like we can on Splunk with powerful queries that no other application has. It will also give us the ability transform data before it is indexed. Further more, the application could have a great deal of machine data generated over time. Where it differentiates from the existing Splunk out of box ability to log top and udp pack wets is its scope. The scope can be a machine or app or session. It should even be able to work on a headless server or on a device. In Splunk language it would be able to work on a single lightweight forwarder that sends events to an indexer from which search heads can pull results for the users queries.

Wednesday, April 9, 2014

We talk about a Splunk app today to read installation logs on a machine. The app monitors and targets installation logs. Optionally, it could try to detect when an application is being installed and turn on verbose logging. The application itself can be packaged in Wixsharp. Also, the application works on windows installer technologies and is to be implemented in C#.
In addition the application will send all classification of messages from these files to the Splunk server as events to be indexed. So it could have a modular input for this machine.
As with the Octopus tool, this app could read different machines.
The Octopus tool is known for facilitating MSI installations on datacenter machines because it can do that consistently for any subset of the machines in the data center. It repeats the same process that it does on one machine with others as well.
Octopus also has a web interface. and this is convenient to choose the machines on which it deploys. For example, we can choose the package we want to deploy and the target machines and Octopus can deploy on all these machines. Should there be a need to change the configuration on the machines, they can be parameterized and passed to the application. This is very convenient.
The application could follow the same as what Octopus does in the sense that it reads from multiple machines and collects the data together with the hostname. The hostname has to be granular in that it should resolve to the physical machine in say a cluster. This kind of granularity is important because we want to associate the logs to the machines.
The application could also look for different levels of details such as whether there were errors in the logs, whether there were registries altered, whether there were files touched, whether there were and settings changes, whether there were any custom actions, font files etc. A lot of the installations leave behind the log files and these are more handy than the installation information received from msiinv and msiinfo kind of tools. These tools can be run for any target machine to get information on the current state of all the applications and their installations. That also provides a valuable input for feeding events into the Splunk. The tools can be run at periodic intervals or on demand as well although the former is recommended and only the deltas may need to be fed into Splunk.

Tuesday, April 8, 2014

We continue our discussion on Ford Fulkerson algorithm today. We talked about improving the efficiency of Ford-Fulkerson basic algorithm with a data structure corresponding to the directed graph, G' = (V, E') where E' implies there is an edge from u to v or from v to u in E. and edges E are also in G'. Given a flow f on G, the edges in the residual network Gf, consists of all edges (u,v) of G' such that c(u,v) - f[u,v] <> 0 The time to find a path in the residual network is now O(V + E') = O(E) if we use depth first or breadth first search. This is a significant improvement. Each iteration on the while loop now takes O(E) time making the overall complexity as O(E | F* |)
Since the algorithms works on incremental units of flow, we look at whether this algorithm scales for large networks. For simple networks, we know that it works efficiently since the |f*| is small.
In a large network, maximum flow can have a value to the order of 10 ^ 6 and this many units of flow traverse a path s->u->t while another so many units traverse the path s->v->t. If the first augmenting path found by Ford Fulkerson algorithm is s->u->v->t then the flow has value 1 after first iteration. If the second iteration has the path s->v->u->t, then the flow has value 2. If we choose the former in odd number iterations and the latter in even number iterations, we would perform a total of two million augmentations, increasing the flow value by only 1 unit at each time.
We could improve the bound on the Ford Fulkerson algorithm if we implement the computation of the augmenting path p with a breadth first search that is if the augmenting path is a shortest path from s to t in the residual network where each edge has unit distance (weight). This implementation of the Ford Fulkerson method is called the Edmonds Karp algorithm. The Edmonds Karp algorithm runs in O(V E2) time because it depends on the distances to vertices in the residual network Gf.
This improvement is due to shortest path distance. The shortest path distance in a residual network increases monotonically with each flow augmentation. We prove this lemma this way:

Let us take the vertex v belonging to V - {s,t} where s is the source and t is the sink. We will assume the shortest path distance from s to v decreases with a flow augmentation and then derive the contradiction. .Let f be the flow before the first augmentation and f' just afterwards.
Let v be the vertex with the minimum shortest path distance that was decreased by the augmentation so that the shortest path of v with the new flow is less than the one before.
Let p = s ~ u --> v be a shortest path from s to v in Gf', so that (u,v) belongs to Ef' and
the shortest-path from s to u = shortest path from s to v - 1
Because of how we chose v, we know that u was unchanged.
In the new flow, the shortest-path label of u is at least as much as it was before.
The claim now is that the edge u,v did not belong the previous Ef. We state this because:
if we had that edge belong to Ef, we would also have the shortest-path to v in the old flow
to be less than the shortest path to u with the new flow plus unit-distance by simple triangular inequality.
and definitely less than the shortest path to u with the old flow plus unit-distance because they have to be different
and also equal to the shortest path to v in the new flow given that it got decremented by unit-distance with the augmentation
overall contradicting that u,v was in the earlier Ef
The Edmond Karp algorithm always augments flow along the shortest paths, and therefore the shortest path from s to u in Gf has (v,u) as its last edge.
Therefore, we say that
the shortest path distance (s,v) = shortest path distance (s,u) in old flow- 1
<= shortest path distance (s,u) in new flow - 1 by inequality
= shortest-path distance(s,v) - 2 given that the distance for v decremented.
which is contradicting that the distance did decrement.
We show that such a vertex did not exist and we strictly increment the shortest path distance with each flow augmentation.

Sunday, April 6, 2014

The basic Ford Fulkerson algorithm is given by the following:
Ford-Fulkerson(G, s, t)
for each edge (u,v) belonging to E[G]
do f[u,v] = 0;
f[v,u] = 0;
While there exists a path p from s to t in the residual network Gf
do cf(p) = min[cf(u,v) : (u,v) is in p]
for each edge (u,v) in p
do f[u,v] = f[u,v] + cf(p)
f[v,u] = -f[u,v]

The Ford Fulkerson algorithm expands on the Ford Fulkerson method mentioned in the previous post. The while loop repeatedly finds an augmenting path p in Gf and augments flow f along the residual capacity cf (p). When no augmenting path exists, the flow f is the maximum flow.
How the augmentation is done determine s how well the algorithms performs.
If a breadth first traversal is chosen, the algorithm runs in polynomial time.
However let's first take the case when the augmentation path is chosen arbitrarily and the capacities are integral.
The complexity of the initialization steps are O (E)

The complexity of the while loop is O (E f*) where f* is the maximum flow through the system. The while loop is executed at most F* times. The efficiency of the while loop can be increased with the choice of a suitable data structure.

Saturday, April 5, 2014

Today we discuss Ford Fulkerson method to solve maximum flow problem. A maximum flow problem is one where we can interpret a directed graph as a flow network. Each directed edge has a stated capacity.In the maximum flow problem, we wish to capture the greatest rate at which material can be perceived to flow through the graph without violating any capacity constraints.
The Ford Fulkerson method is defined as follows:
Ford-Fulkerson-method(G,s,t)
initialize flow f to 0
while there exists an augmenting path p
do augment flow f along p
return f
As we can see, this method is iterative. At each iteration, we increase the flow by an augmenting path which is simply a path along which we can send more flow and then augmenting the flow along this path. We repeat this process until no augmenting path can be found. There is a way to show that this yield maximum flow which is the max flow min cut theorem. In the Ford Fulkerson method, there can be different implementations to attain the maximum flow and hence named as a method instead of an algorithm.
In the graph with a flow network and a flow, we will have some edges that can admit more flow. This additional flow that we can push through is called residual capacity. The Ford Fulkerson method repeatedly augments the flow along augmenting paths until a maximum flow can be found. The max-flow min-cut theorem tells us that a flow is maximum if and only if its residual network contains no augmenting path. To prove this theorem, a technique called a cut of flow is used.
A cut (S, T) of flow network is a partition of V into S and T = V - S such that s belongs to S and t belongs to T. If f is the flow, then the net flow across the cut is f(S,T). The capacity of the cut is C(S,T). A minimum cut of a network is a cut whose capacity is minimum over all cuts of the network.
The max-flow min-cut theorem states that if f is a flow in a flow network G = (V, E) with source s and sink t, then the following conditions are equivalent:
1. f is a maximum flow in G
2. The residual network Gf contains no augmenting paths
3. |f| = c(S,T) for some cut (S,T) of G
The first condition implies the second condition. This we can show by proof of contradiction. If f is the maximum flow and there is an augmenting path then the flow sum f + fp has a flow in G with a strictly greater than |f| contradicting the assumption we just made.
The third condition implies the first because the value of any flow in network is bounded from above by the capacity of any cut of G. The condition |f| = c(S,T) thus implies that f is a maximum flow.
In a basic Ford Fulkerson algorithm, in each iteration, we find some augmenting path and increase the flow f on each edge of p by the residual capacity cf(p). In this implementation, we make use of the formula that the net flow across the cut (S,T) is f(S,T) = |f| which lets us calculate the residual capacity. Given that edges have capacity and flow and no edge implies no flow and no capacity, we update the flow f[u,v] between each pair of vertices that are connected by an edge by calculating the residual capacity in a temporary variable and augmenting the flow with this capacity.
The residual capacity is the minimum of the capacities. We update the flow in each step until there is none.