Monday, March 31, 2014

I want to talk about Splunk saved searches and their visibility aka permissions. Splunk has the following permissions :
A saved search could either be only private (local) or it could be visible to the current app or it could be visible to all applications (global)
The roles that Splunk defines are 'Everyone', 'admin', 'can_delete', 'power', 'splunk-system-role' and 'user'.
Each of these roles can have read or write capability.
Saved searches are searches that can be scheduled for runs. The permissions can also be specified via the configuration file with the following format:
[object_type/object_name]
access = read : [ comma-separated list of roles ], write: [comma-separated list of roles ]
We can also set permissions for all objects of a type
[eventtypes]
access = read : [ *]. write : [admin, power]
Objects can also be made to be globally available by adding an entry for the object in default.meta
as
export = system
As an example, to make all event types in the business analytics app viewable in every app in the Splunk installation, we use the following:
[eventtypes]
access = read : [ * ], write : [admin, power ]
export = system
Every app and its objects is governed by a set of permissions and these can be specified for every role within Splunk.
Every user has their own user directory. Objects visibility can be promoted from user level to app level and to global across all apps. This is called sharing and Splunk implements it by moving that object from the user directory to the app directory.




In today's post, we continue our discussion on Splunk.  In addition to the discussion on search earlier, I wanted to bring up that search artifacts are maintained in the dispatch folder. We can view all the processing information in these folders including events, commands, arguments, preview and results.
We now look at alerts. Splunk Enterprise can be configured to send alert messages to anyone when real time or historical search results have met a condition.  The conditions can cover a variety of threshold and trend based scenarios.
There are three Splunk alert categories.  These are per-result alerts, scheduled alerts and rolling-window alerts. Alerts are based on reports that run on a regular interval over a  set of historical time range or in real time (if the report is a real time search) When alerts are triggered, different actions are executed. There are several alert actions such as e-mail notifications that we will cover but later.
The per-result alerts are real-time searches that trigger every time the base search returns a result.
This is usually authored to be invoked when an matching result comes in and is generally used in workflow oriented applications. These alerts can be throttled to ensure they don't fire too often.
Alerts based on historical searches usually run on a regular schedule. This alert type triggers whenever a scheduled run of a historical search returns results that meet a particular condition. These are generally lower priority alerts and more for monitoring over time. For example, trigger an alert whenever the number of 404 errors in any 1 hour interval exceeds 100.
Real time searches can also have alerts that monitor events within a rolling time window. These trigger when its conditions are met by events as they pass through this window in real time. For example, trigger an alert when there are three consecutive failed logins.


Sunday, March 30, 2014

We talk about distributed search in Splunk today. As we know from the discussion on distributed deployment, Splunk Enterprise performs three major functions as it moves data through the data pipeline. First, it consumes data from files, the network or elsewhere.  Then it indexes the data. Finally it runs interactive or scheduled searches on the indexed data Each of this functionality can be split into dedicated instances of Splunk that can number from a few to thousands.
 The dedicated instances that consume data are called forwarders and they can be light weight or universal. The indexers do the heavy indexing and they are usually the ones with the higher performance configurations. Typically, there's one indexer for every 150/200 GBs of daily indexing and a 12 to 24 core server for each indexer instance. Indexers can also be clustered and configured to replicate each others data.  This process is known as index replication. Replication helps prevent data loss and improves data availability.  Clusters also have a builtin distributed search capability which we will get to shortly. The other type of dedicated instances are the search heads. The search heads co-ordinate searches across the set of indexers, consolidate their results and present them to the user. For the largest environments,  several search heads sharing  a single configuration set can be pooled together. With search head pooling, we can co-ordinate simultaneous searches across a large number of indexers. With these logical definitions, we can now mix and match the components on same or different physical machines and configurations to suit our needs. Often if the instances share the same machine, it will be a high end machine with several cores and GBs of RAM. Deployment decisions are based on the amount of incoming data, the amount of indexed data, the number of concurrent users, the number of saved searches, the types of searches employed and the choice to run Splunk apps.
Note that distributed search is not about deployment. This is a builtin functionality for Splunk.
In distributed search we look at use cases such as the following:
As a Splunk administrator, I want to distribute indexing and searching loads across multiple Splunk instances to that it is possible to search and index large quantities of data in reasonable time.
As a Splunk administrator, I want to distribute search to control access to indexed data such that the admins and security personnel have full access and others require access only to their data.
As a Splunk administrator, I want to have different Splunk instances in different geographical offices such that local offices have access to their own data while the centralized headquarters hae access at the corporate level.
In the distributed search, the Splunk Enterprise instance that does the searching is referred to as the search head and those that participate in the search are called search peers. Typically, the search head will run search across all its search peers unless otherwise limited as specified by the splunk_server field in the query. If you want to pick a subset for searches, we do this with search head pooling.
 When a distributed search is initiated, the search head replicates and distributes its knowledge objects to its search peers. Knowledge objects include saved searches, event types and other entities used in searching across indexes. This is packaged in a knowledge bundle and distributed to the peers.
Knowledge bundles typically reside under the $SPLUNK_HOME/var/run folder and have a .bundle extension. Bundles can contain almost all the contents of the search head's apps. Instead of copying over the bundles to different peers, its sometimes advisable to mount the knowledge bundles. Together with the bundle user authorization flows to the peers. The user running the search, the role and the location of the distributed authorize.conf file are included with the bundle.



In today's post, we bring up a C# application with the SDK we talked about but not the service broker example I gave earlier. we focus on the SDK itself, specifically on the inputs to write an event. Did you know that you can color code the event display with how you specify the events. I didn't until recently. I will go over some of these features and explore them with the SDK.
But first I want to mention that most of Splunks features are also available directly from the management service REST APIs. For example, we can give
curl -k -u admin:changeme  https://localhost:8089/services/apps/local/myapp \ -d description="My application"
Splunk SDKs provide a layer over the REST APIs so we can directly instantiate the objects and prepare them. Knowing the REST APIs is still helpful in cases where the APIs are not available for a feature in a particular SDK or when they are marshalling parameters to the REST APIs and you are interested in finding out what is set and what isn't. By the way there is an SDK for most of the languages for web development but that also means that not all SDKs may have the same capabilities. The REST APIs provide a uniform consistent layer for all the features and are also documented similar to the SDKs. There's a reference PDF file available that describes most of the endpoints.
The endpoints are categorized as data, admin etc  and we will see a little more on these shortly but I want to bring up that the REST APIs are easy to use with tools like curl and Fiddler.
Splunk's REST APIs are available for access control, applications, clusters, configurations, deployment, Indexes, inputs, knowledge, license and output.  Indexes, input and output are something we will cover first.
We will also look at Search REST API. Searches can be adhoc or saved. Saved searches can be scheduled for runs. There's also auto-complete functionality as well.
I will return to this post shortly. I want to cover distributed search.

Friday, March 28, 2014

Today I will continue my discussion on thread level software instrumentation
 On window s there thread local storage available. Here we create a buffer and keep all our stack based data we collect. To collect CPU ticks, we read the system clock in the constructor and again in the destructor of an object we put on the stack. The diff is the time spent executing code and can be cumulated with the running total on the thread to see how much time each takes. Note that its ideal to keep this counter on the scheduler. since threads could attach and detach from the currently executing run.
Similarly it is easier to have a base method for allocations and deletions and they can be take the stack trace when the corresponding system calls are made. This then gives us a histogram of stack traces for allocations and we can use that to find the biggest users of memory or the most frequent users.
Having a thread local storage is desirable on any platform especially given the functionality available via language specific provisions such as C++ 11 that introduces thread_local. Creating a pointer sized variable enables allocation of an arbitrary buffer reserved for this thread which allows thread specific instrumentation that is not normally covered from routines that place a variable on the stack. Undeniably the latter is required too. For example, crash logs are written this way. But here we specify what else we can add to crash log notifications via thread local storage.
The first thing we said is historical stack traces that are not captured by as-of-this-point-of-time stack trace by stack local variables.
When do we take this stack trace, for every exception encountered. Even if not all of them are crash notations.The suggestion is that by keeping track of the last few exceptions at any given time, when we get a dump, it gives us another data point of interest.
We add an exception handler to the chain of handlers that each thread maintains. The first of these we will customize to take a stack trace, put it on our thread local ring buffer, and return 'keep looking' so that other handlers including the most recent frame's own handler to get a chance to handle the exception.
Why is this better than logging ? It's complimentary to logging. Dumps and logs are both relevant for diagnosis and often every bit of information helps.
One consideration for the ring buffer is that it could be implemented as a skip list for fast traversal.
Creating a ring buffer on each thread is one part. It could be used for different purposes.
using it for storing exceptions encountered us one such purpose. Usually the operating systems meet this purpose and using it for applications specific tasks is preferable. One such purpose could be to list the last user query or commands
For now we will just focus on creating a ring buffer for every thread on all platforms.
We do this whenever we start a thread or have a callback for initialization routines. There should be fewer of those through out the source code.

After discussing about MSMQ and how to monitor messages (not journaling) on the queues without requiring any additional switches or trigger services on a production instance, I stumbled on and found the tracing for MSMQ under logman query providers and validated that we don't seem to have trace where each message arrival is traced. Hence I found another argument point in favor of what I would like to build for my application.
In any case, here's some more context on how I stumbled on this finding that I want to write an MSMQ monitor for my application.
First, I'm increasingly studying high memory usage in select scenarios with my application and in this context I wanted to find out non-invasive options for production systems of customers. So I came across our windows tools namely logman/tracelog/xperf:
We can turn on private heap tracing on windows either with logman (an out-of-box tool : we just type 'logman query providers' http://msdn.microsoft.com/en-us/library/ms751538(v=vs.110).aspx) or xperf from Windows Performance Analysis Center (http://msdn.microsoft.com/en-us/performance/default.aspx)
tracelog -start heapTrace -f c:\temp\heapTrace.etl -heap -pids 1 <pid_of_splunkd>
tracelog -stop heapTrace
OR xperf -start HeapSession -heap -PidNewProcess "myapp.exe" -BufferSize 1024 -MinBuffers 128 -MaxBuffers 128 -stackwalk HeapAlloc+HeapRealloc
xperf -stop HeapSession -stop -d c:\temp\xperf_heap.etl
we could also explore stackwalking with xperf -help stackwalk and xperfview -> Get Stack counts by type
Xperf allows us to view heap allocation information in addition to kernel info such as CPU and with stacktraces at the time of allocation! More here : http://msdn.microsoft.com/en-us/library/windows/hardware/hh162920.aspx
And if tracing doesn't work we use Procdump to take a fulldump of our process on production systems without stopping the process http://technet.microsoft.com/en-us/sysinternals/dd996900.aspx
If we had the luxury we could use gflags, appverifier, pageheap and umdh

But anyways getting back to my discussion on my application. Here's what I wanted to try writing next after MSMQ monitor:
These will be thread based monitors for resource  usages and circular ring buffer of stacktrace collections
 A circular ring buffer of stack trace collections per scheduler or node is very helpful in diagnosing a variety of information. In terms of memory used, we use only pointer sized bytes for each frame ptr and ten of those for each backtrace and say ten to fifty of those buffers in the circular ring buffer should not add upto much.
The stack trace capture to circular ring buffer could be triggered by anything - exceptions, memory allocation and deallocation etc.
In fact the cpu usages are also easy to watch, we read cputicks and += the delta that we are on and off the stack.
 

Thursday, March 27, 2014

Today I will cover a software component I'm looking into that consolidates the configuration and administration of the dynamic atom feeds from different model generators that I mentioned in the previous to previous post. When a user makes a REST request to the server, the server does some checks to determine the response to return to the user. In our case, we are looking at atom feeds from different REST handlers all of which conform to this security and configuration mechanism. We will study this component in some depth and will try to come up with potential improvements.
First this component is called the AdminManager.  This component is responsible for managing the server side model generation that is made available for consumption by various interfaces such as the UI and the command line Interface. In a way, this component automates access and configuration of the models that the server generates.
Let us take a closer look at this component. There's several things to mention.
We begin with access control. This component lists some common actions for all the handlers to respond to different  REST endpoints. These are listed as create, edit, remove, list, members, reload, new, move, enable, disable and doc. More than one action can be checked together with a bit mask. The access control can be checked with capabilities. There are two capabilities specified. These are read and write. Actions translate to capabilities that could be demanded or set by handlers.  Handlers can be enabled / disabled. Results from the handlers can be filtered. Actions and capabilities/permissions determine the type of response. We don't block execution, we just return error or results.
Handlers all derive from a base class. The base class has methods for the actions involved. Each derived handler can override these methods corresponding to each action to produce the result they want. If the overrides are not specified, they default to error. There are flags for some of the actions to see if these actions are possible such as if you want to move an item. There is also a custom action possible that handlers can use to extend there functionality. The actions can return results in JSON or XML format. Applications and Users alike that can utilize these REST handlers can be specified with wild cards. It can also take a list of features. These features could be considered a pre-requisite for the handlers to execute.
Access control itself involves several checks. We discuss these now.
First supported actions are filtered. Whether a user can read or write is determined based on the capability for the corresponding read or write permission. The user is checked for the permission involved. This is not necessarily an RBAC access control but merely a check against the privileges for the user. The system user for instance has all the privileges.The capabilities a user has is determined from the license manager for instance or by the explicitly added capabilities. This is also necessary from audit purposes. If the auditing is not necessary, then the check against capabilities could always return true. On the other hand if we are to audit all denies, we should leave a trail in the audit log.
Actions and capabilities are mapped and determine the access for the most part.
Next we talk about ACLs and how they are set and determined. Setting ACLs is itself a custom action and requires write capability.  Owner, sharing and permissions are specified when setting the ACL. The rule of thumb here has been to have just enough checks in place and nothing too complex or flexible to be called an overkill.  That shouldn't limit us from making it flexible for extensions in the future. Today we require owner, sharing and permissions for a resource, but tomorrow we could have not just read and write permissions but permissions such as Full control, modify, read and execute, read and write, special permissions etc. Today we have explicit privileges, tomorrow they could be chained or inherited. Today we have auditing for users, tomorrow we could differentiate, users, groups, apps, service accounts etc.Today we have a builtin user for system, tomorrow we could have many different builtin users and groups. Today when we mean share we share all permissions, tomorrow we could set the permission levels to whom we share with.
Now coming  back to determines the lines of authentication, authorization and auditing in the calls for processing REST requests.
First is the initialization step. The capabilities and evaluation rules are added from a configuration file. In this configuration file, for a particular endpoint, the capabilites and evaluation rules will be mentioned when the handler was first registered with this mechanism.
The capabilites could be for example read, write, search. etc. The rule could be the endpoint qualifier to match.
Note that this configuration file is a great start for re-organization. When there is enough handlers registered to see the kind of capabilities, matches, actions, customizations specified, the new architecture could be used to make it easier for the components to specify what they want.
Further more, most of the configuration  variations will come in handy to provision the central governance that this handler meets. If there are enough handlers specified in the AdminManager configuration file, this is  a low hanging fruit for a rewrite.
At this point we are able to filter the supported actions based on the configuration information and the passed in user context and the look up of the associated user's capabilites. This filtering of supported actions is where the authorization line is crossed. 
After the initialization step and depending on the authorization, we can ACL  resources as we have described earlier.
Next we evaluate the custom actions. Actions can have arguments. These can be used together with the actions to differentiate the kind of response required.
Next we validate the arguments, their presence and values as specified in the configuration.
At this point the authorization is complete and we are ready to execute the handler.







We look at Warshall algorithm for transitive closure.  In a directed or undirected graph, we ask if there is a path from node I to mode j. That is transitive closure. As we may recall from previous posts, we represent a graph with an adjacency matrix.  For a graph with N nodes, an adjacency matrix answers the question whether there is an edge between node I and J. For both directed and undirected graphs a value of 0 in the adjacency matrix indicates there is no edge. A transitive closure finds whether there is a path between node I and j. Here we could apply a brute force method of enumerating N ^2 pairs of vertices and try to find a path for each pair. Fortunately, Warshall algorithm answers that better by using dynamic programming particularly the step of saving the previously found paths so that we can efficiently compute the transitive closure. Here for node I and J  we ask if there is a path through node 1. Next we ask if there's a path through node 1 and 2. Next we ask if there's a path through node 1,2 and 3 and so on.
Warshalls' algorithm has applications in network routing and reachability problems. Its also used to eliminate states.
The algorithm is elaborated as follows:
Algorithm Warshall(A[1..n, 1..n])
{
   R-0  is initialized with the adjacency matrix ; // R-0 is the node to itself.
   for k = 1  to n do
      for i = 1 to n do
        for j = 1 to n do
            R-k[i, j] = R-(k-1) [i, j] OR
                             (R-(k-1) [i, k] AND  R-(k-1) [k,j])
                             // path from i to k thru 1,2 .. k -1
                            // and path from k to j thru 1,2, .. k-1
  return R-(n)
}

Wednesday, March 26, 2014

I came across an interesting design pattern for web applications that I hadn't seen before. Basically the model and the view are decoupled.  The model can vary for the same view so the view is reused and the models are different. The routes and the models are registered together so they vary depending on the purpose and these are maintained in a configuration file but the same view can load any of these models.
There is a single dispatcher that take care of finding the handler for the model. The handler has methods to generate the content or the model. This is then converted to a json document that is then rendered by the web interface. The web interface is written in javascript so all the rendering of the view and the highlighting is taken care of on the client side.
Instead of the conventional model view controller with explicit routes, this is a fairly short way to get server data to client without any sacrifice of form or functionality of web interface. And its extensible too given the type of models supported. Data is packaged in an Atom entry and there's one entry for each configuration item.
In terms of operations, the get and post are handled as actions requiring different capabilities. These are managed by an access control mechanism called the admin manager
The access control is exercised based on user and resources. The web interface makes purely REST based calls. In this way the interface is a separate layer and the server does most or all of the processing
Dijkstra's algorithm for single source shortest path is a greedy algorithm. Here we find the shortest path from a single source in a weighted say directed edge graph where the weights are positive. In the undirected graph, there are two directed edges replaced by a single edge. There could be multiple paths with different weights but the shortest path is one with the smallest weight. To find the shortest oath, we relax the edges that is we take the minimum cost of the edge joining two vertices u and v and set that ad the edge weight. If the shortest path consists of edges v1, v2, v3....vn, any contiguous subset if these is also a shortest path. We now look at the possibility of a shorter path.
Each new shortest path extends an earlier one. So we build the shortest path in the ascending order of costs. When we add a vertex, we see the cost. If there are no edges to that vertex the cost is infinity.
In the algorithm We first initialize the costs.  When we pick a source vertex we do tge following for each of the other vertices.
We choise a vertex u that is not already in the set and has a minimum cost. Then we mark it as included in the shortest path.  For each of the neighbors of u, we determine the cost from the set s and with the edge u to w and then relax the edge by picking the minimum costs between vertices.
In this way we include all vertices one by one in ascending order of weights 

Tuesday, March 25, 2014

While we have been discussing topological sort I'm the previous to previous post. I wanted to take a minute to discuss the advantages of breadth first traversal and depth first traversal.
As we know dfs uses an explicit agenda. It will recurse from the root to the leaves. It also uses less memory. A bfs on the other hand is exploratory. It covers nodes at a given level.
In the depth first search, the recursion termination is specified first. This takes care of covering the end case. If the end case is specified, then the progress can be specified in terms of smaller subspace.
Take the example of finding the factorial of a number for instance. This has a recursive solution.
but let us study its complexity. We can write that as a recursion as well.
If we take the complexity M(n) as the # of basic operations on input n,
M(n) = 1 + M(n-1), n > 0
and
M(0) = 1  which is the initial condition and base case.
We use backward substitution :
M(n) = M(n-1) + 1 = (M(n-2) + 1) + 1
        = M(n-2) + 2  = (M(n-3) + 1) + 2
       = M(n-3) + 3 = .... M(n-i) + i
= M(n-n) + n = M(0) + n
                     = 1 + n = O(n) multiplications
In general we can study the complexity of recursive algorithms such as DFS by following this:
Identify the input size
Identify the basic operations
Find the dependence on the input size ( the worst case, the best case )
Write the reurrence relation with initial conditions to compute total # of basic operations
and then solve the recurrence
In breadth first search, we rely on a collection or container such as a  queue.
We determine the complexity based on the size and opeartions on this container.

Monday, March 24, 2014

I want to continue my discussion on MSMQ trigger integration in Splunk monitoring. I've looked at the sample code for the MSMQ Trigger. We use the methods add rule and add triggers. To construct a rule, we initialize it first. Then we specify the condition and action pair. This we do with parameters to the the add rule method The parameters are the name of the rule, the description of the rule, the condition which evaluates to true if left empty, the action which is specified with the keyword EXE for executable and its arguments, as well as reference to take the guid of the resulting rule.
Similarly the trigger object takes the queue name and rule object and a flag to denote if its a receive or a peek operation.
This trigger invocation can take message properties on both the condition and the action. As we saw earlier, this means we can use the message labels, the message body, the priority, the source machine ID and even any app specific labels.
Note that the matching of the text is handled by the MSMQ trigger thus reducing the onus on the executable that is invoked. I plan to try out this service but even if we specify a single rule to forward all, we get called per message. So we can determine which queue the message belongs to and how to process it including selective actions based on filtering. That is we can move the post peek message processing between the trigger and the executable.
Lastly, I can say that the trigger may or may not be available in all instances, so having the msmq monitor in Splunk be able to read the queues itself is desirable. This means that we can have a layered approach in the monitor where the messages from both the monitor's reader and the executable are processed by a common base layer for modular input to Splunk. 
Also, I wanted to add that the rules in the MSMQ trigger can have a many to one relationship. A host can have many queues, a  single queue can have many triggers. A trigger may have more than one rules. A rule may have more than one condition. The processing of the queues and the rules should scale. 
Today we look at another example of decrease and conquer algorithm. This is the topological sort and we will cover it in two posts. This algorithm is based on depth first search. The Topological Sort works on a directed graph. Directed graph is one where you can traverse the vertices in one direction but not the other. As usual we represent these with adjacency matrix. Bidirectional transfer will have explicit edges for both directions. A directed edge from u to V will have exit number of v less than than u. We justify this claim with the following two cases.  In the first case let's say v was unmarked when dfs explores v. Here v comes off the stack first so its exit number is smaller than that of u. In The Second Case, the depth first search visits a marked vertex v. In this case v is already off the stack and finished. Its exit number will be less than that of u.
This gives us a dfs based algorithm for topological sort.
we run DFS(G) and compute exit numbers.
If DFS(G) discovers a back edge, then G is not a directed graph, so we report this as an error and exit.
We list the vertices in decreasing order of exit number.  This gives us the entry as the root vertex and the last vertex at the end.
The worst case complexity of this DFS based algorithm for topological sort  is based on
DFS + Sorting
The DFS takes O(n^2) as adjacency matrix and O(n+m) as adjacency list.
The sorting takes O(nlogn) and we list the vertices as generated reverse list. So we don't really need to sort but we just need to reverse the list.
We now look at a decrease and conquer algorithm.
The source vertex has no incoming edges.
Every diag has a source vertex. A source vertex is one that has no incoming edges.  If there is a cycle , then it is no longer a diag.
So we can enumerate the source vertex first.
Then we remove the source vertex and this implies we should still be left with a diag. This gives us the construct for a decrease and conquer algorithm. We write it this way:

while graph is not empty {
identify source vertex v
append v to topologically sorted list
remove v from G
}

We do this by computing the indegree of each vertex.
first we set this to zero for all vertices.
Then we increment by 1 for all edges to an inbound vertex.
then we identify the source vertex where the indegree is zero and append v to queue.

while (queue is not empty)
{
v is set to head of queue; we enumerate v; delete head of queue;
for each (v, w) in E // as many outbound edges
{
 indegree[w] = indegree[w] - 1;
if (indegree[w] == 0) { append w to queue; }
}
}

Notice that initialization steps takes O(n) and O(m).
The while loop takes O(n) while the for loop take O(deg(V)) time
If we use adjacency list, the complexity is comes to O(n+m) where we decrease the complexity from O(n^2) in adjacency matrix because we just reverse the list.
This topological sorting has a practical application in scheduling tasks while respecting dependencies and we have seen two different algorithms for it.

Sunday, March 23, 2014

I want to take a short break discussing the MSMQ implementation and how to read queues for data input to Splunk. I looked at the MSMQ .net library implementation to see if there's a way to use the native MSMQ trigger notification from the .Net library and if so, how to use it. I want to discuss both MSMQ trigger and System.messaging separately. Here the first thing is that the system.messaging does not support MSMQ trigger.  And MSMQ Trigger does not support any programmable components. The MSMQ trigger can invoke an executable (hint hint) to relay messages to. This is useful for notification based message processing and reduces the effects of polling.  When the rules of a trigger are evaluated. The executable is invoked. I'm thinking of an executable that forwards messages to a modular input for Splunk.  The  modular input could itself read the messages from the queues directly via the MQReceiveMessage native APIs or process the messages forwarded by an executable that the MSMQ trigger invokes. In both cases, it can serve messages to Splunk and in the latter case, it would be more performant. Please note that MSMQ Trigger is a standalone and must be installed as a service by the administrator.
The system.messaging library transparently exposes the underlying Message queuing windows APIs . For example, it provides GetPublicQueues method that enumerates the public message queues. It takes the message queue criteria as a parameter. This criteria can be specified with parameters such as category and label. It can also take machine name or cluster name, created and modified times as filter parameters. The GetPublicQueuesEnumerator is available to provide an enumerator to iterate over the results.
The MessageQueues can take rules for each of the triggers and we can specify this in ways that the splunk users would like to denote. For example, the rules are written as follows:
a condition that tests the properties of the message in terms of the attributes mentioned and an action that describes an executable and parameters which can include message properties.
SampleRule This_is_a_sample  $MSG_LABEL_DOES_NOT_CONTAIN="Redacted"  EXE c:\foo.exe MSMQTriggerObjects.MSMQRuleHandler 0, RuleId
Other rule parameters include
$MSG_LABEL_CONTAINS
$MSG_LABEL_DOES_NOT_CONTAIN
$MSG_BODY_CONTAINS
$MSG_BODY_DOES_NOT_CONTAIN
$MSG_PRIORITY_EQUALS
$MSG_PRIORITY_NOT_EQUAL
$MSG_PRIORITY_GREATER_THAN
$MSG_PRIORITY_LESS_THAN
$MSG_APPSPECIFIC_EQUALS
$MSG_APPSPECIFIC_NOT_EQUAL
$MSG_APPSPECIFIC_GREATER_THAN
$MSG_APPSPECIFIC_LESS_THAN
$MSG_SRCMACHINEID_EQUALS
$MSG_SRCMACHINEID_NOT_EQUAL
Triggers can be created with queue path and peek/receive flags.
This rule consists of name, description, condition, action and a reference parameter. Passing an empty string for condition means true for all cases. For details, we can refer to some examples on code pro
This gives the executable a notification mechanism for invocations and the message from the last time the condition was satisfied.

Saturday, March 22, 2014

Having discussed the Bellman ford algorithm, let us quickly revisit the classical depth first search. We can apply this to model computational problems using graph.
Graph representations usually involve an adjacency matrix where the matrix rows and columns are vertices and there is an entry in the cell corresponding to (u,v) if and only if the u,v is connected by a direct edge.
An adjacency list adj(v), on the other hand comprises of list of neighbours of v.
In a depth first traversal, we start from a vertex v. For each neighbour w of v that is not explored yet, we recursively explore graph from w. depth first means we explore one vertex as far as possible before moving to the next one.
That is why this algorithm can be considered a decrease and conquer approach.
In a DFS forest, there are tree edges and back edges. Back edges are the missing edges.
One other thing we could do in a depth first tree is that we could keep track of when we enter and exit a vertex.We do this with the help of a stack.
Every back edge leads to an ancestor. A question is in a depth forest, can there be a vertex edge to a vertex that is not an ancestor. The answer in this case is no because if there was an edge between two vertices, then one of the vertices is unexplored and it would be added below not by going up. So all the edges on the top are explored. So a back edge to something other than an ancestor is not possible.
If we were to write a pseudo code for marking the entry and exit point, it would look something like this:
DFS(G)
{
for each v in V do { entryno[v] = 0; exitno[v] = 0 }
entrycount = 0; exitcount = 0;
for each v in V do
{
 if (entrycount(v) == 0) { dfs(v); }
}
}

dfs(v)
{
entrycount = entrycount + 1; entryno[v] = entrycount;
foreach(v,w) in E do
{
if (entrycount[w] = 0) {dfs(w);})
}
exitcount = exitcount + 1; exitno[v] = exitcount;
}
}

Friday, March 21, 2014

We look at Bellman-Ford algorithm today.
Dijkstra's algorithm requires us to have non-negative weights.
This algorithm let us compute the shortest path even when the weights are non-negative. Dijkstra's algorithm does not work with negative edge weights because it does not recompute the costs between vertices as we discover intermediary nodes.
We are now allowing negative edges with the caveat that the negative cycle is not ok.
Shortest path are well defined.
The idea here is that for increasing lengths k, compute shortest paths with at most k edges.
Our observation is that the shortest path can have at most n-1 edges >= n edges-cycle. Non-negative weights are eliminated.
The recurrence is described this way.  The path of length k from v to u breaks up as path of length k - 1 from v to some neighbor w of u. Then we add an edge from w to u.
We use the notation dist-k[u] as the shortest path from v to u using at most k edges.
dist-k[u] = min( dist-k-1[u], distk-1[w] + cost[w,u] for each neighbour w of u )
we know that dist 0 of source vertex v = 0  and dist0[u] = infinity.
So we create  a grid to compute the distances. We grow the path length from the source vertex to all the others so for a seven vertex graph we have path lengths varying from 0 to 6  For each path length we compute the costs and fill the grid.
The pseudo code is like this:
BellmanFord(v, cost, dist, n) {
for i = 1 to n do
    dist[i]  = infinity;
dist[v] = 0;

for k =1 to n-1 do
 for each vertex  u != v do
     distnew[u] = dist[u];  // upto k-1
     for each edge (i,u) do
      distnew[u] = min(distnew[u], dist[i] + cost(i,u));
 for each vertex u != v do
    dist[u] = distnew[u];   // replace
}

Complexity is O(n^3) if we use Adj matrix/
and O(nm) if we use the adjacency list
Optimizations include the following:
At k+1 the only distances that can change are those whose incoming edges have changed in the previous round. So we maintain a queue of those vertices whose distance reduces in iteration k
  Only need to update neighbours of these vertices
We can also terminate if no change is seen for some k.
Today we look at some more lectures from mecr.org. we continue our discussion on dynamic programming we organize the subsets to our advantage there are 2 ^N subsets and the optimal solution has N subsets . In dynamic programming we maintain the principle of optimality by breaking the problem b into sub problems and combining them to format the solution the sub problems may not be disjoint so we cannot apply divide and conquer further the there may not be a single strategy to use so we cannot apply the greedy algorithms. We choose appropriate reordering ti reorganize the sub problems fro m exponential to polynomial. We use memo table to avoid wasteful recomputation. And we iteratively fill table by analyzing dependencies

Thursday, March 20, 2014

Today we cover some more discussion on dynamic programming. When we arrange the jobs in terms of their profits and use the dynamic programming, we have a binary tree of jobs to traverse. We have binary because we include and exclude. Some of the nodes will appear in other places as well in this tree. This is what we call overlapping and it costs us re-evaluation which we try to avoid. So the strategy is to store each computed value in a table.  We look up the table before making recursive call.  This look up table is called a memo table
So we improve the algorithm we described earlier with one which uses lookup of this table.
The algorithm looks like this:
function MaxProfitMemo(n) {
if ( n == 1) {
     return(Profit(1));
}
assign j to the last task job that ends before job n begins
if the table look up of (n-1) is not defined
            recursively call MaxProfitMemo on n-1
if the table MaxProfitTable[j] is not defined
            MaxProfitTable[j] = MaxProfitMemo(j)
finally return the max of either MaxProfitTable(n-1) or
                                                  Profit(n) + MaxProfitTable[j]))
}
We can actually do a further improvement. We can analyze the dependency of entries in table.

we iterate of over the entries this way

function MaxProfitIterative(n) {
MaxProfitTable[1] = Profit(1);

for i = 2 to n {
j = last job that ends before job i begins
MaxProfitTable[i] = max(MaxProfitTable[i-1], Profit(i) + MaxProfitTable[j]);
}
return (MaxProfitTable[n]);
}

Note that we don't have to check if i-1 and j are in the table.
And the complexity is )(n) and the assignment of j is only constant time since we keep the jobs in sorted order. However O(nlogn) to sort.
We now look at an important class of algorithms called dynamic programming. These use the principle of optimality which says:
there are subproblems possible for the overall problem
These subproblems can be solved efficiently
These can combine the solutions
This can be done recursively
Note that divide and conquer algorithms were based on the assumption that the subproblems are disjoint.
Also the greedy algorithms differs from the above in that there can be different subproblems possible but it chooses a strategy that does not change for the input.
generally, the sub-problems overlap. This is why we need to find the sub-problem structure.
Secondly, there is no fixed strategy so we may not be able to find a greedy strategy.
This means we have to exhaust all possibilities.
In dynamic programming, we take these constraints as such and try to do it efficiently.
We do this with the following three steps:
1) we come up with a recursive formulation in terms of subproblems
2) we organize the subproblems to make the search efficient.
3) we solve subproblems and keep track of the answers to avoid recomputation.
We will now see an example with the problem of Interval scheduling.
In this problem, we have tasks with predefined start, finish.
We find the subset of non-overlapping tasks and
try to choose the largest such sub-set.
A twist to the above problem is referred to as the weighted scheduling.
In this problem, each task fetches a profit and we try to maximize not the number of jobs but the profit from those jobs.
As we may see, there is no known greedy strategy for this problem. This means we will have to exhaust all combinations and choose the best. This means we have exponentially many combinations.
We will now re-organize the combinations to make the search more efficient.
We will sort the jobs by their finish time where each job ends earlier than the other.
We search based on excluding jobs one at a time. If a job has to be included we look to see what others can be excluded.If we exclude, we look at the remaining sub-problem recursively. The recursion terminates when there is one job and the profit is from that job.
Let MaxProfit(n) denote the max profit for jobs 1 .. n
Let Profit(i) denote the actual profit of job i
Our recursive formulation is as follows:
MaxProfit(k) = max (MaxProfit(k-1), Profit (k) + MaxProfit(j))
J is the latest job that ends before k starts.
and the MaxProfit(1)  is the profit(1)
The first component in the recursive formulation above is from the exclusion of job j and the second component is from the inclusion of job k.




Wednesday, March 19, 2014

Today we will look at some more discussion on greedy algorithms for job scheduling. Upto now we had established two facts :
1) For any job a with profit pa that is in SI and not in SJ, pa >= pb.
2) For common jobs between SI and SJ, they share the same time slot.
So if we transform SI to SI' and SJ to SJ' where they have the same time reference, then for a job a in SI' we should find an empty slot in SJ' because if there was another job there that would have had a different profit which is not possible. We see why this way. Lets say that there was a job b in that empty slot. If its profit was the same as that of a, then the job is the same as a and we maintained the job with profit pa exists only in SI and not in SJ. If b had a profit more than that of a, it would have had a position earlier than a in SI' since they have the same time reference. However this is not possible. Similarly, it cannot have a profit less than for the same reason. Thus we have eliminated one of the jobs from SI' and SJ' while preserving optimality.
Since we can repeat this step multiple times for jobs with highest profit and  we can preserve optimality at each step. Eventually we can transform J to be the same as I. So J is optimal I is the same as J. Hence I is itself optimal. So greedy solution is optimal.
To conclude, we have established that
J can be executed if and only if J can be executed in deadline order
and
Greedy solution is optimal.


Monday, March 17, 2014

We look further into some of the algorithms for job scheduling today.we discuss feasibility and optimality of scheduling. Feasibility is defined in this way. A schedule of jobs is feasible if and only if the deadlines of the jobs involved are in increasing order and less than the target. This is called deadline order. Suppose J is feasible, J = {1,2,..k} we maintain some order 1,2,...k but J is feasible in some order other than that.
Suppose r1, r2...rk is a feasible solution. we can show that it can be modified to the schedule above. Let us say at some point that the two orders disagree. At that point, we can check and find that 11a job is the same and we have one more point that it agrees. Eventually the given schedules are shown feasible
 A schedule of jobs is optimal such as when  it's greedy. We said the greedy solution is one when the order of profits is in the descending order
Suppose we have two job schedules J and k    there is a job in one that does not belong to the other and vice versa. The claim is the profit from the non optimal schedule is less than the one in the optimal one. We can establish this formally but intuitively we see why the greedy solution is considered optimal since we covered what greedy job scheduling is in the previous post.
To prove that greedy algorithm is optimal we take the greedy solution as I and optimal solution as J
We begin with I is not equal to J i.e. I and J are not comparable. If J belongs to I is not possible, then J is not optimal. If I belongs to J any missing job is compatible.
Let's look at highest profit job in I, not in J. i.e. a belongs to I, a does not belong to J and has the highest profit. b belongs to J and b does not belong to I,
The claim is profit from a is greater than or equal to profit from b.
a arrange J also in descending order of profit.

I ----- a    this schedule has job a with Max profit
J -----b     this schedule has job b with Max profit

If profit of b is more than a,  it would have been included in I since the jobs are compatible but this is not the case because I has listed a as max
This is one proof.
Now lets take SI as schedule for I and SJ schedule for J for symmetrical schedules. We want to make SI & SJ agree on time slots for all common job
If we take a job x and a job z common to both , we show that if x occurs after z on Si, and x occurs before z on Sj, then we walk the events from the rightmost end where we had the least profit and we see that this ordering is a discrepancy. Note that with regard to absolute time the same job in both schedules can be aligned or fixed to the time reference which could be chosen as the original positions of either schedule for that job. Having fixed a job this way, we now repeat for all the common jobs such that they occupy the same time slots
In the above we use I as the reference for our assumptions and findings in j.
We establish the following two facts
If there's a job a with Max profit this in I but not J
Then the profit of any job b in j that is not in I Is lesser than that of a
And common jobs have the same time


Sunday, March 16, 2014

We look at a job scheduling problem on a single machine  however, We look at jobs to be done on a single machine. Job i takes unit time.
Each job has to be finished by a deadline di
and each job has a profit pi
we get the profit only if i i completed before di.
The machine can only do one job at a time. For each job, machine can only do one job at a time.
Select a feasible subset of jobs to maximize the profit.
A feasible set is a subset that can be finished within the deadline.
If we take an example Jobs as follows:
Job        1                2                 3               4
Profit    100              10              15             27
Deadline 2                 1               2                1
When we enumerate the possible solutions, we see that the candidates {1,4 ] with order of processing 4, 1 and value 127 is the best solution.
The greedy strategy is the order by decreasing profit ie 1,4,3,2
The algorithm is as follows
Job GreedyJob(d, J, n)
{
J = {1};
for i = 2 to n do
{
 if all deadlines in J u {i} are feasible then
    J = J U {i}
}
return J;
}

J already has a job that can be scheduled and then we collect jobs in J.
In the algorithm the theorem we use is as follows:
If J = {1,2,..k} with increasing deadlines d1 <= d2 <= .... dk
then J is feasible iff 1,2, ... k is a feasible order.
and we use this fact: To check J is feasible sort it by deadlines and check that di >= i which is the time for every i.
We extend this J to J U {i}
J is the current solution. j1, j2, ... jk
d1 <= d2 <= ... dk
When we insert i we check if i is feasible with di > insert position.
also the jobs after the insert position were feasible for their positions but now that they are disturbed they need to be checked again.

Here we looked at single machine. The comparision between single and multi machines is that the former uses durations while the latter uses start, end pair. The former uses 0/1 profit and the latter uses penalty for delays. In general, job scheduling is not necessarily only greedy. It can be dynamic programming as well. Depending on the parameters of the scheduling problem, different algorithms may be required. If we use a greedy solution, we must show a proof of optimality.
Today I will take a short break to discuss another modular input for Splunk. We will look at how to tie RSS feeds into Splunk. This is another input for Splunk that something that makes subscription easier. The same is true for any kind of notification services. The advantage in this kind of an input is that the application doesn't have to spin to poll.
This saves on CPU. If you look at the cpu usage by something like the Microsoft Outlook application, its very very low even with high loads. I'm not saying thats enabled merely by a push model, but it certainly contributes to it.
Another example of event notification model is the dot net API for MSMQ. The APIs were designed later than the windows ones and come with a better model to send and receive from the message queues.  The event notification model of the C# language is also very helpful in this regard.
If we look at a problem of reading the current message from multiple queues and we look at the API model that would best suit this need, we would like an event notification for any of the queues. This model helps us very well. Otherwise we wake up and poll on the queues and those that have the messages will need to be serviced. Since the queues can vary in size and in traffic, we want a solution that is both scalable and resource savvy. In this case we could treat all the queues and all the messages to be similar even if they are transactional or non-transactional, binary or xml messages etc.
and the task of reading a message from a queue can be delegated to the different workers.

Friday, March 14, 2014

In today's post
we look at some more greedy algorithms again based on mecr.org lectures The knapsack problem is a well known problem to discuss greedy algorithm. This problem is stated this way there are bags of chemicals. Each bag has a specific weight and price. The goal is to fill the knapsack so as to maximize the price. Weights are measured in kilo grams and price is measured on dollars if you have more chemicals that can fit in a knapsack you have to make choices .
We do this by arranging the bags in order of priority. We measure this as price over weight ratio we fill the knapsack my picking items in increasing order of this price to weight ratio. There may be a point when we cannot add a chemical without splitting it into a smaller portion all the chemicals beyond that one cannot be considered since our knapsack is full
I wanted to finish this post but got sidetracked. I will do this now
We have taken items by unit cost. We go one item by item in that order until we fill each item
We sorted our bags by unit cost. i.e P1/W1, P2/W2, ... PN/WN  in that order. Then we picked up as much of the first bag as possible, typically all of it it and continue down the line.  We stop at some j and ignore th rest  j+1 to N bags.
That is we have a sequence of 1 for all the bags we selected and a sequence of zero for all the bags we ignored.
We now show the proof of optimality We use a standard technique to transform any optimal solution to a greedy one
We begin with an optimal solution. We show that at each item we add an optimal step.
Let y = y1, y2, ... yN is Optimal.
The Greedy x = x1, x2, ... xyi, Xn be the greedy choices
y is not equal to x.
Let K be the smallest index such that yk is not equal to xk
The claim is yk < xk
x1, x2, ... xj-1 xj xj+1 .. xN is the sequence of choices
Let xi = 1
if yk != xk => yk < 1 => yk < xk
for xj, we know xj is the maximum, then the above statement mean yj > xj which is infeasible.
So we have yj < xj
for xj+ 1 to xN, we trivially show that yk > xk is not possible. upto j y and x agree. after that x is 0. Now yhas also filled up the knapsack so y is also zero after that point.
Our goal is to transform y into x.
We have yk < xk. Increase yk to be xk. Reduce yk+1 .. yN to restore feasibility.
The total weight comes back to 1
y is new feasible solution z = z1, z2, ... zn
for 1 < =i <= k, xi = zi
after k if we increase the weight wk (xk - yk ) to sum i = K + 1 to N wi(yi -zi)



Thursday, March 13, 2014

Today we look at the greedy method for job scheduling based on deadlines - with an optimization. The complexity is sort by price which is O(nlogn).  We insert each job in current solution. This is the main concern. We maintain J in sorted order of deadline The worst case happens when each new job is feasible and has a smaller deadline than all of J.
To check the feasibility of job i,  we begin by inserting in J of size i - 1
This takes i - 1 steps.
Therefore the complexity of insertion is 1 + 2 + ... + n - 1 = O(n ^ 2)
The final complexity is the complexity of sorting as well as insertion and this comes out to be O(n ^ The Bottleneck that we have is inserting each job in J.
This can probably be addressed with a once and for all slot for each new job.  In this case, the addition of the job i with deadline di will be at lesser time since we find the rightmost free slot before di. We call this slot alpha-i. We would like to schedule i at slot alpha-i and since its the rightmost we don't need to move it afterwards. That is when a job is scheduled, it is scheduled as late as possible given the current constraints and we don't need to move it afterwards because  no job is ever going to move to the right and we don't modify our constraints.  The time slot i is actually the range [i - 1, i ]. Let Ni be the first free slot to the left of i. Then ni <= i. If i < j, ni = nj. For both of them the first free slot is on the left.  i.e i and j are in the same set if ni = nj . So all { i, i + 1, i + 2, j-1, j } incoming jobs are all in one set.  Given a set k in general, f(k) is the value ni. the largest interval free to the left of all of k. Initially no jobs need to be scheduled beyond the largest deadline, Time slots are 1, 2, .. D . D is the maximum deadline of all jobs.  All slots are free, so ni = 1 for all i. Sets {1}, {2}, .. , {D} F({j}) = j. Add a job with deadline d, we compute nd.  To compute nd, check all k such that d belongs to k,
n(d) = F(k) and this tends to zero when the job is not feasible. Otherwise we can schedule new job at time F(k). Now we have to update F(k). The next free slot must be further to the left. So the new value must be less than F(k) - 1 . We know there is a set contains this value. So we just have to merge the sets.Merge set K with set containing F(k) - 1. So I will have n jobs and  n disjoint operations on this data set.
We can explain this with an example. We have six jobs and their deadlines are as shown 2,1,2,6,1,6  K = {0},{1},{2},{3},{4},{5},{6} F(k)  for these are correspondingly  0,1,2,3,4,5,6
We create a line of time slots and we add zero to it.  So we have 0,1,2,3,4,5, and 6
when you look at the first job, the deadline for this job is 2 so we have it in its own set {2} and F({2}) = 2. Implicitly this means that we are going to assign job 1 to time slot 2. and we have to update the fact that the time slot 2 is no longer available and this we do by merging {2} with the set containing F({2}) - 1 which is equal to 1. and the set corresponding is {1} and we merge {2} with {1} and this will gives us  a new picture. Merge {2} with {1} and the number of sets have collapsed by 1 resulting in {0} {1,2}, {3}, {4}, {5}, {6}  with F values 0 ,1, 3, 4, 5, 6 . We put 2 at 1 and merged {1,2} so we have to continue that with the set F({1, 2}) - 1 = 0 resulting in a new set {0, 1, 2 } so continuing d[3] = 2 and 2 belongs to {0,1,2} but F of this set is 0 so the job 3 is not feasible. We discard it. d[4] = 6 so we assign it to timeslot 6. This belongs to set {6} and we merge it with the F({6}) - 1 = 5 which is the set {5} resulting in {5,6}
We now have sets {0,1,2}, {3}, {4}, {5,6} with F values 0,3,4,5 respectively. d[5] = 1 and 1 belongs to set {0,1,2}.This is not feasible. So finally we come to the last job d[6] and d[6] = 6 and 6 belongs to {5,6} and F is 5 so we implicitly schedule job 6 at 5 and we merge {5,6} with 4 and there are no more jobs. We have so far picked out the jobs 1,2, 4 and 6 and we have a schedule.

Wednesday, March 12, 2014

We will cover some more discussion the traveling salesman problem. We have discussed memorization and branching and bounding so far. We will consider NP completeness today. The class P kind of problems are based on Yes/No answers. They have a deterministic polynomial time algorithm. The class NP kind of problems are based on Yes/No answers for which a non-deterministic polynomial time algorithm is known. The advantage with the NP kind of problems is that it can guess many different choices for the solution and if one of the choices is correct, it always has the magical ability to guess that particular choice.
Note that the traveling salesman problem is not phrased as a yes no problem. Instead it is stated as given a weighted undirected graph find a TSP tour visiting each vertex only  once coming back to the start and with the minimum weight w <=k  But we can rephrase this as a yes no problem by asking if the cost is less than k.
Moreover, we can
guess the order in which to tour the vertices  and
check to see that the total weights is <= k
Note that the answers are dependent on the value of k. If none of the total weights of different choices are < k then no matter how we guess the total weight will be greater than k and we will answer no.
Now recalling that P is a subset of NP. we will say that P space problems are also NP. We haven't been able to find an algorithm in NP yet and that is why it begs the question if P = NP. What we do know in addition to this is that there are a class of problems that are really really difficult to solve and these are called NP-complete problems and they lie on the perimeter of the NP space problems.
We will now look to see if TSP problem is NP-complete.
We answer the question Is there a  TSP tour of weight <=k in G
We ask a different question for NP-complete:
Given a graph G with n vertices, is there a path in G of length n - 1 ? and this is a well known way of asking this problem as NP-complete and is referred to as the Hamiltonian path.
Suppose there is a deterministic polynomial time algorithm TSP(G,k)
Then we show that there is a deterministic polynomial time algorithm HAMPATH(G). If we can accomplish that then this will mean that TSP problem is NP-complete because we stated the Hamiltonian path as an NP-complete problem.
How we can use the TSP(G,k) to solve the Hamiltonian path problem, we will now see. We have to answer the question : - is there a path of length n-1 in a given n-vertex graph G ?
We add an extra vertex and connect all the other vertices with a new edge each. This does not take too long. This we call the graph G' where every edge has weight 1 and missing edges have weight infinity.
We now write the Hampath(G) steps as
   construct G' as above
   return T(G', n + 1)
Now G has a path of length n - 1if and only if G' has a tour of weight <= n + 1
In other words, if we take two edges to connect the new vertex on the way out and on the way in, then the remaining n-1 edges will be the path for the graph G.

We continue  to take a look at the traveling salesman problem today. We can improve the memoziation with branching and bounding. The idea here is that the recursive function call in memoziation step earlier could be very expensive so we make it faster with branching by first looking up a table to see if the value was previously computed. We branch if it was previously computed and this makes it faster. If not, we replace the expensive recursive call with a cheaper call for the lower bound of the subspace. If this lower bound and the wmeight of the edge connecting s collected vertices with the new vertex is less than the value of the variable Best we have computed so far, then we perform the regular recursive call
for all m not = 1 on S - k
If M [S - k, m] is defined
    Best =  min ( Best, M[S-k, m] + w(m, k)
else
    temp = LB (S - k,  m )
    If temp + w (m, k) < best
        Best = min ( Best,  Memo(S-k, m) + w (m, k))

Tuesday, March 11, 2014

we discuss the traveling salesman problem. In this problem, we have an undirected weighted graph where the vertices represent cities and the weights represent the cost to travel between cities since the graph is undirected, the cost to travel from one city to another is the same in both directions. The goal 8 the problem is to reduce the cost of visiting  each city in a tour. We start from a city and return back to the city.
We solve this problem by dynamic programming.

  • For any set s belonging to V such that t belongs to S, we compute a quantity M[s, k] which represents the minimum cost of travel from a vertex t, visiting each vertex in S and then visiting k
  • M [s, k] is the minimum cost = w (1, k) is s = 1, k or it is M [s - k, m ] +w (m, k)

We could also write an equivalent using memoziation 

minTSP (v):
Best = infinity 
for k = 2 to N 
       Best = min ( Best, memo (s, V), w (1, k) )
return next 

memo(s, k):
  
If M (s, k) is defined 
    Return M (s, k)

If s == 2
    Return M (s, k) = w (1, k)

Best = infinity
 For all m not = 1 in S - k
    Best = min ( Best, Memo({s}-k, m), w (m, k) )
Return M (s, k) = Best

Monday, March 10, 2014

Today we take a break to look at No queens problem. This problem talks about placing N queens in a NxN board so that no queen can attack each other. Before we look at the solution, let's first see that when N = 2 and when N = 3 there are no solutions. And now let us take a NxN chessboard here we find there us a solution possible. So how do we arrive at that solution ? Let's say we placed one queen in one row. That queen can attack any other horizontally, vertically and diagonally. So one queen can occupy one row and there are N positions to choose from so we have nxn choose N positions that is quite large for large N. We could improve that by putting one queen on one row only and then we have N power N way to choose from. This is much smaller than the previous way
Next we look at a recursive  method where we place a queen on the board and eliminate the cells where other queen can  be placed Then we use this to place the other queens in a depth first manner . This strategy is called branch and bound. We set up a while loop where we eliminate the previous cells and place the next queen. In each recursion if we could pick the sub problem that is most promising and also eliminate the sub problem that doesn't have a solution, we will have a faster path to the solution.

Sunday, March 9, 2014

Splunk modular input for ELMAH.

For many web developers particularly those who work with the Microsoft.Net stack, ELMAH is a well known logging module and handler for ASP.Net. What distinguishes it from the Apache log4net module is that it is "completely pluggable and can be dynamically added to a running ASP.net application without any change to its code" i.e there are no log write instructions added or removed to the existing binary of a running application.When ELMAH is dropped into a running application, it can log nearly all unhandled exceptions, provide a web page to remotely view the entire log of recorded exceptions or the full details of a single exception and to write to different back-end providers or to send out e-mail notifications.

While we discuss ways to send the output of ELMAH to splunk via a connector, we discuss multiple ideas such as
1) enable Splunk to be a first class back-end provider for loggers such as ELMAH, enterprise library and log4net.
2) write a http handler for Splunk on Windows that can transparently send web messages to Splunk
3) write a Splunk NuGet package that can provide the Splunk SDK to .Net developers.
and
4) write the Splunk connector to directly read the static and dynamic content of different web pages that a web application registers.

We will explore these options in upcoming posts and will discuss ELMAH for now. When a request arrives at an IIS web server, IIS examines the extension of the request to see how to proceed. In addition to processing the extension, IIS provides an ability to write filters that can handle various events raised by IIS such as the event when a request comes in, when a request is authenticated and when a content is rendered back. When the resource requested in an ASP.Net resource, it is forwarded to the ASP.Net engine. The ASP.Net engine behaves similar to IIS in that it knows how to respond to the request and it raises events. While the IIS uses unmanaged code, the ASP.Net engine uses managed classes called handlers or modules.
An HTTP module is one that can tap into various events raised as the request passes through the http pipeline. One such event is the Error event and this is what ELMAH is registered to listen to. Many modules can subscribe to many different events. An http handler returns the markup for the requested resource. The ASP.Net engine passes the rendered content to the IIS which forwards it to the client.

Since application servers and database servers are two common production deployments, writing a handler for Splunk like ELMAH will make it easier to collect data instead of configuring each source for input.




We look at establishing lower bounds via decision trees. In particular we look at comparison algorithms. These essentially restrict algorithms to what can be compared based on input. For example, given an array of integers, if we change the input how does the algorithm behave.
In the case of sorting we can show that algorithms will asymptotically reach O(nlogn)  and in the case of searching where we look for an element in a sorted array we can show that algorithms will asymptotically reach O(log n).
Now we look at decision trees. In decision trees we ask questions that have a yes or no value.  In comparison algorithms we don't modify the input. we compare the data and the algorithms output a result based on comparisons and the input doesn't matter. That is the relative ordering matters but not the values. For example, [5,7,2] will be treated the same as [6,9,1]. Outputs can be seen as a permutation of [1..n]
For example, if we look at Selection sort, we compare based on the values of the elements of the array and when we swap we swap only in a local copy of the array and not the original input.
For an input a1, a2, and a3 with n=3, a selection sort can be seen as a decision tree with the following comparisons
Level 0: a1 < a2
Level 1: a1 < a3 and a2 < a3
Level 2: a2 < a3 and a2  < a1 and a1 < a3 and a2 < a1
yielding possible outcomes as
a1a2a3, a1a3a2   a3a1a2 a2a1a3  a2a3a1 a3a2a1
In general we can say that for a fixed input size n elements of an array, there are f(n) possible outputs
where we the height of the decision tree for comparisons is > = log(f(n))
The lower bound for sorting in the comparison model is based on the number of outputs. For n elements, the number of outputs is n! which are the number of leaves.  # leaves >= n! and this is f(n). For any algorithm A for sorting int the comparison model, it must make log(n!) comparison. We know that n! > (n /e ) ^ m
log n! = Omega(nlogn)
The worst case for comparisons is lower bounded by Omega(nlogn)
The theorem is that any comparison based sorting algorithm requires omega(nlogn) comparisons in the worst case. Additionally, we find that the average case for complexity of such sorting algorithm is Omega(nlogn)

Saturday, March 8, 2014

In today's post we revisit P, NP and NP completeness in algorithms and with examples.
There is a very celebrated open problem which is if P = NP and this has not been answered.
we find polynomial times solution for searching sorting, minimum spanning tree etc.
P is the class of all problems that can be solved by some algorithm that takes polynomial time.
There are certain problems that are not decidable.  But there are a set of problems that are decidable that may or may not be polynomial time. For example, the problem of deciding whether a  context free language is a regular language is decidable because it can be shown that a regular language can be expressed in context-free language. Therefore we have seen so far that the space of decidable algorithms includes the class of problems P and is a superset. Within this decidable space is a subset NP which is a class of all problems that can be solved by some non-deterministic algorithm that takes polynomial time. In other words NP is a superset including P but includes problems that can be solved non-deterministically. Non-deterministic means unintuitive notion.
 Deterministic choice is when we use some conditional branching like if ( I = 5 ) etc ..Non-deterministic choice is when we guess the value and make the choice  for eg ( if (*) then etc. Programming languages are usually deterministic because they will look at all available data points before making a choice. A non-deterministic problem will magically pick up the solution because it does not try to exhaust the search space and is much faster.
There are actually large class of very interesting problems in various domains that have actually a non-deterministic algorithm that we don't yet have a deterministic algorithm.
we know that P is a subset of NP but we don't know if P = NP. Most researchers tend to agree that the equality is not really there.
Further there is a sub class of problems in NP that is called NP complete (NP-c) that is the set of the hardest problems. They are hardest because if you solve one of them, you can show that an entire class of problems can be solved similarly.
One interesting observation is that : If A is NP-c, then A belongs to P iff P = NP.
These are again from various domains.
In Practice, if you cannot find a polynomial time algorithm, then you can claim that the problem is NP-complete i.e. solving A fast in P is unlikely and it can be claimed to be a problem that cannot be addressed in this paygrade and is higher than it.

Friday, March 7, 2014

In today's post we look at some more difficulties with numerical algorithms and their computations.
We know the roots of a quadratic with well known formulas.
The first problem is how do we compute the square root. There is a very neat algorithm by Newton which is iterative and converges pretty rapidly,
x0 = (1 +  D ) / 2 and x n+1 = (xn + D / xn ) / 2
 where D > 1 . In 4 iterations, error is within 10 ^ - 15.
Stopping  when we have reasonable values.
For eg: when we have sqrt(2)
we compute x0 = ( 1 + 2 ) / 2 = 1.5
x1  = (1.5 + 2 / 1.5 ) / 2 = 1.416667
x2 = ( x1 + 2 / x1 ) / 2 = 1.414216
and in 4th iteration, there is litttle or no change.
The roots of a quadratic as given by (-b + - Sqrt( b ^ 2 - 4.a.c ) ) / 2 . a
Here another problem is given by the fact that the numerator has a negative component and there is a cancellation possible. This is the substractive cancellation in the x2 numerator.
Take for example, x ^2  - 10 ^ 5 x + 1 = 0
x1 * = 999999.999990
x2  * = 0.000000000001

In this case, we work around it by rewriting the formula as the same multiplied by ( -b - sqrt( b ^2 - 4.a. c) / ( -b - sqrt( b ^2 - 4.a. c)  where the numerators on multiplication gets multiplied and squared and the denominator becomess both components negative and we have eliminated substractive cancelation in favor of addition.

If b < 0 we take x1 as before and x2 as 2c/ ( -b + sqrt( b ^ 2 - 4.a.c))
o summarize, numerical algorithms may suffer from
Truncation errors ( we use Finite Approach)
Round off erors ( we use Bounded Representation )
Overflow, underflow,
substractive cancellation,
ill conditional problems
Therefore they require careful analysis of errors. 







I read an MSDN article written a while back titled Don't let memory allocation failures crash your legacy STL application. Here it is mentioned that application developers who use STL for C++ and Visual C++ compiler 6.0 have a high risk of crash under low memory conditions. This is because the new operator does not respond in a standard manner when it fails. The response could be to throw an exception or return NULL. Additionally, in 6.0 there was no exception thrown but in 7.0 and later exception is thrown. Furthermore, MFC adds its own behavior to this operator but Visual C++ and MFC are different products. 6.0 was not STL compliant. C++ .Net version was more compliant than earlier versions. The exceptions thrown by C++ compiler and MFC are different.
So a project written in 6.0 and using STL has some challenges.
Generally new is not checked. This is because its assumed to never fail or because it is considered to throw an exception. But both assumptions are flawed. Low memory and no exceptions are possible.
The STL implementation at the time of 6.0 for a string copy was to allocate and increment the pointer. It was correct when it relied on the assumption that new would throw when it failed and it correctly caught all exceptions. So handlers were added.
Sometimes its preferred to use the new operator with the nothrow specification. This is also common. However what is not common or expected is that the new operator may still throw even with that specification. This could be because the constructor might fail or because the 6.0 version is built for Release and the release version throws while the debug version doesn't. When this happens the pointer is not NULL. and with the release version the exception is bad allocation.
This is fixed as follows: First we specify that this handler should not throw.  Then we wrap the new operator in a try catch and catch bad alloc exception. Next we set the pointer to null in the catch block.
If the new were to be changed to throw an exception as is mandatory with the STL, then the fix would be to wrap the std:: nothrow. This may work for code with which these handlers are compiled. But exiting third party libraries that use std::nothrow now pose a challenge. In this case, either the codepath using that nothrow specification is to be modified or avoided or recompiled with the static library.
One more issue in this context is that the STL itself may use this form of nothrow allocators and so they may need to be avoided or STL implementations that don't use nothrow can be used instead.



 

Thursday, March 6, 2014

In today's post we will look at some more  challenges with algorithms for numerical computations.
The second type of problems is round off errors. Here we use bounded representations. i.e e use the following format to specify values : sign, mantissa, an implicit base, an exponent. We use this to represent 6.2 x 10 ^ 23 as + . 602 (10) 24
However this does not represent numbers that don't have a finite number of digits such as pi which will then be truncated.
Binary representations are p bits where base is 2
More precision implies more accuracy but it also implies slower computations.
Single digit precision is 6/7 digits after, double  digits precision is 13/14 digits after and extended  precision is 19/20 digits after.
This rounding off error has to be measured.
We do it this way.
If we denote the actual value of a numeric variable as alpha-* and its representation as alpha, then the error is represented by the magnitude of (actual - representation)
There are other errors to representation such as
1. overflow This means that value is too large to represent.
We handle this by reordering operators. For example a large number 1 multiplied by large number 2 divided by a large number 3 could be calculated by first doing the division and then multiplication.
Another approach is to use alternate representations where we pre-compute or use notations such as to avoid doing 100!/(2!98!) and instead use 100.99/2
The third way to solve overflow is to use logarithms since they reduce the numbers considerably.
The second problem is underflow.
This manifests itself when values are too close to zero and computation proceeds with the value zero, instead it should be marked with a flag to indicate underflow. This is different from overflow in that computation could proceed as if the value was zero.
The third problem is  that arithmetic operations are not exact.
i.e if we take the value of pi and remove the seventh place number which happens to be the digit 6, we expect the difference  between old and new values to be 6 * 10 ^ (-7). but this is taken as zero because the relative error is large.
The errors cascade too in the numerical computations. For example, early errors are propagated to later stages. Numbers in a computation may change slightly and this could result in a completely different answer.

Wednesday, March 5, 2014

We look at the challenges of numerical algorithms i.e we show that there are limitations of algorithmic power.

Numerical algorithms are those for computations of problems in  a continuous mathematical world.
Therefore the first obstacle is that we have finite approximations of what is really infinite objects.
This results in truncation erros.
The second class of errors we get from round off errors. Real numbers cannot be used accurately because we cannot represent the real answers so we put them in a range.
To overcome the first limitation, we use what is called the Taylor's expansion.

e ^ x = 1 + x + x ^ 2 / 2! + x ^ 3 / 3! + ... x ^ n / n! + ...

we stop the expansion at a definite n. Anything beyond that fixed n is the truncation error.

The second class is evident from when we take a graph of the curve and want to compute the area under it. In this case, we slice up the area into geometrical shapes, trapezoidals to be precise and this is called the trapezoidal rule by which we approximately compute the area under the curve.

One way to get an upper bound on the truncations error is
to use the equation:
 e ^ x - ( 1 + x + x ^ 2/ 2! + ... x ^n / n! )  <= M/(n+1)! |x| ^ n+1 where M = max e ^ y for  0 <y <x
i.e by choosing an upper bound M for any y that can be between 0 and x, we are able to establish an upper bound on the truncation error.

We can take an example to show how we come up with M
Let's say we target e ^ 0.5 with an error of 0.0001
Since 0< y < x and e ^ y is increasing , this has to be less than e ^ 0.5. Since e is between 2 and 3, we can take value of M as 4 ^ 0.5 = 2
Now the truncation error is M / (n + 1)! |x| ^ n + 1 and this is determined to be upper bounded by ( as per our chosen values) as 2 / (n + 1) ! | 0.5 | ^ n + 1 which should be less than 0.0001 and this lets us find out n. In this case, we may find n = 5.


We can similarly write a formula to find the upper bound on the error of the trapezoidal rule. Here the problem is defined as how thin and how many trapezoidals can be put between lower and upper limit of the curve.
In Today's post we look at the limitations of algorithm power specifically the lower bound arguments.
If a problem P takes time t(n) then there is a time t under which no algorithm can solve P.
Knowing the lower bound is advantageous. For example, we know that sorting algorithms can take O(n^2) in some cases and O(nlogn) in other cases. Searching a sorted array with brute force takes O(n) and with binary search takes O(log n) and thus an improvement. Minimum spanning tree on a connected graph is done with Prim's algorithm in O(mlogn) and single source shortest path is done by Djikstra's algorithm in O(mlogn). How far can we go in each of these cases ? This is perhaps answered by a lower bound.
 Some problems can be really hard to solve. For example factoring a number into its prime numbers is hard to solve. Others require a lot of processing. For example, The RSA cryptographic scheme as used by say a bank to secure its clients messages generates two large prime numbers p and q and takes its product p.q as n.  Then it creates a decryption encryption pair de = 1mod(p-1)(q-1) where (e,n) is taken as the public key and d is taken as the private key. When a message is sent by a client, he signs the message with m^e mod n. When the bank receives the message, it decrypts it by applying ((m^e)^d) mod n which is equal to m. Since factoring a number n into p.q is hard, this is a good security scheme.
A lower bound can be trivially established by enumerating the cases that need to be generated.  This is the case for generating permutations of n elements since it will be n! Similarly a search in an unsorted array will have to cover the n elements and that O(n) becomes the lower bound.
Another way to establish the lower bound is the information theoretic. This is the case in a binary search where we know the number of questions that need to be answered before a problem can be solved.
Another way to establish a lower bound is by using an adversary arguments. In this case we don't show the full input and an adversary answers the questions an algorithm asks on behalf of the input.
as an example, let us take the problem of merging two sorted lists. In this case an algorithm is forced to ask 2n-1 questions on the input to come up with the merged list. That is the first element of the first list and the first element of the second list are compared and the second element of the first list and the first element of the second list are compared. i.e the way in which these comparisions are made are finite and pre-determined. If an algorithm asked a different set of questions and produced another output that was valid that would mean there are two different algorithms that produce two different valid outputs which should not happen for the same given input. Thus this is another way of establishing a lower bound.
The final way of establishing a lower bound is by a class of techniques called Reduction. Here if P reduces to Q in constant time, then any lower bound for P will hold for Q as well.
If Q were solvable any faster then that means P could be solved faster too which is a contradiction to the assumption we started with. Hence reduction can also be used to find the lower bound.
Courtesy: @mecr.org

Tuesday, March 4, 2014

In today's post we discuss RabinKarp algorithm for string matching based on mecr lectures. We match a string pattern P  of length m in a target string T of length n. The pattern matches O(mn) times when we go character by character in the target string. This can be improved with two good ideas:
one is that on a mismatch we try to skip ahead and we can skip upto m characters. So we have O(m) extra space and O(m + n)  worst case time. This is still an improvement.
second is that we do approximate matches where if there is a mismatch, we know for sure but if there is a match, it may or may not be so. This way we can skip ahead and definitive mis-matches.
We implement the approximation for instance with a hash match. We take strings at index 0 to n - m and hash m characters to see if the hash matches that of the pattern.
If a portion of the string matches, then their portion of hash should match. But we look at the opposite  way here. If the hash of the match is different then we know that the string is not the same as the pattern and we can skip ahead. So while we do a little bit more work in the hashing and consecutively our worst case is more than the first approach but in most cases our hash function will work very well. We just have a hash function that matches our needs which are:
1) we should pre-compute a hash of the pattern
2) we compute hash of string at index i to i+m-1for different i
3) the hash function should be such that
h(T[i, i+m-1]) = g(T[i]) "+" g(T[i+1]) "+" .... g(T[i+m-1])
where + allows us to compute
h(T[i+1, i+m]) = g(T[i+1]) + g(T[i +2)) + .... g(T[i+m])
with these properties it takes a constant time to compute
h(T[i+1, i+m]) from h(T[i, i+m-1])
In addition if this hash function has very few collisions, then we have a fast running time of O(n+m)
As an example, we can take a hash function h(string.letter) as 4h(string) + h(letter)
if a string is made up of DNA letters ACGT
h(ACT) = 4h(AC) + h(T) and so on.


Monday, March 3, 2014

Today I looked at a few course materials available on mecr.org and these are available for reading. One of the topics I looked at was evaluating complexity in terms of big O, Omega and Theta.  I want to briefly mention them here.
If we take a few methods f and g  where f is O(g(n)) we can show there are non-unique values of coefficient c and n that satisfy this i.e c > 1 n > 1 and there are different forms of polynomials that can satisfy the relationship for f and g.
Typically in calculating complexity, usually only the higher term matters. This becomes important in polynomials as it simplifies the discussion on complexity.
While O is applicable to a single algorithm. Omega is applicable to a class of algorithms. For example, we say that Omega for sorting algorithms is nlogn. This we see from different sorting algorithms having a complexity of O(nlogn).
Theta on the other hand is used for a range. For example if f(n) lies between c1.g(n) and c2.g(n) i.e
c1.g(n) < f(n) < c2.g(n) then we say f(n) is theta for g(n) for all n0, c1, c2 such that n > n0.
These two functions are generally of the same order.
big O, Omega, Theta are all related.
For example, f(n) is O(g(n)) and g(n) is Omega(f(n)). Then we can take two coefficients c and 1/c for which we can say f(n) <= c.g(n) and 1/c .f(n) <= g(n)
and consecutively we can say f(n) is Theta(g(n). In other words, f(n) is  O(g(n)) and also f(n) is Omega(g(n)).
There is also an interesting observation that
if f1 is O(g1(n))
and f2 is O(g2(n))
the f1 + f2 is O( max(g1, g2))
This is true even if we have more than two such functions. Its easier to visualize these as applicable to a  sequence of code blocks in a program where the overall program complexity is the max of any one of the blocks of code.
Also, we can show that with a choice of coefficients for the functions above, where say n3 is max of n1, n2 and c3 is max of c1, c2, then f1 + f2 = c3g1(n) + c3g2(n) = 2.c3.(max (g1,g2)) and that leads to
f1 + f2 is O(max(g1, g2))

Sunday, March 2, 2014

In today's post I want to cover a topic on writing fast low level logs.  There are many logging providers and I've covered at different levels earlier. Some examples included log4net and log4cpp. Kernel mode tracing is often considered as an example of low level logging. Database systems also have their own system of logging.
There are a few salient features of most logging methods.
First is they enable formatting to the text that needs to be logged. These are variable length arguments and allow the different specifiers to be enabled
Second is they enable different levels of logging. By this we mean we control how much or how little of the logs to be written. For the components that provide the log strings, they could continue to log at different levels and the end user could decide how much or how little of the logs to be seen. This is a switch independent of all the components and each log provider is expected to honor these levels.
Components are expected to separate their logging with labels or strings that they identify by initializing or teardown.
Another feature is that these logging are based on file appenders and the logs are written at the end of the file.
Aside from the debug levels, there may be other metadata for the log.
But if we take the implementation of a logger, we neither use an event loop or a overlapped IO.
In principle, both could be used. However, neither are.
A logger has many clients that send messages. The messages are serialized onto the end of the file.
The offset of the file is incremented in a thread safe manner as opposed to locking the entire data.
or putting the entire log entry addition code within critical section. The process of writing the log is that of filling out buffers which are then flushed to disks. Typically this buffer is left to file system instead of maintaining a separate one. Some applications choose to implement a round robin circular linked list of buffers which are then flushed to disk.
No matter what the buffering is there can be many producers and one consumer and serialized queue. That is why this sounds similar to an overlapped IO.