Cluster computing

Thursday, February 20, 2014

I'm not finding time tonight but I wanted to take a moment to discuss an application for data input to Splunk. We talked about user applications for Splunk and sure they can be written in any language but when we are talking performance reading orders such as for an MSMQ cluster, we want it to be efficient in memory and CPU. What better way to do it than to push it down the way to the bottom of the Splunk stack.This is as close as it can get to the Splunk engine. Besides MSMQ clusters are high volume queues and there can be a large number of such queues. While we could subscribe to notifications at different layers, there is probably nothing better than having something out of the box from the Splunk application.
I've a working prototype but I just need to tighten it. What is missing out of this is the ability to keep the user configuration small. The configuration currently takes one queue at a time but there is possibility to scale that. One of the things I want to do for example is to enable a regular expression for specifying the queues. This way users can specify multiple queues or all queues on a host or cluster with .* like patterns. The ability to enumerate queues on clusters is via name resolution. and adding it to the prefix for the queue names. With an iterator like approach all queues can be enumerated.
One of the things that I want is to do is to enable transactional as well as non-transactional message reading. This will cover all the queues on a variety of deployments. Other than the system reserved queues most other queues including the special queues can be processed by the mechanism above. Making the message queue monitoring as first class citizen of the input specifications for Splunk, we now have the ability to transform and process as part of the different T-shirt size deployments and Splunk roles. This will come in very useful to scale on different sizes from small, medium to enterprise level systems.
I also want to talk about system processing versus app processing of the same queues. There are several comparisons to be drawn here and consequently different merits and de-merits. For example, we talked about different deployments. The other comparisons include such thing as performance, being close to pipelines and processors, shared transformations and obfuscations, indexing of data and no translation to other channels, etc.
Lastly I wanted to add that as opposed to any other channels where there is at least one level of redirection, this directly taps into a source that forms a significant part of enterprise level systems.
Further more, journaling and other forms of input lack the same real time processing of machine data and is generally not turned on in production systems. However Splunk forwarders are commonly available to read machine data.

Cluster computing

Thursday, February 20, 2014

No comments:

Post a Comment