Cluster computing

Sunday, February 23, 2014

Splunk 6 has a web framework with documentation on their dev portal that seems super easy to use. Among other things, it can help to gain App Intelligence i.e by improving semantic logging where the meaning can be associated via simple queries, to integrate and extend Splunk, such as with business systems or customer facing applications and to build real time applications that add a variety of input to Splunk.
One such example could be SQL Server Message Broker Queue. The Message broker keeps track of messages based on a "conversation_handle" which is a Guid.
Using a SQL data reader and a SQL query, we can get these messages which can then be added as input to Splunk. We issue RECEIVE commands like this :RECEIVE top (@count) conversation_handle,service_name,message_type_name,message_body,message_sequence_number
FROM <queue_name>
Unless the messages have messageType as "http://schemas.microsoft.com/SQL/ServiceBroker/EndDialog the message body can be read.
The queue listener that drives this should allow methods to configure the listener, and to start and stop.
Using a simple call back routine, the thread can actively get the messages as described and send it to a processor for completion. A transaction scope could be used in this routine.
Inbound and outbound processor queues could be maintained independently to be invoked separately. Both should have methods for process messages and to save failed messages.
The processed messages can then be written to a file for input to Splunk or to use the framework for directly indexing this input.
There are several channels for sending data from SQL server and this is one that could potentially do with a Splunk app.
In general writing such apps in languages such as CSharp or JavaScript has documentation but it would not be advisable to push it any further into the Splunk stack. This is because the systems are different and Splunk is not hosted on SQL server.
If Splunk is hosted on say one of the operating sytems, then certain form of input that is a nuance for that operating system could be considered but in general Splunk foundation on which the apps are built focuses on generic source types and leaves it to used discretion to send it through one of the established channels.

I'm taking a look at the windows IO completion ports today and writing about it. When an IO completion port is created by a process, there is a queue associated with this port that services the multiple asynchronous IO requests. This works well with a thread pool. The IO completion port is associated with one or more file handles and when an asynchronous IO operation on the file completes, an IO completion packet is queued in the First in First out order to the completion port.
Note that the file handle can be in any arbitrary overlapped IO endpoint ranging from file, sockets, named pipes or mail slots etc.
The thread pool is maintained in the Last In First Out manner. This is done so that the running thread can continuously pick up the next queued completion packet and there is no time lost in context switches.This is hardly the case though since thread may switch ports or put in sleep or terminate and the other workers get to service the queue. When threads waiting on a GetQueuedCompletionStatus call can process a completion packet when another running thread enters a wait state. The system also prevents any new active threads until the number of active threads falls below the concurrency value.
In general, the concurrency value is chosen as the number of processors but this is subject to change and its best to use to profiling tool to see the benefits before thrashing. I've a case where I want to read from multiple mail slots and these are best serviced by a thread pool. The Threads from the pool can read the mail slots and place the data packets directly on the completion port queue. The consumer for the completion port will then dequeue it for processing. In this example, The threads are all polling the mail slots directly for messages. and place them on the completion port. This is fast and efficient and polling can delay for queues with same or no current message. However, this is not the same model as a completion port notification for that mail slot or a call back routine for that mail slot. In the latter model, there is a notification, subscription model and it is better at utilizing system resources. These resources can be quite some if the number of mail slots or their number of messages are high. we can make the polling model fast as well with a timeout value of zero for any calls to read the mail slots and skipping those that dont' have actionable messages. However, the notification model helps with little or no time spent on anything other than servicing the messages in the mail slots as and when they appear. The receive call seems to have a builtin wait that relieves high cpu usage.

Friday, February 21, 2014

Yesterday I saw a customer report for a failure of our application and it seemed at first a disk space issue. however, file system problems are generally something that applications cannot workaround.
Here the file system was a NFS mount even though it had the label of a GPFS mount. Further disk space was not an issue. Yet the application reported that it could not proceed because the open/read/write was failing. Mount showed the file system mount point and the remote server it mapped to. Since the mount was for a remote file system, we needed to check both the network connectivity and the file system read and writes.
A simple test that was suggested was to
Try writing a file outside the application with the dd utility to the remote server
Something like
dd -if /dev/zero -of /remotefs/testfile -b blocksize
And if that succeeds, read it back again as follows:
dd -if /remotefs/testfile -of /etc/null -b blocksize
With a round trip like that, the file system problems
could be detected.
The same diagnostics can be made part of the application diagnostics.

Thursday, February 20, 2014

I'm not finding time tonight but I wanted to take a moment to discuss an application for data input to Splunk. We talked about user applications for Splunk and sure they can be written in any language but when we are talking performance reading orders such as for an MSMQ cluster, we want it to be efficient in memory and CPU. What better way to do it than to push it down the way to the bottom of the Splunk stack.This is as close as it can get to the Splunk engine. Besides MSMQ clusters are high volume queues and there can be a large number of such queues. While we could subscribe to notifications at different layers, there is probably nothing better than having something out of the box from the Splunk application.
I've a working prototype but I just need to tighten it. What is missing out of this is the ability to keep the user configuration small. The configuration currently takes one queue at a time but there is possibility to scale that. One of the things I want to do for example is to enable a regular expression for specifying the queues. This way users can specify multiple queues or all queues on a host or cluster with .* like patterns. The ability to enumerate queues on clusters is via name resolution. and adding it to the prefix for the queue names. With an iterator like approach all queues can be enumerated.
One of the things that I want is to do is to enable transactional as well as non-transactional message reading. This will cover all the queues on a variety of deployments. Other than the system reserved queues most other queues including the special queues can be processed by the mechanism above. Making the message queue monitoring as first class citizen of the input specifications for Splunk, we now have the ability to transform and process as part of the different T-shirt size deployments and Splunk roles. This will come in very useful to scale on different sizes from small, medium to enterprise level systems.
I also want to talk about system processing versus app processing of the same queues. There are several comparisons to be drawn here and consequently different merits and de-merits. For example, we talked about different deployments. The other comparisons include such thing as performance, being close to pipelines and processors, shared transformations and obfuscations, indexing of data and no translation to other channels, etc.
Lastly I wanted to add that as opposed to any other channels where there is at least one level of redirection, this directly taps into a source that forms a significant part of enterprise level systems.
Further more, journaling and other forms of input lack the same real time processing of machine data and is generally not turned on in production systems. However Splunk forwarders are commonly available to read machine data.

Wednesday, February 19, 2014

We will look at advanced Splunk server configuration. We look at modifying data input. This is important because once data is written by Splunk, it will not be changed. Data transformation is handled by the different configuration files as indicated earlier. These are props.conf, inputs. conf and transforms.conf. The props.conf is typically only one and for different forwarders. At the input phase , we look only at the data in bulk and put tags around it such as host, source and source type but we don't process them as events. This is what we specify in inputs.conf. In props.conf, we add information to tags such as character set, user-defined stanza etc. Stanza is specified to a group of attribute-value pairs and can be host, source and source type specified within square brackets where we can differentiate between source type for overriding automatic source type. Note that props.conf affects all stages of processing globally as opposed to the other configuration files. The stanzas in a props.conf is similar to the others. Also, user inputs alleviates the processing down the line or afterwards.
In the parsing phase, we take these tags off and process them as individual events. We will find start and stop of events in this phase and perform other event level processing. There are processing that could be performed in input phase as well as parsing phase. Typically they are done once and not repeated elsewhere. That said, parsing is usually performed on the indexer or the heavy forwarder.
In the indexing phase, the events are indexed and written to disk.
Splunk indexing is read write intensive and consequently requires better disks. The recommended RAID setup is RAID 10 which provides fast read and write with greatest redundancy. RAID 5 duplicate writes is not recommended. SAN and NAS storage is not recommended for recently indexed data. They are preferable for older data.
Search heads are far more cpu bound than indexers.

We will look at Splunk server administration today. Here we talk about the best practices and the configuration details for Splunk administration in a medium to large deployment environment. A common spunk topology is a self-contained Splunk instance. It gathers inputs, indexes and acts as a search interface. If the Indexer is separate, then it gathers and/or receives data from forwarders and writes them to disk. It can operate alone or with other indexers load balanced and can also act as a search interface. A search head runs Splunk Web, generally does not index and connects to indexers with distributed search. It is used in large implementations with high numbers of concurrent users/searches.
A light forwarder is a Splunk agent installed on a non-Splunk system to gather data locally but it can't parse or index. The purpose here is to keep the hardware footprint as small as possible on production systems.
If there are no restrictions and the hardware can support more, a heavy forwarder is installed that can also parse the spunk data. No data is written to the disk and does not support indexing. That is left to indexers and search head. It generally works as a remote collector, intermediate forwarder and possible data filter.
A deployment server acts as a configuration manager for a Splunk install. It can run on an indexer or search head or a dedicated machine depending on the size of the installation.
Key considerations when planning a topology include such things as how much data per day is being indexed, how many concurrent users are there and how many scheduled searches or alerts. We want to know about the data, its location, its persistence, its growth, its security, its connectivity and its redundancy to plan the deployment.
Generally as the T-shirt sizes of the deployments increases, the number of indexers, forwarders and syslog devices increases. A dedicated search head is deployed for handling the search requests. But the indexers and search head are typically kept together and secured as Splunk internal while everything else feed into it. An Intermediate forwarder may consolidate input from syslog devices and together with the feed from the forwarders, they are consolidated with load balancing feed to Splunk indexers.

Tuesday, February 18, 2014

The scaling of threads to process a concurrent queue was discussed. In this post we talk about integrating the data and metadata passed over the queue.