Cluster computing

Friday, February 21, 2014

Yesterday I saw a customer report for a failure of our application and it seemed at first a disk space issue. however, file system problems are generally something that applications cannot workaround.
Here the file system was a NFS mount even though it had the label of a GPFS mount. Further disk space was not an issue. Yet the application reported that it could not proceed because the open/read/write was failing. Mount showed the file system mount point and the remote server it mapped to. Since the mount was for a remote file system, we needed to check both the network connectivity and the file system read and writes.
A simple test that was suggested was to
Try writing a file outside the application with the dd utility to the remote server
Something like
dd -if /dev/zero -of /remotefs/testfile -b blocksize
And if that succeeds, read it back again as follows:
dd -if /remotefs/testfile -of /etc/null -b blocksize
With a round trip like that, the file system problems
could be detected.
The same diagnostics can be made part of the application diagnostics.

Thursday, February 20, 2014

I'm not finding time tonight but I wanted to take a moment to discuss an application for data input to Splunk. We talked about user applications for Splunk and sure they can be written in any language but when we are talking performance reading orders such as for an MSMQ cluster, we want it to be efficient in memory and CPU. What better way to do it than to push it down the way to the bottom of the Splunk stack.This is as close as it can get to the Splunk engine. Besides MSMQ clusters are high volume queues and there can be a large number of such queues. While we could subscribe to notifications at different layers, there is probably nothing better than having something out of the box from the Splunk application.
I've a working prototype but I just need to tighten it. What is missing out of this is the ability to keep the user configuration small. The configuration currently takes one queue at a time but there is possibility to scale that. One of the things I want to do for example is to enable a regular expression for specifying the queues. This way users can specify multiple queues or all queues on a host or cluster with .* like patterns. The ability to enumerate queues on clusters is via name resolution. and adding it to the prefix for the queue names. With an iterator like approach all queues can be enumerated.
One of the things that I want is to do is to enable transactional as well as non-transactional message reading. This will cover all the queues on a variety of deployments. Other than the system reserved queues most other queues including the special queues can be processed by the mechanism above. Making the message queue monitoring as first class citizen of the input specifications for Splunk, we now have the ability to transform and process as part of the different T-shirt size deployments and Splunk roles. This will come in very useful to scale on different sizes from small, medium to enterprise level systems.
I also want to talk about system processing versus app processing of the same queues. There are several comparisons to be drawn here and consequently different merits and de-merits. For example, we talked about different deployments. The other comparisons include such thing as performance, being close to pipelines and processors, shared transformations and obfuscations, indexing of data and no translation to other channels, etc.
Lastly I wanted to add that as opposed to any other channels where there is at least one level of redirection, this directly taps into a source that forms a significant part of enterprise level systems.
Further more, journaling and other forms of input lack the same real time processing of machine data and is generally not turned on in production systems. However Splunk forwarders are commonly available to read machine data.

Wednesday, February 19, 2014

We will look at advanced Splunk server configuration. We look at modifying data input. This is important because once data is written by Splunk, it will not be changed. Data transformation is handled by the different configuration files as indicated earlier. These are props.conf, inputs. conf and transforms.conf. The props.conf is typically only one and for different forwarders. At the input phase , we look only at the data in bulk and put tags around it such as host, source and source type but we don't process them as events. This is what we specify in inputs.conf. In props.conf, we add information to tags such as character set, user-defined stanza etc. Stanza is specified to a group of attribute-value pairs and can be host, source and source type specified within square brackets where we can differentiate between source type for overriding automatic source type. Note that props.conf affects all stages of processing globally as opposed to the other configuration files. The stanzas in a props.conf is similar to the others. Also, user inputs alleviates the processing down the line or afterwards.
In the parsing phase, we take these tags off and process them as individual events. We will find start and stop of events in this phase and perform other event level processing. There are processing that could be performed in input phase as well as parsing phase. Typically they are done once and not repeated elsewhere. That said, parsing is usually performed on the indexer or the heavy forwarder.
In the indexing phase, the events are indexed and written to disk.
Splunk indexing is read write intensive and consequently requires better disks. The recommended RAID setup is RAID 10 which provides fast read and write with greatest redundancy. RAID 5 duplicate writes is not recommended. SAN and NAS storage is not recommended for recently indexed data. They are preferable for older data.
Search heads are far more cpu bound than indexers.

We will look at Splunk server administration today. Here we talk about the best practices and the configuration details for Splunk administration in a medium to large deployment environment. A common spunk topology is a self-contained Splunk instance. It gathers inputs, indexes and acts as a search interface. If the Indexer is separate, then it gathers and/or receives data from forwarders and writes them to disk. It can operate alone or with other indexers load balanced and can also act as a search interface. A search head runs Splunk Web, generally does not index and connects to indexers with distributed search. It is used in large implementations with high numbers of concurrent users/searches.
A light forwarder is a Splunk agent installed on a non-Splunk system to gather data locally but it can't parse or index. The purpose here is to keep the hardware footprint as small as possible on production systems.
If there are no restrictions and the hardware can support more, a heavy forwarder is installed that can also parse the spunk data. No data is written to the disk and does not support indexing. That is left to indexers and search head. It generally works as a remote collector, intermediate forwarder and possible data filter.
A deployment server acts as a configuration manager for a Splunk install. It can run on an indexer or search head or a dedicated machine depending on the size of the installation.
Key considerations when planning a topology include such things as how much data per day is being indexed, how many concurrent users are there and how many scheduled searches or alerts. We want to know about the data, its location, its persistence, its growth, its security, its connectivity and its redundancy to plan the deployment.
Generally as the T-shirt sizes of the deployments increases, the number of indexers, forwarders and syslog devices increases. A dedicated search head is deployed for handling the search requests. But the indexers and search head are typically kept together and secured as Splunk internal while everything else feed into it. An Intermediate forwarder may consolidate input from syslog devices and together with the feed from the forwarders, they are consolidated with load balancing feed to Splunk indexers.

Tuesday, February 18, 2014

The scaling of threads to process a concurrent queue was discussed. In this post we talk about integrating the data and metadata passed over the queue.

In terms of storage, we discussed that local storage is preferable for each worker. The resources are scoped for the lifetime of a worker. There is no co-ordination required between producers and consumers for access to resources. Storage can have a collection of data structures. With the partitioning of data structures, we improve fault tolerance.
In our case we have n queues with arbitrary number of messages each from a cluster. To best process these, we could enumerate and partition the queues to different worker threads from a pool. The pool itself can have different number of workers as configurable and the number of queues assigned to any worker could be determined based on dividing the total number of queues by the number of threads.
The queues are identified by their names so as such we work a global list of queue names that the workers are allotted to. This list is further qualified to select only those that are candidates for monitoring.
Whether a queue is candidate for monitoring is determined by a regular expression match between what the user provides and the name of the queue. The regular expression and pattern matching is evaluated against each name one by one to select the filter of candidate queues.
The queues are enumerated based on windows API and these are with the corresponding begin and get next methods. Each queue retrieved will have a name that can be matched with the regex provided.
The queues may have different number of messages but each monitor thread works on only the current message on any queue. If that message is read or timeout, it moves to the current message of the next queue to work on . All candidate queues are treated equally with the optimization that no messages are fixed costs that we could try to reduce with say smaller timeouts
If we consider this round robin method of retrieving the current message from each of the queues, there is fair treatment of all queues and a guarantee for progress. What we will be missing is whether we can accelerate on queues where the same or no messages are current. If we could do that, we would be processing the queues with more number of messages faster. If we didn't do round robin, we wouldn't fair to all queues. Therefore we do not identify the priority queues based on the number of distinct messages they carry. The method we have will process the queues with more number of messages and will scale without additional logic or complexity.
Each set of queues are partitioned for workers so there is no need to solve any contention and load is optimal per worker.
The number of threads could be taken as one more than the number of available processors.

Monday, February 17, 2014

We review command line tools used for support of Splunk here.
cmd tool can invoke other tools by including the required preset environment variables. These can be displayed with the splunk envvars command.
The btoollllllll can be used to view or validate the Splunk configuration files. This is taking into account configuration file layering and user / app context i.e the configuration data visible to the given user and from the given app or from an absolute path or with extra debug information.
btprobe queries the fish bucket for file records stored by tailing by specifying the directory or crc compute file. Using the given key or file, this tool queries the specified BTree
classify cmd is used for classifying files with types.
fsck diagnoses the health of the buckets and can rebuild search data as necessary.
hot, warm, thawed or cold buckets can be specified separately or together with all.
locktest command tests the locks
locktool command can be used to set and unset the tool
parsetest command can be used to parse log files
pcregextest command is a simple utility tool for testing modular regular expressions.
searchtest command is another tool to test search functionality of Splunk.
signtool is used for verification and signing spunk index buckets.
tsidxprobe will take a look at your time series index files or tsidx and verify the formatting
or identify a problem file. It can look at each of the index files.
tsidx_scan.py is a utility script to search for tsidx files at a specified starting starting location, runs tsidxprobe for each one, and outputs the results to a file.
Perhaps one more tool that could be added to this belt is one that helps with monitoring and resource utilization to see if the number of servers or settings can be better adjusted