Cluster computing

Wednesday, February 19, 2014

We will look at advanced Splunk server configuration. We look at modifying data input. This is important because once data is written by Splunk, it will not be changed. Data transformation is handled by the different configuration files as indicated earlier. These are props.conf, inputs. conf and transforms.conf. The props.conf is typically only one and for different forwarders. At the input phase , we look only at the data in bulk and put tags around it such as host, source and source type but we don't process them as events. This is what we specify in inputs.conf. In props.conf, we add information to tags such as character set, user-defined stanza etc. Stanza is specified to a group of attribute-value pairs and can be host, source and source type specified within square brackets where we can differentiate between source type for overriding automatic source type. Note that props.conf affects all stages of processing globally as opposed to the other configuration files. The stanzas in a props.conf is similar to the others. Also, user inputs alleviates the processing down the line or afterwards.
In the parsing phase, we take these tags off and process them as individual events. We will find start and stop of events in this phase and perform other event level processing. There are processing that could be performed in input phase as well as parsing phase. Typically they are done once and not repeated elsewhere. That said, parsing is usually performed on the indexer or the heavy forwarder.
In the indexing phase, the events are indexed and written to disk.
Splunk indexing is read write intensive and consequently requires better disks. The recommended RAID setup is RAID 10 which provides fast read and write with greatest redundancy. RAID 5 duplicate writes is not recommended. SAN and NAS storage is not recommended for recently indexed data. They are preferable for older data.
Search heads are far more cpu bound than indexers.

We will look at Splunk server administration today. Here we talk about the best practices and the configuration details for Splunk administration in a medium to large deployment environment. A common spunk topology is a self-contained Splunk instance. It gathers inputs, indexes and acts as a search interface. If the Indexer is separate, then it gathers and/or receives data from forwarders and writes them to disk. It can operate alone or with other indexers load balanced and can also act as a search interface. A search head runs Splunk Web, generally does not index and connects to indexers with distributed search. It is used in large implementations with high numbers of concurrent users/searches.
A light forwarder is a Splunk agent installed on a non-Splunk system to gather data locally but it can't parse or index. The purpose here is to keep the hardware footprint as small as possible on production systems.
If there are no restrictions and the hardware can support more, a heavy forwarder is installed that can also parse the spunk data. No data is written to the disk and does not support indexing. That is left to indexers and search head. It generally works as a remote collector, intermediate forwarder and possible data filter.
A deployment server acts as a configuration manager for a Splunk install. It can run on an indexer or search head or a dedicated machine depending on the size of the installation.
Key considerations when planning a topology include such things as how much data per day is being indexed, how many concurrent users are there and how many scheduled searches or alerts. We want to know about the data, its location, its persistence, its growth, its security, its connectivity and its redundancy to plan the deployment.
Generally as the T-shirt sizes of the deployments increases, the number of indexers, forwarders and syslog devices increases. A dedicated search head is deployed for handling the search requests. But the indexers and search head are typically kept together and secured as Splunk internal while everything else feed into it. An Intermediate forwarder may consolidate input from syslog devices and together with the feed from the forwarders, they are consolidated with load balancing feed to Splunk indexers.

Tuesday, February 18, 2014

The scaling of threads to process a concurrent queue was discussed. In this post we talk about integrating the data and metadata passed over the queue.

In terms of storage, we discussed that local storage is preferable for each worker. The resources are scoped for the lifetime of a worker. There is no co-ordination required between producers and consumers for access to resources. Storage can have a collection of data structures. With the partitioning of data structures, we improve fault tolerance.
In our case we have n queues with arbitrary number of messages each from a cluster. To best process these, we could enumerate and partition the queues to different worker threads from a pool. The pool itself can have different number of workers as configurable and the number of queues assigned to any worker could be determined based on dividing the total number of queues by the number of threads.
The queues are identified by their names so as such we work a global list of queue names that the workers are allotted to. This list is further qualified to select only those that are candidates for monitoring.
Whether a queue is candidate for monitoring is determined by a regular expression match between what the user provides and the name of the queue. The regular expression and pattern matching is evaluated against each name one by one to select the filter of candidate queues.
The queues are enumerated based on windows API and these are with the corresponding begin and get next methods. Each queue retrieved will have a name that can be matched with the regex provided.
The queues may have different number of messages but each monitor thread works on only the current message on any queue. If that message is read or timeout, it moves to the current message of the next queue to work on . All candidate queues are treated equally with the optimization that no messages are fixed costs that we could try to reduce with say smaller timeouts
If we consider this round robin method of retrieving the current message from each of the queues, there is fair treatment of all queues and a guarantee for progress. What we will be missing is whether we can accelerate on queues where the same or no messages are current. If we could do that, we would be processing the queues with more number of messages faster. If we didn't do round robin, we wouldn't fair to all queues. Therefore we do not identify the priority queues based on the number of distinct messages they carry. The method we have will process the queues with more number of messages and will scale without additional logic or complexity.
Each set of queues are partitioned for workers so there is no need to solve any contention and load is optimal per worker.
The number of threads could be taken as one more than the number of available processors.

Monday, February 17, 2014

We review command line tools used for support of Splunk here.
cmd tool can invoke other tools by including the required preset environment variables. These can be displayed with the splunk envvars command.
The btoollllllll can be used to view or validate the Splunk configuration files. This is taking into account configuration file layering and user / app context i.e the configuration data visible to the given user and from the given app or from an absolute path or with extra debug information.
btprobe queries the fish bucket for file records stored by tailing by specifying the directory or crc compute file. Using the given key or file, this tool queries the specified BTree
classify cmd is used for classifying files with types.
fsck diagnoses the health of the buckets and can rebuild search data as necessary.
hot, warm, thawed or cold buckets can be specified separately or together with all.
locktest command tests the locks
locktool command can be used to set and unset the tool
parsetest command can be used to parse log files
pcregextest command is a simple utility tool for testing modular regular expressions.
searchtest command is another tool to test search functionality of Splunk.
signtool is used for verification and signing spunk index buckets.
tsidxprobe will take a look at your time series index files or tsidx and verify the formatting
or identify a problem file. It can look at each of the index files.
tsidx_scan.py is a utility script to search for tsidx files at a specified starting starting location, runs tsidxprobe for each one, and outputs the results to a file.
Perhaps one more tool that could be added to this belt is one that helps with monitoring and resource utilization to see if the number of servers or settings can be better adjusted

Saturday, February 15, 2014

What's a good way to scale the concurrent processing of a queue both on the producer and consumer side ? This seems a text book question but think about support on cross platform and high performance. Maybe if we narrow down our question to windows platform, that would help. Anyways, its the number of concurrent workers we can have for a queue. The general rule of thumb was that you could have as many threads as one more than the number of processors to keep everyone busy. And if its a light weight worker without any overhead of TLS storage, we could scale to as many as we want. The virtual workers can use the same pool of physical threads. I'm not talking fibers which don't have the TLS storage as well. Fibers are definitely welcome over OS threads. But I'm looking at a way to parallelize as much as we can in terms of number of concurrent calls on the same system.
In addition we consider the inter worker communication both in a failsafe, reliable manner. OS provides mechanisms for thread completion based on handles returned from the CreateThread and then there's a completion port on windows that could be used with multiple threads. The threads can then close when the port is closed.
Maybe the best way to do this would be to keep a pool of threads and partitioned tasks instead of timeslicing.
Time-slicing or sharing does not really improve concurrent progress if the tasks are partitioned.
Again, it helps if the tasks are partitioned. The .Net task parallel library enables both declarative parallel queries and imperative parallel algorithms. By declarative we mean we can use notations such as 'AsParallel' to indicate we want routines to be executed in parallel. By Imperative we mean we can use partitions, permutations and combinations with linear data.
In general a worker helps with parallelization when it has little or no communication and works on isolated data.
I want to mention diagnostics and boost. Diagnostics on a workers activity to detect hangs or for identifying a worker among a set of workers are enabled with such things as logging or tracing and
identifiers for workers. Call level logging and tracing enable detection of activity by a worker. Between a set of workers, IDs can tell apart a worker from the test. Logging can include this ID to detect the worker with a problem activity.
There can also be a dedicated activity or worker to monitor others.
Boosting a workers performance is in terms of cpu speed and memory. Both are variables that depend on hardware. Memory and caches come very helpful in improving the performance of the same activity by a worker.