Wednesday, February 19, 2014

We will look at advanced Splunk server configuration. We look at modifying data input. This is important because once data is written by Splunk, it will not be changed. Data transformation is handled by the different configuration files as indicated earlier.  These are props.conf, inputs. conf and transforms.conf. The props.conf is typically only one and for different forwarders.  At the input phase , we look only at the data in bulk and put tags around it  such as host, source and source type but we don't process them as events. This is what we specify in inputs.conf. In props.conf, we add information to tags such as character set, user-defined stanza etc. Stanza is specified to a group of attribute-value pairs and can be host, source and source type specified within square brackets where we can differentiate between source type for overriding automatic source type. Note that props.conf affects all stages of processing globally as opposed to the other configuration files. The stanzas in a props.conf is similar to the others. Also, user inputs alleviates the processing down the line or afterwards.
In the parsing phase, we take these tags off and process them as individual events. We will find start and stop of events in this phase and perform other event level processing. There are processing that could be performed in input phase as well as parsing phase. Typically they are done once and not repeated elsewhere. That said, parsing is usually performed on the indexer or the heavy forwarder.
In the indexing phase, the events are indexed and written to disk.
 Splunk indexing is read write intensive and consequently requires better disks. The recommended RAID setup is RAID 10 which provides fast read and write with greatest redundancy. RAID 5 duplicate writes is not recommended. SAN and NAS storage is not recommended for recently indexed data. They are preferable for older data.
Search heads are far more cpu bound than indexers.

No comments:

Post a Comment