Cluster computing

Sunday, April 13, 2014

We read the topics TCP monitoring from Splunk docs today. Splunk monitors the TCP port specified. It listens for packets for one or all machines on the specified port. We use the host restriction field to specify this. Host can be specified using IP, DNS name or a custom label. On a unix system, Splunk will require root access to listen for packets from ports under 1024. SourceType is a default field added to events and so is the index. SourceType is used to determine the processing characteristics such as timestamps and event boundaries. Index is where the events are stored.
Note that when Splunk starts listening on a port, it establishes a connection on both directions. There are two workers it spawns. The first worker is the forward data receiver thread and the second worker is the replication data receiver thread. Both workers act on Tcp input and hence share similar functionality.
The forward data receiver thread creates the input pipeline for data. Therefore it manages routines such as setting up the queue names, updating bookkeeping, maintaining stats and cleanup of all associate data structures and file descriptors. The Replication data receiver thread is responsible for creating an acceptor port, scheduling memory reclaiming and handling shutdowns.
Note that in a cluster configuration, there may be multiple peers or forwarders. Therefore all the data must be handled. There is only endpoint to which the data arrives and these are consolidated.
The interface that deals with the TCP channel is the one that registers a data callback, consumes and send acknowledgements
The data callback function does the following:
It processes forwarder info and it sends acknowledgements.
The way TCP monitoring works is very similar to how tcpdump works. Raw packets read are dumped to a file. Tcpdump is written in the form of libpcap packet capture library. Libpcap works on different operating systems. It works on the principle that a TCP flow between a source ip address and port and a destination ip address and port is available to read just like any other file. Tcpdump can log to a file that can be parsed with the tcptrace tool. The use of such tools makes it unnecessary for Splunk to monitor all Tcp connections to a computer directly.

Cluster computing

Sunday, April 13, 2014

No comments:

Post a Comment