Cluster computing

Sunday, June 15, 2014

In addition to the previous post, where we described in some detail how trackers work when they are injected into a pipeline for data in Splunk, let us now look at the changes to the indexer to support adding labels or trackers in the data pipeline When the data flows into an input processor, it is typically initiated by a monitor or a processor in the forwarder. When it arrives in the indexer, it is typically handled by one or more processors that apply the configurations specified by the user. When the raw data is written to the Splunk database, it is received by an Indexed Value Receiver that writes it to the appropriate Index. As an example, if we take the signing processor that encrypts the data before it is written to the database, we see that it takes the data from the forwarder and forwards it to Splunk database. Here the data is changed before it makes its way to the database.
Addition of this tracker is via a keyword that can be added to any existing pipeline data. Adding a keyword does not need to be user visible. This can appear in the transformation or expansion of user queries or inputs. For example at search time, the use of the keyword tracker can be used to display the tracking data that was injected into the data stream. This can be implemented by a corresponding Search Processor. Similarly on the input side, a corresponding processor can inject the trackers into the data pipeline. The data might go to different indexes and its debatable whether data should go only into the internal index or to the one specified for data. The choice of internal index lets us persist the trackers separately from the data so that the handlers for the different processors do not affect the data itself and this requires little code change. On the other hand, if we keep the trackers and the data in the same index, then almost all components reading the data from the index will need to differentiate between the data and the trackers. Associations between the data and the trackers even if kept separately are easy to maintain because we keep track of timestamps and trackers pertaining to an input can show where they are inlined with the data by means of the timestamps.
IndexProcessor has methods to retrieve buckets and events by timestamps. We add a method to IndexProcessor to add the tracker to internal buckets.

Cluster computing

Sunday, June 15, 2014

No comments:

Post a Comment