Cluster computing

Tuesday, July 29, 2014

Let us look at the porting of Splunk Forwarder to SplunkLite.Net ( the app ). This is the app we want to have all three roles forwarder, indexer and searchhead. Our next few posts may continue to be on forwarder because they are all going to scope the Splunk code first. When we prepare a timeline for implementation, we will consider sprints and include time for TDD and feature priority but first let us take a look at the different sections of the input pipeline and what we need to do. First we looked at the PipelineData. We now look at the context data structures. Our goal is to enumerate the classes and data structures needed to enable writing an event from a Modular input. We will look at Metadata too.
Context data structures include configuration information. Configuration is read and merged at the startup of the program. For forwarding, we need some configuration that are expressed in files such as input.conf, props.conf etc. These are key value pairs and not the xml that .Net configuration files are. The format of the configuration does not matter. What matters is their validation and population in runtime data structures called a PropertyMap. Configuration keys also have their metadata and this is important because we actively use it.
PipelineData class manages data that is passed around in the pipeline. Note that we don't need raw storage for the data since we don't need to manage the memory. The same goes for Memory pool because we don't actively manage the memory. The runtime does it for us and this scales well.What we need is the per-event data structure including metadata. Metadata can be for each field. This lets us understand the event better.
We will leave the rest of the discussion on Metadata to indexer time discussion. For now, we have plenty more to cover for Forwarder.
Interestingly one of the modular inputs is UDP and this should facilitate the traffic such as games, realtime traffic and other such applications. However, udp modular input may not be working in Splunk especially if connection_host key is specified.
We will focus exclusively on file based modular input for the prototype we want to build. This will keep it simple and give us the flexibility to work with events small and large, few and several.

Cluster computing

Tuesday, July 29, 2014

No comments:

Post a Comment