Cluster computing

Monday, July 28, 2014

Today onwards I'm going to start a series of posts on Splunk internals. As we know there are three roles for Splunk Server. - forwarding, indexing and searching.
We will cover forwarding section today to see what all components need to be ported to a SplunkLite framework.
First we look at Pipeline Data and associated data structures. It is easy to port some of these to C# and it gives the same definitions to the input that we have in Splunk Server.
Pipeline components and actor threads can be used directly in .Net. Configurations can be maintained via configuration files just the same as in Splunk Server. threads that are dedicated to shutdown or for running the event loop are still required. Note the support for framework items like event loop Can Be Substituted By The equivalent .Net scheduler classes. .Net has a rich support for threads and scheduling via the .Net 4.0 task library.
While on the topic of framework items, we might as well cover logger. Logging is available via packages like log4net or enterprise application block. These are convenient to add to the application and come with multiple destination features. When we iterate down the utilities required for this application, we will see that a lot of the efforts of writing something portable with Splunk Server are done away with because .Net comes with those already.
When writing the forwarder, we can start small with a few set of Input and expand the choices later. Having one forwarder one indexer and one search head will be sufficient for proof of concept. The code can provide end to end functionality and we can then augment each of the processors whether they are input search or index processors. Essentially the processors all conform in the same way for the same role, so how we expand it is up to us.
PersistentStorage may need to be specified to work with the data and this is proprietary. Data and Metadata may require data structures similar to what we have in Splunk. We would look into Hash manager and record file manager. We should budget for things that are specific to Splunk first because they are artifacts that have a history and a purpose.
Items that we can and should deliberately avoid are those for which we have rich and robust .net features such as producer consumer queues etc.
The porting of the application may require a lot of time. An initial estimation for the bare bones and resting is in order. For anything else we can keep it prioritized.

Cluster computing

Monday, July 28, 2014

No comments:

Post a Comment