Cluster computing

Thursday, July 31, 2014

In today's post, we continue our discussion on porting Splunk forwarder to SplunkLite.Net which is a lightweight application that forwards, indexes and searches Splunk data. In the previous posts, we discussed a few of the functionalities we require such as the ability to create a data pipeline for input, processors that can convert the input into events and save them for analytics later. There's still a few more data structures to look into but as we see the majority of the framework and utilities we use are conveniently available to us in .Net libraries. This reduces the code significantly. Framework helpers such as HTTP request and response handling, HttpStaticDispatcher, Processor, QueryRunningSingleton, SearchResults, SearchEvaluator, ServerConfig TagManager, etc are still needed. The ability to secure the REST calls with AdminManager is also needed. The KeyManagers for localhost, search peers and general settings can come in later too. The utilities for FileSystemWatcher, datetime handing, HTTPServer, ProducerConsumerQueue, already have support in .Net. Proprietary database helpers such as PersistentHashData, PersistentHashManager, PersistentMapManager and PersistentStorage are still required. Let us look at the Persistent data structures more closely. PersistentMapManager provides a way to lookup based on keys and tags. It has methods to get all the keys, or matching Keys or to check if a key exists or to remove keys. The same holds for tags. Ability to look up the store based on keys and tags has been a key feature of Splunk analytics. PersistentHashManager maintains a hash table and gets all the data that matches a key. The values are maintained as PersistentHashData and the data on disk is accessed via RecordFileManager which loads the DB file into memory and has methods for read and write records to disk.
Results from the search on the database are available via a data structure called SearchResults which is a collection of SearchResult and maintains a key map. Each SearchResult returns a list of fields which can be multivalued.
Note that the SearchResult is internal to Splunk. The export of results in different data formats via online and offline methods are also available. This let Splunk integrate well in most ecosystems. I introduced a way for Splunk to provide searchable data to LogParser which has a SQL interface. The ability to use SQL over splunk makes it user friendly to users who work primarily with databases.

Cluster computing

Thursday, July 31, 2014

No comments:

Post a Comment