With this post, I will now return to my readings on Splunk from the book - Exploring Splunk. Splunk has a server and client. The Engine of the Splunk is exposed via REST based APIs to CLI Interface, web interface and other interfaces.
The Engine has multiple layers of software. At the bottom layer are components that read from different source types such as files, network ports or scripts. The layer above is used for routing, cloning, and load balancing the data feeds and this are dependent on the load. This load is generally distributed for better performance. All the data is subject to Indexing and an index is build Note that both the indexing layer and the layer below i.e. routing, cloning and load balancing are deployed and set up with user access controls. This is essentially the where what gets indexed by whom is decided. The choice is left to users because we don't want sharing or privacy violations and by leaving it configurable we are independent of how much or how little is sent our way for processing.
The layer on top of Index is Search which determines the processing involved for retrieving the results from the index. The search query language is used to describe the processing and the searching is distributed across workers so that the results can be parallel-ly processed. The layer on top of the Search is the Scheduling/Alerting, Reporting and Knowledge each of which is a dedicated component in itself. The results from these are sent through the REST based API.
Pipeline is used to refer to the data transformations as it changes shape, form and meaning before being indexed. Multiple pipe-lines may be involved before indexing. Processor performs small but logical unit of work. Processors are logically contained within a pipeline.
Processors perform small but logical unit of work. Queues hold the data between Pipelines. Producers and consumers operate on the two ends of a queue.
The file input is monitored in two ways - one the file watcher that scans directory or finds files and the other that reads files at the tail where the data is being added.
The Engine has multiple layers of software. At the bottom layer are components that read from different source types such as files, network ports or scripts. The layer above is used for routing, cloning, and load balancing the data feeds and this are dependent on the load. This load is generally distributed for better performance. All the data is subject to Indexing and an index is build Note that both the indexing layer and the layer below i.e. routing, cloning and load balancing are deployed and set up with user access controls. This is essentially the where what gets indexed by whom is decided. The choice is left to users because we don't want sharing or privacy violations and by leaving it configurable we are independent of how much or how little is sent our way for processing.
The layer on top of Index is Search which determines the processing involved for retrieving the results from the index. The search query language is used to describe the processing and the searching is distributed across workers so that the results can be parallel-ly processed. The layer on top of the Search is the Scheduling/Alerting, Reporting and Knowledge each of which is a dedicated component in itself. The results from these are sent through the REST based API.
Pipeline is used to refer to the data transformations as it changes shape, form and meaning before being indexed. Multiple pipe-lines may be involved before indexing. Processor performs small but logical unit of work. Processors are logically contained within a pipeline.
Processors perform small but logical unit of work. Queues hold the data between Pipelines. Producers and consumers operate on the two ends of a queue.
The file input is monitored in two ways - one the file watcher that scans directory or finds files and the other that reads files at the tail where the data is being added.
No comments:
Post a Comment