Cluster computing

Monday, August 11, 2014

Today we look at the search functionality of Splunk.
Splunk Search as we saw earlier comes in differnent modes :
Cursored or Historical search
Batchmode search
Realtime search and
Indexed realtime search
and Search processors or their equivalent commands generally are of these types :
Streaming/Remote
Stateful Streaming
Events
Stream Reporting
Reporting
With Distributed search, we involve a few other players in these typical configurations:
search head with multiple peers
Multiple SH-s with multiple peers
SHs and clusters
The benefits are scaling, access control and geo-dispersed data
When a search is initiated, the request is transferred by the browser to the app server which in turn sends it to the backend splunkd The backend checks the system/user quota, bundle replication and creates the dispatch directory. The search is parsed into map and reduce At this point it can be distributed to peers. Peers can be a logical local peer or a remote peer. When the search request is received by a peer, it checks which bundle to use and initializes the conf subsystem based on the bundle, check indexes db and finds the buckets that are involved in the time range. With the help of the tsidx files the lexicon and the postings of those lexicons identifying the events where they occur help enumerate the matching events aka search results. The raw data from the compressed journal is read and processed into the search results. The activities performed, the parameters for the search and the results are all written out to the dispatch dir.

Cluster computing

Monday, August 11, 2014

No comments:

Post a Comment