Sunday, July 13, 2014

Today we look at a comparison between Splunk clustering and a Hadoop instance. In Hadoop for instance, the MapReduce used is a high performance parallel data processing technique. It does not guarantee ACID properties and supports forward only parsing. Data is stored in Hadoop such that the column names, column count and column datatypes don’t matter. The data is retrieved in two steps – with a Map function and a Reduce function. The Map function selects keys from each line and the values to hold resulting in a big hashtable. The Reduce function aggregates results. The database stores these key-values as columns in a column family and each row can have more than one column family. Splunk uses key maps to index the data but has a lot to do in terms of Map-Reduce and database.Splunk stores events. Its indexing is about events - together with their raw data andtheir index files and metadata. These are stored in directories organized by agecalled buckets. Splunk clustering is about keeping multiple copies of data to preventdata loss and improving data availability for searching. Search heads co-ordinatesearches across all the peer nodes.

No comments:

Post a Comment