Cluster computing

Another search processor for Splunk could be type conversion. That is support for user defined types in the search bar. Today we have fields that we can extract from the data. Fields are like key-value pairs. So users define their queries in terms of key values. Splunk also indexes key-value pairs so that their look-ups are easier. Key-Value pairs are very helpful in associations between different SearchResults and in working with different processors. However, support for user defined types can change the game and become a tremendous benefit to the user. This is because user defined types associate not just one fields but more than one fields with the data and in a way the user defines. This is different from tags. Tags can also come in helpful to the user for his labeling and defining the groups he cares about. However support for types and user defined types goes beyond mere fields. This is quite involving in that it affects the parser, the indexer, the search result retrieval and the display.
But first let us look at a processor that can support extract, transform and load kind of operations. We support these via search pipeline operators where the search results are piped to different operators that can handle one or more of the said operations. For example, if we wanted to transform the raw data behind the search results into XML, we can have a 'xml' processor that transforms it into a single result with the corresponding XML as the raw data. This lends itself to other data transformations or XML style querying by downstream systems.XML as we know is a different form of data than tabular or relational. Tabular or relational data can have compositions that describe entities and types. We don't have a way to capture the type information today but that doesn't mean we cannot plug into a system that does. For example, database servers handle types and entities. If Splunk were to have a connector where it could send XML downstream to a SQL lite database and shred the XML to relational data, then Splunk doesn't even have the onus to implement a type based system. It can then choose to implement just the SQL queries that lets the downstream databases handle it. These SQL queries can even be saved and reused later. Splunk uses SQL lite today. However, the indexes that Splunk maintains is different from the indexes that a database maintains. Therefore, extract transform and load of a data to downstream systems could be very helpful. Today atom feeds may be one way to do that but search results are even more intrinsic to Splunk.

In this post, I hope I can address some of the following objectives, otherwise I will try to elaborate over them in the next few.
Define why we need xml operator
The idea behind converting tables or CSVs to XML is that it provides another avenue for integration with data systems that rely on such data format. Why are there special systems using XML data ? Because data in xml can be validated independently with XSD, provide a hierarchical and well defined tags, enable a very different and useful querying system etc. Uptil now, Splunk relied on offline and file based dumping of XML. Such offline methods did not improve the workflow users have when integrating with systems such as a database. To facilitate the extract, transform and load of search results into databases, one has to have better control over the search results. XML is easy to import and shred in databases for further analysis or archival. The ability to integrate Splunk with a database does not diminish the value proposition of Splunk. If anything, it improves the usability and customer base of Splunk by adding more customers who rely on database for analysis.
Define SQL integration
Define user defined type system
Define common type system
Define user defined search operator
Define programmable operator
Define user programming interface for type system

Cluster computing

Thursday, July 10, 2014

No comments:

Post a Comment