Monday, September 10, 2018

Object Storage as a query store
Introduction: Users are able to search and query files in a file system or unstructured data stores. Object storage is not only a replacement for file storage but is also an unstructured data store promoting enumeration of object with a simple namespace, bucket and object hierarchy. This articles looks at enabling not just querying over Object Storage but also search and mining techniques.
Description:
1) Object Storage as  a SQL store:
This technique utilizes a SQL engine over enumerables:

Object Storage data is search-able as a COM input to log parser. A COM input simply implements a few methods for the log parser and abstracts the data store. These methods are :
OpenInput: Opens your data source and sets up any initial environment settings
GetFieldCount: returns the number of fields that the plugin provides
GetFieldName: returns the name of a specified field
GetFieldType : returns the datatype of a specified field
GetValue : returns the value of a specified field
ReadRecord : reads the next record from your data source
CloseInput: closes the data source and cleans up any environment settings
Here we are saying that Object Storage acts as a data store for COM input to log parser which can then be queried in SQL for the desired output.
There are two different forms of expressions enabled SQL queries
First - This expression is in the form of standard query operators which became popular across languages such as .Where() and .Sum() as in LINQ. This tried, tested and well-established SQL Query language features. The language is very inspiring to express queries in succinct manner often enabling all aspects of data manipulation to refine and improve result-sets.
The second form of expression was with the search query language which has had a rich history in shell scripting and log analysis where the results of one command are piped into another command for transformation and analysis. Although similar in nature to chaining operators the expressions of this form of query involved more search like capabilities across heterogenous data such as with the use of regular expressions for detecting patterns.  This form of expression not only involved powerful query operators but facilitated data extract, transform and load as it made its way through the expressions.


2) Object Storage as a search index:


Here we are utilizing the contents with the objects to build an index. The index may reside in a database but there is no restriction for storing it as objects in object store if performance is tolerated.

Sample program available here: https://github.com/ravibeta/csharpexamples/blob/master/SourceSearch/SourceSearch/SourceSearch/Program.cs
3) Object Storage for deep learning:


Here we utilize the tags associated with the objects which may be done once when the content is classified. Operators used here can be expanded to include more involved forms of computations such as grouping, ranking, sorting and such analysis.


There are three scenarios for showcasing the breadth of query language which include - a Cognitive example , a text analysis example and a JSON processing.
The cognitive example identifies objects in images. This kind of example show how the entire image processing on image files can be considered custom logic and used with the query language. As long as we define the objects, the input and the logic to analyze the objects, it can be made part of the query to extract the desired output dataset.
The text analysis example is also similar where we can extract the text prior to performing the analysis. It is interesting to note that the classifier used to tag the text can be written in R language and is not dependent on the query.
JSON processing  is another example that is referenced often probably because it has become important to extract transform load in analytical processing whether it is a cloud data warehouse or big data operations. This "schema later" approach is popular because it decouples producers and consumers which saves co-ordination and time-consuming operations between say departments. In all these three scenarios, the object storage can work effectively as a storage layer.
Conclusion:
All aspects of a universal query language may be applicable to object stores just as if the content was available from file or document stores.

Furthermore, the metadata and the indexes may be stored in dedicated objects within the object storage

No comments:

Post a Comment