Friday, September 14, 2018

We were discussing the choice of Query Language for search over object storage.
The use of user defined operators and computations to perform the work associated with the data is well known for querying. Such custom operators enable intensive and involved queries to be written. These have resulted in stored logic such as the stored procedures which are written in a variety of languages. With the advent of machine learning and data mining algorithms, these have enabled support for new languages and packages as well as algorithms that are now available right out of the box and shipped with their respective tools.
While some graph databases have to catchup on support for streaming operations, Microsoft facilitated it with StreamInsight queries. The Microsoft StreamInsight Queries follow a five-step procedure:

1)     define events in terms of payload as the data values of the event and the shape as the lifetime of the event along the time axis

2)     define the input streams of the event as a function of the event payload and shape. For example, this could be a simple enumerable over some time interval

3)     Based on the events definitions and the input stream, determine the output stream and express it as a query. In a way this describes a flow chart for the query

4)     Bind the query to a consumer. This could be to a console. For example

        Var query = from win in inputStream.TumblingWindow( TimeSpan.FromMinutes(3)) select win.Count();
5)     Run the query and evaluate it based on time.
Query execution engine is  different for large distributed databases.  For example, Horton has four components  - the graph client library, the graph coordinator, graph partitions and the graph manager. The graph client library sends queries to the graph coordinator which prepares an execution plan for the query. The graph partition manages a set of graph nodes and edges. Horton is able to scale out mainly because of graph partitions. The graph manager provides an administrative interface to manage the graph with chores like loading and adding and removing servers. But the queries that are written for Horton are not necessarily the same as SQL.
While Horton's approach is closer to SQL, Cypher's language has deviated from SQL. Graph databases evolved their own query language such as Cypher to make it easy to work with graphs. Graph databases perform better than relational in highly interconnected data where a nearly online data warehouse is required. Object Storage could have standard query operators for the query language if the entire data were to be considered as enumerable.
In order to collapse the enumeration, efficient lookup data structures such as Bplus tree are used. These indexes can be saved right in the object storage for enabling faster lookup later.  Similarly logs for query engine operations and tags and metadata for objects may also be persisted in object storage. The storage forms a layer with the query engine compute layer stacked over it.

Void generateEvenFibonacci () {
Var Fibonacci = GetFibonacciNumbers ();
Fibonacci.Enumerate( (I , e) => { if ( (i%2 == 1) { Console.writeline (e); }} );
}

No comments:

Post a Comment