Cluster computing

Saturday, November 23, 2019

A comparision of Flink SQL execution and Facebook’s Presto continued:

The Flink Application provides the ability to write SQL query expressions. This abstraction works closely with the Table API and SQL queries can be executed over tables. The Table API is a language centered around tables and follows a relational model. Tables have a schema attached and the API provides the relational operators of selection, projection and join. Programs written with Table API go through an optimizer that applies optimization rules before execution.

Presto from Facebook is a distributed SQL query engine can operate on streams from various data source supporting adhoc queries in near real-time.
Microsoft facilitated streaming operators with StreamInsight queries. The Microsoft StreamInsight Queries follow a five-step procedure:
1) define events in terms of payload as the data values of the event and the shape as the lifetime of the event along the time axis
2) define the input streams of the event as a function of the event payload and shape. For example, this could be a simple enumerable over some time interval
3) Based on the events definitions and the input stream, determine the output stream and express it as a query. In a way this describes a flow chart for the query
4) Bind the query to a consumer. This could be to a console. For example
Var query = from win in inputStream.TumblingWindow( TimeSpan.FromMinutes(3)) select win.Count();
5) Run the query and evaluate it based on time.
The StreamInsight queries can be written with .Net and used with standard query operators. The use of ‘WindowedDataStream’ in Flink Applications and TumblingWindow in .Net are constructs that allow operation on stream based on a window at a time.

Just like the standard query operators of .Net the FLink SQL layer is merely a convenience over the table APIs. On the other hand, Presto offers to run over any kind of data source not just Table APIs.

Although Apache Spark query code and Apache Flink query code look very much similar, the former uses stream processing as a special case of batch processing while the latter does just the reverse.
Also Apache Flink provides a SQL abstraction over its Table API

Cluster computing

Saturday, November 23, 2019

No comments:

Post a Comment