Cluster computing

Monday, November 25, 2019

A comparision of Flink SQL execution and Facebook’s Presto continued:

The Flink Application provides the ability to write SQL query expressions. This abstraction works closely with the Table API and SQL queries can be executed over tables. The Table API is a language centered around tables and follows a relational model. Tables have a schema attached and the API provides the relational operators of selection, projection and join. Programs written with Table API go through an optimizer that applies optimization rules before execution.

Presto from Facebook is a distributed SQL query engine can operate on streams from various data source supporting adhoc queries in near real-time.

Just like the standard query operators of .Net the FLink SQL layer is merely a convenience over the table APIs. On the other hand, Presto offers to run over any kind of data source not just Table APIs.

Although Apache Spark query code and Apache Flink query code look very much similar, the former uses stream processing as a special case of batch processing while the latter does just the reverse.

Also Apache Flink provides a SQL abstraction over its Table API

While Apache Spark provided ".map(...)" and ".reduce(...)" programmability syntax to support batch oriented processing, Apache Flink provides Table APIs with ".groupby(...)" and ".order(...)" syntax. It provides SQL abstraction and supports steam processing as the norm.

Cluster computing

Monday, November 25, 2019

No comments:

Post a Comment