Cluster computing

Monday, November 18, 2019

A comparision of Flink SQL execution and Facebook’s Presto continued:

The Flink Application provides the ability to write SQL query expressions. This abstraction works closely with the Table API and SQL queries can be executed over tables. The Table API is a language centered around tables and follows a relational model. Tables have a schema attached and the API provides the relational operators of selection, projection and join. Programs written with Table API go through an optimizer that applies optimization rules before execution.

Presto from Facebook is a distributed SQL query engine can operate on streams from various data source supporting adhoc queries in near real-time.

The querying of key value collection is handled natively as per the data store. This translates to a query popularly described in SQL language over relational store as a join where the key-values can be considered a table with columns as key and value pair. The desired keys to include in the predicate can be put in a separate temporary table holding just the keys of interest and a join can be performed between the two based on the match between the keys.

Without the analogy of the join, the key-value collections will require standard query operators like where clause which may test for a match against a set of keys. This is rather expensive compared to the join because we do this with a large list of key-values and possibly repeated iterations over the entire list for matches against one or more keys in the provided set.

Most key-value collections are scoped. They are not necessarily in a large global list. Such key-values become scoped to the document or the object. The document may be in one of two forms – Json and Xml. The Json format has its own query language referred to as jmesPath and the Xml also support path-based queries. When the key-values are scoped, they can be efficiently searched by an application using standard query operators without requiring the use of paths inherent to a document format as Json or Xml.

Presto scalability to processing petabytes of data is unparalled. And the use of a distributed SQL query engine also helps

int getKthAntiClockWise(int[] [] A, int m, int n, int k)
{
if (n <1 || m < 1) return -1;
if (k <= m)
return A[0, k-1];
if (k <= n+m-1)
return A[m-1, k-m];
if (k <= n+m-1+m-1)
return A[n-1, (m-1-(k-(n+m-1)))] ;
if (k <= n+m-1+m-1+n-2)
return A[0, n-1-(k-(n+m-1+m-1))];
return getKthAntiClockWise(Copy(A, (1,1,m-2,n-2)), m-2, n-2, k-(2*n+2*m-4)));
// Copy uses System.arraycopy
}

Cluster computing

Monday, November 18, 2019

No comments:

Post a Comment