Monday, November 18, 2019

A comparision of Flink SQL execution and Facebook’s Presto continued:

The Flink Application provides the ability to write SQL query expressions. This abstraction works closely with the Table API and SQL queries can be executed over tables. The Table API is a language centered around tables and follows a relational model. Tables have a schema attached and the API provides the relational operators of selection, projection and join. Programs written with Table API go through an optimizer that applies optimization rules before execution.

Presto from Facebook is  a distributed SQL query engine can operate on streams from various data source supporting adhoc queries in near real-time.

The querying of key value collection is handled natively as per the data store. This translates to a query popularly described in SQL language over relational store as a join where the key-values can be considered a table with columns as key and value pair. The desired keys to include in the predicate can be put in a separate temporary table holding just the keys of interest and a join can be performed between the two based on the match between the keys.

Without the analogy of the join, the key-value collections will require standard query operators like where clause which may test for a match against a set of keys. This is rather expensive compared to the join because we do this with a large list of key-values and possibly repeated iterations over the entire list for matches against one or more keys in the provided set.

Most key-value collections are scoped. They are not necessarily in a large global list. Such key-values become scoped to the document or the object. The document may be in one of two forms – Json and Xml. The Json format has its own query language referred to as jmesPath and the Xml also support path-based queries. When the key-values are scoped, they can be efficiently searched by an application using standard query operators without requiring the use of paths inherent to a document format as Json or Xml.

Presto scalability to processing petabytes of data is unparalled. And the use of a distributed SQL query engine also helps

int getKthAntiClockWise(int[] [] A, int m, int n, int k)
{
if (n <1 || m < 1) return -1;
if (k <= m)
    return A[0, k-1];
if (k <= n+m-1)
   return A[m-1, k-m];
if (k <= n+m-1+m-1)
   return A[n-1, (m-1-(k-(n+m-1)))] ;
if (k <= n+m-1+m-1+n-2)
   return A[0, n-1-(k-(n+m-1+m-1))];
return getKthAntiClockWise(Copy(A, (1,1,m-2,n-2)), m-2, n-2, k-(2*n+2*m-4)));
 // Copy uses System.arraycopy
}

Sunday, November 17, 2019

A comparision of Flink SQL execution and Facebook’s Presto:
The Flink Application provides the ability to write SQL query expressions. This abstraction works closely with the Table API and SQL queries can be executed over tables. The Table API is a language centered around tables and follows a relational model. Tables have a schema attached and the API provides the relational operators of selection, projection and join. Programs written with Table API go through an optimizer that applies optimization rules before execution.
Flink Applications generally do not need to use the above abstraction of Table APIs and SQL layers. Instead they work directly on the Core APIs of DataStream (unbounded) and DataSet (bounded data set) APIs. These APIs provide the ability to perform stateful stream processing.
For example,
DataStream<String> lines = env.addSource( new FlinkKafkaConsumer<>(…)); // source
DataStream<Event> events = lines.map((line)->parse(line)); // transformation
DataStream<Statistics> stats = events.keyBy(“id”).timeWindow(Time.seconds(10)).apply(new MyWindowAggregationFunction());
stats.addsink(new BucketingSinkPath));
Presto from Facebook is  a distributed SQL query engine can operate on streams from various data source supporting adhoc queries in near real-time. It does not partition based on MapReduce and executes the query with a custom SQL execution engine written in Java. It has a pipelined data model that can run multiple stages at once while pipelining the data between stages as it become available. This reduces end to end time while maximizing parallelization via stages on large data sets. A co-ordinator taking the incoming the query from the user draws up the plan and the assignment of resources. Facebook’s Presto can run on large data sets of social media such as in the order of Petabytes. It can also run over HDFS for interactive graphics.
There is also a difference in the queries when we match a single key or many keys. For example, when we use == operator versus IN operator in the query statement, the size of the list of key-values to be iterated does not reduce. It's only the efficiency of matching one tuple with the set of keys in the predicate that improves when we us an IN operator because we don’t have to traverse the entire list multiple times. Instead each entry is matched against the set of keys in the predicate specified by the IN operator. The use of a join on the other hand reduces the size of the range significantly and gives the query execution a chance to optimize the plan.
Just like the standard query operators of .Net the FLink SQL layer is merely a convenience over the table APIs. On the other hand, Presto offers to run over any kind of data source not just Table APIs.

Saturday, November 16, 2019

This is a continuation of the earlier posts to enumerate funny aspects of software engineering practice:

380) Build a product with thought through dashboard and find the users gravitating to the page
381) Build a product with little or no styling and find the applications not gaining appeal
382) Build a product with specific investment towards stylesheets and see dramatic improvement in perception
383) Build a product with customizable styles and the partners become happy
384) Build a product with styles that can be changed and the s satisfaction grows among end-users.
385) Build a product with styles that suit groups and membership to the group grows
386) Build a product with logo that can be made into stickers to be offered as give away and theit becomes popular among the young professionals
387) Build a product with marketing events that become popular and the awareness increases
388) Build a product with partnerships where partners talk about the product and the fan following grows
389) Build a product with advocacy groups and training and the skilled users grow
390) Build a product where people can learn about and hear others empowered in their work and the fan following grows.
391) Build a product where they have to wait for orphaned resourcesn cleanup before they can proceed with re-install.
392) Build a product where the users have to frequently run the installer and it doesn’t complete some of the times
393) Build a product where the software is blamed because the administrator was shy to read the manual
394) Build a product where the resources for the software provided by the customer does not meet the requirements.
395) Build a product where different parts of the software need to be installed differently and find that deployments are usually haphazard
396) Build a product where the installation on production environment is so elaborate that it requires planning, dry runs and coordination across teams
397) Build a product where every update is used to justify setting aside a day by the staff
398) Build a product where the reality and perception are deliberately kept different at a customer site
399) Build a product where the vision for the product is different from what the customer wants to use it for.
400) Build  product where the quirkiness of the product offers fodder for all kind of talks from conferences to board room meetings.

This is a continuation of the earlier posts to enumerate funny aspects of software engineering practice:

Friday, November 15, 2019

This is a continuation of the earlier posts to enumerate funny aspects of software engineering practice:
370) Build a product that gets a thumbs up from production support.
371) Build a product that makes it easy to add storage and find prolific use of storage by users.
372) Build a product that makes it easy for users to complete workflows with fewer checks and find that users tend to experiment rather than let the workflow guide them
373) Build a product that makes workflows too convoluted and users tend to use it only for partial completion
374) Build a product where users can create their own workflows and it becomes popular with audience that like user friendly designer software like tools
375) Build a product with little or no composition to workflows and users tend to write several clones of customized workflows
376) Build a product with the ability to support live debugging and it become popular for development environments
377) Build a product with lots of levers and the dashboard looks intimidating
378) Build a product with fewer levers and find the customers unhappy
379) Build a product where you let the users create their own panels for levers and they hardly do it
380) Build a product with thought through dashboard and find the users gravitating to the page
381) Build a product with little or no styling and find the applications not gaining appeal
382) Build a product with specific investment towards stylesheets and see dramatic improvement in perception
383) Build a product with customizable styles and the partners become happy
384) Build a product with styles that can be changed and the s satisfaction grows among end-users.
385) Build a product with styles that suit groups and membership to the group grows
386) Build a product with logo that can be made into stickers to be offered as give away and theit becomes popular among the young professionals
387) Build a product with marketing events that become popular and the awareness increases
388) Build a product with partnerships where partners talk about the product and the fan following grows
389) Build a product with advocacy groups and training and the skilled users grow
390) Build a product where people can learn about and hear others empowered in their work and the fan following grows.

Thursday, November 14, 2019

This is a continuation of the earlier posts to enumerate funny aspects of software engineering practice:
360) Build a product for different flavors of operating systems or cloud computing and find the developers struggle to keep their skills up to date on each flavor.
361) Build a product and realize it has to be modified for each and every platform on which it is run.
362) Build a product and which grows significantly and then retrofit adapters to different technologies.
363) Build a product and find a number of clients requiring customizations that ends up forming an infrastructure layer facing the clients
364) Build a product that proliferates layers and components as business expands only to have them shrink and adjust afterwards.
365) Build a product that makes it a challenge to reverse engineer
366) Build a product that reduces the surface area for foreign software to operate within its trust boundary
367) Build a product that lets it easy for applications to work inside the most stringent requirements customer sites
368) Build a product where the product cannot be remote accessed for troubleshooting due to customer -imposed restrictions
369) Build a product that makes it easy to diagnose issues or make remedies by flipping on or off configurations
370) Build a product that gets a thumbs up from production support.
371) Build a product that makes it easy to add storage and find prolific use of storage by users.
372) Build a product that makes it easy for users to complete workflows with fewer checks and find that users tend to experiment rather than let the workflow guide them
373) Build a product that makes workflows too convoluted and users tend to use it only for partial completion
374) Build a product where users can create their own workflows and it becomes popular with audience that like user friendly designer software like tools
375) Build a product with little or no composition to workflows and users tend to write several clones of customized workflows
376) Build a product with the ability to support live debugging and it become popular for development environments
377) Build a product with lots of levers and the dashboard looks intimidating
378) Build a product with fewer levers and find the customers unhappy
379) Build a product where you let the users create their own panels for levers and they hardly do it
380) Build a product with thought through dashboard and find the users gravitating to the page

Wednesday, November 13, 2019

This is a continuation of the earlier posts to enumerate funny aspects of software engineering practice:

350) Build a product which becomes a Fortune 500 and gets to demand others to work with it
351) Build a product which has an easier path to growth due to its celebrity power
352) Build a product which gets accepted in industries because someone/some lobby opens doors
353) Build a product which struggles during it venture capital rounds but makes up for it with acquisition or going public
354) Build a product which has enormous appeal generated by word of mouth in academic circles
355) Build a product with high standards where the developers revel in fewer restrictions on resources and timeline and form a tight productive group
356) Build a product where there is proactive program management to keep the growth aligned to timelines and milestones
357) Build a product where the management deals with engineering problems in creative ways while the product is developed at the pace of the software development team
358) Build a product where the architecture team is responsible for ensuring no component is missed from the possibilities for the product and find the specifications to become obsolete before it can be revised.
359) Build a product where the growth and versioning of the product follows snowflake like pattern while the developers have to jump about from one branch to another
360) Build a product for different flavors of operating systems or cloud computing and find the developers struggle to keep their skills up to date on each flavor.
361) Build a product and realize it has to be modified for each and every platform on which it is run.
362) Build a product and which grows significantly and then retrofit adapters to different technologies.
363) Build a product and find a number of clients requiring customizations that ends up forming an infrastructure layer facing the clients
364) Build a product that proliferates layers and components as business expands only to have them shrink and adjust afterwards.
365) Build a product that makes it a challenge to reverse engineer
366) Build a product that reduces the surface area for foreign software to operate within its trust boundary
367) Build a product that lets it easy for applications to work inside the most stringent requirements customer sites
368) Build a product where the product cannot be remote accessed for troubleshooting due to customer -imposed restrictions
369) Build a product that makes it easy to diagnose issues or make remedies by flipping on or off configurations
370) Build a product that gets a thumbs up from production support.