Cluster computing

Wednesday, February 17, 2021

User interface and API design for Streaming Queries (continued from previous post ...)

Analytical applications that perform queries on data are best visualized via catchy charts and graphs. Most applications using storage and analytical products have a user interface that lets them create containers for their data but have very little leverage for the customer to author streaming queries. Even the option to generate a query is nothing more than the uploading of a program using a compiled language artifact. The application reuses a template to make requests and responses, but this is very different from long running and streaming responses that are typical for these queries. A solution is presented in this document.

Query Interface is not about keywords.

When it comes to a user interface for querying data, people often associate the Google user interface with search terms. Avoid Google. This is not necessarily the right interface for Streaming queries. If anything, it is simplistic, and keywords based. It is not operators-based. Splunk interface is a much cleaner interface since it has seasoned from search over machine data. Streaming queries are not table queries. They also don’t operate on big data on the Hadoop file system only. They do have standard operators that can directly translate to operator keywords and can be chained the same manner in a java file as pipe operators on the search bar. These translations of query operator expressions on the search bar to a java language FLink streaming query is a patentable design for streaming user interface and one that will prove immensely flexible for direct execution of user queries as compared to package queries.

Use of off-the-shelf JavaScript libraries for data visualization.

There are several off-the-shelf libraries for data visualization such as D3.js, Recharts, and other libraries including admin-UI for dashboards. They are used on a case-by-case basis. User-interface for a dedicated purpose requires a lot of customization. Sometimes it is easy to work with off-the-shelf-libraries, other times, it is simpler to rewrite the foundation for the essential and long-standing use-cases.

Conclusion:

User interface development for streaming queries should bring the best practice of usability from those for objects store and database while bringing the best out of the event-by-event continuity that is the hallmark of stream processing. Paginated results are the hallmark of querying over big data whether the more relevant results are brought up first even if all the results follow later. Results from streaming queries can also be shown as they arrive so that the users can have a chance to pause if they have what they need already. They will also benefit from piped operators that can be entered in the search bar just the same way as keywords and this permits the results to be extracted-transformed-and-loaded to forms that appeal better to the user.

Tuesday, February 16, 2021

User interface and API design for Streaming Queries (continued from previous post ...)

The Proposal

Query Interface is not about keywords.

When it comes to a user interface for querying data, people often associate Google user interface with search terms. Avoid Google. This is not necessarily the right interface for Streaming queries. If anything, it is simplistic, and keywords based. It is not operators-based. Splunk interface is a much cleaner interface since it has seasoned from search over machine data. Streaming queries are not table queries. They also don’t operate on big data on the Hadoop file system only. They do have standard operators that can directly translate to operator keywords and can be chained the same manner in a java file as pipe operators on the search bar. These translations of query operator expressions on the search bar to a java language FLink streaming query is a patentable design for streaming user interface and one that will prove immensely flexible for direct execution of user queries as compared to package queries.

Some truths about user interface development:

1) User Interface development estimates slip too often. It is not necessarily due to user interface test automation frameworks such as Selenium. Instead, it is caused by the user interface becoming monolithic and clunky. It takes on a large maintenance tail as different browsers and timing issues must be handled or special-cased.

2) With the number of resources working on user interface increasing, there is a machine efficiency curve generated to user interface library development.

3) User interface should be more like a library rather than an all-inclusive customized application. It should easily switch as a layer over one data source to another or both

4) It may be tempting to copy the existing solution from competitors such as

Splunk

SQL

Tableau

Machine Learning User Interface

But it is far more important to streamline the translation of search expressions in plain English on the Search bar to Java Flink queries that can run in a fast serverless computing manner.

5) If a wire diagram becomes more appealing and the go-to resource for all problem solving, it will likely not materialize. Instead, the comfort and pain of using the existing technologies for improving the user experience and usability must be honed.

6) Plug-n-play plugin JavaScript framework for Sharepoint is a great start for standard query operators. The essence of stream querying and the continuous availability of results and their paging is something that will not be found elsewhere and should be salient for a user interface over streams.

7) Security is not a functionality of the user interface. It is a non-functional aspect and should not require scattered checks all over the place but rather at the admission of control.

8) User interface is great to build on REST interface and with thin clients, but the functionality should be composable and inheritable

9) Separation of concerns especially between dashboard elements and the logic pertaining to one or more search queries will tremendously improve the offerings to the customer.

Monday, February 15, 2021

User Interface and API design for Streaming queries

Problem statement:

Description

An application can choose to offer a set of packaged queries available for the user to choose from a dropdown menu while internalizing all upload of corresponding programs, their execution and the return of the results. One of the restrictions that comes with packaged queries exported via REST APIs is their ability to scale since they consume significant resources on the backend and continue to run for a long time. These restrictions cannot be relaxed without some reduction on their resource usage. The API must provide a way for consumers to launch several queries with trackers and they should be completed reliably even if they are done one by one. This is facilitated with the help of a reference to the query and a progress indicator. The reference is merely an opaque identifier that only the system issues and uses to look up the status. The indicator could be another api that takes the reference and returns the status. It is relatively easy for the system to separate read-only status information from read-write operations so the number of times the status indicator is called has no degradation on the rest of the system. There is a clean separation of the status information part of the system which is usually periodically collected or pushed from the rest of the system. The separation of read-write from read-only also helps with their treatment differently. For example, it is possible to replace the technology for the read-only separately from the technology for read-write. Even the technology for read-only can be swapped from one to another for improvements on this side.

The design of all REST APIs generally follows a convention. This practice gives well recognized uri qualifier patterns, query parameters and methods. Exceptions, errors and logging are typically done with the best usage of http protocol. The 1.1 version of that protocol is revised so it helps with the

The SQL query can also be translated from REST APIs with the help of engines like Presto or LogParser.

Tools like LogParser allow sql queries to be executed over enumerable. SQL has been supporting user defined operators for a while now. Aggregation operations have had the benefit that they can support both batch and streaming mode operations. These operations therefore can operate on large datasets because they view only a portion at a time. The batch operation can be parallelized to several processors. This has generally been the case with Big Data and cluster mode operations. Both batch and stream processing can operate on unstructured data.

Presto from Facebook is a distributed SQL query engine can operate on streams from various data source supporting adhoc queries in near real-time.
Just like the standard query operators of .NET the FLink SQL layer is merely a convenience over the table APIs. On the other hand, Presto offers to run over any kind of data source not just Table APIs.
Although Apache Spark query code and Apache Flink query code look very much similar, the former uses stream processing as a special case of batch processing while the latter does just the reverse.
Also, Apache Flink provides a SQL abstraction over its Table API
While Apache Spark provided ". map(...)" and ". reduce(...)" programmability syntax to support batch-oriented processing, Apache Flink provides Table APIs with ".groupby(...)" and ". order(...)" syntax. It provides SQL abstraction and supports steam processing as the norm.

The APIs vary when the engines change even though they form an interface such that any virtualized query can be executed, and the APIs form their own query processing syntax. They vary because the requests and responses vary, and their size and content vary. Some APIs are better suited for stream processing and others for batch processing.