Cluster computing: Support for small to large footprint introspection database and query

Wednesday, July 29, 2020

Support for small to large footprint introspection database and query

The application that uses the introspection store may access it over APIs. In such case, the execution of the packaged query could imply launching an FlinkJob on a cluster and processing the results of the job. This is typical for any api implementation that involves asynchronous processing.

The execution of a query can take arbitrary amount of time. The REST based api implementation does not have to be blocking in nature since the call sequence in the underlying analytical and storage systems are also asynchronous. The REST api can provide a handle and a status checker or a webhook for completion reporting. This kind of mechanism is common at any layer and is also available from REST.

The ability to provide the execution of results inlined in the HTTP response is specific to the API implementation. Even if the results are large, the API responses can return a filestream so that the caller can process the response at their end. The ability to upload and download files is easily implementable for the API with the use of suitable parameter and return value for the corresponding calls.

The use of upload APIs is also useful to upload analytical queries in the form of Flink artifacts that the API layer can forward and execute on a cluster. The results of the query can then similarly be passed back to the user. The result of the processing is typically going to be a summary and will generally be smaller in size than overall data on which the query is run. This makes it convenient for the API return the results in one call.

The query processing ability from the API will also help automations that rely on scripts that run from the command line. Administrators generally prefer invocation from the command-line whenever possible. Health check by querying the introspection datastore as reported above is not only authoritative but also richer since all the components contribute to the data.

Cluster computing

Wednesday, July 29, 2020

Support for small to large footprint introspection database and query

No comments:

Post a Comment