Cluster computing

Wednesday, December 4, 2019

We were discussing Flink applications and the use of stream store such as Pravega
it is not appropriate to encapsulate an Flink connector within the http request handler for data ingestion at the store. This API is far more generic than the upstream software used to send the data because the consumer of this REST API could be the user interface, a language specific SDK, or shell scripts that want to make curl requests. It is better for the rest API implementation to directly accept the raw message along with the destination and authorization.
There are two factors I want to discuss further when comparing the analytical applications:
First, the use of Pravega as a stream store should not mandate the use of an Flink Application with Flink Connector. Data can be sent and read directly to and from the Pravega store respectively. However, the use of FlinkApplication is generally for performing transformations and queries which are both helpful when done in a cluster mode so that they can scale to large data sets.
Second data path is critical so it is not necessary to combine the collection and the analysis together.
Although most analysis works well only when certain collection happens, it is not necessary to make the collection heavy in terms of its processing. The collection is a data path and can almost always be streamlined. The analysis can happen as the collection happens because it is stream processing. However, analysis does not have to happen within the collection. It can execute separately and as long as there is a queue of events collected, analysis can begin with stream processing even for varying rates of ingestion and analysis because they are write and read paths respectively that are best kept separate.

Cluster computing

Wednesday, December 4, 2019

No comments:

Post a Comment