Saturday, December 21, 2019

Flink considers batch processing to be a special case of stream processing. A batch program is one that operates on a batch of events at a time. It differs from the above mentioned Stateful stream processing in that it uses data structures in memory rather than the embedded key store and therefore can export its state externally and does not require to be kept together with streams. Since the state can  be stored externally, the use if checkpoints mentioned earlier is avoided since their purpose was also similar to states. The iterations are slightly different from that in stream processing, because there is now a superstep level that might involve coordination
When we mentioned the parallelism, we brought up keyBy operator. Let us look at the details of this method. It partitions the stream on the keyBy attribute and the windows are computed per key. A window operates on a finite set of elements a tumbling window has fixed boundaries.
For example,
stream

    .keyBy( (event) -> event.getUser() )

    .timeWindow(Time.hours(1))

    .reduce( (a, b) -> a.add(b) )

    .addSink(...);

 empty window does not trigger a computation

The keyBy is called for keyed streams and the windowAll is called for non-keyed streams
The notion behind keyed streams is that all of the computations of a windowed stream can be executed in parallel. Each logical keyed stream is independent. Elements with the same key goto the same task.

No comments:

Post a Comment