Monday, January 6, 2020

Ordering events in Flink involves two aspects:
First, it requires the events to be timestamped. This can be done either at the source or by methods in Flink
Second, it requires serialized execution when the events are processed as they come rather than by looking at the timestamps.

The method to do first is demonstrated with the following code:

DataStream<MyEvent> withTimestampsAndWatermarks = stream
        .filter( event -> event.severity() == WARNING )
        .assignTimestampsAndWatermarks(new MyTimestampsAndWatermarks());

The method to do second is by ensuring:
stream.setParallelism(1);

Or by the use of synchronized locks within Function objects.

Flink provides three different types of processing based on timestamps which are independent of the above two methods. There can be three different types of timestamps corresponding to: processing time, event time and ingestion time.
Out of these only the event time guarantees completely consistent and deterministic results. All three processing types can be set on the StreamExecutionEnvironment prior to the execution of queries.
Event time also support watermarks. Watermarks is the mechanism in Flink to measure progress in event time. They are simply inlined with the events. As a processor advances its timestamp, it introduces a watermark for the downstream operators to process. In the case of distributed systems where an operator might get inputs from more than one streams, the watermark on the outgoing stream is determined from the minimum of the watermarks from the invoking streams. As the input streams update their event times, so does the operator. Flink also provides a way to coalesce events within the window.

No comments:

Post a Comment