Wednesday, January 8, 2020

Flink-connector has an EventTimeOrderingOperator This uses watermark and managed  state to buffer elements which helps to order elements by event time. This class extends the AbstractStreamOperator and implements the OneInputStreamOperator. The last seen watermark is initialized to min valueIt uses a timer service and mapState stashed in the runtime Context. It processes each stream record one by one. If the event does not have a timestamp it simply forwards. If the event has a timestamp, it buffers all the events between the current and the next watermark. 
When the event Timer fires due to watermark progression, it polls all the event time stamp that are less than or equal to the current watermark. If the timestamps are empty, the queued state is cleared otherwise the next watermark is registered. The sorted list of timestamps from buffered events is maintained In a priority queue. 
AscendingTimestampExtractor is a timestamp assigner and watermark generator for streams where timestamps are monotonously ascending. This is true in the case of log files. The local watermarks are easily assigned because they follow the strictly increasing timestamps which are periodic.  
ProcessFunction allows the use of timers. This allows receiving callback with the OnTimer method.
the TimerService provides methods like:
currentProcessingTime
currentWatermark
registerEventTimeTimer
registerProcessingTimeTimer
which can peg the time when the timer fires. Within the function there can be any criteria to set the timer and because a callback should be received, selective action can be taken for that event.
One of the most common applications of the Timer callback is the use of collector.collect method. This lets the processFunction take action on every event but decides what the collector collects.
There are four aspects of these timers:
1) Timers work with KeyedStreams.
2) Timers allow events to be de-duplicated.
3) Timers are checkpointed
4) Timers can be deleted.

There is at most one timer per key. This results in deduplication of events. Multiple timers are registered for the same key or timestamp then only one will fire which will automatically de-duplicate timers for events. They are also persisted so the process function becomes fault-tolerant. Timers can also be deregistered.

No comments:

Post a Comment