Saturday, January 4, 2020

Apache Flink Applications can use ProcessFunction with streams. It is similar to FlatMap Function but handles all three : events, state and timers. The function applies to each and every event in the input stream. It also gives access to the Flink keyed state via the runtime Context. The timer allows changes in event time and processing time to be handled by the application. Both the timestamps and timerservice are available via the context object. This is only possible with process function on a keyed stream.

The timer service is called for future event processing via registered callbacks. The onTimer method is invoked for this particular processing. Inside this method, states are scoped to the key for which the timer was created.

Joins are possible on low-level operations of two inputs. The ProcessFunction or the KeyedCoProcessFunction is a function that is bound to two different inputs and it gets individual calls to processElement1 and processElement2 for records for respective inputs. One of the inputs is processed first. It’s state is updated. Then the other input is processed. The earlier state is probed and the result is emitted. Since events may correspond to different times including delays, a watermark may be used as a reference and when it has elapsed, the operation may be performed.

The timers are fault-tolerant and checkpointed This helps with recovery if the task gets paused. There does not need to be an ordered set of events with the logic above.

The Flink coalescing may maintain only one timer per key and timestamp. The timer resolution is in milliseconds but the runtime may not permit it at this granularity although the system clock can. Besides, the timer may work upto +/- one-time unit equal to resolution from the time specified. Timers can be coalesced between watermarks. However, there can be only one timer per key per resolution time-unit.

No comments:

Post a Comment