Sunday, December 22, 2019


In this article, we continue with the discussion on stream processing with Flink and look into the use of windows. An infinite stream can only be processed if it were split into finite size. A window refers to this split and can have a size dependent on one or the other factors. A programmer generally refers to the use of windows with the help of syntax to define the split. In the case of keyed streams, she uses “keyBy” operator followed by “window” and in the case of non-keyed streams, she uses “windowAll”
The keyed stream helps with parallelizing the stream tasks.
There are four types of windows:
1)      Tumbling window: A tumbling window assigns each element to a window of a specified window size as with the code shown
.keyBy(<key selector>)
.window(TumblingEventTimeWindows.of(Time.seconds(5)))
2)      Sliding window: this window allows overlapping with other windows even though the size of the window is fixed as per the previous definition. An additional window slide parameter dictates how often the next window comes up.
3)      Session window: this window assigns elements by group elements by activity. The window does not have a fixed size and do not overlap The number of elements within a window can vary and the gap between the windows is called session gap.
.window(EventTimeSessionWindows.withDynamicGap((element) -> {
    // determine and return session gap
}))
4)      Global window: This is a singleton spanning all events with the same key into the same window. Since it does not have a beginning or an end, no processing can be done unless there is a trigger specified. A trigger determines when a window is ready.
The computation on a window is performed with the help of a window function which can be one of ReduceFunction, AggregateFunction, FoldFunction, or ProcessingWindowFunction. Flink incrementally aggregates the elements for each window as they arrive for the first two window functions. The ProcessingWindowFunction requires buffering but it returns an iterator with metadata about the window. A fold function determines how each input value affects the output value and is called once for each element as it is added to the window.
The context object holds the state which are of two types:
1)      globalState()
2)      windowState()
Both allow access to the keyed state but the former is not scoped to a window while the latter is scoped to a window.

No comments:

Post a Comment