In this article, we continue with the discussion
on stream processing with Flink and look into the use of windows. An infinite
stream can only be processed if it were split into finite size. A window refers
to this split and can have a size dependent on one or the other factors. A
programmer generally refers to the use of windows with the help of syntax to
define the split. In the case of keyed streams, she uses “keyBy” operator
followed by “window” and in the case of non-keyed streams, she uses “windowAll”
The keyed stream helps with parallelizing the stream tasks.
There are four types of windows:
1)
Tumbling window: A tumbling window assigns each
element to a window of a specified window size as with the code shown
.keyBy(<key selector>)
.window(TumblingEventTimeWindows.of(Time.seconds(5)))
2)
Sliding window: this window allows overlapping
with other windows even though the size of the window is fixed as per the
previous definition. An additional window slide parameter dictates how often
the next window comes up.
3)
Session window: this window assigns elements by
group elements by activity. The window does not have a fixed size and do not
overlap The number of elements within a window can vary and the gap between the
windows is called session gap.
.window(EventTimeSessionWindows.withDynamicGap((element)
-> {
// determine and return session gap
}))
4)
Global window: This is a singleton spanning all
events with the same key into the same window. Since it does not have a
beginning or an end, no processing can be done unless there is a trigger
specified. A trigger determines when a window is ready.
The computation on a window is performed with the help of a
window function which can be one of ReduceFunction, AggregateFunction,
FoldFunction, or ProcessingWindowFunction. Flink incrementally aggregates the
elements for each window as they arrive for the first two window functions. The
ProcessingWindowFunction requires buffering but it returns an iterator with
metadata about the window. A fold function determines how each input value
affects the output value and is called once for each element as it is added to
the window.
The context object holds the state which are of two types:
1)
globalState()
2)
windowState()
Both allow access to the keyed state but the former is not
scoped to a window while the latter is scoped to a window.
No comments:
Post a Comment