Today we continue reading the book "Stream processing with Apache Flink" We have read about parallelization, time and state in stream processing. We discussed the jobManager, taskManager, resourceManager and Dispatcher implemented the distributed stream processing. The jobManager is the master process that controls the execution of a single application. Each application is controlled by a different jobManager and is usually represented by a logical dataflow graph called the jobGraph and a jar file. The jobManager converts the job graph into a physical data flow graph and parallelizes as much of the execution as possible.
Each task Manager has a pool of network buffers with size 32KB to send and receive data The sender and receiver exchange data based on permanent tcp connections if they are on different processes A buffer is require for every receiver and there should be enough buffers and is proprotional to the number of tasks of the involved operators. Buffers add latency so a mechanism for flow control is required. Flink implements a credit-based flow control. which reduces latency because the sender can send data as soon as the receiver has enough resources to accept it.The receiver sends credit notification with the number of network buffers it was granted and the sender piggy backs the backlog that is ready. This ensures that there is a continuous flow
Flink features an optimization technique called task chaining where tasks that are usually remote are found locally and executed by linking them with local forward channels.\ All functions are evaluated by an individual task running in a dedicated thread.
Timestamps and watermarks support event-time semantics. Timetamps are sixteen byte long values that is slapped on to the metadata and is used to collect events into a time window. Watermarks are used to derive the current event time at each task in an event application. Watermarks are special records which have a timestamp and are placed together with the usual event records. The watermark must be monotonically increasing and satisfy the property that events between them fall in the time range between their timestamps.
Each task Manager has a pool of network buffers with size 32KB to send and receive data The sender and receiver exchange data based on permanent tcp connections if they are on different processes A buffer is require for every receiver and there should be enough buffers and is proprotional to the number of tasks of the involved operators. Buffers add latency so a mechanism for flow control is required. Flink implements a credit-based flow control. which reduces latency because the sender can send data as soon as the receiver has enough resources to accept it.The receiver sends credit notification with the number of network buffers it was granted and the sender piggy backs the backlog that is ready. This ensures that there is a continuous flow
Flink features an optimization technique called task chaining where tasks that are usually remote are found locally and executed by linking them with local forward channels.\ All functions are evaluated by an individual task running in a dedicated thread.
Timestamps and watermarks support event-time semantics. Timetamps are sixteen byte long values that is slapped on to the metadata and is used to collect events into a time window. Watermarks are used to derive the current event time at each task in an event application. Watermarks are special records which have a timestamp and are placed together with the usual event records. The watermark must be monotonically increasing and satisfy the property that events between them fall in the time range between their timestamps.
No comments:
Post a Comment