Cluster computing

Friday, August 16, 2019

We were looking at storage for the stream processing. We now look into the analytical side. Flink is a distributed system for stateful parallel data stream processing. It performs distributed data processing by focusing leveraging its integration with cluster resource managers such as Mesos, YARN, and Kubernetes but it can also be configured to run as a standalone cluster. Flink depends on Zookeeper for co-ordination.

The components of a Flink setup include the jobManager, the resourceManager, the taskManager and the dispatcher. The jobManager is the master process that controls the execution of a single application. Each application is controlled by a different jobManager and is usually represented by a logical dataflow graph called the jobGraph and a jar file. The jobManager converts the job graph into a physical data flow graph and parallelizes as much of the execution as possible.

There are multiple resource managers based on the cluster of choice. It is responsible for managing task manager’s slots which is the unit of processing resources. It offers idle slots to the jobManager and launches new containers.

The task manager are the worker processes of Flink. There are multiple task managers running in a Flink setup. Each task manager provides a certain number of slots. These slots limit the number of tasks that a task manager can execute.

The dispatcher runs across job execution and when an application is submitted via its REST interface, it starts a jobManager and hands the application over. The REST interface enables the dispatcher to serve as an HTTP entrypoint to clusters that are behind a firewall.

Cluster computing

Friday, August 16, 2019

No comments:

Post a Comment