The sizing specification for Flink
runtime workloads is not published by Apache but some form of deployments work
as T-shirt size guidance.
They are differentiated by
IO-intensive and CPU-intensive allocations and are listed below:
Minimal size is usually for
non-production workloads and includes:
Compute Intensive configuration
of 1 Zookeeper server and 1 job manager and 4x task managers
The IO intensive configuration
has lower task managers
The Small, medium and large
proportionately scale up from these configurations.
For example, the small
configuration
The medium compute intensive configuration
will involve 2 job managers and 8x task managers. The size of zookeeper cluster
can be three for each of medium and large deployments.
The large configuration will
involve 4 job managers and 16x tasks managers.
The IO intensive configuration
can be reduced to smaller numbers from the above configuration.
Kubernetes
cluster sizing:
SDP
applications are hosted on a cluster that involves Flink runtime cluster and
stream store cluster. The stream store is globally shared across all projects
level isolations of the application.
Therefore, Applications wanting more resources for compute increase the
resources available to the Flink Cluster and those requiring more IO try to
increase the resources on the Pravega cluster. However, these are not the only
clusters on stream store and analytics platform. In addition to the controller
and segment stores for the global Pravega cluster on stream store and analytics
platform, there are additional clusters required for components such as
Zookeeper, Bookkeeper and Keycloak. In
fact, performance tuning of an Flink application spans several layers, stacks
and components. Tuning tips such as reduction of traffic, omission of
unnecessary routines, in-memory operations, removal of bottlenecks and hot
spots, load-balancing and increasing the size of the clusters are all valid
candidates for improving Application performance.
Cluster
sizing is accommodated based on the resources initially carved for the entire
cluster for stream store and analytics platform. The resources associated with
this cluster is all virtual and the ratio of physical to virtual is adjusted
external to the platform
All
of the resources for clusters for components within the Kubernetes cluster can
have their resource described and configured at the time of installation or
upgrade via the values file. These specifications for memory, cpu and storage
affect how those components are installed and run. The number of replicas or
container counts can be scaled dynamically as typical for Kubernetes
deployments. The number of desired and ready instances will reflect whether the
resource scaling was accomodated.
Leveraging K8s platform
Application execution is not
visible to K8s platform because it occurs over Flink runtime.
The logs, metrics and events
from Flink can be made to flow to the Kubernetes runtime.
Leveraging stream store
Applications can continue to use the
stream store for persisting custom state and intermediary execution results.
This does not impact other applications and gives flexibility for the
application to introduce logic such as pause and resume.
No comments:
Post a Comment