Cluster computing: Application troubleshooting continued

Tuesday, June 9, 2020

Application troubleshooting continued

The sizing specification for Flink runtime workloads is not published by Apache but some form of deployments work as T-shirt size guidance.

They are differentiated by IO-intensive and CPU-intensive allocations and are listed below:

Minimal size is usually for non-production workloads and includes:

Compute Intensive configuration of 1 Zookeeper server and 1 job manager and 4x task managers

The IO intensive configuration has lower task managers

The Small, medium and large proportionately scale up from these configurations.

For example, the small configuration

The medium compute intensive configuration will involve 2 job managers and 8x task managers. The size of zookeeper cluster can be three for each of medium and large deployments.

The large configuration will involve 4 job managers and 16x tasks managers.

The IO intensive configuration can be reduced to smaller numbers from the above configuration.

Kubernetes cluster sizing:

SDP applications are hosted on a cluster that involves Flink runtime cluster and stream store cluster. The stream store is globally shared across all projects level isolations of the application. Therefore, Applications wanting more resources for compute increase the resources available to the Flink Cluster and those requiring more IO try to increase the resources on the Pravega cluster. However, these are not the only clusters on stream store and analytics platform. In addition to the controller and segment stores for the global Pravega cluster on stream store and analytics platform, there are additional clusters required for components such as Zookeeper, Bookkeeper and Keycloak. In fact, performance tuning of an Flink application spans several layers, stacks and components. Tuning tips such as reduction of traffic, omission of unnecessary routines, in-memory operations, removal of bottlenecks and hot spots, load-balancing and increasing the size of the clusters are all valid candidates for improving Application performance.

Cluster sizing is accommodated based on the resources initially carved for the entire cluster for stream store and analytics platform. The resources associated with this cluster is all virtual and the ratio of physical to virtual is adjusted external to the platform

All of the resources for clusters for components within the Kubernetes cluster can have their resource described and configured at the time of installation or upgrade via the values file. These specifications for memory, cpu and storage affect how those components are installed and run. The number of replicas or container counts can be scaled dynamically as typical for Kubernetes deployments. The number of desired and ready instances will reflect whether the resource scaling was accomodated.

Leveraging K8s platform

Application execution is not visible to K8s platform because it occurs over Flink runtime.

The logs, metrics and events from Flink can be made to flow to the Kubernetes runtime.

Leveraging stream store

Applications can continue to use the stream store for persisting custom state and intermediary execution results. This does not impact other applications and gives flexibility for the application to introduce logic such as pause and resume.

Cluster computing

Tuesday, June 9, 2020

Application troubleshooting continued

No comments:

Post a Comment