Cluster computing: Application troubleshooting continued

Sunday, June 7, 2020

Application troubleshooting continued

Multiple levels of scaling:

There are multiple levels of scaling involved. First, level is the Kubernetes level. Second level is the per component cluster level. Third level is the resources assigned to the application at the application level. Not all scaling is the same. Some increase replicas while others increase resources per cluster.

If the tasks are partitioned, scaling out the replicas help with the compute. If the compute is not the bottleneck but the storage is, then the pravega cluster can be scaled independent of others. The patching of resources in the PravegaCluster occurs in the following order: Zookeeper, BookKeeper, Pravega Segment Store and lastly Pravega Controller. When these components are scaled, the existing application containers can continue to work without restarting.

Scaling script may be available to help with the sizing but the following preparation is required in any case:

Require credentials to access the cluster and for the admin privilege on the software components
Check the health of the clusters in terms of desired and ready members

Check for errors and number of restarts on the containers
Check for the availability of resources to scale specific components or whole clusters
The actual scaling step will be easy to check with the number of members and their status
After the scaling, it is important to check the logs for containers to ensure that the they are healthy and running successfully

A simple check of the dashboard for the analytics and the stream can provide visual confirmation that all the components are healthy.

It is preferable not to attempt scaling prior to upgrade since it can be done when the components are running smoothly and the steps for scaling do not affect the operations of the components that are not touched.

Cluster computing

Sunday, June 7, 2020

Application troubleshooting continued

No comments:

Post a Comment