Cluster computing

Friday, January 17, 2020

Ideas for a graceful shutdown of an application hosted on Kubernetes orchestration framework (K8s) continued...

8. Eighth, there are special capabilities with statefulset which include the following:

a. They can be used to create replicas when pods are being deployed. The pods are created sequentially in order from 0 to N-1 When the pods are deleted, they are terminated in the reverse order

b. They can be used to created ordered and graceful scaling. All of the predecessors are ensured to be ready and running prior to scaling

c. Before a pod is terminated, all of its successors must be completely shut down.

The above set of guarantees is referred to as the “OrderedReady” pod management.

There is also parallel pod management which does not chain the pods.

Statefulset can also be used to perform rolling updates. This is one case where healthy pods may be terminated. Kubernetes slowly terminates old pods while spinning up new ones. If a node is drained, Kubernetes terminates all the pods on that node. If a node runs out of resources, pods may be terminated to free some resource. While we discussed SIGTERM and preStop hook, we have not discussed an appropriate limit for the terminationGracePeriodSeconds on the pod spec. This is typically set to 30 or 60 seconds but it merely has to be greater than the duration of running all the chained handlers for the termination messages

Please note that the use of “lifecycle: command: ” scripts in postStart and preStop. These should ideally not use “/bin/sh -c” because they don’t pass messages. It is preferable to either use dumb-init or actual executable that handles ^C event.
When a software product comprises of multiple independent applications, each application may get a message from the infrastructure. The application then handles the message as appropriate regardless of who sent the message. However, applications also tend to have coordinators in a cluster-based deployment model. In such a case, the coordinator might know a better way to gracefully shutdown the application. For example, “./bin/flink stop” is a better way to shut down a long running analytical application. This gives the chance for the coordinator to relay any additional commands along with the shutdown and the application to piggy back a suitable response to the coordinator. The infrastructure message then takes a form of communication in the layer above that participating applications and coordinator knows best how to handle. The distributed model is especially beneficial for graceful shutdown because different roles in the cluster can now share the prepartion chores for the shutdown suitable to that application or globally. In such cases, the cleanup also provides an opportunity to save state for better and more efficient post shutdown activities.

Finally, a software product can choose to alleviate inefficiencies in the distribution of termination messages by providing one publisher and one subscriber model. The publisher will inevitably be the infrastructure while the subscriber will be the component of the product. The termination message is always an interrupt and will be most efficiently routed to a single destination which can guarantee a graceful shutdown. Efficiency in this case is not as much about cost from communication as it is about increasing reliability and data-safety during the graceful shutdown procedure by doing the necessary minimal.

These are some of the techniques used for the purpose of a graceful shutdown of an application hosted on the Kubernetes orchestration framework.

#Apache flink split events into windows:

.window(EventTimeSessionWindows.withGap(Time.milliseconds(1)))

.allowedLateness(Time.milliseconds(1))

Is used to separate the events into windows.

Cluster computing

Friday, January 17, 2020

No comments:

Post a Comment