Ideas for a graceful shutdown of an application hosted on Kubernetes orchestration framework (K8s) continued...
6. Sixth, the technique for transparently passing graceful shutdown message to only user activities through the system and application hosted on the system could be sufficient since they only user activities are the unknown. They could be long running jobs with or without checkpoints and savepoints. The ability to send command such as ./bin/flink stop -d "jobID" during flink container shutdown will be helpful to not lose state and allow the runtime to take the necessary steps prior to exit. User mode activities take precedence over system activities because the latter is already fairly robust and handled by the specifications to K8s.
7. Seventh, When the user process has been sufficiently enhanced to take care of long running jobs, the invocation of containers could be addressed next. These include using a dumb-init program to run as PID 1 in the minimal container that is popular on Kubernetes infrastructure. When a process is run as PID 1 directly, the operating system gives special treatment to the process. Most process don’t take advantange of this special treatment and the operating system does not terminate the process. Using a relay such as dumb-init which becomes the PID 1 process now conveys the SIGTERM and SIGKILL messages to the process for which the container was launched while getting the general treatment that the process needed to have for shutdown.
Even a script launched with dumb-init allows the message propagation to the programs as shown below:
#!/usr/bin/dumb-init /bin/sh
aBackgroundProcess & # launch a process in the background
aForegroundProcess # launch another process in the foreground
And the change to the Dockerfile is minimal as shown below:
ENTRYPOINT [“/usr/local/bin/dumb-init”, “--”]
CMD [“path/to/file”]
6. Sixth, the technique for transparently passing graceful shutdown message to only user activities through the system and application hosted on the system could be sufficient since they only user activities are the unknown. They could be long running jobs with or without checkpoints and savepoints. The ability to send command such as ./bin/flink stop -d "jobID" during flink container shutdown will be helpful to not lose state and allow the runtime to take the necessary steps prior to exit. User mode activities take precedence over system activities because the latter is already fairly robust and handled by the specifications to K8s.
7. Seventh, When the user process has been sufficiently enhanced to take care of long running jobs, the invocation of containers could be addressed next. These include using a dumb-init program to run as PID 1 in the minimal container that is popular on Kubernetes infrastructure. When a process is run as PID 1 directly, the operating system gives special treatment to the process. Most process don’t take advantange of this special treatment and the operating system does not terminate the process. Using a relay such as dumb-init which becomes the PID 1 process now conveys the SIGTERM and SIGKILL messages to the process for which the container was launched while getting the general treatment that the process needed to have for shutdown.
Even a script launched with dumb-init allows the message propagation to the programs as shown below:
#!/usr/bin/dumb-init /bin/sh
aBackgroundProcess & # launch a process in the background
aForegroundProcess # launch another process in the foreground
And the change to the Dockerfile is minimal as shown below:
ENTRYPOINT [“/usr/local/bin/dumb-init”, “--”]
CMD [“path/to/file”]
No comments:
Post a Comment