Ideas for a graceful shutdown of an application hosted on Kubernetes orchestration framework (K8s) continued...
2. Second the application specific actions to save state.
The application exits with an internal server error when it encounters an internal failure. This lets the probe for its health check to fail letting the Kubernetes host to deregister the service for this pod from its active list. This is no different from what an application would do when it is encountering a SIGTERM message. The incoming traffic to the application stops when the K8s decides or the readiness probe fails. The application has no control over the incoming traffic. It will often encounter failure to process current requests when such a message is received. There can be a number of incoming requests that become current after the SIGTERM was issued and during the grace period. This may be due to many reasons and cannot be avoided. For example, the alpine /bin/sh shell does not even send SIGTERM. This is a common pitfall when a Dockerfile ends with a “CMD myapp”. Instead, the following CMD [“myapp”] may be used. Applications written in GO language can certainly register for the syscall.SIGTERM rather than the os.Interrupt call.
3. Third the out-of-band packaging of state in a read-only manner.
Applications wishing to save state such as for backup, uninstall or out-of-band dynamic requests will usually trigger it in their components so that they can do so in their proprietary format. Since the application may track user resources per session, it is possible for the application to collect state corresponding to the user in a session from all of the components involved and package it in a manner that it can export and work with another instance of the application on another host or even an upgrade of the application on the same host. The tool that packs and unpacks this state from the resources used to serve the users’ requests can even be external to the application so that the application can remain online and focus on the business needs of the user rather the operational needs. Alternatively, the routines to save state may run on background workers of the application. The packing and unpacking of state allow restore either local or remote to the origin.
#codingexercise
Flink allows timestamps to be assigned but when we assign timestamps we ensure their distinctness and order. For example:
package com.dellemc.pravega.app;
import org.apache.flink.streaming.api.watermark.Watermark;
import javax.annotation.Nullable;
public class LogTimestampAssigner implements org.apache.flink.streaming.api.functions.AssignerWithPeriodicWatermarks<String>, org.apache.flink.streaming.api.functions.AssignerWithPunctuatedWatermarks<String> {
@Nullable
@Override
public Watermark getCurrentWatermark() {
return null;
}
@Nullable
@Override
public Watermark checkAndGetNextWatermark(String s, long l) {
return null;
}
@Override
public long extractTimestamp(String s, long l) {
return System.currentTimeMillis();
}
}
LogException{id='c9685d77-8c07-40ed-890d-1101b744df11', timestamp='2019-12-23 19:40:23,909', data='2019-12-23 19:40:23,909 ERROR Line1, at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:546), at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:421),'}
LogException{id='0685d218-d7ce-43b6-823c-6bd9f40abcc2', timestamp='2019-12-23 19:40:24,557', data='2019-12-23 19:40:24,557 INFO org.apache.flink.runtime.rest.RestClient - Shutting down rest endpoint.,'}
2. Second the application specific actions to save state.
The application exits with an internal server error when it encounters an internal failure. This lets the probe for its health check to fail letting the Kubernetes host to deregister the service for this pod from its active list. This is no different from what an application would do when it is encountering a SIGTERM message. The incoming traffic to the application stops when the K8s decides or the readiness probe fails. The application has no control over the incoming traffic. It will often encounter failure to process current requests when such a message is received. There can be a number of incoming requests that become current after the SIGTERM was issued and during the grace period. This may be due to many reasons and cannot be avoided. For example, the alpine /bin/sh shell does not even send SIGTERM. This is a common pitfall when a Dockerfile ends with a “CMD myapp”. Instead, the following CMD [“myapp”] may be used. Applications written in GO language can certainly register for the syscall.SIGTERM rather than the os.Interrupt call.
3. Third the out-of-band packaging of state in a read-only manner.
Applications wishing to save state such as for backup, uninstall or out-of-band dynamic requests will usually trigger it in their components so that they can do so in their proprietary format. Since the application may track user resources per session, it is possible for the application to collect state corresponding to the user in a session from all of the components involved and package it in a manner that it can export and work with another instance of the application on another host or even an upgrade of the application on the same host. The tool that packs and unpacks this state from the resources used to serve the users’ requests can even be external to the application so that the application can remain online and focus on the business needs of the user rather the operational needs. Alternatively, the routines to save state may run on background workers of the application. The packing and unpacking of state allow restore either local or remote to the origin.
#codingexercise
Flink allows timestamps to be assigned but when we assign timestamps we ensure their distinctness and order. For example:
package com.dellemc.pravega.app;
import org.apache.flink.streaming.api.watermark.Watermark;
import javax.annotation.Nullable;
public class LogTimestampAssigner implements org.apache.flink.streaming.api.functions.AssignerWithPeriodicWatermarks<String>, org.apache.flink.streaming.api.functions.AssignerWithPunctuatedWatermarks<String> {
@Nullable
@Override
public Watermark getCurrentWatermark() {
return null;
}
@Nullable
@Override
public Watermark checkAndGetNextWatermark(String s, long l) {
return null;
}
@Override
public long extractTimestamp(String s, long l) {
return System.currentTimeMillis();
}
}
will assign timestamps for the following string entries:
2019-12-23 19:40:23,909', data='2019-12-23 19:40:23,909 ERROR Line1
at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:546)
at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:421)
2019-12-23 19:40:24,557 INFO org.apache.flink.runtime.rest.RestClient - Shutting down rest endpoint.,
as follows:
[KeyedProcess -> Sink: Print to Std. Out (1/1)] ERROR com.dellemc.pravega.app.LogExceptionKeyedProcessFunction - s=2019-12-23 19:40:23,909,timestamp=1578924603182
[KeyedProcess -> Sink: Print to Std. Out (1/1)] ERROR com.dellemc.pravega.app.LogExceptionKeyedProcessFunction - s=2019-12-23 19:40:24,557,timestamp=1578924603558
[KeyedProcess -> Sink: Print to Std. Out (1/1)] ERROR com.dellemc.pravega.app.LogExceptionKeyedProcessFunction - s=2019-12-23 19:40:23,909,timestamp=1578924603558
[KeyedProcess -> Sink: Print to Std. Out (1/1)] ERROR com.dellemc.pravega.app.LogExceptionKeyedProcessFunction - s=2019-12-23 19:40:23,909,timestamp=1578924603558
where the first value is the timestamp used to group the entries using ExceptionTimestampExtractor() and the second is the one assigned by LogTimestampAssigner.
The results remain unchanged when converting them to LogExceptions with stacktraces:LogException{id='c9685d77-8c07-40ed-890d-1101b744df11', timestamp='2019-12-23 19:40:23,909', data='2019-12-23 19:40:23,909 ERROR Line1, at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:546), at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:421),'}
LogException{id='0685d218-d7ce-43b6-823c-6bd9f40abcc2', timestamp='2019-12-23 19:40:24,557', data='2019-12-23 19:40:24,557 INFO org.apache.flink.runtime.rest.RestClient - Shutting down rest endpoint.,'}
No comments:
Post a Comment