Thursday, May 28, 2020

Application troubleshooting guide

Application debugging, job submission result and logging are not interactive during stream processing. Consequently, application developers rely on runtime querying and state persistence. A few other options are listed here:
Use of side outputs
When the events are processed from an infinite sequence in the main stream, the results can be stored in one or more additional output streams. This resulting stream is called side output. The type of events stored in a side output does not have to be the same type as the main stream. 
The side output stream is denoted with the OutputTag as shown here:
OutputTag<String> outputTag = new OutputTag<String>("side-output") {};
The side-output stream is retrieved using the getSideOutput(OutputTag) method on the result of the DataStream operation.
o Testing:
Testing of user-defined functions is made possible with mocks. The Flink collection comes with the so-called test harnesses such as OneInputStreamOperatorTestHarness for operator on DataStreams, KeyedOneInputStreamOperatorTestHarness for operators on KeyedStream, TwoInputStreamOperatorTestHarness for operators of ConnectedDataStreams of two DataStreams, KeyedTwoInputStreamOperatorTestHarness for operators on ConnectedStreams of two KeyedStreams
The ProcessFunction is the most common method used for processing the events in a stream. This can be unit-tested with the ProcessFunctionTestHarnesses. The PassThroughProcessFunction along with the previously mentioned types of DataStreams come useful to invoke the processElement method on the ProcessFunctionTestHarness. The extractOutputValues method on the harness comes useful to retrieve the list of emitted records for assertions.

Deployments:
Stream Store and analytics automates deployments of applications written using Flink.  This includes options for 
o Authentication and authorization
o Stream store metrics
o High availability
o State and fault tolerance via state backends
o Configuration including memory configuration and 
o All of the production readiness checklist, that includes:
Setting an explicit max parallelism
Setting UUID for all operators
Choosing the right State backend
Configuring high availability for job managers
Debugging and monitoring:
o Flink Applications have support for extensible user metrics in addition to System metrics. Although not all of these might be exposed via the stream and store analytics user interface, the applications can register metric types such as Counters, guages, Histograms and meters.
o Log4j and Logback can be configured with the appropriate log levels to emit log entries for the appropriate operators. These logs can also be collected continuously as the system makes progress.


No comments:

Post a Comment