Saturday, May 30, 2020

Application troubleshooting continued...

  • Debugging and monitoring: 

  • Flink Applications have support for extensible user metrics in addition to System metrics. Although not all of these might be exposed via the stream and store analytics user interface, the applications can register metric types such as Counters, guages, Histograms and meters. 

  • Log4j and Logback can be configured with the appropriate log levels to emit log entries for the appropriate operators. These logs can also be collected continuously as the system makes progress. 

  • The status and statistics of completed jobs that have been archived by the JobManager can be viewed via the HistoryServer after they are configured. The Flink user interface may have support for it. 

  • Since the Job graph involves multiple jobs, they can each be independently queried using the job id. 

  • Checkpoints can also be monitored although the stream store and analytics might not support it via the user interface. 

  • Checkpoints can be triggered and restorations can be performed. 

  • Backpressure can be detected. If a task is producing data faster than the downstream operators can consume, it will have a rating. 

  • There are REST apis’ available for monitoring from the ‘flink-runtime’ project and is hosted by the Dispatcher. The web dashboard for monitoring also shows this information. It is also possible to extend these APIs. 

  • Event time and watermarks are powerful features that enable applications to handle late events and out-of-order events so that the events remain sequenced. The Flink runtime provides a way to allow source to issue timestamps or have the Flink assign timestamps using event origination time, ingestion time or processing time. Applications don’t have to implement time-windows themselves although they are not restricted.  
    Monitoring event time is tricky. There are two reasons for it. First, when the event doesn’t come, it is not clear whether the time is advancing or whether there is no data. Second, when an event is received, there is no knowledge of whether there is another event coming with that timestamp. 
    The Stream is usually not a single sequence of bytes but a co-ordination of multiple parallel segments. Segments are sequence of bytes and is not mixed with anything that is not data. Metadata exists in its own stream and is usually internal. 

Friday, May 29, 2020

Application troubleshooting guide continued...

Stream Store and analytics automates deployments of applications written using Flink.  This includes options for  

  • Authentication and authorization 

  • Stream store metrics 

  • High availability 

  • State and fault tolerance via state backends 

  • Configuration including memory configuration and  

  • All of the production readiness checklist, that includes: 

  • Setting an explicit max parallelism 

  • Setting UUID for all operators 

  • Choosing the right State backend 

  • Configuring high availability for job managers 

  • Debugging and monitoring: 

  • Flink Applications have support for extensible user metrics in addition to System metrics. Although not all of these might be exposed via the stream and store analytics user interface, the applications can register metric types such as Counters, guages, Histograms and meters. 

  • Log4j and Logback can be configured with the appropriate log levels to emit log entries for the appropriate operators. These logs can also be collected continuously as the system makes progress. 

  • The status and statistics of completed jobs that have been archived by the JobManager can be viewed via the HistoryServer after they are configured. The Flink user interface may have support for it. 

  • Since the Job graph involves multiple jobs, they can each be independently queried using the job id. 

  • Checkpoints can also be monitored although the stream store and analytics might not support it via the user interface. 

  • Checkpoints can be triggered and restorations can be performed. 

  • Backpressure can be detected. If a task is producing data faster than the downstream operators can consume, it will have a rating. 

  • There are REST apis’ available for monitoring from the ‘flink-runtime’ project and is hosted by the Dispatcher. The web dashboard for monitoring also shows this information. It is also possible to extend these APIs. 

  • Event time and watermarks are powerful features that enable applications to handle late events and out-of-order events so that the events remain sequenced. The Flink