Sunday, August 25, 2019

Today we continue discussing the summary of the book "stream processing with Apache Flink". Flink is an Apache software which is well-known to unify stream and batch processing
It features low latency, high throughput, rich programmable APIs, flexible windows and exactly once processing. As opposed to Apache Samza which features state over stream with low level stream apis, Apache Spark Streaming which performs stream processing on top of batch processing, Apache Storm which features true streaming but with low level APIs, Apache Flink may be considered a good analytical frameworks suitable to handle all workloads such as event logs, historic data, and ETL transformations.
At its core, Apache Flink stack supports a DataStream APIs with stream optimizer over an Flink runtime. The data stream sources could even include message brokers and their connectors.
The DataStream APIs support a policy based flexible windowing semantic where time, count, delta etc can be for windowing. The operators on the streams involve joins, crosses, and support iterations. The stream transformations are supported with CoMap and CoReduce functions.
This is illustrated with the help of financial data where multiple inputs are read for say merging the stock data and simple sum/count may be used as aggregations on the windows or the windows can be data driven or a time-based tumbling window, and a streaming join can be performed on multiple data streams.
For example,
val socketStockStream = env.socketTextStream("localhost", 9999)
                                         .map( x => { val split = x.split(",")
                                         StockPrice(split(0), split(1).toDouble) })
val stockStream =  socketStockStream.merge(env.addSource(generateStock("TCKR")(10) _))
val windowedStream = stockStream.window(Time.of(10, SECONDS)).every(Time.of(5, SECONDS))
val maxByStock = windowedStream.groupBy("symbol").maxBy("price")

The window method call as a data-driven example could be:
.window(Delta.of(0.05, priceChange, defaultPrice))
Even a stream from social media can be used for correlations:
val rollingCorrelation = tweetsAndWarning.window(Time.of(30, seconds)).mapWindow(computeCorrelation _)
Reference: https://flink.apache.org/news/2015/02/09/streaming-example.html

No comments:

Post a Comment