Thursday, August 15, 2019

We were discussing stream processing. This is particularly helpful when streams represent the data in terms of past, present and future. This is a unified view of data. Windowing is a tool to help process bounded streams which are segments of the stream. Segments build up the stream so it is merely a convenience to really transform only a portion of the stream,  Otherwise the stream as such gives a complete picture of the data from the origin.
Stream segments are made up of events. It is possible to view stream processing as a transformation on event by event basis.Event processing helps applications to be data driven. In this sense, batch processing has focused on historic data, stream processing has focused on current data and event driven processing has focused on data driven or future data. All of this processing may require to be stateful. Therefore by unifying the data as a span over all ranges of time, stream processing is positioned to provide all of the benefits of the batch, stream and event driven processing. Algorithms and data structures process finite data. The data range is chosen based on time span which may include past, present or future. Therefore stream processing can unify all the above processing.
All we need is a unification of the stack in terms of storage, runtime and modeling. The stream storage and stream analytics products do just that. The APIs available from these products make programming easier.
Inside the stream storage product, the segment store, the controller store and the zookeeper sit at the same level in the stack over the tier1 storage. They serve independent functions and are required to persist the streams. While the segment store hosts the Stream data API, the controller hosts the Stream controller API The client library that writes and queries the event stream uses the stream abstraction over all these three components.
Windowing is essential to stream processing since it allows a seqence of events to be processed. Stream processing can work on one event at a time. This abstraction made it possible to unify the query language in SQL where the predicate had the time range as a parameter.
However the use of SQL forced the data to be accumulated prior to the execution of the query. With the use of stream storage products and stream analytics, the data is viewed as if it were in transit on a pipeline and the querying is done on a unified view of the data as it acrrues. Therefore stream processing has the ability to be near real-time.

No comments:

Post a Comment