Sunday, September 13, 2020

Field property Extraction:

We were discussing a feature for stream store that brings the notion of accessing events in sorted order with skipped traversal.  The events can be considered to be in some predetermined sequence in the event stream whether it is by offset or by timestamp. These sequence numbers are in sorted order. Accessing any event in the stream, as if considered to be in a batch bounded by a head and a tail StreamCut that occur immediately before and after the event respectively, is now better than the linear traversal to read the event. This makes the access to the event from the historical set of events in the stream to be O(log N). The skip-level access links in the form of head and tail streamcuts can easily be built into the metadata on a catch-up basis after the events are accrued in the stream.

This says nothing about the contents of the events and has little or no possibility of data corruption since it is entirely a metadata activity. If we do consider the data to have some text, then we could leverage that to enhance the metadata.

The data that gets written to an event in the stream is not parsed because it may contain binary data and the stream store does not know what may be useful to customers. But that decision could be pended for later by extracting all key value properties that occurs in patterns in the text portion of the data. 


Many devices send IoT data in the form of json or Xml if not regular text data besides binary data. This gives the opportunity to collect fields and the possible values that occur in the events to allow them to be leveraged in queries subsequently. This enhancement of metadata from events in the stream becomes useful to find similar events 


The use if standard query operators with the events in the stream has been made possible by the Flink programming library but the logic written with those operators usually is not aware of all the fields that has been extracted from the events. By closing the gap between the field extraction and new fields in query logic, the applications can not only improve existing logic but also write new ones. 


The enhancement of metadata for a stream works independent of events as well. For example, an application browsing the streams in scope or a store now has properties to narrow down the stream of interest. This is significantly better than reading all the events in the store. 


The activities to build this metadata is incremental and progressive as the events arrive in the stream due to the append only nature of the stream. Even when the stream is truncated, the metadata does not lose its value. These activities can also keep up with the rate of incoming data. 

 

No comments:

Post a Comment