Cluster computing: Support for small to large footprint introspection database and query

Sunday, August 9, 2020

Support for small to large footprint introspection database and query

Introspection analytics

A Flink job might be dedicated to perform periodic analysis of introspection data or to collect information from sensors. The job can also consolidate data from other sources that are internal to the stream store and hidden from users.

Batching and statistics are some of the changes with which the analytics job can help. Simple aggregate queries per time window for sum(), min(), max() can help make more meaningful events to be persisted in the stream store.

The FlinkJob may have network connectivity to read events from external data stores and these could include events published by sensors. Usually those events make their way to the stream store irrespective of whether there is an introspection store or analytics job or not in the current version of the system. In some cases, it is helpful for the analytical jobs to glance at backlog and rate of accumulation in the stream store as overall throughput and Latency for all the components taken together. Calculation and persistence of such diagnostics events is helpful for trends and investigations later.

The use of Flink job dedicated to introspection store immensely improves the querying capability. Almost all aspects of querying as outlined in the Stream processing of Apache Flink by O’Reilly Media can be used for this purpose

Distributed Collection agents

As with any store not just introspection store, data can come from different sources for the destination as the store. Collection agents for each type of source make it convenient for the data to be transferred to the store

The collection agents do not themselves need to be monitored. The data they send can be lossy but it should arrive at the store. The store is considered a singleton local instance. It may not even be in the same system as the one it serves. The rest of the store may be global and shared but the data transfer from collection agent does not have to go directly to the global shared storage. If it helps to have the introspection store serve as the same local destination for all collection agents, the introspection store can be kept outside the global storage. In either case the streams are managed centrally by the stream store and the storage refers to tier 2 persistence.

Cluster computing

Sunday, August 9, 2020

Support for small to large footprint introspection database and query

No comments:

Post a Comment