Friday, August 7, 2020

Support for small to large footprint introspection database and query.

 

Introspection is runtime operational states, metrics, notifications and smartness saved from the ongoing activities of the system. Since the system can be deployed in different modes and with different configurations, this dedicated container and query can take different shapes and forms. The smallest footprint may merely be similar to log files or json data while the large one can even open up the querying to different programming languages and tools. Existing hosting infrastructure such as container orchestration framework and etcd database may provide excellent existing channels of publications that can do away with this store and query altogether but they rely highly on network connectivity while we took the opportunity to discuss that Introspection datastore does not compete but only enhances offline production support. 

There are ways to implement the introspection suitable to a size of deployment of the overall system. Mostly these are incremental add-ons with the growth of the system except for the extremes of the system deployments of tiny and Uber. For example, while introspection store may save runtime only periodically for small deployments, it can be continuous for large deployments. The runtime information gathered could also be more expansive as the size of the deployment of the system grows. The gathering of runtime information could also expand to more data sources as available from a given mode of deployment from the same size.  The Kubernetes mode of deployment usually has many deployments and statefulsets and the information on those may be available from custom resources as queried from kube-apiserver and etcd database. The introspection store is a container within the storage product so it is elastic and can accommodate the data from various deployment modes and sizes. Only for the tiniest deployments where repurposing a container for introspection cannot be accommodated, a change of format for the introspection store becomes necessary. On the other end of the extreme, the introspection store can include not just the dedicated store container but also snapshots of other runtime data stores as available. A cloud scale service to virtualize these hybrid data stores for introspection query could then be provided. These illustrate a sliding scale of options available for different deployment modes and sizes.


Introspection scope versus stream.

This section focuses on dedicating different levels of the container hierarchy in the stream store for the purpose of introspection. In the earlier section, the size of the store was put on a sliding scale to suit the needs of the store. There was no mention of the container. In a stream store, this is always a stream but the choice for using different streams for different event types remains available. If there is only one stream, all the events are of the same type and it is up to the producer and consumer to use that type. At the same time, the producer and consumer do not always from the system since it can include different components and each component can submit a type of event into the stream.  The separation of event types by streams implies that the different event types can be treated differently for the purpose of retention and stream truncation. The consumer usually sees only a singleton introspection store either directly or via an interface per instance of the product.  

Troubleshooting.
The introspection store helps with troubleshooting the product but like any software it may malfunction. The store itself does not provide any ability to troubleshoot the information collection and that can be discerned from the logs of the publishers. 
Introspection analytics 
A Flink job might be dedicated to perform periodic analysis of introspection data or to collect information from sensors. The job can also consolidate data from other sources that are internal to the stream store and hidden from users. 
Batching and statistics are some of the changes with which the analytics job can help. Simple aggregate queries per time window for sum(), min(), max() can help make more meaningful events to be persisted in the stream store. 
The FlinkJob may have network connectivity to read events from external data stores and these could include events published by sensors. Usually those events make their way to the stream store irrespective of whether there is an introspection store or analytics job or not in the current version of the system. In some cases, it is helpful for the analytical jobs to glance at backlog and rate of  accumulation in the stream store as overall throughput and Latency for all the components taken together. Calculation and persistence of such diagnostics events is helpful for trends and investigations later. 
The use of Flink job dedicated to introspection store immensely improves the querying capability. Almost all aspects of querying as outlined in the Stream processing of Apache Flink by O’Reilly Media can be used for this purpose



No comments:

Post a Comment