A REST based data path to a stream store can work as a log sink in a PKS hosted and Kubernetes cluster deployed stream storage so that the data from the source can find its way to a sink with little or no intervention. Then the generation, collection, sink and analysis of the log entries follows a staged propagation in a pipeline model and makes the logs available for extract-transform-load, analysis and reporting solutions downstream.
The stages are:
Kube api-server is outside the Kubernetes cluster and any products hosted on Kubernetes. As an infrastructure it is well suited to turn on these collection items and determine their transmission techniques. The upshot is that we have a set of command line parameters as input and and data flow as output
Transformation of data. This is a required step because this data is generally read only. Transformation means select, project, map, filter, reduce and other functionalities. Flink application can be leveraged for this purpose.
Sink of event where we leverage a data path directly to the stream store allowing all reporting stacks to read from the store instead of the source.
The stages are:
Kube api-server is outside the Kubernetes cluster and any products hosted on Kubernetes. As an infrastructure it is well suited to turn on these collection items and determine their transmission techniques. The upshot is that we have a set of command line parameters as input and and data flow as output
Transformation of data. This is a required step because this data is generally read only. Transformation means select, project, map, filter, reduce and other functionalities. Flink application can be leveraged for this purpose.
Sink of event where we leverage a data path directly to the stream store allowing all reporting stacks to read from the store instead of the source.
The logic for querying logs is written usually in two layers – a low level primitive layers and a higher-level composite. Very rarely do we see joins or relations between logs. Instead pipelining of operators take precedence over the correlation of data because the stages of extracting from source, transforming, putting into sink and utilizing by-products of the transformation for subsequent storage and analysis follow a data flow model.
Indeed, a data driven approach of log analysis is not usually a concern as most users are willing to search all the logs if it weren’t so time consuming. What they really want is the ease of writing and refining queries because the curated library does not always eradicate the need for adhoc queries. In such cases, the library of existing code/script is merely a starting point for convenience which can then be edited for the current task.
No comments:
Post a Comment