Cluster computing

Monday, July 29, 2019

Today we continue discussion logging in Kubernetes. We start with a Kubernetes Daemonset that will monitor container logs and forward them to a log indexer. Any log forwarder can be used as the image for the container in the Daemonset yaml specification. Some environment variables and shell commands might be necessary to set and run on the container. There will also be a persistent volume typically mounted at /var/log. The mounted volumes must be verified to be available under the corresponding hostPaths.
Journald is used to collect logs from those components that do not run inside a container. For example, kubelet and the controller runtime which is usually Docker will write to journald when the host has systemd enabled. Otherwise, they write to .log files in /var/log directory. Klog is the logging library used by such system components.
One log indexer is sufficient for a three node cluster with thirty containers generating 1000 messages/second each even when the message size can be a mix of small (say 256 byte) and large (1KB).
Timber is an example of a log product. The use of this product typically entails logstash, elasticsearch, and Kibana where the elasticsearch is for api access and the kibana is for web user interface. Any busybox container image can be used to produce logs which we can use as test data for our logging configuration.
The logrotate tool rotates the log once the side exceeds a given threshold.
The typical strategies for pushing logs include the following:
1) use a node-level logging agent that runs on every node. For example, Stackdriver logging on Google cloud platform and ElasticSearch on conventional Kubernetes clusters.
2) include a dedicated sidecar container for logging in an application pod.
3) Push logs directly in the backend from within an application.
Between the options 1 and 2, the latter is preferable for performance reasons. It is not intrusive and i collects the logs with fluentd which provides a rich language to annotate or transform log sources. Also, option 2 can scale independently without impact to the rest of the cluster.

Cluster computing

Monday, July 29, 2019

No comments:

Post a Comment