Cluster computing

Wednesday, September 18, 2019

There are a few caveats to observe when processing events as opposed to other resources with webhooks.

Events can come as a cascade, we cannot cause undue delay in the processing of any one of them. Events can cause the generation of other events. This would require stopping additional events from generating if the webhook is responsible for transforming and creating new events that cause those additional events. The same event can occur again, if the transformed event triggers the original event or if the transformed events ends up again in the webhook because the transformed event is also of the same kind as the original event. The webhook has to skip the processing of the event it has already handled.

The core resources of Kubernetes all are subject to the same verbs of Get, create, delete, etc. However, events are far more numerous and occur at a fast clip than some of the other resources. This calls for handling and robustness equivalent to the considerations in a networking protocol or a messaging queue. The webhook is not designed to handle such rate or latency even though it might skip a lot of the events. Consequently, selector labels and processing criteria become all the more important.

The performance consideration of the event processing by webhooks aside, a transformation of an event by making an http request to an external service also suffers from outbound side considerations such as the timeouts and retries on making the http request. Since performance of the webhooks of events has been called out above as significantly different from the few and far in between processing for other resources, the webhook might have to make do with lossy transformation as opposed to a persistence-based buffering.

None of the above criteria should matter for the usage of this kind of webhook for the purpose of diagnostics which is most relevant to troubleshooting analysis. Diagnostics have no requirement on performance from webhook. When the events are already filtered to audit events, and when the conversion of those events is even more selective, the impact on diagnostics is little or none from changes in the rate and delay of incoming events.

Tuesday, September 17, 2019

Data Storage and connectors:
The nature of data storage is that they accumulate data over time for analytics. S3 Apis are a popular example of programmatic access for web access to store data in the cloud. A connector works similarly but they don’t necessarily require the data storage to be remote or code to be written for the data transfer. In fact, one of the most common asks from a storage product is that it facilitates data transfer using standard shell commands.
The purpose of the connector is to move data quickly and steadily between source and destination regardless of their type or the kind of data to be transferred. Therefore, the connector needs to be specific to the destination storage. The connector merely automates the steps to organize and fill the containers in the data storage by sending data in fragments, if necessary. The difference made by a connector is enormous in terms of convenience to stash data and for reusability of the same automation for different sources. The traditional stack of command line storage over programmable interfaces to allow automation beyond the programmatic access is not lost. However requiring customers to write their wrappers for their own command line utility to send data is somewhat tedious and avoidable.
In addition to the programmatic access for receiving data, data stores need to customize the input of data from different contexts such as protocols, bridging technologies such as message queues, and even the eccentricities of the sender. It is in these contexts that a no-code ready-made tool is preferred. Data transfers may also need to be in a chain requiring data to be relayed between systems like piping operations in shell environment. A new connector may wrap another existing connector as well.
One of the most common examples of a connector is a TCP based data connector. The data is simply sent by opening a networking socket to make a connection. This is executed with standard command line tools as follows:
cat logfile | nc ipaddress port
The inclusion of such a data connector for a storage product is probably the most convenient form of data transfer. Even if the storage product requires programmatic access, wrapping the access APIs to facilitate a TCP connector like above will immensely benefit those who do not have to write code to send data to storage.
With the automation for a TCP connector written once, there will not be a need to repeat the effort elsewhere or to reinvent the wheel.

Monday, September 16, 2019

Kubernetes provides Webhooks as a way to interact with all system generated events. This is the equivalent of http handlers and modules in ASP. Net in terms of functionality to intercept and change requests and Responses. The webhooks however are an opportunity to work on System generated resources such as pod creation requests and so on.
There are two stages where webhooks can be run. They are correspondingly named as mutating or validating webhooks. The first is an opportunity to change the requests on Kubernetes core V1 resources. The second is an opportunity to add validations to requests and Responses.
Since these span a lot of system calls the webhooks are invoked frequently. Therefore they must be selective in the requests they modify to exclude the possibility that they touch requests that were not intended.
In addition to selectors, the power of webhooks is best denomonstrated when they select all requests of a particular type to modify. For example this is an opportunity for security to raise the baseline by allowing or denying all resources of a particular kind. The execution of privileged pods may be disabled in the cluster with the help of webhooks.
The webhooks are light to run and serve similar to nginx http parameter modifiers. A number of them may be allowed to run

Sunday, September 15, 2019

The difference between log forwarding and event forwarding becomes clear when the use of command line options for kube-apiserver is considered. For example, the audit-log-path option dumps the ausit events to a log file that cannot be accessed from within the Kubernetes runtime environment within the cluster. Therefore this option cannot be used with FluentD because that us a containerized workload. On the other hand, the audit-web-hook option allows the service to listen for callbacks from the Kubernetes control plane to the arrival of audur events. These service listening from Falco for this web hook endpoint is running in its own container as a Kubernetes service. The control plane makes only web request per audit event and since the events are forwarded over the http, the Falco service can efficiently handle the rate and latency of traffic.
The performance consideration between the two options is also notable. The log forwarding is the equivalent of running the tail command on the log file and forwarding it over TCP as netcat command. This utilizes the sand amount of data in transfers and uses a TCP connection although it does not traverse as many layers as the web hook. It is also suitable for Syslog drain that enables further performance improvements
The Webhook command is a push command and requires packing and unpacking of data as it traverses up and down the network layers. There is no buffering involved on the service side so there is a chance that some data will be lost as service goes down. The connectivity is also subject to faults more than the Syslog drain. However the http is best suited for message broker intake which facilitates filtering and processing that can significantly improve performance.
The ability to transform events is not necessarily restricted to the audit container based service or services specific to the audit framework. The audit data is rather sensitive which is why it’s access is restricted. The transformation of events can occur even during analysis. This lets the event queries be simpler when the events are transformed. The use of streaming analysis enables the view of the data since the origin as holistic and continuous. With the help of windows over the data, the transformations are efficient.
Transformations can also be persisted where the computations are costly. This helps pay those costs one time rather than every time they need to be analyzed. Persisted transformations help with reuse and sharing. This makes it convenient and efficient to use a single source of truth. Transformations can also be chained between operators and they serve to firm a pipeline. This makes it easier to diagnose, troubleshoot and improve separation of concerns.

Saturday, September 14, 2019

The native k8s events can also be transformed to custom events to suit the need of any other event processing engine. Typically, organizations have their own event gateway and event stores for making them proprietary such as for the use of dial home, network operations center and remote diagnostic sessions. This ability to transform events then let us do without reserving large storage as long as there is some buffering possible from the source.
It is this notion that can be extended to Extract-Transform-Load operations suitable to different downstream systems.
The difference between log forwarding and event forwarding becomes clear when the use of command line options for kube-apiserver is considered. For example, the audit-log-path option dumps the ausit events to a log file that cannot be accessed from within the Kubernetes runtime environment within the cluster. Therefore this option cannot be used with FluentD because that us a containerized workload. On the other hand, the audit-web-hook option allows the service to listen for callbacks from the Kubernetes control plane to the arrival of audur events. These service listening from Falco for this web hook endpoint is running in its own container as a Kubernetes service. The control plane makes only web request per audit event and since the events are forwarded over the http, the Falco service can efficiently handle the rate and latency of traffic.
The performance consideration between the two options is also notable. The log forwarding is the equivalent of running the rail command on the log file and forwarding it over TCP as netcat command. This utilizes the sand amount of data in transfers and uses a TCP connection although it does not traverse as many layers as the web hook. It is also suitable for Syslog drain that enables further performance improvements
The Webhook command is a push command and requires packing and unpacking of data as it traverses up and down the network layers. There is no buffering involved on the service side so there is a chance that some data will be lost as service goes down. The connectivity is also subject to faults more than the Syslog drain. However the http us best suited for message broker intake which facilitates filtering and processing that can significantly improve performance.

Friday, September 13, 2019

The architecture of the Kubernetes has its control plane over network and storage available over infrastructure providers.

The components above are facilitated with the use Pivotal Container Service (PKS) which helps us migrate the same production stack across core infrastructure. Consequently, the security aspects of the production stack are dependent on the PKS and Kubernetes features and we have to reach out to the Kubernetes apiserver for auditing information from the containerized workloads.
The architecture is standard for reviewing any workloads hosted on Kubernetes. In particular, let us note the use of a distributed key-value database within the Kubernetes control plane. This database is the ‘etcd’ and it is used to maintain the cluster. ‘etcd’ is written in Go and uses the Raft consensus algorithm to manage highly-available replicated log.
Any distributed key-value database could do and it may even have benefits if the database can be offloaded from the control plane. If this cluster database could be object storage, it will continue to provide the durability and reliability while bringing some of the storage best practice.
The database is internal to the Kubernetes control plane so it does not really within the scope of this document. However, the events from the Kubernetes execution environment do pass through the layers. K8s events are noted for their format, labels and content. They help with monitoring, troubleshooting and for subsequent analysis from storage.
The native k8s events can also be transformed to custom events to suit the need of any other event processing engine. Typically, organizations have their own event gateway and event stores for making them proprietary such as for the use of dial home, network operations center and remote diagnostic sessions. This ability to transform events then let us do without reserving large storage as long as there is some buffering possible from the source.
It is this notion that can be extended to Extract-Transform-Load operations suitable to different downstream systems.

Thursday, September 12, 2019

Audit events originate from the Kube-apiserver usually running on the master VM in the PKS Kubernetes cluster.

There are essentially only two considerations:
First, we define the audit policy and the webhook which is passed as the Yaml file locations to the kube-apiserver in the form of command-line arguments. [These command-line options are explained here: https://kubernetes.io/docs/reference/command-line-tools-reference/kube-apiserver/]. We can also include these options in the kube-apiserver configuration.

Second, we restart the kube-apiserver to use the specified policy and webhook. Changing the configuration file automatically restarts the kube-apiserver.

The steps to setup auditing so that events can be analyzed later, include the following:

1) ssh admin@<pks-apiserver> # such as "ssh ubuntu@opsman.environment.local"

2) ubuntu@opsmanager-2-5:~$ sudo -i # pks and bosh commands are run from an elevated privilege account

3) pks login -a pks-api.environment.local -u -p -k # this let us view and use the pks cluster

4) pks cluster <cluster_name> | grep UUID # this lets us get the UUID for the cluster. The convention for naming service instance is usually service-instance_UUID. You can replace the service instance name with whatever format suits the name.

5) bosh vms -d service-instance_874b838b-6391-4c62-991b-3e1528a4b37e # this lets us use the service instance to display the vms. Usually there will be only one master. The kube-apiserver runs on this master.

6) bosh scp service-instance_874b838b-6391-4c62-991b-3e1528a4b37e master/b9a8aa9f-0e31-4579-8e4b-685c55a80f0e audit-policy.yaml :/var/vcap/jobs/kube-apiserver/config/audit-policy.yaml # we copy the audit policy file locally to the VM where the kube-apiserver runs.

7) bosh ssh service-instance_874b838b-6391-4c62-991b-3e1528a4b37e master/b9a8aa9f-0e31-4579-8e4b-685c55a80f0e -c ' echo "--audit-policy-file=/var/vcap/jobs/kube-apiserver/config/audit-policy.yaml " >> /var/vcap/jobs/kube-apiserver/config/kube-apiserver.yaml' # here we update the configuration of the kube-apiserver with the policy file path. This is the input to the auditing system.

8) bosh ssh service-instance_874b838b-6391-4c62-991b-3e1528a4b37e master/b9a8aa9f-0e31-4579-8e4b-685c55a80f0e -c ' echo "--audit-log-path=/var/vcap/sys/log/kube-apiserver/audit.log" >> /var/vcap/jobs/kube-apiserver/config/kube-apiserver.yaml' # here we update the configuration of the kube-apiserver with the log path. This is the output to the auditing system.