Cluster computing

Saturday, September 7, 2019

Let us review the sink architecture in PKS. This consists of a log sink for monitoring the cluster and namespace logs and a metric sink for monitoring the cluster metrics. The log sink and metric sink therefore serve different purposes although the data may appear in common json format. These resources have to be enabled using the observability manager.
The log architecture forwards them to a common log destination. The forwarding of logs is done with the help of Fluent-bit where a daemon running as a pod on a single node aggregates the events. In addition to logs thus collected, the event collector collects Kubernetes API events and a sink collector handles CRD events pertaining to fluent-bit configmaps. The event collector and sink collector are hosted independently. All aggregated events are then forwarded to the common log destination.
The metrics architecture is also similar with kubelets producing metrics but differs in two different aspects. Instead of the fluent bit forwarding the aggregated events to a common log destination, a plugin is required to forward them to the common metrics destination. The second difference is that there is no sink collector for metrics. Even the CRD events are handled by the metrics controller and only the telegraf is responsible for forwarding metrics.
The sink architecture in PKS is merely for automation. It does not prevent the direct access of logs and configuration for the clusters. If the cluster logs were to be downloaded, the following steps would be necessary.
We gather the credential and IP address information for the. BOSH Director, SSH into the Ops Manager VM, and use the BOSH CLI v2+ to log in to the BOSH Director from the Ops Manager VM. We mention the name of the deployment and list all of the virtual machines. We choose a virtual machine and download the logs from it by specifying the “logs” command line argument.
The sink architecture is also useful for monitoring. The logs and events are combined in a shared format which provides operators with a robust set of monitoring and filtering options. All the entries of the sink data are timestamped, contain the host ID, and are annotated with the namespaces, pod ID and container name. The logs are distinguished by the App-Name field.
The Kubernetes API event entries are distinguished by “k8s.event” in the App-Name field. Strings like “ Error:ErrImagePull”, “Back-off restarting failed container”, and “Started container” help query the events for determining the cause of failure or the time of success.
A sink resource enables PKS users to configure destinations for logs transported following the Syslog Protocol defined in RFC 5424. This resource is dependent on the IaaS infrastructure. The sink resource needs to be enabled because it is not on by default. As with service brokers, sinks are created for cluster and namespace scopes. They don’t use namespace, bucket, resource hierarchy. The “create-sink” command is used to

Friday, September 6, 2019

The log architecture forwards them to a common log destination. The forwarding of logs is done with the help of Fluent-bit where a daemon running as a pod on a single node aggregates the events. In addition to logs thus collected, the event collector collects Kubernetes API events and a sink collector handles CRD events pertaining to fluent-bit configmaps. The event collector and sink collector are hosted independently. All aggregated events are then forwarded to the common log destination.

The metrics architecture is also similar with kubelets producing metrics but differs in two different aspects. Instead of the fluent bit forwarding the aggregated events to a common log destination, a plugin is required to forward them to the common metrics destination. The second difference is that there is no sink collector for metrics. Even the CRD events are handled by the metrics controller and only the telegraf is responsible for forwarding metrics.

Thursday, September 5, 2019

PKS can also be monitored with sinks. RFC 5424 describes log ingress over TCP and introduces the notion of a sink. These sink resources help PKS to send the logs to that destination. Logs as well as events can use a shared format. The Kubernetes API events are denoted by the string “k8s.event” and with their “APP-NAME” field. A typical Kubernetes API event includes the host ID of the BOSH VM, the namespace and the Pod-ID as well. Failure to retrieve containers from Registry is specified with an identifying string of “Error: ErrImagePull”. Malfunctioning containers are denoted with “Back-off restarting failed container” in their events. Successful scheduling of containers has “Started container” in their events.
The logs for any cluster can also be downloaded from the PKS VM using the BOSH CLI command such as “logs pks/0”
Kubernetes master node VMs also have etcd an open source distributed key value store which it uses for service discovery and configuration sharing. The etcd also has metrics which help cluster health monitoring.
Overall PKS has a multi-layer security model for VMWare Enterprise. The layers are Application layer, Container management layer, Platform layer, Infrastructure layer. IAM and monitoring span across all these layers. All aspects of AAA apply to each of these layers and is done with the help of IAM and monitoring.

The Application layer visibility is provided with the help of auditing. PKS integrates well with VMWare and leverages the monitoring of containerized applications and log events.
The Platform layer security is provided by PKS Identity and Access management which is handled primarily by a service called the User Account and Authentication.
Container management layer is secured with the help of private image registry, flexible multi-tenancy, and vulnerability scanning. PKS uses Clair an open source project to statically analyze containers while importing information about vulnerabilities from a variety of sources. Signed container images provide content trust.
Infrastructure security is provided by micro-segmentation, a unified network policy layer and operational tools including those for troubleshooting.

Wednesday, September 4, 2019

Configuring a PKS cluster for auditing:
In a previous post, we described that the auditing framework for any Kubernetes cluster can be enabled by specifying the audit flags in the command line to the kube-apiserver. Most systems facilitate that by ssh to the host running the concerned server.
In this document, we take a closer look at the PKS cluster. PKS facilitates this with the help of BOSH Command line interface which can be run from any PKS development environment. We gather the credential and IP address information of the BOSH director and SSH into the Ops Manager VM.
We create the BOSH alias for the PKS environment with the ip address and the credentials so it is easy to use with the commands. The “pks login” is one such command which lets us login to the BOSH director.
The “PKS deployments” is another command we run to get the name and the hash of the deployment. We can use this to get the VMs name. To SSH into the PKS VM, we run the “ssh pks-vm” command from BOSH. The “cloud-check” command in particular is very helpful to detect the differences between the VM state database maintained by the BOSH director and the actual state VMs. It also helps reboot, recreate or delete references.
The PKS database stores information on the pods which includes watermark and consumption. Watermark is the number of pods that can run at a single time. Consumption is the memory and CPU usage of the pods. This database can also be accessed from the PKS VM. The billing database can be accessed with pre-defined credentials which can then be used with SQL queries.
PKS can also be monitored with sinks. RFC 5424 describes log ingress over TCP and introduces the notion of a sink. These sink resources help PKS to send the logs to that destination. Logs as well as events can use a shared format. The Kubernetes API events are denoted by the string “k8s.event” and with their “APP-NAME” field. A typical Kubernetes API event includes the host ID of the BOSH VM, the namespace and the Pod-ID as well. Failure to retrieve containers from Registry is specified with an identifying string of “Error: ErrImagePull”. Malfunctioning containers are denoted with “Back-off restarting failed container” in their events. Successful scheduling of containers have “Started container” in their events.
The logs for any cluster can also be downloaded from the PKS VM using the BOSH CLI command such as “logs pks/0”
Kubernetes master node VMs also have etcd an open source distributed key value store which it uses for service discovery and configuration sharing. The etcd also has metrics which help cluster health monitoring.

Tuesday, September 3, 2019

Security review of a product involves the following activities:

Analysis:

Threat Modeling

Perform detailed design analysis

List Assets, Activity matrix and Actions chart

Identify threats

Mitigate threats

Static Analysis

Perform code scanning activity

Perform binary scanning activity

Publish Code Analysis Reports for review by component owners

Publish Binary Analysis Reports for review by component owners

Mitigate risks from Code Analysis

Mitigate risks from Binary Analysis

Network Vulnerability Scanning:

Use available tools on a deployed instance of the product

Publish findings from the tool

Mitigate risks from the findings

Web Security testing:

Request PSO office to perform testing

Publish findings from the testing

Mitigate risks from the findings

Malware scanning:

Request Malware detection

Publish findings from malware detection

Mitigate any findings

Third party components

Harden third party components

Use latest secure versions

Source third party components

Documentation:

Provide a security configuration guide

Document known false positives

Vulnerability response plan

Patching Capability plan

Governance:

Participate in Security Training

Security Champions identified

Enforce coding conventions

Periodic security review with Business Unit

Monday, September 2, 2019

Kubernetes security guidelines:

These include:
1. These include hardening the kubernetes deployment by allowing access only via published endpoints

2. Enumerating and securing each and every resource - whether a system or custom resource and with role based access control

3. Leveraging the auditing framework available from the container engine with or without additional auditing enhancement products.

4. Descriptive logging from each and every system component including event generation and their collection via Daemonsets

5. Securing the storage and mounted persistent volumen claims with read only policies so that secrets are not divulged

6. Securing containers to run as non-root so that the code does not get escalated privilege to run

7. Using all linux capabilities as permission sets for code to run in the containers so that there is no undetected or uncontrolled access to the host

8. Secuing all external and internal connectivity with the help of proxies and tls for external facing connections.

9. Using service accounts specific to applications so that there is a containment and isolation of privilege with which applications run.

10. Securing fine grained permissions on individual operations and access of resources so thatt there is no unauthorized access

11. Setting up monitoring and alerting based on audit events so that the all intrusions and anomalies in the system can get detected, notified and mitigated.

Sunday, September 1, 2019

A survey of Kubernetes auditing frameworks:
1) A Do-it-yourself approach: Native Kubernetes framework supports the publishing of all events by the system for the core group resources such as pods, secrets, configmaps, persistent volume claims etc. This can be turned on with the help of the following:
An audit policy that specifies sections for verbs such as watch on resources such as pods, configmaps. It specifies levels where only changes affecting increasing orders of metadata level, metadata and request level, metadata, request and response levels are captured for publishing. Events are collected with the help of a collector agent such as fluentd. A log backend or a gateway may be used to specify the destination.
2) Available product – Falco: This is an auditing solution that can be deployed to a Kubernetes cluster and automates the auditing related activities. It also hosts a web server for the querying of audit events and enable more possibilities that would have otherwise required code to be written on the Kubernetes framework.
Falco is available as a container image for a Daemonset so it is deployment follows the same rules as any container resources. These container resources are available in the same way as any other Kubernetes framework.
3) Kubesec.io - This is an open source solution that features a rich web API server. It can scan any of the Kubernetes resources from its declaration and provide security analysis. This framework comes bundled with an http server that supports dynamic queries for security scans on Kubernetes resources.
The approach taken by each of the methods above is somewhat different. The native Kubernetes auditing is somewhat simpler with very little use of resources while allowing both policy and storage definitions to remain flexible. The use of FluentD is somewhat standard for log collections. All audit events sink to the logs although this could be redirected to syslog.
Falco takes the approach of deploying the Kubernetes Response Engine (KRE) which provides a way to send Falco alerts to a message broker service such as the Amazon Web Service’s Simple Notification Service. It also helps to deploy security playbooks that can mitigate the Falco alerts when they are raised. A service account needs to be created for Falco to connect to the Kubernetes API server to fetch resource metadata. The service account must be secured with suitable Role-based Access control policy. This is typical for any application hosted on Kubernetes. With the help of the image, the default Yaml configuration, the startup script, the service account and role, Falco provides a simple Daemonset deployment.
The approach taken by Kubesec.io among other typical deployments includes an admission controller which prevents the application of a privileged Daemonset, Deployment, or Statefulset to a Kubernetes cluster with the help of a score for the scanning of the declarations corresponding to each. A minimum score is required to gain admission.
All three approaches require the kube-apiserver command line utility to be called with specific command line parameters to initiate auditing, locate the rules declaration and to specify the log file and its rotation. This is done over ssh with the help of suitable credentials, the apiserver host address, the path to the rules to be uploaded.
As a side note, when clusters run on PKS - a cloud technology for automating and hosting Kubernetes clusters, the ssh option may not be available. The PKS native way is to login with the help of username and password to the pks-api server and interact with it using the pks command line tool. File to be uploaded as audit rules is used across clusters so this is a one-time requirement outside of the installer for the cluster. The same can also be mentioned in a user manual for the installer or the product’s security configuration guide.
Auditing events thus collected, can subsequently be queried for specific insights.