Cluster computing

Wednesday, September 4, 2019

Configuring a PKS cluster for auditing:
In a previous post, we described that the auditing framework for any Kubernetes cluster can be enabled by specifying the audit flags in the command line to the kube-apiserver. Most systems facilitate that by ssh to the host running the concerned server.
In this document, we take a closer look at the PKS cluster. PKS facilitates this with the help of BOSH Command line interface which can be run from any PKS development environment. We gather the credential and IP address information of the BOSH director and SSH into the Ops Manager VM.
We create the BOSH alias for the PKS environment with the ip address and the credentials so it is easy to use with the commands. The “pks login” is one such command which lets us login to the BOSH director.
The “PKS deployments” is another command we run to get the name and the hash of the deployment. We can use this to get the VMs name. To SSH into the PKS VM, we run the “ssh pks-vm” command from BOSH. The “cloud-check” command in particular is very helpful to detect the differences between the VM state database maintained by the BOSH director and the actual state VMs. It also helps reboot, recreate or delete references.
The PKS database stores information on the pods which includes watermark and consumption. Watermark is the number of pods that can run at a single time. Consumption is the memory and CPU usage of the pods. This database can also be accessed from the PKS VM. The billing database can be accessed with pre-defined credentials which can then be used with SQL queries.
PKS can also be monitored with sinks. RFC 5424 describes log ingress over TCP and introduces the notion of a sink. These sink resources help PKS to send the logs to that destination. Logs as well as events can use a shared format. The Kubernetes API events are denoted by the string “k8s.event” and with their “APP-NAME” field. A typical Kubernetes API event includes the host ID of the BOSH VM, the namespace and the Pod-ID as well. Failure to retrieve containers from Registry is specified with an identifying string of “Error: ErrImagePull”. Malfunctioning containers are denoted with “Back-off restarting failed container” in their events. Successful scheduling of containers have “Started container” in their events.
The logs for any cluster can also be downloaded from the PKS VM using the BOSH CLI command such as “logs pks/0”
Kubernetes master node VMs also have etcd an open source distributed key value store which it uses for service discovery and configuration sharing. The etcd also has metrics which help cluster health monitoring.

Tuesday, September 3, 2019

Security review of a product involves the following activities:

Analysis:

Threat Modeling

Perform detailed design analysis

List Assets, Activity matrix and Actions chart

Identify threats

Mitigate threats

Static Analysis

Perform code scanning activity

Perform binary scanning activity

Publish Code Analysis Reports for review by component owners

Publish Binary Analysis Reports for review by component owners

Mitigate risks from Code Analysis

Mitigate risks from Binary Analysis

Network Vulnerability Scanning:

Use available tools on a deployed instance of the product

Publish findings from the tool

Mitigate risks from the findings

Web Security testing:

Request PSO office to perform testing

Publish findings from the testing

Mitigate risks from the findings

Malware scanning:

Request Malware detection

Publish findings from malware detection

Mitigate any findings

Third party components

Harden third party components

Use latest secure versions

Source third party components

Documentation:

Provide a security configuration guide

Document known false positives

Vulnerability response plan

Patching Capability plan

Governance:

Participate in Security Training

Security Champions identified

Enforce coding conventions

Periodic security review with Business Unit

Monday, September 2, 2019

Kubernetes security guidelines:

These include:
1. These include hardening the kubernetes deployment by allowing access only via published endpoints

2. Enumerating and securing each and every resource - whether a system or custom resource and with role based access control

3. Leveraging the auditing framework available from the container engine with or without additional auditing enhancement products.

4. Descriptive logging from each and every system component including event generation and their collection via Daemonsets

5. Securing the storage and mounted persistent volumen claims with read only policies so that secrets are not divulged

6. Securing containers to run as non-root so that the code does not get escalated privilege to run

7. Using all linux capabilities as permission sets for code to run in the containers so that there is no undetected or uncontrolled access to the host

8. Secuing all external and internal connectivity with the help of proxies and tls for external facing connections.

9. Using service accounts specific to applications so that there is a containment and isolation of privilege with which applications run.

10. Securing fine grained permissions on individual operations and access of resources so thatt there is no unauthorized access

11. Setting up monitoring and alerting based on audit events so that the all intrusions and anomalies in the system can get detected, notified and mitigated.

Sunday, September 1, 2019

A survey of Kubernetes auditing frameworks:
1) A Do-it-yourself approach: Native Kubernetes framework supports the publishing of all events by the system for the core group resources such as pods, secrets, configmaps, persistent volume claims etc. This can be turned on with the help of the following:
An audit policy that specifies sections for verbs such as watch on resources such as pods, configmaps. It specifies levels where only changes affecting increasing orders of metadata level, metadata and request level, metadata, request and response levels are captured for publishing. Events are collected with the help of a collector agent such as fluentd. A log backend or a gateway may be used to specify the destination.
2) Available product – Falco: This is an auditing solution that can be deployed to a Kubernetes cluster and automates the auditing related activities. It also hosts a web server for the querying of audit events and enable more possibilities that would have otherwise required code to be written on the Kubernetes framework.
Falco is available as a container image for a Daemonset so it is deployment follows the same rules as any container resources. These container resources are available in the same way as any other Kubernetes framework.
3) Kubesec.io - This is an open source solution that features a rich web API server. It can scan any of the Kubernetes resources from its declaration and provide security analysis. This framework comes bundled with an http server that supports dynamic queries for security scans on Kubernetes resources.
The approach taken by each of the methods above is somewhat different. The native Kubernetes auditing is somewhat simpler with very little use of resources while allowing both policy and storage definitions to remain flexible. The use of FluentD is somewhat standard for log collections. All audit events sink to the logs although this could be redirected to syslog.
Falco takes the approach of deploying the Kubernetes Response Engine (KRE) which provides a way to send Falco alerts to a message broker service such as the Amazon Web Service’s Simple Notification Service. It also helps to deploy security playbooks that can mitigate the Falco alerts when they are raised. A service account needs to be created for Falco to connect to the Kubernetes API server to fetch resource metadata. The service account must be secured with suitable Role-based Access control policy. This is typical for any application hosted on Kubernetes. With the help of the image, the default Yaml configuration, the startup script, the service account and role, Falco provides a simple Daemonset deployment.
The approach taken by Kubesec.io among other typical deployments includes an admission controller which prevents the application of a privileged Daemonset, Deployment, or Statefulset to a Kubernetes cluster with the help of a score for the scanning of the declarations corresponding to each. A minimum score is required to gain admission.
All three approaches require the kube-apiserver command line utility to be called with specific command line parameters to initiate auditing, locate the rules declaration and to specify the log file and its rotation. This is done over ssh with the help of suitable credentials, the apiserver host address, the path to the rules to be uploaded.
As a side note, when clusters run on PKS - a cloud technology for automating and hosting Kubernetes clusters, the ssh option may not be available. The PKS native way is to login with the help of username and password to the pks-api server and interact with it using the pks command line tool. File to be uploaded as audit rules is used across clusters so this is a one-time requirement outside of the installer for the cluster. The same can also be mentioned in a user manual for the installer or the product’s security configuration guide.
Auditing events thus collected, can subsequently be queried for specific insights.

Saturday, August 31, 2019

Diagnosing keycloak connection refused error from a server hosted in the Kubernetes container
The following is a trail of steps down the path of an investigation that has proved time consuming and hard to resolve. Before we begin some introduction to the terminology may help explain the problem. These include the following:
A keycloak server is an identity management framework that has allows Independent identity providers to be configured
A Kubernetes framework is an orchestration framework that hosts application such as Keycloak and alleviates the automation efforts of servicing and scaling the resources used.
A container is a lightweight host for an I stance if the application and includes the operating system which is isolated for that application
A connection refused error comes from the keycloak server usually due to some configuration error
The diagnosis of the exact configuration error and location requires thorough investigation of all the aspects of the deployment pertaining to connectivity because this internal error could have ripple symptoms all the way to external boundary
The dns entry for the keycloak server should allow reaching the server by its name rather than by the cluster master ip address.
The IDENTITY_SERVER_URL must be configured properly with the scheme, address, port, path and qualifier.
The ingress resource for the service must respond be properly configured to translate the external connection with the internal connection to the backend. An improper connection will promptly result in “default backend 404” error. In the case of Keycloak the external facing host and port for the ingress maybe set to keycloak.myrealm.cluster.local and the port is 80 for http and 443 for SSL. The backend is usually available at port 8080 for Keycloak server.
The application server itself has a lot of settings that can be configured with application.properties, configuration.xml, and command line options. In addition settings can also be defined via passing them to the keycloak via proprietary files that have a “.cli” extension.
The command line can include a 0.0.0.0 option to bind the application to all available network interface as opposed to the localhost ip address of 127.0.0.1
The jboss.bind.address.management can also include this address instead of localhost. These settings do not need to be in conflict.
The security context of the container does not need to be elevated for connect to work and they can allow the keycloak server to run as jboss user
The application.properties can include a role setting such as follows:
keycloak.securityConstraints[0].authRoles[0]=* keycloak.securityConstraints[0].securityCollections[0].patterns[0]=/*
The trouble encountered during diagnosis include settings that don’t get applied, settings getting reset, typo in settings or multiple sources for the same settings.

Friday, August 30, 2019

Auditing:
Audit events are some of the system generated events that are widely produced due to compliance from almost all software products. These include system software, orchestrator frameworks and user applications. As a source of data for both storage and analysis, audit events are interesting use-case. Charts and graphs for reporting as well as queries for diagnostics and troubleshooting are extremely helpful. It is therefore very popular and applied in a variety of usages.
The stream storage is appropriate for use with audit events. In terms of scalability, consistency, durability and performance, this storage handles not just the size but also the count for the events in the stream.

Audit serves to detect unwanted access and maintain compliance with regulatory agencies. Most storage services enable auditing by each and every component in the control path. This is very much like the logging for components. In addition, the application exposes a way to retrieve the audits.
The best way for a higher-level storage product to enforce spatial locality of the data is to store it directly on the raw device. However, this uses up the disk partitions and the interfaces are OS specific. Instead developments like RAID and SAN have made the virtual device more appealing. Consequently, storage product now accesses a single file and place these blocks directly in the file. The file is essentially treated as a linear array of disk-resident pagesh
These higher-level storage products will try to postpone or reorder writes and this may conflict with OS read-ahead and write behind approach. The write ahead logging is required by these products to provide durability and correctness

Thursday, August 29, 2019

Example of streaming queries
DataStream<StockPrice> socketStockStream = env
.socketTextStream("localhost", 9999)
.map(new MapFunction<String, StockPrice>() {
private String[] tokens;

@Override
public StockPrice map(String value) throws Exception {
tokens = value.split(",");
return new StockPrice(tokens[0],
Double.parseDouble(tokens[1]));
}
});
//Merge all stock streams together
DataStream<StockPrice> stockStream = socketStockStream
.merge(env.addSource(new StockSource("TCKR", 10)););

WindowedDataStream<StockPrice> windowedStream = stockStream
.window(Time.of(10, TimeUnit.SECONDS))
.every(Time.of(5, TimeUnit.SECONDS));

DataStream<StockPrice> maxByStock = windowedStream.groupBy("symbol")
.maxBy("price").flatten();

The window method call as a data-driven example could be:
DataStream<String> priceWarnings = stockStream.groupBy("symbol")
.window(Delta.of(0.05, new DeltaFunction<StockPrice>() {
@Override
public double getDelta(StockPrice oldDataPoint, StockPrice newDataPoint) {
return Math.abs(oldDataPoint.price - newDataPoint.price);
}
}, DEFAULT_STOCK_PRICE))
.mapWindow(new SendWarning()).flatten();

Even a stream from social media can be used for correlations:
DataStream<Double> rollingCorrelation = tweetsAndWarning
.window(Time.of(30, TimeUnit.SECONDS))
.mapWindow(new WindowCorrelation());

The application stack for stream analysis can independently scale the analysis and storage tiers to their own clusters. Clusters in this case is not just for high availability but a form of distributed processing for scale out purposes. Many traditional desktop centric applications are invested way high on scale up techniques when scale out processing when workloads have become smarter to narrow the gap between peak traffic and regular traffic.