Cluster computing

Tuesday, September 24, 2019

We wee discussing how to implement the kunernetes sink destination as a service within the cluster, in this regard, the service merely needs to listen on a port.

public class PravegaGateway {
private static final Logger logger = Logger.getLogger(PravegaGateway.class.getName());

private Server server;

private void start() throws IOException {
int port = CommonParams.getListenPort();
server = NettyServerBuilder.forPort(port)
.permitKeepAliveTime(1, TimeUnit.SECONDS)
.permitKeepAliveWithoutCalls(true)
.addService(new PravegaServerImpl())
.build()
.start();
logger.info("Server started, listening on " + port);
logger.info("Pravega controller is " + CommonParams.getControllerURI());
Runtime.getRuntime().addShutdownHook(new Thread() {
@Override
public void run() {
// Use stderr here since the logger may have been reset by its JVM shutdown hook.
System.err.println("*** shutting down gRPC server since JVM is shutting down");
PravegaGateway.this.stop();
System.err.println("*** server shut down");
}
});
}

Monday, September 23, 2019

We continue with our discussion on logging in Kubernetes. In the example provided, a log service is assumed to be made available. This service accessible at a predefined host and port.
Implementing the log-service within the cluster and listening on a tcp port will help with moving the log destination inside the cluster as stream storage and yet another application within the cluster. A service listening on a port typical for a Java application. The application will read from the socket and write to a pre-determined stream defined by the following parameters:
Stream name
Stream scope name
Stream reading timeout
Stream credential username
Stream credential password

public void write(String scope, String streamName, URI controllerURI, String routingKey, String message) {
StreamManager streamManager = StreamManager.create(controllerURI);
final boolean scopeIsNew = streamManager.createScope(scope);

StreamConfiguration streamConfig = StreamConfiguration.builder()
.scalingPolicy(ScalingPolicy.fixed(1))
.build();
final boolean streamIsNew = streamManager.createStream(scope, streamName, streamConfig);

try (ClientFactory clientFactory = ClientFactory.withScope(scope, controllerURI);
EventStreamWriter<String> writer = clientFactory.createEventWriter(streamName,
new JavaSerializer<String>(),
EventWriterConfig.builder().build())) {

System.out.format("Writing message: '%s' with routing-key: '%s' to stream '%s / %s'%n",
message, routingKey, scope, streamName);
final CompletableFuture writeFuture = writer.writeEvent(routingKey, message);
}
}

public void readAndTransform(String scope, String streamName, URI controllerURI) {
StreamManager streamManager = StreamManager.create(controllerURI);

final boolean scopeIsNew = streamManager.createScope(scope);
StreamConfiguration streamConfig = StreamConfiguration.builder()
.scalingPolicy(ScalingPolicy.fixed(1))
.build();
final boolean streamIsNew = streamManager.createStream(scope, streamName, streamConfig);

final String readerGroup = UUID.randomUUID().toString().replace("-", "");
final ReaderGroupConfig readerGroupConfig = ReaderGroupConfig.builder()
.stream(Stream.of(scope, streamName))
.build();
try (ReaderGroupManager readerGroupManager = ReaderGroupManager.withScope(scope, controllerURI)) {
readerGroupManager.createReaderGroup(readerGroup, readerGroupConfig);
}

try (ClientFactory clientFactory = ClientFactory.withScope(scope, controllerURI);
EventStreamReader<String> reader = clientFactory.createReader("reader",
readerGroup,
new JavaSerializer<String>(),
ReaderConfig.builder().build())) {
System.out.format("Reading all the events from %s/%s%n", scope, streamName);
EventRead<String> event = null;
do {
try {
event = reader.readNextEvent(READER_TIMEOUT_MS);
if (event.getEvent() != null) {
System.out.format("Read event '%s'%n", event.getEvent());
String message = transform(event.getEvent().toString());
write("srsScope", "srsStream", "tcp://127.0.0.1:9090", "srsRoutingKey", message);
}
} catch (ReinitializationRequiredException e) {
//There are certain circumstances where the reader needs to be reinitialized
e.printStackTrace();
}
} while (event.getEvent() != null);
System.out.format("No more events from %s/%s%n", scope, streamName);
}
}
}

Sunday, September 22, 2019

Logging in Kubernetes illustrated:
Kubernetes is a container orchestration framework that hosts applications. The logs from the applications are saved local to the container. Hosts are nodes of a cluster and can be scaled out to as many as required for the application workload. A forwarding agent forwards the logs from the container to a sink. A sink is a destination where all the logs forwarded can be redirected to a log service that can better handle the storage and analysis of the logs. The sink is usually external to the cluster so it can scale independently and may be configured to receive via a well known protocol called syslog. A continuously running process forwards the logs from the container to the node before drains to the sink via the forwarding agent. This process is specified with a syntax referred to as a Daemonset which enables the same process to be started on a set of nodes which is usually all the nodes in the cluster.

The log destination may be internal to the Kubernetes if the application providing log service hosted within the cluster.
The following are the configurations to setup the above logging mechanism:
1. Registering a sink controller:
apiVersion: apps.pivotal.io/v1beta1
kind: Sink
metadata:
name: YOUR-SINK
namespace: YOUR-NAMESPACE
spec:
type: syslog
host: YOUR-LOG-DESTINATION
port: YOUR-LOG-DESTINATION-PORT
enable_tls: true

2. Deploying a forwarding agent
apiVersion: v1
kind: ServiceAccount
metadata:
name: fluentd
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
name: fluentd
namespace: kube-system
rules:
- apiGroups:
- ""
resources:
- pods
- namespaces
verbs:
- get
- list
- watch
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: fluentd
roleRef:
kind: ClusterRole
name: fluentd
apiGroup: rbac.authorization.k8s.io
subjects:
- kind: ServiceAccount
name: fluentd
namespace: kube-system
3. Deploying the Daemonset
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: fluentd
namespace: kube-system
labels:
k8s-app: fluentd-logging
version: v1
kubernetes.io/cluster-service: "true"
spec:
template:
metadata:
labels:
k8s-app: fluentd-logging
version: v1
kubernetes.io/cluster-service: "true"
spec:
serviceAccount: fluentd
serviceAccountName: fluentd
tolerations:
- key: node-role.kubernetes.io/master
effect: NoSchedule
containers:
- name: fluentd
image: fluent/fluentd-kubernetes-daemonset:elasticsearch
env:
- name: FLUENT_ELASTICSEARCH_HOST
value: “sample.elasticsearch.com”
- name: FLUENT_ELASTICSEARCH_PORT
value: "30216"
- name: FLUENT_ELASTICSEARCH_SCHEME
value: "https"
- name: FLUENT_UID
value: "0"
# X-Pack Authentication
# =====================
- name: FLUENT_ELASTICSEARCH_USER
value: "someusername"
- name: FLUENT_ELASTICSEARCH_PASSWORD
value: "somepassword"
resources:
limits:
memory: 200Mi
requests:
cpu: 100m
memory: 200Mi
volumeMounts:
- name: varlog
mountPath: /var/log
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
terminationGracePeriodSeconds: 30
volumes:
- name: varlog
hostPath:
path: /var/log
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers

Saturday, September 21, 2019

Aspects of monitoring feature in a containerization framework.
This article describes the salient features of tools used for monitoring containers hosted on an orchestration framework. Some products are dedicated for this purpose and strive to make it easier for administrators to help with this use-case. They usually fall in two categories – one for a limited set of built-in metrics that help with, say, the master to manage the pods and another that gives access to custom metrics which helps with, say, horizontal scaling of resources.
Most of the metrics products such as Prometheus for Kubernetes orchestration framework fall in the second category. The first category is rather lightweight and served over the http using resource APIs
The use of metrics is a common theme and the metrics are defined in a JSON format with key value pairs and timestamps. They might also carry additional descriptive information that might aid the handling of metrics. Metrics are evaluated by expressions that usually look at a time-based window which gives slices of data points. These data points allow the use of calculator functions such as sum, min, average and so on.
Metrics often follow their own route to a separate sink. In some deployments of the Kubernetes orchestration framework, the sink refers to an external entity that know how to store and query metrics. The collection of metrics at the source and the forwarding of metrics to the destination follow conventional mechanisms that are similar to logs and audit events both of which have their own sinks.
As with all agents and services on a container, a secret or an account is required to control the accesses to resources for their activities. This role-based access control, namespace and global naming conventions is a prerequisite for any agent.
The agent has to run continuously forwarding data with little or no disruption. Some orchestrators facilitate this with the help of concepts similar to a Daemonset that run endlessly. The deployment is verified to be working correctly when a standard command produces an output same as a pre-defined output. Verification of monitoring capabilities becomes part of the installation feature.
The metrics comes helpful to be evaluated against thresholds that trigger alerts. This mechanism is used to complete the monitoring framework which allows rules to be written with expressions involving thresholds that then raise suitable alerts. Alerts may be delivered via messages or email or any other form of notification services. Dashboards and mitigation tools may be provided from the product providing full monitoring solution.
Almost all activities of any resource in the orchestrator framework can be measured. These include the core system resources which may spew the data to logs or to audit event stream. The option to combine metrics, audit and logs effectively rests with the administrator rather than the product designed around one or more of these.
Specific queries help with the charts for monitoring dashboards. These are well-prepared and part of a standard portfolio that help with the analysis of the health of the containers.

Friday, September 20, 2019

We looked at a few examples of applying audit data for the overall product. In all these cases, the audit dashboards validate the security and integrity of the data.

The incident review audit dashboard provides an overview of the incidents associated with users.
The Suppression audit dashboard provides an overview of notable event suppression activity.
The Per-Panel Filter Audit dashboard provides information about the filters currently in use in the deployment.
The Adaptive Response Action Center dashboard provides an overview of the response actions initiated by adaptive response actions, including notable event creation and risk scoring.
The Threat Intelligence Audit dashboard tracks and displays the current status of all threat and generic intelligence sources.
The product configuration health dashboard is used to compare the latest installed version of the product to prior releases and identity configuration anomalies.

The Data model audit dashboard displays the information about the state of data model accelerations in the environment. Acceleration here refers to a speed up of data models that represent extremely large datasets where certain operations such as pivots become faster with the use of data-summary backed methods.
The connectors audit report on hosts forwarding data to the product. This audit is an example for all other components that participate in data handling
The data protection dashboard reports on the status of the data integrity controls.
Audit dashboards provide a significant opportunity to show complete, rich, start-to-finish user session activity data in real-time. These include all access attempts, session commands, data accessed, resources used, and many more. Dashboards can also be compelling and intuitive for Administrator intervention to user experience. Security information and event management can be combined from dedicated systems as well as application audit. This helps to quickly and easily resolve security incidents. The data collection can be considered as tamper-proof which makes the dashboard the source of truth.

Thursday, September 19, 2019

Let us look at a few examples of applying audit data for the overall product. In all these cases, the audit dashboards validate the security and integrity of the data. The audit data must be forwarded and the data should not be tampered with.
The incident review audit dashboard provides an overview of the incidents associated with users. It displays how many incidents are associated with a specific user. The incidents may be selected based on different criteria such by status, by user or by kind or other forms of activity. Recent activities also help determine the relevance of the incidents.
The Suppression audit dashboard provides an overview of notable event suppression activity. This dashboard shows how many events are being suppressed, and by whom, so that notable event suppression can be audited and reported on. Suppression is just as important as an audit of access to resources.
The Per-Panel Filter Audit dashboard provides information about the filters currently in use in the deployment.
The Adaptive Response Action Center dashboard provides an overview of the response actions initiated by adaptive response actions, including notable event creation and risk scoring.
The Threat Intelligence Audit dashboard tracks and displays the current status of all threat and generic intelligence sources. As an analyst, you can review this dashboard to determine if threat and generic intelligence sources are current, and troubleshoot issues connecting to threat and generic intelligence sources.
The ES configuration health dashboard is used to compare the latest installed version of the product to prior releases and identity configuration anomalies. The dashboard can be made to review against specific past versions.
The Data model audit dashboard displays the information about the state of data model accelerations in the environment. Acceleration here refers to a speed up of data models that represent extremely large datasets where certain operations such as pivots become faster with the use of data-summary backed methods.
The connectors audit report on hosts forwarding data to the product. This audit is an example for all other components that participate in data handling
The data protection dashboard reports on the status of the data integrity controls.

Wednesday, September 18, 2019

There are a few caveats to observe when processing events as opposed to other resources with webhooks.

Events can come as a cascade, we cannot cause undue delay in the processing of any one of them. Events can cause the generation of other events. This would require stopping additional events from generating if the webhook is responsible for transforming and creating new events that cause those additional events. The same event can occur again, if the transformed event triggers the original event or if the transformed events ends up again in the webhook because the transformed event is also of the same kind as the original event. The webhook has to skip the processing of the event it has already handled.

The core resources of Kubernetes all are subject to the same verbs of Get, create, delete, etc. However, events are far more numerous and occur at a fast clip than some of the other resources. This calls for handling and robustness equivalent to the considerations in a networking protocol or a messaging queue. The webhook is not designed to handle such rate or latency even though it might skip a lot of the events. Consequently, selector labels and processing criteria become all the more important.

The performance consideration of the event processing by webhooks aside, a transformation of an event by making an http request to an external service also suffers from outbound side considerations such as the timeouts and retries on making the http request. Since performance of the webhooks of events has been called out above as significantly different from the few and far in between processing for other resources, the webhook might have to make do with lossy transformation as opposed to a persistence-based buffering.

None of the above criteria should matter for the usage of this kind of webhook for the purpose of diagnostics which is most relevant to troubleshooting analysis. Diagnostics have no requirement on performance from webhook. When the events are already filtered to audit events, and when the conversion of those events is even more selective, the impact on diagnostics is little or none from changes in the rate and delay of incoming events.