Cluster computing

Wednesday, August 7, 2019

The reliance on Kubernetes cluster only log storage does not compete with log services external to the cluster. The cluster can be self-sufficient for limited logs while the cluster external services can provide a durable storage along with all the best practices of storage engineering. The cluster only log storage can leverage high availability and load balancing indigenous to the cluster while the cluster external services can divert log reading loads from the cluster itself. The cluster can reduce points of failure and facilitate capture of console sessions and other diagnostics actions taken on the cluster while the services external to the service cannot change the source of truth. The cluster specific log collection allows the ability to specify fluentd rules while the services outside the cluster have to rely on classification.
The advantages of cluster-specific rules different from cluster-external rules over having no differentiation is both significant and win-win for application deployers as well as their customers. Consider the use case where the analysis over logs need to have proprietary logic or have query hints or annotations that can help product support but need not be part of the queries from customers. Taking this to another extreme, let us say the cluster deployed application would like to have competitive advantage over the marketplace log store capabilities outside the cluster. These use cases broaden the horizon over the storage of logs especially when the application is a storage product.
Let us take another use case where the cluster specific solution provides an interim remediation for disaster recovery especially when the points of failure are external services. In such a failure case, the user will remain blind to the operations of the cluster since the cluster external log services are not giving visibility into the latest log entries. Similarly, external network connection may have been taken for granted while the administrator may find it easy to retrieve the logs from the cluster and send it offline for analysis by remote teams. The dual possibility of internal and external provides benefits for many other product perspectives.

Tuesday, August 6, 2019

It is customary to have specific log queries against the introspection log store within the cluster because they identify some of the common root cause analysis from product Support and Sustaining Engineering. Some of these queries have been mentioned in the attached document:

https://1drv.ms/w/s!Ashlm-Nw-wnWtCFt6NmEo6HhZmWm

It would be even more flexible to have support for more than one kind of log stores. For example, we could have a cluster internal logstore and a cluster external logstore. The external logstore could be a time-series store, a database or a web-accessible object store with each participating in different workflows with different usages.

These log queries then become as useful as log specific d-scripts. The stores themselves may help with visualizations for these and other queries.

Provisioning of stores for logs external to clusters is dependent on the service broker. This may be done during the cluster setup or when the logs become available. The logs usually have nearly a five year retention period although only the last 90 days are most heavily used in queries. The maintenance of the store is not a concern as long as it is hosted in a public cloud.

Monday, August 5, 2019

The storage product is a sink for the log streams. It is also not the only sink. The use of log appender help with registering other sinks and transferring the data to the sink. The use of more than one sinks helps with the publication of the data to different usages. Some of the usages might involve notifications.
It is customary to have specific log queries against the introspection log store within the cluster because they identify some of the common root cause analysis from product Support and Sustaining Engineering. Some of these queries have been mentioned in the attached document:
https://1drv.ms/w/s!Ashlm-Nw-wnWtCFt6NmEo6HhZmWm
It would be even more flexible to have support for more than one kind of log stores. For example, we could have a cluster internal logstore and a cluster external logstore. The external logstore could be a time-series store, a database or a web-accessible object store with each participating in different workflows with different usages.
These log queries then become as useful as log specific d-scripts. The stores themselves may help with visualizations for these and other queries.
Provisioning of stores for logs external to clusters is dependent on the service broker. This may be done during the cluster setup or when the logs become available. The logs usually have nearly a five year retention period although only the last 90 days are most heavily used in queries. The maintenance of the store is not a concern as long as it is hosted in a public cloud.

Sunday, August 4, 2019

Logs, like any other data, are subject to extract-transform-load especially for deriving properties and annotations that serve to augment the information associated with the logs and facilitate search. The log channels are independent and can be numerous. Since each channel can be viewed as a datastream, it is also possible to customize log searches to be over streams rather than time-series buckets.
In fact, if the logs are being generated with the use of a storage product, then storage for logs can be earmarked in the same product as one of the sinks for the logs. This reservation is only for introspection on the operations of the product. The persistence plays a major factor in doing away with the need for external systems. It also helps with dogfooding the product. This is essential because ongoing operations continuously produce more logs and the more the data, the better the use of storage product.
The storage product is a sink for the log streams. It is also not the only sink. The use of log appender help with registering other sinks and transferring the data to the sink. The use of more than one sinks helps with the publication of the data to different usages. Some of the usages might involve notifications.

Saturday, August 3, 2019

Although logs are sanitized prior to persistence. Their accessibility should also be secured.

This can be done by securing the logs in transit as well as at rest. For example, the logs may be sent over with transport layer security. The service broker responsible for provisioning the service external to the cluster for handling the logs may use and validate tokens for the control plane prior to the logs being sent over the data plane and registering access to do so only for those users. This facilitates a Role Based Access Control over all the control resources provisioned to be used with the data.

The obfuscation of data was mentioned as to be controlled at the source. This is made possible with the help of pattern matching in log filters. The log filters are convenient to be used when the log entries are flushed to file. In most other cases, though the log entries have to be sanitized afterwards. This is made possible with processing text in downstream systems or at the log index store. For example, log storage products are already equipped to remove Personally Identifiable Information that needs to be removed for compliance with the application.

Friday, August 2, 2019

We were discussing sidecar model for logging. We also discussed service broker model for logging. Keycloak deployments are also facilitated with service brokers. The vanilla deployment of Keycloak model uses http. Usually a Keycloak pod is hosted within the cluster. Since the traffic is internal to the cluster, there is no security issue.
However, traffic may need to be secured with tls when the keycloak is hosted outside the server or when the service broker implementations uses some services outside the cluster.
Let us take a quick look at how to do this
1) First we create a PKCS12 keystore
2) Then we import it as a jks keystore
3) Then we set the security realms in the standalone.xml config

<security-realm name="ProductRealm">
<server-identities>
<ssl>
<keystore path="keycloak.jks" relative-to="jboss.server.config.dir" keystore-password="password" />
</ssl>
</server-identities>
</security-realm>

4) Then we replace the authentication-realm in the https-listener below:
<https-listener name="https" socket-binding="https" security-realm="ProductRealm" enable-http2="true"/>
<host name="default-host" alias="localhost">
<location name="/" handler="welcome-content"/>
<http-invoker security-realm="ProductRealm"/>
</host>

If the deployments were all within the cluster, another option to secure the services, would be to use a service-mesh.

Thursday, August 1, 2019

Kubernetes has an interesting behavior due to its garbage collector. Resources has object references and the collector collects object where they once had an owner and now they don't. These objects are deleted automatically. When the dependents are deleted automatically, it is called cascading deletion. There are two types of cascading deletion- These are foreground and background.
If the object is deleted without deleting its dependents automatically, the dependents are said to be orphaned.
Every dependent object has a metadata field named ownerReferences that points to the owning object. However, sometimes Kubernetes sets the ownerReferences value automatically This is true for objects created or adopted by the ReplicationController, ReplicaSet, StatefulSet, DaemonSet, Deployment, Job and CronJob.
The ownerReference field can be dumped with the metadata when we use the kubectl command to display the resource. Manual override of the ownerReference is risky because it does not change and if improperly specified the objects may be orphaned.
Cross-namespace owner references are not allowed by design. Moreover the owner and the dependents have to be in the same or smaller scope. For example, namespace scoped dependents can only specify owners in the same namespace not owners in different namespace. They can however specify owners that are cluster scoped. Cluster-scoped dependents can only specify cluster scoped owners but not namespace scoped owners
When the object is deleted, and its dependents are deleted, the deletion is cascading. The foreground cascading deletion and the background cascading deletion differ in the delay for the deletion. The foreground deletion will put the root object in the 'deletion in progress' state whereas the background object will delete the owner object immediately. The dependent objects are deleted subsequently while it is the reverse in foreground.
The deletion policy is specified by setting the propagationPolicy field on the deleteOptions. The values specified can be one of orphan, foreground or background. When deleting deployments we must use propagationPolicy:Foreground to delete not only the ReplicaSets created but also their pods.
#codingexercise
#codingexercise
int getToDo(List<Task> tasks) {
int count = 0;
For (Task task :tasks) {
If (task.Completed == false) {
count += 1;
}
}
Return count;
}