Cluster computing

Wednesday, April 1, 2020

Unlike traditional application frameworks, Kubernetes has a special meaning for service accounts. While service accounts continue to be different from user accounts where the former represents applications and the latter represents users, Kubernetes persists service accounts while delegating the same for user accounts to identity providers. The notion behind not persisting user accounts is that Kubernetes does not authenticate users, it validates assertions. Each request must assert an identity to Kubernetes. There is no web interface, login or session associated with this identity. Kubernetes honors requests and every action is specified with the help of an API request. Each request is unique, distinct and self-contained for authentication and authorization purposes. Identity is merely an assertion in these requests.
Service accounts are meant for interactions between applications and Kubernetes resources. A service account is injected by the admission controller at the time of resource creation if one is not provided already. This is the default service account and bears the name as such. A service account should never be mixed with user account otherwise they suffer many drawbacks. It should only be authorized with role-based access control otherwise any other scheme will encounter numerous items to audit. It should never be leaked, otherwise it can be abused indefinitely.
This last item requires a rotation of service accounts so that the misuse of a service account is limited to its discovery or a the issue of a new account. A service account is represented by a toke gold mounted at the location /var/run/secrets/kubernetes.io/serviceaccount in a Kubernetes pod. This can be injected and made to work with the same resources as earlier with the help of wrapping one secret into another
The persistence of service accounts and their usage as a secret makes it valuable to an autonomous secret management system that can keep track of secrets and rotate them as necessary. The external key manager that manages keys and certificates was built for a similar purpose where the secret was a key certificate. That system can work well for Kubernetes since the framework poses no limitation to injection or rotation.
The automation of periodic and on-demand rotation improves security and convenience for all Kubernetes usage.

Tuesday, March 31, 2020

There are a few advantages to using manifests and charts with custom resources that are self-contained as opposed to those that are provisioned external to the Kubernetes infrastructure via service brokers. When the charts are self-contained, they are completely within the Kubernetes system and accessible for create, update and delete via the kubectl. The kube-apiserver and the operators can take care of the state reconciliation and the system becomes the source of truth. This is not the case with external resources which can have varying degrees of deviation from truth depending on the external service provider.
Resources that are provisioned by service broker can also be kustomized. The charts for these resources are oblivious to the provisioners. It’s the kube-apiserver that determines high provisioners are to be tapped for a resource. This is looked up in the service catalog. The biding between the catalog and the provisioner is a loose one. They can get out of sync but the catalog guarantees a global registry. The manifest and charts work well to describe and define these externally provisioned custom resources
The resources that are provisioned externally have their own data management systems since they are external services and are responsible for the data they store in the system. This data can also be exported if the external service provides an option. The Kubernetes resource can then merely include a resource identifier or a uri for the data associated with the external resource. In such a case, the resource becomes exclusively control plane only.
There is a size limit to the self-contained Kubernetes resource. We can only embed Kubernetes secret in them. We cannot embed arbitrary byte ranges in Kubernetes resources. That would not be appropriate since it’s a resource for handling in the control plane. Any tar ball of data should be downloadable via a URI from its respective service. This keeps the concerns between control plane and data plane separate. It is also easy on the Kube-api server while delegating the sync between the custom resource and external service providers to be separate.
The size and the separation of concerns between the kubectl resource and its associated data does not stop with the external service and the kube-api server. The binding between the resource and the external service is managed with the help of a service catalog which lets services to be hosted outside the cluster. It adheres to OSBA api. It allows services to be independent and scaleable. It allows services to define their own resources.

Monday, March 30, 2020

Both persistent volume and network accessible storage refer to disk storage for Kubernetes which is a portable, extensible, open-source platform for managing containerized workloads and services. It is often a strategic decision for any company because it improves business value of their offering while relieving their business logic from the chores of the hosts so that the same logic can work elsewhere with minimal disruption to its use.
Kubernetes provides a familiar notion of shared storage system with the help of VolumeMounts accessible from each container. A volume mount is a shared file system which may be considered local to the container and reused across containers. The file system protocols have always facilitated the local and remote file storage with their support for distributed file systems. This allowed for databases, configurations and secrets to be available on disk across containers and provide single point of maintenance. Most storage regardless of which storage access protocol – file system protocols, http(s), block or stream are essentially moving data to storage so there is a transfer and latency involved.
The only question has been what latency, and I/O throughput is acceptable for the application and this has guided the decisions for the storage systems, appliances and their integrations. When the storage is tightly coupled with the compute such as between a database server and a database file, all the reads and writes incurred from performance benchmarks require careful arrangement of bytes, their packing, organization, index, checksums and error codes. But most applications hosted on Kubernetes don’t have the same requirements as a database server.
This design and relaxation of performance requirements from applications hosted on Kubernetes facilitates different connectors not just volume mounts. The notion that the same data can be sent to a destination regardless of the destination has been successfully demonstrated by log appenders which publish logs to a variety of destinations. Connectors, too, can help persist data written from the application to a variety of storage providers using consolidators, queues, cache and mechanisms that know how and when to write the data.
The native Kubernetes API does not support any other forms of storage connectors other than the VolumeMount but it does allow services to be written in the form of Kubernetes applications that can accept the data published over http(s) just like a time series database server accepts all kinds of events over the web protocol. The configuration of the endpoint, the binding of the service and the contract associated with the service for the connector definition may vary from destination to destination in the same data publishing application. This may call for the application to become a consolidator that can provide different storage class and support different data workload profiles. Appenders and connectors are popular design patterns that get re-used often and justify their business value.
The shared data volume can be made read-only and accessible only to the pods. This facilitates access restrictions. While authentication, authorization and audit can be enabled for storage connectors, they will still require RBAC access. Therefore, service accounted become necessary with storage connectors. A side-benefit of this security is that the accesses can now be monitored and alerted.

Sunday, March 29, 2020

This article introduces its readers to the long duration testing tools for storage engineering products. Almost all products made for digital storage have to deal with bytes of data that usually have a start offset, end offset and content for the range of byte. It’s the pattern of reading and writing various byte ranges that generates load on the storage products. Whether the destination for data is a file in a folder, a blob in a bucket, a content-addressable-storage in a remote store or a stream in a scope, the requirements for reading and writing of large amounts of data remains more or less the same. This would make most load or stress testing tools appear to be similar in nature with a standard battery of tests. This article calls out some of those re-appearing tests but it might also surprise us to know that these tools much like the products they test differ even within the same category playing one or more of their strengths.

These tests include:

1) Precise storage I/O for short and long durations

2) TCP or UDP generation with full flexibility for their traffic generation and control

3) Generating real-world data access patterns in terms of storage containers, their count and size

4) Supporting byte range validations with hashing, journaling of reads and writes, replay of load patterns and configuring compression and byte range validation

5) Able to identify, locate and resolve system malfunction through logging and support for test reports

6) Configurability to determine target systems, the inclusion of loads and the parameters necessary for test automation

7) Test across hundreds of remote clients and servers from a single controller.

8) Determining server-side performance exclusively from client-side reporting.

9) Able to pause, resume and show progress

10) Complete and accurate visibility into the server or component of storage products that show anomalies

11) Overloading the system until they break and being able to determine the threshold before the surge as well as after thrashing

Saturday, March 28, 2020

One thing manifests and charts should not do, is to rely exclusively on kustomizations to chain ownerships. Those belong in the chart or the operator code which exclusively define the Kubernetes resources. There is an inherent attribute on every Kubernetes resource called ownerReferences that allows us to chain construction and deletion of Kubernetes objects. This is very valuable to the resource because the kube-apiserver will perform these actions automatically. Specifying the chaining together with the resource makes it clear to Kubernetes control plane that these objects are first class citizens that it needs to manage. Specifying it elsewhere leaves the onus on the application not the the kube-apiserver. If there is some logic that cannot be handled via static yaml declarations, then it belongs in the operator code not the charts and the manifests
There are a few advantages to using manifests and charts with custom resources that are self-contained as opposed to those that are provisioned external to the Kubernetes infrastructure via service brokers. When the charts are self-contained, they are completely within the Kubernetes system and accessible for create, update and delete via the kubectl. The kube-apiserver and the operators can take care of the state reconciliation and the system becomes the source of truth. This is not the case with external resources which can have varying degrees of deviation from truth depending on the external service provider.
Resources that are provisioned by service broker can also be kustomized. The charts for these resources are oblivious to the provisioners. It’s the kube-apiserver that determines high provisioners are to be tapped for a resource. This is looked up in the service catalog. The biding between the catalog and the provisioner is a loose one. They can get out of sync but the catalog guarantees a global registry. The manifest and charts work well to describe and define these externally provisioned custom resources

Friday, March 27, 2020

Kustomization allows us to create object types that are not defined by the Kubernetes cluster. These object types allow us to group objects so that they may all have overlayed configuration or overridden configuration. One possible use case scenario is the overriding of all images to be loaded from a specific internal registry. These images are treated as an object type to separate its distinction from the Kubernetes image object. Regardless of the origin of these images, these objects can now uniformly be loaded from internal registry
The use of manifests for object types is a way of grouping resources for these changes. They can be modified to enhance annotations and labels. Those annotations are not limited to post configuration processing or interpretations. They can be used upfront with a selector to determine if they are candidates for creating a resource.
The use of labels is similar to the use of annotations in that they come helpful to allow their immediate use in templates from the charts. The grouping of resources by labels can in fact be specified as a static yaml configuration suitable use within a chart with the label as the parameter and the specification of a selector to match the label.
These mechanisms show a way to have the manifest generate what is needed by the charts in addition to their already well-established role in applying overlays and overrides via kustomization.
One thing manifests and charts should not do, is to rely exclusively on kustomizations to chain ownerships. Those belong in the chart or the operator code which exclusively define the Kubernetes resources. There is an inherent attribute on every Kubernetes resource called ownerReferences that allows us to chain construction and deletion of Kubernetes objects. This is very valuable to the resource because the kube-apiserver will perform these actions automatically. Specifying the chaining together with the resource makes it clear to Kubernetes control plane that these objects are first class citizens that it needs to manage. Specifying it elsewhere leaves the onus on the application not the the kube-apiserver. If there is some logic that cannot be handled via static yaml declarations, then it belongs in the operator code not the charts and the manifests

Thursday, March 26, 2020

The use of charts and manifests explained:
Kustomization with all its benefits as described in earlier posts also serves as a template free configuration
When the charts are authored they have to be general purpose with the user having the ability to specify the parameters for deployment. Most of these options are saved in the values file. Even if there are more than one chart, the user can provide these values to all the charts via a wrapper values file. These values are then used to override the default values that may come with the associated chart. The template allows taking these options from the values files and using it to create the Kubernetes resources as necessary. This makes the template work the same regardless of the deployment. The syntax used in the templates is quite flexible to read the options from the values file. Other than these parameters, the templates are mere static definitions for Kubernetes resources and encourage consistency throughout the definitions such as with annotations and labels.
When the charts are used with values file, kustomization is not necessary. It is just a value add on based on our discussion above. The user can choose between specifying manifests and charts to suit his deployment. They serve different purposes and it would behoove the user to use either or both appropriately.
The user also has the ability to compose multiple values files and use them together with comma separations when specifying to apply the chart. This allows the user to organize the options in the values file to be granular and then include them on a case by case basis.