Cluster computing

Monday, March 30, 2020

Both persistent volume and network accessible storage refer to disk storage for Kubernetes which is a portable, extensible, open-source platform for managing containerized workloads and services. It is often a strategic decision for any company because it improves business value of their offering while relieving their business logic from the chores of the hosts so that the same logic can work elsewhere with minimal disruption to its use.
Kubernetes provides a familiar notion of shared storage system with the help of VolumeMounts accessible from each container. A volume mount is a shared file system which may be considered local to the container and reused across containers. The file system protocols have always facilitated the local and remote file storage with their support for distributed file systems. This allowed for databases, configurations and secrets to be available on disk across containers and provide single point of maintenance. Most storage regardless of which storage access protocol – file system protocols, http(s), block or stream are essentially moving data to storage so there is a transfer and latency involved.
The only question has been what latency, and I/O throughput is acceptable for the application and this has guided the decisions for the storage systems, appliances and their integrations. When the storage is tightly coupled with the compute such as between a database server and a database file, all the reads and writes incurred from performance benchmarks require careful arrangement of bytes, their packing, organization, index, checksums and error codes. But most applications hosted on Kubernetes don’t have the same requirements as a database server.
This design and relaxation of performance requirements from applications hosted on Kubernetes facilitates different connectors not just volume mounts. The notion that the same data can be sent to a destination regardless of the destination has been successfully demonstrated by log appenders which publish logs to a variety of destinations. Connectors, too, can help persist data written from the application to a variety of storage providers using consolidators, queues, cache and mechanisms that know how and when to write the data.
The native Kubernetes API does not support any other forms of storage connectors other than the VolumeMount but it does allow services to be written in the form of Kubernetes applications that can accept the data published over http(s) just like a time series database server accepts all kinds of events over the web protocol. The configuration of the endpoint, the binding of the service and the contract associated with the service for the connector definition may vary from destination to destination in the same data publishing application. This may call for the application to become a consolidator that can provide different storage class and support different data workload profiles. Appenders and connectors are popular design patterns that get re-used often and justify their business value.
The shared data volume can be made read-only and accessible only to the pods. This facilitates access restrictions. While authentication, authorization and audit can be enabled for storage connectors, they will still require RBAC access. Therefore, service accounted become necessary with storage connectors. A side-benefit of this security is that the accesses can now be monitored and alerted.

Sunday, March 29, 2020

This article introduces its readers to the long duration testing tools for storage engineering products. Almost all products made for digital storage have to deal with bytes of data that usually have a start offset, end offset and content for the range of byte. It’s the pattern of reading and writing various byte ranges that generates load on the storage products. Whether the destination for data is a file in a folder, a blob in a bucket, a content-addressable-storage in a remote store or a stream in a scope, the requirements for reading and writing of large amounts of data remains more or less the same. This would make most load or stress testing tools appear to be similar in nature with a standard battery of tests. This article calls out some of those re-appearing tests but it might also surprise us to know that these tools much like the products they test differ even within the same category playing one or more of their strengths.

These tests include:

1) Precise storage I/O for short and long durations

2) TCP or UDP generation with full flexibility for their traffic generation and control

3) Generating real-world data access patterns in terms of storage containers, their count and size

4) Supporting byte range validations with hashing, journaling of reads and writes, replay of load patterns and configuring compression and byte range validation

5) Able to identify, locate and resolve system malfunction through logging and support for test reports

6) Configurability to determine target systems, the inclusion of loads and the parameters necessary for test automation

7) Test across hundreds of remote clients and servers from a single controller.

8) Determining server-side performance exclusively from client-side reporting.

9) Able to pause, resume and show progress

10) Complete and accurate visibility into the server or component of storage products that show anomalies

11) Overloading the system until they break and being able to determine the threshold before the surge as well as after thrashing

Saturday, March 28, 2020

One thing manifests and charts should not do, is to rely exclusively on kustomizations to chain ownerships. Those belong in the chart or the operator code which exclusively define the Kubernetes resources. There is an inherent attribute on every Kubernetes resource called ownerReferences that allows us to chain construction and deletion of Kubernetes objects. This is very valuable to the resource because the kube-apiserver will perform these actions automatically. Specifying the chaining together with the resource makes it clear to Kubernetes control plane that these objects are first class citizens that it needs to manage. Specifying it elsewhere leaves the onus on the application not the the kube-apiserver. If there is some logic that cannot be handled via static yaml declarations, then it belongs in the operator code not the charts and the manifests
There are a few advantages to using manifests and charts with custom resources that are self-contained as opposed to those that are provisioned external to the Kubernetes infrastructure via service brokers. When the charts are self-contained, they are completely within the Kubernetes system and accessible for create, update and delete via the kubectl. The kube-apiserver and the operators can take care of the state reconciliation and the system becomes the source of truth. This is not the case with external resources which can have varying degrees of deviation from truth depending on the external service provider.
Resources that are provisioned by service broker can also be kustomized. The charts for these resources are oblivious to the provisioners. It’s the kube-apiserver that determines high provisioners are to be tapped for a resource. This is looked up in the service catalog. The biding between the catalog and the provisioner is a loose one. They can get out of sync but the catalog guarantees a global registry. The manifest and charts work well to describe and define these externally provisioned custom resources

Friday, March 27, 2020

Kustomization allows us to create object types that are not defined by the Kubernetes cluster. These object types allow us to group objects so that they may all have overlayed configuration or overridden configuration. One possible use case scenario is the overriding of all images to be loaded from a specific internal registry. These images are treated as an object type to separate its distinction from the Kubernetes image object. Regardless of the origin of these images, these objects can now uniformly be loaded from internal registry
The use of manifests for object types is a way of grouping resources for these changes. They can be modified to enhance annotations and labels. Those annotations are not limited to post configuration processing or interpretations. They can be used upfront with a selector to determine if they are candidates for creating a resource.
The use of labels is similar to the use of annotations in that they come helpful to allow their immediate use in templates from the charts. The grouping of resources by labels can in fact be specified as a static yaml configuration suitable use within a chart with the label as the parameter and the specification of a selector to match the label.
These mechanisms show a way to have the manifest generate what is needed by the charts in addition to their already well-established role in applying overlays and overrides via kustomization.
One thing manifests and charts should not do, is to rely exclusively on kustomizations to chain ownerships. Those belong in the chart or the operator code which exclusively define the Kubernetes resources. There is an inherent attribute on every Kubernetes resource called ownerReferences that allows us to chain construction and deletion of Kubernetes objects. This is very valuable to the resource because the kube-apiserver will perform these actions automatically. Specifying the chaining together with the resource makes it clear to Kubernetes control plane that these objects are first class citizens that it needs to manage. Specifying it elsewhere leaves the onus on the application not the the kube-apiserver. If there is some logic that cannot be handled via static yaml declarations, then it belongs in the operator code not the charts and the manifests

Thursday, March 26, 2020

The use of charts and manifests explained:
Kustomization with all its benefits as described in earlier posts also serves as a template free configuration
When the charts are authored they have to be general purpose with the user having the ability to specify the parameters for deployment. Most of these options are saved in the values file. Even if there are more than one chart, the user can provide these values to all the charts via a wrapper values file. These values are then used to override the default values that may come with the associated chart. The template allows taking these options from the values files and using it to create the Kubernetes resources as necessary. This makes the template work the same regardless of the deployment. The syntax used in the templates is quite flexible to read the options from the values file. Other than these parameters, the templates are mere static definitions for Kubernetes resources and encourage consistency throughout the definitions such as with annotations and labels.
When the charts are used with values file, kustomization is not necessary. It is just a value add on based on our discussion above. The user can choose between specifying manifests and charts to suit his deployment. They serve different purposes and it would behoove the user to use either or both appropriately.
The user also has the ability to compose multiple values files and use them together with comma separations when specifying to apply the chart. This allows the user to organize the options in the values file to be granular and then include them on a case by case basis.

Wednesday, March 25, 2020

Billing requests are expected to be continuous for a business. Are they indeed a stream so that they can be stored as such ? Consider that billing requests have existing established technologies that compete with the business justification of storing them as streams? Will there be cost savings as well as expanded benefits? If the stream storage is a market disruptive innovation, can it be overlaid over object storage which has established itself as a “standard storage” in the enterprise and cloud. As it brings many of the storage best practice to provide durability, scalability, availability and low cost to its users, it can go beyond tier 2 storage to become nearline storage for vectorized execution. Web accessible storage has been important for vectorized execution. We suggest that some of the NoSQL stores can be overlaid on top of object storage and discuss an example with Event storage. We focus on the use case of billing requests because they are not relational and find many applications that are similar to the use cases of object storage. Specifically, events conform to append only stream storage due to the sequential nature of the events. billing requests are also processed in windows making a stream processor such as Flink extremely suitable for events. Stream processors benefit from stream storage and such a storage can be overlaid on any Tier-2 storage. In particular, object storage unlike file storage can come very useful for this purpose since the data also becomes web accessible for other analysis stacks. Object storage then transforms from being a storage layer participating in vectorized executions to one that actively builds metadata, maintains organizations, rebuilds indexes, and supporting web access for those don’t want to maintain local storage or want to leverage easy data transfers from a stash. Object storage utilize a queue layer and a cache layer to handle processing of data for pipelines. We presented the notion of fragmented data transfer with an earlier document. Here we suggest that billing requests are similar to fragmented data transfer and how object storage can serve both as source and destination of billing requests.

Event storage gained popularity because a lot of IoT devices started producing them. Read and writes were very different from conventional data because they were time-based sequential and progressive. Although stream storage is best for events, any time-series database could also work. However, they are not web-accessible unless they are in an object store. Their need for storage is not very different from applications requiring object storage that facilitate store and access. However as object storage makes inwards into vectorized execution, the data transfers become increasingly fragmented and continuous. At this junction it is important to facilitate data transfer between objects and Event and it is in this space that billing requests and object store find suitability. Search, browse and query operations are facilitated in a web service using a web-accessible store.

Tuesday, March 24, 2020

We were discussing Kubernetes Kustomization. There are two advantages to using it. First, it allows us to configure the individual components of the application without requiring changes in them. Second, it allows us to combine components from different sources and overlay them or even override certain configurations. The kustomize tool provides this feature. Kustomize can add configmaps and secrets to the deployments using their specific generators respectively.
Kustomize is static declaration. We can add labels across components. We can choose the groups of Kubernetes resources dynamically using selectors but they have to be declared as yaml. This kustomization yaml is usually stored as manifests and applied on existing components so they refer to other yamls. The manifests is a way of specifying the location of the kustomization files and passing it as a commandline parameter to kubectl commands with -k option
For example, we can say:
commonLabels:
app: potpourri-app
resources:
- deployment.yaml
- service.yaml
We can even add new resources such as K8s secret
This comes useful to inject username passwords for say a database application at the time of install and uninstall with the help of a resource called secret.yaml. It just won't detect a virus to force an uninstall of the product. Those actions remain with the user.
Kustomize also helps us to do overlays and overrides. Overlay means we change parameters for one or more existing components. Override means we take an existing yaml and change portions of it such as changing the service to be of type LoadBalancer instead of NodePort or vice versa for developer builds. In this case, we provide just enough information to lookup the declaration we want to modify and specify the modification. For example:
apiVersion:v1
kind:Service
metadata:
name: myservice
spec:
type: NodePort
If the above service type modification were persisted side by side as prod and dev environment, it would be called an overlay.
Finally the persistence of kustomization files is not strictly required and we can run:
kustomize build manifests_folder | kubectl apply -f
or
kubectl apply -k
One of the interesting applications of Kustomization is the use of internal docker registries.
we use the secretGenerator to create the secret for the registry which typically has the
docker-server, docker-username, docker-password and docker-email and the secret type to be type: docker-registry
This secret can take environment variables and the kustomization file can even be stored in source control.