Cluster computing

Sunday, March 29, 2020

This article introduces its readers to the long duration testing tools for storage engineering products. Almost all products made for digital storage have to deal with bytes of data that usually have a start offset, end offset and content for the range of byte. It’s the pattern of reading and writing various byte ranges that generates load on the storage products. Whether the destination for data is a file in a folder, a blob in a bucket, a content-addressable-storage in a remote store or a stream in a scope, the requirements for reading and writing of large amounts of data remains more or less the same. This would make most load or stress testing tools appear to be similar in nature with a standard battery of tests. This article calls out some of those re-appearing tests but it might also surprise us to know that these tools much like the products they test differ even within the same category playing one or more of their strengths.

These tests include:

1) Precise storage I/O for short and long durations

2) TCP or UDP generation with full flexibility for their traffic generation and control

3) Generating real-world data access patterns in terms of storage containers, their count and size

4) Supporting byte range validations with hashing, journaling of reads and writes, replay of load patterns and configuring compression and byte range validation

5) Able to identify, locate and resolve system malfunction through logging and support for test reports

6) Configurability to determine target systems, the inclusion of loads and the parameters necessary for test automation

7) Test across hundreds of remote clients and servers from a single controller.

8) Determining server-side performance exclusively from client-side reporting.

9) Able to pause, resume and show progress

10) Complete and accurate visibility into the server or component of storage products that show anomalies

11) Overloading the system until they break and being able to determine the threshold before the surge as well as after thrashing

Saturday, March 28, 2020

One thing manifests and charts should not do, is to rely exclusively on kustomizations to chain ownerships. Those belong in the chart or the operator code which exclusively define the Kubernetes resources. There is an inherent attribute on every Kubernetes resource called ownerReferences that allows us to chain construction and deletion of Kubernetes objects. This is very valuable to the resource because the kube-apiserver will perform these actions automatically. Specifying the chaining together with the resource makes it clear to Kubernetes control plane that these objects are first class citizens that it needs to manage. Specifying it elsewhere leaves the onus on the application not the the kube-apiserver. If there is some logic that cannot be handled via static yaml declarations, then it belongs in the operator code not the charts and the manifests
There are a few advantages to using manifests and charts with custom resources that are self-contained as opposed to those that are provisioned external to the Kubernetes infrastructure via service brokers. When the charts are self-contained, they are completely within the Kubernetes system and accessible for create, update and delete via the kubectl. The kube-apiserver and the operators can take care of the state reconciliation and the system becomes the source of truth. This is not the case with external resources which can have varying degrees of deviation from truth depending on the external service provider.
Resources that are provisioned by service broker can also be kustomized. The charts for these resources are oblivious to the provisioners. It’s the kube-apiserver that determines high provisioners are to be tapped for a resource. This is looked up in the service catalog. The biding between the catalog and the provisioner is a loose one. They can get out of sync but the catalog guarantees a global registry. The manifest and charts work well to describe and define these externally provisioned custom resources

Friday, March 27, 2020

Kustomization allows us to create object types that are not defined by the Kubernetes cluster. These object types allow us to group objects so that they may all have overlayed configuration or overridden configuration. One possible use case scenario is the overriding of all images to be loaded from a specific internal registry. These images are treated as an object type to separate its distinction from the Kubernetes image object. Regardless of the origin of these images, these objects can now uniformly be loaded from internal registry
The use of manifests for object types is a way of grouping resources for these changes. They can be modified to enhance annotations and labels. Those annotations are not limited to post configuration processing or interpretations. They can be used upfront with a selector to determine if they are candidates for creating a resource.
The use of labels is similar to the use of annotations in that they come helpful to allow their immediate use in templates from the charts. The grouping of resources by labels can in fact be specified as a static yaml configuration suitable use within a chart with the label as the parameter and the specification of a selector to match the label.
These mechanisms show a way to have the manifest generate what is needed by the charts in addition to their already well-established role in applying overlays and overrides via kustomization.
One thing manifests and charts should not do, is to rely exclusively on kustomizations to chain ownerships. Those belong in the chart or the operator code which exclusively define the Kubernetes resources. There is an inherent attribute on every Kubernetes resource called ownerReferences that allows us to chain construction and deletion of Kubernetes objects. This is very valuable to the resource because the kube-apiserver will perform these actions automatically. Specifying the chaining together with the resource makes it clear to Kubernetes control plane that these objects are first class citizens that it needs to manage. Specifying it elsewhere leaves the onus on the application not the the kube-apiserver. If there is some logic that cannot be handled via static yaml declarations, then it belongs in the operator code not the charts and the manifests

Thursday, March 26, 2020

The use of charts and manifests explained:
Kustomization with all its benefits as described in earlier posts also serves as a template free configuration
When the charts are authored they have to be general purpose with the user having the ability to specify the parameters for deployment. Most of these options are saved in the values file. Even if there are more than one chart, the user can provide these values to all the charts via a wrapper values file. These values are then used to override the default values that may come with the associated chart. The template allows taking these options from the values files and using it to create the Kubernetes resources as necessary. This makes the template work the same regardless of the deployment. The syntax used in the templates is quite flexible to read the options from the values file. Other than these parameters, the templates are mere static definitions for Kubernetes resources and encourage consistency throughout the definitions such as with annotations and labels.
When the charts are used with values file, kustomization is not necessary. It is just a value add on based on our discussion above. The user can choose between specifying manifests and charts to suit his deployment. They serve different purposes and it would behoove the user to use either or both appropriately.
The user also has the ability to compose multiple values files and use them together with comma separations when specifying to apply the chart. This allows the user to organize the options in the values file to be granular and then include them on a case by case basis.

Wednesday, March 25, 2020

Billing requests are expected to be continuous for a business. Are they indeed a stream so that they can be stored as such ? Consider that billing requests have existing established technologies that compete with the business justification of storing them as streams? Will there be cost savings as well as expanded benefits? If the stream storage is a market disruptive innovation, can it be overlaid over object storage which has established itself as a “standard storage” in the enterprise and cloud. As it brings many of the storage best practice to provide durability, scalability, availability and low cost to its users, it can go beyond tier 2 storage to become nearline storage for vectorized execution. Web accessible storage has been important for vectorized execution. We suggest that some of the NoSQL stores can be overlaid on top of object storage and discuss an example with Event storage. We focus on the use case of billing requests because they are not relational and find many applications that are similar to the use cases of object storage. Specifically, events conform to append only stream storage due to the sequential nature of the events. billing requests are also processed in windows making a stream processor such as Flink extremely suitable for events. Stream processors benefit from stream storage and such a storage can be overlaid on any Tier-2 storage. In particular, object storage unlike file storage can come very useful for this purpose since the data also becomes web accessible for other analysis stacks. Object storage then transforms from being a storage layer participating in vectorized executions to one that actively builds metadata, maintains organizations, rebuilds indexes, and supporting web access for those don’t want to maintain local storage or want to leverage easy data transfers from a stash. Object storage utilize a queue layer and a cache layer to handle processing of data for pipelines. We presented the notion of fragmented data transfer with an earlier document. Here we suggest that billing requests are similar to fragmented data transfer and how object storage can serve both as source and destination of billing requests.

Event storage gained popularity because a lot of IoT devices started producing them. Read and writes were very different from conventional data because they were time-based sequential and progressive. Although stream storage is best for events, any time-series database could also work. However, they are not web-accessible unless they are in an object store. Their need for storage is not very different from applications requiring object storage that facilitate store and access. However as object storage makes inwards into vectorized execution, the data transfers become increasingly fragmented and continuous. At this junction it is important to facilitate data transfer between objects and Event and it is in this space that billing requests and object store find suitability. Search, browse and query operations are facilitated in a web service using a web-accessible store.

Tuesday, March 24, 2020

We were discussing Kubernetes Kustomization. There are two advantages to using it. First, it allows us to configure the individual components of the application without requiring changes in them. Second, it allows us to combine components from different sources and overlay them or even override certain configurations. The kustomize tool provides this feature. Kustomize can add configmaps and secrets to the deployments using their specific generators respectively.
Kustomize is static declaration. We can add labels across components. We can choose the groups of Kubernetes resources dynamically using selectors but they have to be declared as yaml. This kustomization yaml is usually stored as manifests and applied on existing components so they refer to other yamls. The manifests is a way of specifying the location of the kustomization files and passing it as a commandline parameter to kubectl commands with -k option
For example, we can say:
commonLabels:
app: potpourri-app
resources:
- deployment.yaml
- service.yaml
We can even add new resources such as K8s secret
This comes useful to inject username passwords for say a database application at the time of install and uninstall with the help of a resource called secret.yaml. It just won't detect a virus to force an uninstall of the product. Those actions remain with the user.
Kustomize also helps us to do overlays and overrides. Overlay means we change parameters for one or more existing components. Override means we take an existing yaml and change portions of it such as changing the service to be of type LoadBalancer instead of NodePort or vice versa for developer builds. In this case, we provide just enough information to lookup the declaration we want to modify and specify the modification. For example:
apiVersion:v1
kind:Service
metadata:
name: myservice
spec:
type: NodePort
If the above service type modification were persisted side by side as prod and dev environment, it would be called an overlay.
Finally the persistence of kustomization files is not strictly required and we can run:
kustomize build manifests_folder | kubectl apply -f
or
kubectl apply -k
One of the interesting applications of Kustomization is the use of internal docker registries.
we use the secretGenerator to create the secret for the registry which typically has the
docker-server, docker-username, docker-password and docker-email and the secret type to be type: docker-registry
This secret can take environment variables and the kustomization file can even be stored in source control.

Monday, March 23, 2020

Kubernetes Kustomization techniques:
Kustomize is a standalone tool for the Kubernetes platform that supports the management of objects using a kustomization file.
“kubectl kustomize <kustomization_directory>” command allows us to view the resources that can be kustomized. The apply verb instead of the kustomize verb can be used to apply it again.
It can help with generating resources, setting cross-cutting fields such as labels and annotations or metadata and composing or customizing groups of resources.
The resources can be generated and infused with specific configuration and secret using a configMap generator and a secret generator respectively. For example, it can take an existing application.properties file and generated a configMap that can be applied to new resources.
Kustomization allows us to override the registry for all images used in the containers for an application.
Any access to images on registries such as docker, gcr, or AWS is outbound from the kube cluster and will likely require credentials. Outbound connectivity from the pods is given using well known Nameservers and gateway added through the master which even goes through the host visible network and to the external world. There are two modes for downloading images:
First, we can provide an insecure registry that is internal and private to the cluster. This registry has no images to begin with and all images can be subsequently uploaded to the registry. It helps us create a manifest of images required to host the application on the cluster and it provides us the ability to use non-tls registry
Second, we can use credentials with one or more external registries as add on because the outbound request for pulling an image can be configured using credentials by referring to them as “registry-creds". Each external registry can accept the path for a credentials file usually in json format which will help configure the registry.
Together these options allow all images to be made available inside the cluster so that containers can be spun up on the cluster to host the application.
Kubernetes has a controller-manager, a kubelet, an apiserver, a proxy, etcd and a scheduler. All of these can be configured using a configurator. The –feature-gates flag can be used to govern what can be allowed to run. The options supported by the feature-gates are few but the components can utilize them to provide selectivity in inclusion for running the app.