Cluster computing

Saturday, April 11, 2020

Considerations in testing software deployed to Minikube

Products that are hosted on Minikube are expected to work the same as if they were deployed on any other Kubernetes container orchestration framework. This remains the case when the minikube deployments are large. When they are small, they are more prone to failures. Additional validations are required for small deployments since they are not at par with a fully deployed instance on any cluster.

The Minikube hosting feature is just like any other hosting platform such as AWS. It adds another dimension to existing test cases that involve a change of host on which the product runs.

The product may support both install and upgrade. The new test cases will usually target install but they can be expanded to include upgrade as well.

The product upgrade paths may vary if it is a patch or major version upgrade. Both paths will usually be supported on Minikube. This remains true for both small and large deployments of Minikube.

Minikube’s support is usually host facing as opposed to the external world. It requires a docker registry that is local or reachable from the pods to be able to pull the images. The use of local docker registry with or without tls is an important consideration.

The small deployments of Minikube should target lower number of replicas and containers. The specification of cpu, memory and disk for the whole host does not necessarily lower the size of various clusters used in this feature. A minimal-dev-values file is provided as guidance for lowering the number of replicas and containers and their consumption.

Access to the cluster should be tested from both User Interface as well as with Kubectl commands. If the product is used to create namespaces for the users, testing may include one or two projects on Minikube because this is the typical case. A large number of namespaces is neither required nor supported on small deployments due to resource constraints.

Error messages on using the user interface will not be tweaked since it is the same for the product across deployments regardless of size or flavor. A few negative test cases targeting error messages could be helpful.

Security configuration of the Minikube installation is lower priority since instance is already completely owned. Still it might be better to test with the pre-install script provided in the scripts directory. Options for Pravega security such as setting tls on pravega controller can also be exercised.

Small Minikube deployments are expected to have frequent restarts due to low resources. They should not number into hundreds. A test case that allows the Minikube deployment to run for some time will be helpful in this case.

If the product deployed on minikube hosts user code, then that code should be tweaked to utilize resources within reasonable limits for what is available to the product.

These are some of the considerations that make deployment validations different on Minikube.

Friday, April 10, 2020

We were discussing Event Storage such as for Telemetry and introspection.
The form of writing the logic for telemetry and introspection can be different from product to product. Some like to write standalone tool while others like to incorporate it as diagnostic APIs and runtime queries. This leads to a collection of utilities, services and programs that evolve on a case by case basis. Some planning to allow the growth of the utilities by creating an inventory of components and layers and coming up with a common framework for querying their health helps in the long run. This system architecture is a core value proposition of the patent application and is best known to the maker.
Among the implementations available for telemetry and introspection, one of arguments against making it a part of the product has traditionally been that this is essentially a reporting stack and even if the data is continuous as if in a stream, much of the analysis is read-only and is independent from the system that is busy with the read-write operations on the product that is performance-oriented. This calls for an event infrastructure framework built on stream store that can be independent analytical platform available even as a separate stack that can be standardized as a published plugin for many products. This is true for all logs, metrics and events that are generated continuously and in real-time from the products and are available to be read by these stacks. This follows a push model of health information from the products. The pull model of retrieving data from the products will require expertise from each component. In such a case, the logic is part of the product and exposed via the packaging of logic referenced earlier. Both the push and the pull model have their respective usages. The discussion in this document is an argument for improving the pull model with consistency, innovation and sound system architecture while working well with other products in the ecosystem that relay the information.

Thursday, April 9, 2020

Event storage is continuous, infinite, durable and eventually consistent. It is not limited to Backup/IoT traffic but it is different from applications requiring object storage. As object storage makes inwards into vectorized execution, the data transfers become increasingly fragmented and continuous. At this junction it is important to facilitate data transfer between objects and streams
Since Event storage overlays on Tier 2 storage on top of blocks, files and blobs, it is already transferring data to object storage. However, the reverse is not that frequent although objects in a storage class can continue to be serialized to Event in a continuous manner. It is also symbiotic to audience on both storage.
Although stream storage is best for events, any time-series database could also work. However, they are not web-accessible unless they are in an object store. Their need for storage is not very different from applications requiring object storage that facilitate store and access. However as object storage makes inwards into vectorized execution, the data transfers become increasingly fragmented and continuous. At this junction it is important to facilitate data transfer between objects and streams and it is in this space that Events and object store find suitability. Search, browse and query operations are facilitated in a web service using a web-accessible store.
Time Series Database offers a viable option as an Event store. For example, InfluxDB is used to record Sensor data and Grafana stack is used to make charts and graphs for dashboards using the same data. Most of these time-series databases are also deployed using clusters and typically represent the growing trend towards this kind of Big Data and search-based analytics.
Moreover, not all the requests need to reach the object storage. In some cases, web Time Series Database may use temporary storage from hybrid choices. The benefits of using a web Time Series Database including saving bandwidth, reducing server load, and improving request-response time. If a dedicated content store is required, typically the storage and server are encapsulated into a content server. This is quite the opposite paradigm of using object storage and replicated objects to directly serve the content from the store. The distinction here is that there are two layers of functions - The first layer is the Time Series Database that solves read-only distribution using techniques such as governing, asset copying and load balancers. The second layer is the compute and storage bundling in the form of a server or a store with shifting emphasis on compute and storage.

Wednesday, April 8, 2020

Till now, we have discussed custom resources and applying it for traditional clusters. Custom resources can be used with personal computing Kubernetes clusters such as those on windows. Two technologies made it possible for developers to overcome the reduced support on native development over Windows. First, the ability to specify insecure registry to house all the container images mean that the containers don’t require a login or have to go online to external registries to fetch their images. Second, the Kubernetes cluster for windows would accept this insecure registry as a parameter at startup to allow its orchestration framework to redirect all requests for container image lookups to the passed in registry. Together they helped the containers find and load their images for applications to run.
Kubernetes manifests can be applied across applications, services and charts. This makes it very powerful tool to add configmaps, secrets and change existing definitions.
It is not a full configuration management system like SaltStack or a database. It’s limited set of Kustomization files provide Kubernetes specific customization. A tool for keeping it in sync with the deployments can work outside its scope since the kustomization files are declarative.
Configuration management software can be external to the Kubernetes cluster and work across clusters. Take Zookeeper for instance which is popularly used by applications hosted on the Kubernetes cluster. Zookeeper can work outside the cluster as a configuration management tool. Zookeeper can be used to implement dynamic configuration in a distributed application. This is one of the basic schemes where each configuration is stored in the form of ZNode. Processes wake up with the fullname of ZNode and set a watch on it. If the configuration is updated, the processes are notified and they apply the updates.
Sometimes a configuration change is dependent others. For example, host and port may be known only afterwards. In this case clients can co-ordinate with a rendezvous node. The node is filled in with details as and when they become available. The workers set a watch on this node before they begin their changes. While Kubernetes implements coordination within its system, devops can use the zookeeper across Kubernetes clusters for configuration management. This brings versioning, syncing and change capture across the kustomization files that would have only been possible by bringing the redeployment through full development cycles involving source control. The configuration management system is also local to production and mission critical systems.

Tuesday, April 7, 2020

We continue with our discussion on Kubernetes Kustomization using manifests from a few posts earlier.

The use of metadata to improve the information on the resource without having to introspect the resource helps in making decisions about the resources and for visibility between source and destination of transfer.
Till now, we have discussed custom resources and applying it for traditional clusters. Custom resources can be used with personal computing Kubernetes clusters such as those on windows. Two technologies made it possible for developers to overcome the reduced support on native development over Windows. First, the ability to specify insecure registry to house all the container images mean that the containers don’t require a login or have to go online to external registries to fetch their images. Second, the Kubernetes cluster for windows would accept this insecure registry as a parameter at startup to allow its orchestration framework to redirect all requests for container image lookups to the passed in registry. Together they helped the containers find and load their images for applications to run.
Kubernetes manifests can be applied across applications, services and charts. This makes it very powerful tool to add configmaps, secrets and change existing definitions.
It is not a full configuration management system or database. It’s limited set of Kustomization files provide Kubernetes specific customization. A tool for keeping it in sync with the deployments can work outside its scope since the kustomization files are declarative.

Monday, April 6, 2020

Integrating Sonarqube source code analyzers in repository builds

Go is officially supported by SonarSource with SonarGo since May 2018. Any build integration may require the following items:

1) SonarGo

2) GoMetaLinter

3) SonarScanner

4) SonarQube docker image

Community package for SonarQube for golang is available at https://github.com/uartois/sonar-golang

It requires GoMetaLinter reports using the checkstyle format for the scanner to run.

The GoMetaLinter can be obtained as follows:

go get -u gopkg.in/alecthomas/gometalinter.v1

gometalinter.v1 –install

The GoMetaLinter report can be generated with the command:

gometalinter.v1 --checkstyle > report.xml

The sonar-golang will also require a sonar.properties file which will look like this:

enableSonarQube=true

sonar.skip-tests=false

sonar.projectKey=group:app-name

sonar.projectName=app-name

sonar.projectVersion=1.1

sonar.sources=pkg/

sonar.sourceEncoding=UTF-8

sonar.host.url=http://localhost:9000

Any server will require the sonar-golang jar file to be put in the $SONAR_PATH/extensions/plugins folder The sonarqube server will be paused and restarted for the jar to be loaded.

The SonarScanner may run in its own docker image during the build. There are several images that are available on dockerhub including sonar server image. The project will have to be copied locally into the image and the sonar properties will point to the server with the jar. This is preferable to do with a custom Dockerfile

Jenkinsfile can be modified to build the image with the sonarscanner from the corresponding Dockerfile and then the project can be scanned with the “sonar-scanner” command.

The sonar.properties helps point to the sonar.host.url where the reports will be published using the sonar.projectKey

Sunday, April 5, 2020

Developing containerized applications on Windows

Well-known container orchestration frameworks such as Kubernetes rose up in popularity meteorically with its widespread acceptance on Linux hosts. The ecosystem built using Kubernetes fueled the growth and made development on Linux hosts as mainstream. Applications that were developed on top of this framework were already modular and used smaller footprints than contemporary applications built for the cloud. With the popularity of macbooks for linux based development, it was easy to host the framework and the applications in personal computing space. The same did not hold for windows until later.

With the arrival of Docker runtime for windows and the Kubernetes cluster for windows, the situation changed. These tools allowed container images to be registered and the containers to be spun up with those images respectively which made it possible for applications to run on personal computers. Still, the existing popularity of supporting Linux virtual machines on Hyper-V reduced the wind on these efforts to allow native development of Kubernetes applications on Windows using these tools.

Two technologies made it possible for developers to overcome the reduced support on native development over Windows. First, the ability to specify insecure registry to house all the container images mean that the containers don’t require a login or have to go online to external registries to fetch their images. Second, the Kubernetes cluster for windows would accept this insecure registry as a parameter at startup to allow its orchestration framework to redirect all requests for container image lookups to the passed in registry. Together they helped the containers find and load their images for applications to run.

The entire developer workspace on the personal computers is in the hands of the developer. So the need to setup and use secure registries is of very little concern in the overall steps to run an application. The Kubernetes cluster for windows is also set up in a host facing network only which means that the traffic needs to be proxied or tunneled into the cluster even if there is outbound connectivity from the containers within the cluster. The insecure registry makes it easy for the developer to get going with the development rather than the otherwise elaborate deployment.

One of the advantages of hosting Docker and Kubernetes on Windows directly is that we can control their size. By provisioning the insecure registry entirely on a separate virtual machine, we can specify the storage to just exceed the total of all the images to be registered in that registry. This lets the Kubernetes on Windows to have a cpu and memory limit specified to support what is needed for the hosted application to run.

This separation of registry and container framework allows them to be more focused on their purpose and enables the developer to do more.