Cluster computing

Wednesday, March 25, 2020

Billing requests are expected to be continuous for a business. Are they indeed a stream so that they can be stored as such ? Consider that billing requests have existing established technologies that compete with the business justification of storing them as streams? Will there be cost savings as well as expanded benefits? If the stream storage is a market disruptive innovation, can it be overlaid over object storage which has established itself as a “standard storage” in the enterprise and cloud. As it brings many of the storage best practice to provide durability, scalability, availability and low cost to its users, it can go beyond tier 2 storage to become nearline storage for vectorized execution. Web accessible storage has been important for vectorized execution. We suggest that some of the NoSQL stores can be overlaid on top of object storage and discuss an example with Event storage. We focus on the use case of billing requests because they are not relational and find many applications that are similar to the use cases of object storage. Specifically, events conform to append only stream storage due to the sequential nature of the events. billing requests are also processed in windows making a stream processor such as Flink extremely suitable for events. Stream processors benefit from stream storage and such a storage can be overlaid on any Tier-2 storage. In particular, object storage unlike file storage can come very useful for this purpose since the data also becomes web accessible for other analysis stacks. Object storage then transforms from being a storage layer participating in vectorized executions to one that actively builds metadata, maintains organizations, rebuilds indexes, and supporting web access for those don’t want to maintain local storage or want to leverage easy data transfers from a stash. Object storage utilize a queue layer and a cache layer to handle processing of data for pipelines. We presented the notion of fragmented data transfer with an earlier document. Here we suggest that billing requests are similar to fragmented data transfer and how object storage can serve both as source and destination of billing requests.

Event storage gained popularity because a lot of IoT devices started producing them. Read and writes were very different from conventional data because they were time-based sequential and progressive. Although stream storage is best for events, any time-series database could also work. However, they are not web-accessible unless they are in an object store. Their need for storage is not very different from applications requiring object storage that facilitate store and access. However as object storage makes inwards into vectorized execution, the data transfers become increasingly fragmented and continuous. At this junction it is important to facilitate data transfer between objects and Event and it is in this space that billing requests and object store find suitability. Search, browse and query operations are facilitated in a web service using a web-accessible store.

Tuesday, March 24, 2020

We were discussing Kubernetes Kustomization. There are two advantages to using it. First, it allows us to configure the individual components of the application without requiring changes in them. Second, it allows us to combine components from different sources and overlay them or even override certain configurations. The kustomize tool provides this feature. Kustomize can add configmaps and secrets to the deployments using their specific generators respectively.
Kustomize is static declaration. We can add labels across components. We can choose the groups of Kubernetes resources dynamically using selectors but they have to be declared as yaml. This kustomization yaml is usually stored as manifests and applied on existing components so they refer to other yamls. The manifests is a way of specifying the location of the kustomization files and passing it as a commandline parameter to kubectl commands with -k option
For example, we can say:
commonLabels:
app: potpourri-app
resources:
- deployment.yaml
- service.yaml
We can even add new resources such as K8s secret
This comes useful to inject username passwords for say a database application at the time of install and uninstall with the help of a resource called secret.yaml. It just won't detect a virus to force an uninstall of the product. Those actions remain with the user.
Kustomize also helps us to do overlays and overrides. Overlay means we change parameters for one or more existing components. Override means we take an existing yaml and change portions of it such as changing the service to be of type LoadBalancer instead of NodePort or vice versa for developer builds. In this case, we provide just enough information to lookup the declaration we want to modify and specify the modification. For example:
apiVersion:v1
kind:Service
metadata:
name: myservice
spec:
type: NodePort
If the above service type modification were persisted side by side as prod and dev environment, it would be called an overlay.
Finally the persistence of kustomization files is not strictly required and we can run:
kustomize build manifests_folder | kubectl apply -f
or
kubectl apply -k
One of the interesting applications of Kustomization is the use of internal docker registries.
we use the secretGenerator to create the secret for the registry which typically has the
docker-server, docker-username, docker-password and docker-email and the secret type to be type: docker-registry
This secret can take environment variables and the kustomization file can even be stored in source control.

Monday, March 23, 2020

Kubernetes Kustomization techniques:
Kustomize is a standalone tool for the Kubernetes platform that supports the management of objects using a kustomization file.
“kubectl kustomize <kustomization_directory>” command allows us to view the resources that can be kustomized. The apply verb instead of the kustomize verb can be used to apply it again.
It can help with generating resources, setting cross-cutting fields such as labels and annotations or metadata and composing or customizing groups of resources.
The resources can be generated and infused with specific configuration and secret using a configMap generator and a secret generator respectively. For example, it can take an existing application.properties file and generated a configMap that can be applied to new resources.
Kustomization allows us to override the registry for all images used in the containers for an application.
Any access to images on registries such as docker, gcr, or AWS is outbound from the kube cluster and will likely require credentials. Outbound connectivity from the pods is given using well known Nameservers and gateway added through the master which even goes through the host visible network and to the external world. There are two modes for downloading images:
First, we can provide an insecure registry that is internal and private to the cluster. This registry has no images to begin with and all images can be subsequently uploaded to the registry. It helps us create a manifest of images required to host the application on the cluster and it provides us the ability to use non-tls registry
Second, we can use credentials with one or more external registries as add on because the outbound request for pulling an image can be configured using credentials by referring to them as “registry-creds". Each external registry can accept the path for a credentials file usually in json format which will help configure the registry.
Together these options allow all images to be made available inside the cluster so that containers can be spun up on the cluster to host the application.
Kubernetes has a controller-manager, a kubelet, an apiserver, a proxy, etcd and a scheduler. All of these can be configured using a configurator. The –feature-gates flag can be used to govern what can be allowed to run. The options supported by the feature-gates are few but the components can utilize them to provide selectivity in inclusion for running the app.

Sunday, March 22, 2020

A billing application over stream storage:

Billing requests are expected to be continuous for a business. Are they indeed a stream so that they can be stored as such ? Consider that billing requests have existing established technologies that compete with the business justification of storing them as streams? Will there be cost savings as well as expanded benefits? If the stream storage is a market disruptive innovation, can it be overlaid over object storage which has established itself as a “standard storage” in the enterprise and cloud. As it brings many of the storage best practice to provide durability, scalability, availability and low cost to its users, it can go beyond tier 2 storage to become nearline storage for vectorized execution. Web accessible storage has been important for vectorized execution. We suggest that some of the NoSQL stores can be overlaid on top of object storage and discuss an example with Event storage.  We focus on the use case of billing requests because they are not relational and find many applications that are similar to the use cases of object storage. Specifically, events conform to append only stream storage due to the sequential nature of the events. billing requests are also processed in windows making a stream processor such as Flink extremely suitable for events. Stream processors benefit from stream storage and such a storage can be overlaid on any Tier-2 storage. In particular, object storage unlike file storage can come very useful for this purpose since the data also becomes web accessible for other analysis stacks. Object storage then transforms from being a storage layer participating in vectorized executions to one that actively builds metadata, maintains organizations, rebuilds indexes, and supporting web access for those don’t want to maintain local storage or want to leverage easy data transfers from a stash. Object storage utilize a queue layer and a cache layer to handle processing of data for pipelines. We presented the notion of fragmented data transfer with an earlier document. Here we suggest that billing requests are similar to fragmented data transfer and how object storage can serve both as source and destination of billing requests.

Saturday, March 21, 2020

High Availability for applications.
This article is a comparision between cluster mode and serverless computing. Both are techniques to scale software deployments to meet the challenges of the increasing traffic from clients. Back of the envelope calculations for capacity and an inclination to treat software deployments as traditional have been common practice. It is time to wake up to lower costs with small changes in modes of deployment and application modularity
High availability clusters spin up additional nodes to handle the load. They scale out with fixed long running costs for nodes and continue till they are resized. The intervention to increase or decrease the number of nodes is independent of the demand and capacity fluctuations. All aspects of the node from compute to operating system and application stack involve maintenance chores for the application. The code for the application also takes on such routines as health checks, alerts and notifications.
Compare this with serverless computing where the resources required to execute the code is not only independent of the application but also dynamically provisioned and torn down so we pay as we go. Applications are already familiar with Platform as a service model and have shifted to deep divisions in application modularity with separate hardware and software for each module that is managed independent of the applications. This shift towards serverless computing also called Function as a service is only minimally more than the PaaS model because application have to organize their logic into smaller functions that can be executed without concern for resources. The applications also get to focus more on their actual business value rather than the mundane operation concerns.
The cluster model is hugely popular since it has shown a proven track record and tested software in cluster management. Cluster management also provides customized capabilities by way of dedicated nodes. The serverless computing is only provided by big cloud providers who can take away all the costs from our application by providing economies of scale. The serverless architecture may be standalone or distributed. In both cases, it remains an event-action platform to execute code in response to events. We can execute code written as functions in many different languages and a function is executed in its own container. Because these functions are asynchronous to the frontend and backend, they need not perform continuous polling which helps them to be more scaleable and resilient. OpenWhisk introduces event programming model where the charges are only for what is used.
The choice between the cluster and serverless computing also depends on the ability for the organizations to adapt. Clusters are easy to be provisioned on premise and on orchestration frameworks whereas the use of public cloud technologies has still not penetrated sufficiently within the organizations where they become mainstream mode of deployment. Some organizations also have genuine need for special purpose hardware racks and cannot truly be software defined.

Friday, March 20, 2020

Kubernetes has a controller-manager, a kubelet, an apiserver, a proxy, etcd and a scheduler. All of these can be configured using a configurator. The –feature-gates flag can be used to govern what can be allowed to run. The options supported by the feature-gates are few but the components can utilize them to provide selectivity in inclusion for running the app.
The images do not have feature gates. They are atomic and wholesome in that they are composed of layers but are treated as a whole image with <registry>/<name>:<tag> specifier. The images referenced as "repository" key value in the values file can simply specify the name and tag and utilize the registry resolving to locate the image. The kubernetes framework must be guided in this process.
The image loading is first local then remote. Even in remote the preference is for the configured registry rather than the default. This allows us to use manifests with charts in an effective way. The helm charts contain at least two elements: a chart.yml that has a description of the package and one or more templates which contains Kubernetes manifest files
A manifest can specify chart override values such as either:
organization/chart-values: |-
{
"image": {
"repository": "image1"
}
}
Or

organization/chart-values: |-
{
"global": {
"registry" : ""
},
"createdecksappResource": false
}
In the first case, the image is used with its name and tag as specified. In the second case, the registry is prefixed to the image and tag making it specific to where the image should be located.

Thursday, March 19, 2020

Image registry:
Kubernetes, as a container orchestration framework, requires images to launch containers. The image registry can be private or it can be one of the well know public registries.
Any access to images on registries such as docker, gcr, or AWS is outbound from the kube cluster and will likely require credentials. Outbound connectivity from the pods is given using well known Nameservers and gateway added through the master which even goes through the host visible network and to the external world. There are two modes for downloading images:
First, we can provide an insecure registry that is internal and private to the cluster. This registry has no images to begin with and all images can be subsequently uploaded to the registry. It helps us create a manifest of images required to host the application on the cluster and it provides us the ability to use non-tls registry
Second, we can use credentials with one or more external registries as add on because the outbound request for pulling an image can be configured using credentials by referring to them as “registry-creds". Each external registry can accept the path for a credentials file usually in json format which will help configure the registry.
Together these options allow all images to be made available inside the cluster so that containers can be spun up on the cluster to host the application.
Kubernetes has a controller-manager, a kubelet, an apiserver, a proxy, etcd and a scheduler. All of these can be configured using a configurator. The –feature-gates flag can be used to govern what can be allowed to run. The options supported by the feature-gates are few but the components can utilize them to provide selectivity in inclusion for running the app.
The images do not have feature gates. They are atomic and wholesome in that they are composed of layers but are treated as a whole image with <registry>/<name>:<tag> specifier. The images referenced as "repository" key value in the values file can simply specify the name and tag and utilize the registry resolving to locate the image. The kubernetes framework must be guided in this process.