Cluster computing

Sunday, March 22, 2020

A billing application over stream storage:

Billing requests are expected to be continuous for a business. Are they indeed a stream so that they can be stored as such ? Consider that billing requests have existing established technologies that compete with the business justification of storing them as streams? Will there be cost savings as well as expanded benefits? If the stream storage is a market disruptive innovation, can it be overlaid over object storage which has established itself as a “standard storage” in the enterprise and cloud. As it brings many of the storage best practice to provide durability, scalability, availability and low cost to its users, it can go beyond tier 2 storage to become nearline storage for vectorized execution. Web accessible storage has been important for vectorized execution. We suggest that some of the NoSQL stores can be overlaid on top of object storage and discuss an example with Event storage.  We focus on the use case of billing requests because they are not relational and find many applications that are similar to the use cases of object storage. Specifically, events conform to append only stream storage due to the sequential nature of the events. billing requests are also processed in windows making a stream processor such as Flink extremely suitable for events. Stream processors benefit from stream storage and such a storage can be overlaid on any Tier-2 storage. In particular, object storage unlike file storage can come very useful for this purpose since the data also becomes web accessible for other analysis stacks. Object storage then transforms from being a storage layer participating in vectorized executions to one that actively builds metadata, maintains organizations, rebuilds indexes, and supporting web access for those don’t want to maintain local storage or want to leverage easy data transfers from a stash. Object storage utilize a queue layer and a cache layer to handle processing of data for pipelines. We presented the notion of fragmented data transfer with an earlier document. Here we suggest that billing requests are similar to fragmented data transfer and how object storage can serve both as source and destination of billing requests.

Event storage gained popularity because a lot of IoT devices started producing them. Read and writes were very different from conventional data because they were time-based sequential and progressive. Although stream storage is best for events, any time-series database could also work. However, they are not web-accessible unless they are in an object store. Their need for storage is not very different from applications requiring object storage that facilitate store and access. However as object storage makes inwards into vectorized execution, the data transfers become increasingly fragmented and continuous. At this junction it is important to facilitate data transfer between objects and Event and it is in this space that billing requests and object store find suitability. Search, browse and query operations are facilitated in a web service using a web-accessible store.

Saturday, March 21, 2020

High Availability for applications.
This article is a comparision between cluster mode and serverless computing. Both are techniques to scale software deployments to meet the challenges of the increasing traffic from clients. Back of the envelope calculations for capacity and an inclination to treat software deployments as traditional have been common practice. It is time to wake up to lower costs with small changes in modes of deployment and application modularity
High availability clusters spin up additional nodes to handle the load. They scale out with fixed long running costs for nodes and continue till they are resized. The intervention to increase or decrease the number of nodes is independent of the demand and capacity fluctuations. All aspects of the node from compute to operating system and application stack involve maintenance chores for the application. The code for the application also takes on such routines as health checks, alerts and notifications.
Compare this with serverless computing where the resources required to execute the code is not only independent of the application but also dynamically provisioned and torn down so we pay as we go. Applications are already familiar with Platform as a service model and have shifted to deep divisions in application modularity with separate hardware and software for each module that is managed independent of the applications. This shift towards serverless computing also called Function as a service is only minimally more than the PaaS model because application have to organize their logic into smaller functions that can be executed without concern for resources. The applications also get to focus more on their actual business value rather than the mundane operation concerns.
The cluster model is hugely popular since it has shown a proven track record and tested software in cluster management. Cluster management also provides customized capabilities by way of dedicated nodes. The serverless computing is only provided by big cloud providers who can take away all the costs from our application by providing economies of scale. The serverless architecture may be standalone or distributed. In both cases, it remains an event-action platform to execute code in response to events. We can execute code written as functions in many different languages and a function is executed in its own container. Because these functions are asynchronous to the frontend and backend, they need not perform continuous polling which helps them to be more scaleable and resilient. OpenWhisk introduces event programming model where the charges are only for what is used.
The choice between the cluster and serverless computing also depends on the ability for the organizations to adapt. Clusters are easy to be provisioned on premise and on orchestration frameworks whereas the use of public cloud technologies has still not penetrated sufficiently within the organizations where they become mainstream mode of deployment. Some organizations also have genuine need for special purpose hardware racks and cannot truly be software defined.

Friday, March 20, 2020

Kubernetes has a controller-manager, a kubelet, an apiserver, a proxy, etcd and a scheduler. All of these can be configured using a configurator. The –feature-gates flag can be used to govern what can be allowed to run. The options supported by the feature-gates are few but the components can utilize them to provide selectivity in inclusion for running the app.
The images do not have feature gates. They are atomic and wholesome in that they are composed of layers but are treated as a whole image with <registry>/<name>:<tag> specifier. The images referenced as "repository" key value in the values file can simply specify the name and tag and utilize the registry resolving to locate the image. The kubernetes framework must be guided in this process.
The image loading is first local then remote. Even in remote the preference is for the configured registry rather than the default. This allows us to use manifests with charts in an effective way. The helm charts contain at least two elements: a chart.yml that has a description of the package and one or more templates which contains Kubernetes manifest files
A manifest can specify chart override values such as either:
organization/chart-values: |-
{
"image": {
"repository": "image1"
}
}
Or

organization/chart-values: |-
{
"global": {
"registry" : ""
},
"createdecksappResource": false
}
In the first case, the image is used with its name and tag as specified. In the second case, the registry is prefixed to the image and tag making it specific to where the image should be located.

Thursday, March 19, 2020

Image registry:
Kubernetes, as a container orchestration framework, requires images to launch containers. The image registry can be private or it can be one of the well know public registries.
Any access to images on registries such as docker, gcr, or AWS is outbound from the kube cluster and will likely require credentials. Outbound connectivity from the pods is given using well known Nameservers and gateway added through the master which even goes through the host visible network and to the external world. There are two modes for downloading images:
First, we can provide an insecure registry that is internal and private to the cluster. This registry has no images to begin with and all images can be subsequently uploaded to the registry. It helps us create a manifest of images required to host the application on the cluster and it provides us the ability to use non-tls registry
Second, we can use credentials with one or more external registries as add on because the outbound request for pulling an image can be configured using credentials by referring to them as “registry-creds". Each external registry can accept the path for a credentials file usually in json format which will help configure the registry.
Together these options allow all images to be made available inside the cluster so that containers can be spun up on the cluster to host the application.
Kubernetes has a controller-manager, a kubelet, an apiserver, a proxy, etcd and a scheduler. All of these can be configured using a configurator. The –feature-gates flag can be used to govern what can be allowed to run. The options supported by the feature-gates are few but the components can utilize them to provide selectivity in inclusion for running the app.
The images do not have feature gates. They are atomic and wholesome in that they are composed of layers but are treated as a whole image with <registry>/<name>:<tag> specifier. The images referenced as "repository" key value in the values file can simply specify the name and tag and utilize the registry resolving to locate the image. The kubernetes framework must be guided in this process.

Wednesday, March 18, 2020

The Kubernetes ingress is a resource that can help with load balancing. It provides an external endpoint to the cluster and usually has a backend to which it forwards traffic. This makes it work as a SSL termination endpoint and name based virtual hosting.
An ingress controller controls the traffic into the Kubernetes cluster. Typically, this is done with the help of nginx. An Ingress-nginx controller can be configured to use the certificate.
Nginx provides the option to specify a –default-ssl-certificate. The default certificate is used as a catch-all for all traffic into the server. Nginx also provides a –enable-ssl-passthrough feature This bypasses the nginx and the controller instead pipes it forward and backward between the client and backend. If the virtual domain cannot be resolved, it passes to the default backend.
This calls for two steps required to secure the ingress controller
1) Use a library such as cert-manager to generate keys and certificates.
2) Use the generated key and certificates as Kubernetes secrets and generate the key certificate parts whose location is specified in the SSL configuration of the application.
The TLS port usually on 443 or 8443 can also be port-forwarded to the host and thereby the rest of the clients on the host external network. The ingress in an SSL termination point and the host can use an SSL proxy or any technique suitable to relay the traffic to the ingress.
This does not affect a registry for image pulling. Any access to images on registries such as docker, gcr, or AWS is outbound from the kube cluster. We have already established outbound connectivity from the pods using well known Nameservers and gateway added through the master which goes through the host visible network and to the external world. There are two techniques possible here.
First, we can provide an insecure registry that is internal and private to the cluster. This registry has no images to begin with and all images can be subsequently uploaded to the registry. It helps us create a manifest of images required to host the application on the cluster and it provides us the ability to use non-tls registry
Second, we can use credentials with one or more external registries as add on because the outbound request for pulling an image can Minikube provides addons to configure these credentials by referring to them as “registry-creds" Each external registry can accept the path for a credentials file usually in json format which will help configure the registry.
Together these options allow all images to be made available inside the cluster so that containers can be spun up on the cluster to host the application.

Tuesday, March 17, 2020

Minikube Tunnel creates a route to services deployed with type LoadBalancer and sets their ingress to be cluster ip
We can also create an ingress resource with nginx-ingress-controller. The resource has an ip address mentioned that can be reached from host. The /etc/hosts file on the host must specify this ip address and the corresponding host specified in the ingress resource.
There is no support mentioned in the docs for replacing the network adapter with a bridged network even though that might solve external connectivity for the minikube cluster.
The Kubernetes ingress is a resource that can help with load balancing. It provides an external endpoint to the cluster and usually has a backend to which it forwards traffic. This makes it work as a SSL termination endpoint and name based virtual hosting.
An ingress controller controls the traffic into the Kubernetes cluster. Typically, this is done with the help of nginx. An Ingress-nginx controller can be configured to use the certificate.
Nginx provides the option to specify a –default-ssl-certificate. The default certificate is used as a catch-all for all traffic into the server. Nginx also provides a –enable-ssl-passthrough feature This bypasses the nginx and the controller instead pipes it forward and backward between the client and backend. If the virtual domain cannot be resolved, it passes to the default backend.
This calls for two steps required to secure the ingress controller
1) Use a library such as cert-manager to generate keys and certificates.
2) Use the generated key and certificates as Kubernetes secrets and generate the key certificate parts whose location is specified in the SSL configuration of the application.

Monday, March 16, 2020

External key management on stream store analytics:

Stream stores are overlaid on Tier 2 storage where the assumption is that the latter takes care of securing data at rest. Tier 2 such as object storage has always supported Data at Rest Encryption(D@RE) by maintaining a set of encryption keys in the system. These include Data Encryption Keys (DeKs) and Key Encryption Keys (KeKs). Certain object storage even supports external key management (EKM) by providing integration with Gemalto Key Secure servers for industry best practice. With the help of external keys, there is reduced risk when there is a compromise against a single instance of an application. Keys are rotated periodically, and this integration helps with performing the re-encryption on storage artifacts. Products that combine analytics over stream stores have at least two levels of data transfers – one involving the analytical application and the stream store and another involving stream store and tier 2 which may either be a nfs file system or a blob store. They can also occur side by side if the product allows storage independent of streams or with a virtualizer that involves a storage class provisioner or finally with an abstraction that syncs between hybrid stores. In these cases, there is replicated data often without protection. When the product supports the ability to use the same key to secure all parts of the data and their copies along with the ability to rotate the keys, an external key manager comes useful to safeguard the keys both old and new.

Data is organized in containers and hierarchy specific to the store and encryption can be applied at each hierarchical level. All the data is at the lowest level and have their own DeK per container while the higher-level containers have their own KeKs. A master KeK is available for the overall store. When the stores are multiple and hybrid the masters become different, but it can be treated as just another intermediary level as long as the stores are registered at an abstraction layer.