Cluster computing

Wednesday, March 18, 2020

The Kubernetes ingress is a resource that can help with load balancing. It provides an external endpoint to the cluster and usually has a backend to which it forwards traffic. This makes it work as a SSL termination endpoint and name based virtual hosting.
An ingress controller controls the traffic into the Kubernetes cluster. Typically, this is done with the help of nginx. An Ingress-nginx controller can be configured to use the certificate.
Nginx provides the option to specify a –default-ssl-certificate. The default certificate is used as a catch-all for all traffic into the server. Nginx also provides a –enable-ssl-passthrough feature This bypasses the nginx and the controller instead pipes it forward and backward between the client and backend. If the virtual domain cannot be resolved, it passes to the default backend.
This calls for two steps required to secure the ingress controller
1) Use a library such as cert-manager to generate keys and certificates.
2) Use the generated key and certificates as Kubernetes secrets and generate the key certificate parts whose location is specified in the SSL configuration of the application.
The TLS port usually on 443 or 8443 can also be port-forwarded to the host and thereby the rest of the clients on the host external network. The ingress in an SSL termination point and the host can use an SSL proxy or any technique suitable to relay the traffic to the ingress.
This does not affect a registry for image pulling. Any access to images on registries such as docker, gcr, or AWS is outbound from the kube cluster. We have already established outbound connectivity from the pods using well known Nameservers and gateway added through the master which goes through the host visible network and to the external world. There are two techniques possible here.
First, we can provide an insecure registry that is internal and private to the cluster. This registry has no images to begin with and all images can be subsequently uploaded to the registry. It helps us create a manifest of images required to host the application on the cluster and it provides us the ability to use non-tls registry
Second, we can use credentials with one or more external registries as add on because the outbound request for pulling an image can Minikube provides addons to configure these credentials by referring to them as “registry-creds" Each external registry can accept the path for a credentials file usually in json format which will help configure the registry.
Together these options allow all images to be made available inside the cluster so that containers can be spun up on the cluster to host the application.

Tuesday, March 17, 2020

Minikube Tunnel creates a route to services deployed with type LoadBalancer and sets their ingress to be cluster ip
We can also create an ingress resource with nginx-ingress-controller. The resource has an ip address mentioned that can be reached from host. The /etc/hosts file on the host must specify this ip address and the corresponding host specified in the ingress resource.
There is no support mentioned in the docs for replacing the network adapter with a bridged network even though that might solve external connectivity for the minikube cluster.
The Kubernetes ingress is a resource that can help with load balancing. It provides an external endpoint to the cluster and usually has a backend to which it forwards traffic. This makes it work as a SSL termination endpoint and name based virtual hosting.
An ingress controller controls the traffic into the Kubernetes cluster. Typically, this is done with the help of nginx. An Ingress-nginx controller can be configured to use the certificate.
Nginx provides the option to specify a –default-ssl-certificate. The default certificate is used as a catch-all for all traffic into the server. Nginx also provides a –enable-ssl-passthrough feature This bypasses the nginx and the controller instead pipes it forward and backward between the client and backend. If the virtual domain cannot be resolved, it passes to the default backend.
This calls for two steps required to secure the ingress controller
1) Use a library such as cert-manager to generate keys and certificates.
2) Use the generated key and certificates as Kubernetes secrets and generate the key certificate parts whose location is specified in the SSL configuration of the application.

Monday, March 16, 2020

External key management on stream store analytics:

Stream stores are overlaid on Tier 2 storage where the assumption is that the latter takes care of securing data at rest. Tier 2 such as object storage has always supported Data at Rest Encryption(D@RE) by maintaining a set of encryption keys in the system. These include Data Encryption Keys (DeKs) and Key Encryption Keys (KeKs). Certain object storage even supports external key management (EKM) by providing integration with Gemalto Key Secure servers for industry best practice. With the help of external keys, there is reduced risk when there is a compromise against a single instance of an application. Keys are rotated periodically, and this integration helps with performing the re-encryption on storage artifacts. Products that combine analytics over stream stores have at least two levels of data transfers – one involving the analytical application and the stream store and another involving stream store and tier 2 which may either be a nfs file system or a blob store. They can also occur side by side if the product allows storage independent of streams or with a virtualizer that involves a storage class provisioner or finally with an abstraction that syncs between hybrid stores. In these cases, there is replicated data often without protection. When the product supports the ability to use the same key to secure all parts of the data and their copies along with the ability to rotate the keys, an external key manager comes useful to safeguard the keys both old and new.

Data is organized in containers and hierarchy specific to the store and encryption can be applied at each hierarchical level. All the data is at the lowest level and have their own DeK per container while the higher-level containers have their own KeKs. A master KeK is available for the overall store. When the stores are multiple and hybrid the masters become different, but it can be treated as just another intermediary level as long as the stores are registered at an abstraction layer.

Sunday, March 15, 2020

We were discussing Minikube applications and the steps taken for allowing connectivity to the applications.
The technique to allow external access to the application hosted on Minikube is port-forwarding. If the application is hosted on http and https both, then a set of ports can be opened on the host to send traffic to and from the application.
On Windows we take extra precautions in handling betwork traffic. The default firewall settings may prevent access to these ports. A new set of inbound and outbound rules must be specified for the new ports on the host. A set of inbound and outbound rules need to be written to allow access to each port.
Redirects continue to operate as before because all available endpoints will have port-forwarding. The web address itself does not need to be translated to have the localhost and host port to be included as long as the application is the same point of origin.
The other option aside from port forwarding is to ask the minikube to expose the service. This option provides another Kubernetes service with an ip address and port as its own url. This url can then be accessed from the host. There is no direct external network ip connectivity over the NAT without using static ip addressing. That said, Minikube does provide an option for tunneling.
Tunnel creates a route to services deployed with type LoadBalancer and sets their ingress to be cluster ip
We can also create an ingress resource with nginx-ingress-controller. The resource has an ip address mentioned that can be reached from host. The /etc/hosts file on the host must specify this ip address and the corresponding host specified in the ingress resource.

Saturday, March 14, 2020

Minikube applications can be accessed outside the host via port-forwarding. The applications hosted on Minikube have external cluster-IP address but the ip address is NAT'ed which means it is on a private network where the address is translated from the external IP address.
The external and cluster Ip address are two different layers of abstraction. The external in this case refers to the address that is visible only to the host since the minikube is hosted with a only host visible network adapter. It has outbound external connectivity but no internal access except from what the host permits. The IP address does not route automatically to the pods within a minikube.
The cluster IP address refers to one that has been marked cluster wide and is accessible outside the kubernetes cluster. It does not mean it is accessible over the NAT. It is different from the internal ip addresses used for the pods.
The layering therefore looks like the following:
- Outside world
- Host (IP connectivity)
- Minikube (Network Address Translation)
- Cluster IP address ( Kubernetes )
- Pod IP address ( Kubernetes )

The Minikube provides two features that enable transmission of data to the pod to and from the outside world.
This is called port-forwarding.
To transmit the data to a web application serving at port 80, we can run the following commands on the host:
> kubectl port-forward pod/<podName> -n namespace 9880:80 for the inbound traffic
Forwarding from 80 -> 9880
and
> kubectl port-forward --address 0.0.0.0 pod/podName -n namespace for the outbound traffic from the application
Forwarding from 0.0.0.0:9880 -> 9000

It is important to recognize that the inbound and outbound rules must be specified separately for the same application. If the traffic involves both http and https then this results in a set of two rules for each kind of traffic - plain and encrypted.

Friday, March 13, 2020

Kubernetes application install on windows
Kubernetes is a portable, extensible, open-source platform for managing containerized workloads and services. It is often a strategic decision for any company because it decouples the application from the hosts so that the same application can work elsewhere with minimal disruption to its use.
Windows is the most common operating system software on personal workstations. Most Kubernetes deployments are on Linux flavor virtual machines for large scale deployments. The developer workstation is considered a small deployment.
The most convenient way to install Kubernetes on windows for hosting any application is with the help of software product called Minikube. This software provisions a dedicated Kubernetes cluster ideal for use in a small resource environment
It simplifies storage with the help of a storage class that refers to an abstraction on how data is persisted. It uses a storage provisioner called K8s.io/minikube-hostpath which unlike other storage provisioners does not require static configuration beforehand for hosted applications to be able to persist files. All request for persisting files are honored dynamically as and when they are required. It stores the data from the applications on the host itself unlike nfs-client-provisioner which provisions on a remote storage.
It simplifies networking with the help of dual network adapters that let’s the cluster provide connectivity with the host and for the outside world which let’s the application appear as if it is on a traditional deployment that is reachable on the internet. The network adapter with the host provides ability to seamlessly port-forward for services deployed to pods with external cluster-ip address.
Together the storage and networking convenience makes application portability easy. Minikube also his calls forcomes with its own docker runtime and kubectl toolset that makes it easy to provide the software to run on the cluster.
These help with the convenience to host any application on Kubernetes over windows in a resource constrained environment

Thursday, March 12, 2020

1) The Flink programming model helps a lot with writing queries for streams. Several examples of this are available on their documentation page and as a sample here. The ability to combine a global index for the stream store with their programming model boosts the analytics that can be performed on the stream. An example to create this index and use it with storage is shown here.
2) The utility of the index is in its ability to lookup based on keywords. The query engine for using the index exposes additional semantics and syntax to the analytics user over the Flink queries or to be used with Flink queries. Then the logic to use the queries and the Flink can be packaged in a maven published jar. Credentials to access the streams can be injected into the jar with the help of a resolver which utilizes the application context.
3) Some of the streams may be generated as part of a running map or flatMap operations on an existing stream and they might come useful later. Unlike the stream for index, there could be a stream for transformation of events. Such transformation will happen once and persist in a new stream. Indexes are rebuilt. Transformation is one time. Indexes are used for a while. Transformations are temporary and persisted only when used with several queries. Indexes might even be stored better as files since they are rewritten. Transformed streams will be append only. The Extract-Transform-Load operation to generate this stream could be packaged in a maven artifact that is easy to write in Flink. If the indexing automatically includes all streams in a project, then this transformed stream would become automatically available to the user. If there is a way for user to blacklist or whitelist the streams for inclusion in the index, it will give more power to the user and prevent unnecessary indexing. All project members can have the privilege to add a stream to the indexing. If the stream is earmarked to be indexed, indexing may even be kicked off by the project member or require the administrator to do so.
4) Overall, there is a comparision made between indexing across stream and transforming one or more streams into another stream. The Flink programmability model works well with transformations. Utilization of index-based querying adds more power to this analytics. Finally, the data for the transformed streams and indexes can be stored with a tier 2 that brings storage engineering best practice and allowing the businesses to focus more on the querying, transformations and indexing.