Cluster computing

Monday, May 13, 2019

Columnar store overlay over object storage

Object storage has established itself as a “standard storage” in the enterprise and cloud. As it brings many of the storage best practice to provide durability, scalability, availability and low cost to its users, it can go beyond tier 2 storage to become nearline storage for vectorized execution. Web accessible storage has been important for vectorized execution. We suggest that some of the NoSQL stores can be overlaid on top of object storage and discuss an example with Column storage. We focus on the use case of columns because they are not relational and find many applications that are similar to the use cases of object storage. They also tend to become large with significant read-write access. Object storage then transforms from being a storage layer participating in vectorized executions to one that actively builds metadata, maintains organizations, rebuilds indexes, and supporting web access for those don’t want to maintain local storage or want to leverage easy data transfers from a stash. Object storage utilize a queue layer and a cache layer to handle processing of data for pipelines. We presented the notion of fragmented data transfer with an earlier document. Here we suggest that Columns are similar to fragmented data transfer and how object storage can serve both as source and destination of Columns.
Column storage gained popularity because cells could be grouped in columns rather than rows. Read and writes are over columns enabling fast data access and aggregation. Their need for storage is not very different from applications requiring object storage. However as object storage makes inwards into vectorized execution, the data transfers become increasingly fragmented and continuous. At this junction it is important to facilitate data transfer between objects and Column
File-systems have long been the destination to store artifacts on disk and while file-system has evolved to stretch over clusters and not just remote servers, it remains inadequate as a blob storage. Data writers have to self-organize and interpret their files while frequently relying on the metadata stored separate from the files.  Files also tend to become binaries with proprietary interpretations. Files can only be bundled in an archive and there is no object-oriented design over data. If the storage were to support organizational units in terms of objects without requiring hierarchical declarations and supporting is-a or has-a relationships, it tends to become more usable than files.
Since Column storage overlays on Tier 2 storage on top of blocks, files and blobs, it is already transferring data to object storage. However, the reverse is not that frequent although objects in a storage class can continue to be serialized to Column in a continuous manner. It is also symbiotic to audience on both storage.
An Object Storage offers better features and cost management, as it continues to stand out against most competitors in the unstructured storage. The processors lower the costs of usage so that the total cost of ownership is also lowered making the object storage whole lot more profitable to the end users. 

Sunday, May 12, 2019

Credentials and Identity

Credentials and identity are assets to be managed and surely there are plenty of reasons to dedicate software for these resources otherwise they would not be so ubiquitous. However, the two concepts are not necessarily separate. They are separate only when the identity associated with an individual uses one or more credentials. A two-factor authentication is a good example for different credentials. The password is what the user knows and the one-time passcode is what the user has. This provides a separation of credentials but they represent the same identity. On the other hand, an identity represented by access key and access secret is different for each request made with an API call to a server since it does not recognize those credentials except for the purpose of authorizing the call.

A key-secret can be used for encryption of data just as much as it can be a form of identity. SSH access with username or password can be substituted with public and private key and it would still represent the same identity. Since encryption of data can be applied to scopes determined by different sized containers, the key-secret become a valuable asset and represents much more than an identity. They become assignable to parent keys and can be rotated so that they are not used again.

Key secrets are used mostly with byte ranges so it does not have any semantic content other than the context in which it was used. Yet they are just as significant to keep in a secure store as any other secret. This digital key is a bearer only access grantor yet it can be used in the form of an identity.

Such artifacts are exchangeable one for the other and yet the notion of identity remains virtually the same.

Saturday, May 11, 2019

Aliases and robots:
Usernames and passwords became a representation of the user. With the use of cryptography and X509, we now have public keys and private keys as representations of identity. These can be used to generate the equivalent of username and password as keys and secrets which can then be used to authorize http requests.
HTTP requests generated with keys and secrets are generally used by applications. Since there is no user involvement in creating these, they can also be called programmatic or robot usages. Moreover, the key and secrets can be generated dynamically for a limited lifetime, scope and purpose. These then constitute a set of credentials to be managed with expiration policies.
Identity therefore is no longer for humans alone. It is a notion shared with every accessor of system resources. By giving identities to accessors, we can assign roles for proper authentication and authorization. There is very little difference to the system in identities for users and machine. The notion of identity can change even for the same end user when the credentials change. Old names or identifiers may be closed in favor of new. Machines just make use of short lived identities. They take it to the next level where identities are frequently rotated preventing and reducing any risk or compromise. The number of times an identity is generated makes no difference as long as the set of active identities are finite and manageable.
Identity as a resource for management has evolved into specialized and general purpose Identity and Access Management products and solutions. They are setup to consolidate identities for members of entire organization with the use of a member directory. They provide different mechanisms for authentication. They provide the option to authenticate via federated, chained or standalone modes. Identity providers enable single sign on, token generation and API integration.

Friday, May 10, 2019

We were discussing the ingress resource for Kubernetes cluster.

The ingress resource can fan out the traffic to different destinations based on service uri paths. This
helps with routing the api calls to independent service implementations. The ingress resource can be specified declaratively as yaml or programmatically. The routing may be determined by the host header. The ingress resource can also route traffic to different backends. An ingress with no rules sends to a traffic to a default backend. The default backend is typically a configuration option of the ingress controller. The ingress resource can also route traffic to different name based virtual hosts at the same ip address. This feature can work without a name based virtual host being required which translates to all traffic as pass-through. The ingress resource can be used to secure the ingress by specifying a tls private key and certificate.

The deployment of services running on the Kubernetes cluster can be checked using

kubectl cluster-info

This can also be checked programmatically for automations with the help of K8s apis as shown below:

localhost:~ # kubectl proxy --port=8080 &

[1] 18455

localhost:~ # Starting to serve on 127.0.0.1:8080

localhost:~ # curl http://localhost:8080/api/

{

"kind": "APIVersions",

"versions": [

"v1"

"serverAddressByClientCIDRs": [

{

"clientCIDR": "0.0.0.0/0",

"serverAddress": "10.245.129.228:8443"

}

]

}

localhost:~ # kubectl config view -o jsonpath='{"Cluster name\tServer\n"}{range .clusters[*]}{.name}{"\t"}{.cluster.server}{"\n"}{end}'

Cluster name Server

10.245.129.228

pixie https://pixie.abc.com:8443

localhost:~ # export CLUSTER_NAME="pixie"

localhost:~ # APISERVER=$(kubectl config view -o jsonpath="{.clusters[?(@.name==\"$CLUSTER_NAME\")].cluster.server}")

localhost:~ # echo $APISERVER

https://pixie.abc.com:8443

localhost:~ # TOKEN=$(kubectl get secrets -o jsonpath="{.items[?(@.metadata.annotations['kubernetes\.io/service-account\.name']=='default')].data.token}"|base64 -d)

localhost:~ # echo $TOKEN

eyJhbGciOiJSUzI1NiIsImtpZCI6IiJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJkZWZhdWx0Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZWNyZXQubmFtZSI6ImRlZmF1bHQtdG9rZW4tOWttMnIiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoiZGVmYXVsdCIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50LnVpZCI6ImQ1OGUzMTI1LTViOGQtMTFlOS05NDUxLTAwNTA1NmJkZThjZiIsInN1YiI6InN5c3RlbTpzZXJ2aWNlYWNjb3VudDpkZWZhdWx0OmRlZmF1bHQifQ.ruDYECIDICGosPh3sUwrPsIoEZlleENEqOy_9vWrANkkDxIVK659ROF2_jfVlUNPFAz9SgPbf3sYj2I7zgKxce-m_FukoWAoB6x68E8s1bIPaRaAq5jmQZ5TubLWS3Vfc7cEnWy1DujzabcGxF7s2tCfvjXVIjwyRTDojk9wYfmFDu61rfIohEkTnR09S43u6Py2iy3REzteTsksxK9eWjwPYeJJ-KX3VAa8ZM_nItKq_5tCvtFK8bSJe7E3qKpKquYA9-To0tAsqtQWWUCx4WF0gul_t65GWES0QOvdy6PLHLi1caGarfuzpOWPeUeXnNygQk1k_YzOZWBjx3efmQ

localhost:~ # curl -X GET $APISERVER/api --header "Authorization: Bearer $TOKEN" --insecure

{

"kind": "APIVersions",

"versions": [

"v1"

"serverAddressByClientCIDRs": [

{

"clientCIDR": "0.0.0.0/0",

"serverAddress": "10.245.129.228:8443"

}

]

}localhost:~ #

Thursday, May 9, 2019

KeyCloak is a complex software and so are the features for a gatekeeper. An end to end design will separate the concerns for deployment which should preferably be as no-brainer as possible. All the custom logic should be handled by the application and not the deployment code because they are likely tied to the application and not the Kubernetes platform.

The ingress resource can fan out the traffic to different destinations based on service uri paths. This
helps with routing the api calls to independent service implementations. The ingress resource can be specified declaratively as yaml or programmatically. The routing may be determined by the host header.

The ingress resource can also route traffic to different backends. An ingress with no rules sends to a traffic to a default backend. The default backend is typically a configuration option of the ingress controller.

The ingress resource can also route traffic to different name based virtual hosts at the same ip
address. This feature can work without a name based virtual host being required which translates to all traffic as pass-through.

The ingress resource can be used to secure the ingress by specifying a tls private key and certificate.

The ingress only supports a single TLS port. The TLS secret contains key and certificate with the corresponding lookup names as tls.key and tls.certificate.

#codingexercise
Node getSecondSmallestInBst(Node root) {
Node smallest = getLeftMost(root);
return getSuccessor(smallest);
}

Wednesday, May 8, 2019

One of the least anticipated security vulnerabilities in ingress control is the use of an https proxy. As with any proxy it splits the secure channel between sender and receiver. Since there is no more a single tunnel, we now have to secure not just the ends of the tunnel with the certificates for mutual authentication but also the https proxy.

The role of an https proxy is that it consolidates calls to and from api endpoints that may even be hosted on different pods. This means that it is very similar to an api gatekeeper while allowing statistics, monitoring and troubleshooting of all applications.

It is not enough to configure the gatekeeper without having an end to end security between external incoming traffic and internal api endpoints via and including the ingress control, the applications exposing the APi and the gatekeeper or any man-in-the-middle modules.

Firewalls and network routing control can control the ip layer and the ports. The above method only addresses the http and https which are mostly allowed as default in all rules.

KeyCloak generates certificates. It is not necessary to use cert-manager if the application relies on an external OpenID provider like KeyCloak. However, passing the certificate via https through api during deployment and before the ingress control is in place is generally not preferable. A stock Kubernetes solution for deployment relieves this concern. On the other hand, changing the initial certificate for an application with the help of KeyCloak is always possible after the deployment in the form of custom code.

Tuesday, May 7, 2019

Ingress control versus external and internal application endpoint security
An ingress controller controls the traffic into the Kubernetes cluster. Typically, this is done with the help of nginx. An Ingress-nginx controller can be configured to use the certificate. Kubernetes is a system for managing the containerized applications. It facilitates deployment and scaling. As part of deployment, applications have to set up passwords, keys, certificates and other secrets. These secrets are necessary to be made available as files and environment variables for the deployment to go through. Keys and certificates are used to secure data by using the public key to encrypt and the private key to decrypt. A certificate is used as a stamp of authority. It can include the public key. The certificate then becomes usable to secure the ends of a channel such as https. Applications tend to require a key and a certificate in their configuration. Sometimes they require keystores and truststores as alternative formats. Keystores is a combination of key and certificate. It is made available in the form of a file with pfx extension or p12 extension. The truststores is merely a collection of certificates to be trusted. It could include a certificate chain if the certificates are signed. Kubernetes takes keys, certificates, keystores and truststores as secrets. For example, we can specify:
kubectl create secret tls ${CERT_NAME} --key ${KEY_FILE} --cert ${CERT_FILE}
There are ttwo steps required to secure the ingress controller include the following:
1) Use a library such as cert-manager to generate keys and certificates.
2) Use the generated key and certificates as Kubernetes secrets and generate the keystore and truststore whose location is specified in the SSL configuration of the application.
An ingress resource is defined say for example on the nginx where the http and https ports are defined. The ingress resource is merely a declaration of the traffic policy. An ingress control can be strictly https by redirecting http traffic to https. Therefore this works more like a gateway.
In addition, application may define their own endpoints that have their own ports or even required certificates beyond the catch all that the ingress resource provides. In such cases, the applications requires its own configuration with even a separate pair of key and certificate.