Cluster computing: January 2020

Friday, January 31, 2020

We continue with the discussion of the script to export and import Kubernetes resources.
By changing the namespace, the same files can be imported again.
The export order should be resources first followed by definitions while the import order should be definitions followed by resources.
The above script only prints the commands and does not execute them.
There are two modifications that may be required to the commands.
First, the ip addresses for host, cluster and the service endpoints will need to be modified when the resources are exported from one namespace on a cluster to another. These addresses are easy to find as we make incremental progress towards the definitions and resources.
Second, the uid will need to be specified for some resources because they will pertain to existing resources in fields such as ownerReferences. This is a way to indicate that the resource will be cleaned up when the owner gets cleaned up. The ownerReferences is a way of chaining the cleanup and gives us an opportunity to do proper cleanup with a broader scope.
These two changes will help enable the resources and definitions to be recreated in the destination cluster and namespace. The commands also provide a point of reference between this method and the resulting changes provided by the Velero tool. It might be interesting to note that the latter involves specifications that may not be as lightweight and customized as the scripts above. At the same time, the tool works across deployments in general. The scripts above will work for any Kubernetes framework.

Thursday, January 30, 2020

The following script rounds up the retrieval of definitions
#!/bin/bash
# retrieves definitions and resources for a user namespace
COMMAND=$1
if [ -z "$COMMAND" ]
then
echo "Command does not exist"
exit 1
fi

NAMESPACE=$2
if [ -z "$NAMESPACE" ]
then
echo "Namespace does not exist"
exit 1
fi

# Pack
if [[ $1 == "GET" ]]; then
kubectl api-resources --verbs=list --namespaced -o name | xargs -n 1 kubectl get -n $NAMESPACE -o yaml > definitions.yaml
kubectl get all -n $NAMESPACE -o yaml > resources.yaml
fi
#Unpack
if [[ $1 == "PUT" ]]; then
kubectl create ns $NAMESPACE
if [ ! -e "definitions.yaml" ]; then
echo "definitions does not exist"
exit $E_NOFILE
fi
sed -i.bak '/creationTimestamp/d' definitions.yaml
sed -i.bak '/deletionTimestamp/d' definitions.yaml
sed -i.bak '/uid:/d' definitions.yaml
echo kubectl create -f definitions.yaml --validate=false

if [ ! -e "resources.yaml" ]; then
echo "resources does not exist"
exit $E_NOFILE
fi
sed -i.bak '/creationTimestamp/d' resources.yaml
sed -i.bak '/deletionTimestamp/d' resources.yaml
sed -i.bak '/uid:/d' resources.yaml
sed -i.bak '/uid:/d' resources.yaml
echo kubectl create -f resources.yaml --validate=false
fi

By changing the namespace, the same files can be imported again.
The export order should be resources first followed by definitions while the import order should be definitions followed by resources.

Wednesday, January 29, 2020

The dependencies between the exported resources is not obvious from their metadata since each is independently exported. In order to help with their export, their annotations could be enhanced to include a category or a level.
The order of restoring the Kubernetes resources at another point is also not known. When databases were backed up and restored, the order of creation was known from their schema. However, in this case the Kubernetes resources are independently exported and imported. Even if it is a custom resource, all that needs to be imported first is their custom resource definition.
A Custom resource application can include other definitions so there is definitely a cascading order of creation involved. This order is already determined in the charts that were used to deploy the product. The export order can be any order but the import order has to be the same as what the charts had described at deployment time.
The order of export and import do not have to be reverse of each other but it certainly helps to verify the order offline. The attributes such as creationTimestamp, deletionTimestamp, uid and even finalizers can be removed prior to import.
A command like “kubect get all -n <user-namespace> -o yaml > user-namespace-resources.yaml” will infact determine the order appropriately which can then be directly supplied to the “kubectl create -f user-namespace-resources.yaml” as long as interfering attributes have been removed.
These won’t export custom resource definitions or the resources that can be fetched with “kubectl api-resources" nor will it setup the persistent volumes and external services that are represented via service brokers within the Kubernetes clusters. Those are assumed to be ready before the import begins.

Tuesday, January 28, 2020

Some solutions like the Heptio Ark or Velero perform this nicely for some of the applications but the collection of all resources and objects determine a fully functional backup and restore.

The customizations performed over the backup and restore from the tool above should be minimal and just enough to support the packing and unpacking of user resources in a bundle to be exported from source and imported at destination.

These include:

Registrations in the catalog
Custom resources being created: for example, FlinkApplications, FlinkSavepoints and FlinkClusters
External logic saved as maven artifacts or other files on disk aside from custom resources

Metrics data
Events for all the user resources
Logs for all the containers
User accounts and roles
Store connection info and containers

Proprietary Data migration for each container in etcd, databases, blobs, files and stream store.

The methods for backup and restore of project artifacts created by the user must be based on retry loops. This logic would look something like this

While ((count_of_resources_to_be_collected > 0 ))

Result=$(kubectl get $custom-resource-definitions –n user-namespace -o yaml > crds.yaml)

Count=$(cat crds.yaml | grep -I “kind” | wc –l)

If (($count > 0)); then

count_of_resources_to_be_collected = ( expr $count_of_resources_to_be_collected- $count)

// adjust $custom-resource_definitions to not include those that were read

done

Similarly, the resources to be restore on another cluster must be re-entrant and repeatable:

While ((count_of_resources_to_be_collected > 0 ))

Result=$(kubectl create –f crds.yaml -n user-namespace)

Count=$(get $custom-resource-definitions –n user-namespace -o yaml >)

If (($count > 0)); then

count_of_resources_to_be_collected = ( expr $count_of_resources_to_be_collected- $count)

// adjust crds.yaml and count_of_resources_to_be_collected to not include those that were read

done

Monday, January 27, 2020

Exporting Kubernetes resources:

The resources created in Kubernetes on one cluster can be exported to another cluster. While Kubernetes is a platform that allows separation of resources by namespaces, the user objects created through Kubernetes api-services can be assigned to any namespace and belong to any type. These objects include such things as secrets, serviceaccounts, configmaps, persistentvolumes and custom resources. Collecting all the resources for a particular namespace is as easy as:

“kubectl get --export --all -n <user-namespace>”

And each resource can be collected in json or yaml format.

The exported resources have creation timestamp or if they were in the state of being deleted then they have deletion timestamp. These and some other attributes will need to be removed from the declaration before they can be imported into another cluster.

When the resource is imported in the destination cluster with:

“kubectl create –f <resource-definition.yaml> -n <user-namespace>”

the corresponding resources get created just as they were in the origin cluster. This means that the application works essentially the same as before.

This works for individual Kubernetes resources and their definitions. However, the links between the resources such as the service account to be used with a particular resource needs to be in the exported-imported yaml declarations otherwise they are not re-created. In order to do this, an application must be mindful of all the objects and their dependencies including those that exists in the catalog or have their service-brokers point to resources external to the cluster.

Some solutions like the Heptio Ark or Velero perform this nicely for some of the applications but the collection of all resources and objects determine a fully functional backup and restore.

These include:

Registrations in the catalog
Custom resources being created: for example, FlinkApplications, FlinkSavepoints and FlinkClusters
External logic saved as maven artifacts or other files on disk aside from custom resources

Metrics data
Events for all the user resources
Logs for all the containers
User accounts and roles
Store connection info and containers

Proprietary Data migration for each container in etcd, databases, blobs, files and stream store.