Cluster computing

Friday, November 11, 2016

Scoped and Persistent Connections

Let us take a look at an example where we use both stateful and stateless APIs. We use the example of a terminal console. In a stateless API, the console takes one command at a time and returns the output. There is no state maintained between the previous command and the current command. So if we wanted to list the files of a directory by changing to that directory, then we do it all at once in a single command otherwise we don’t do it. In a stateful API, we do processing of the command but maintain the state so we can issue one command for change directory and another command for listing the files.

One way to mimic the stateful API over the stateless API would be to replay the history of commands. If there is only one command, we execute it. If there is second command we look up the history to find the earlier commands, execute it and then execute the given command. If we tolerate the performance, we can use this method, however this assumes the commands are idempotent which means they can be called repeatedly to the same effect.

Generally the APIs are never interchanged since they serve different purposes. But one can be used over the other. In this case, we take the example of a state represented by a connection and use the stateless calls over these for the commands. The only difference is the connection has a setup and teardown and we persist for the duration of the calls. This is very similar to socket and in fact many networking protocols have a setup and tear down phase.

The commands coming over the stateless APIs are decoupled from the stateful connection setup and tear down that maintains a single connection over which all the commands are run and the output returned through the stateless APIs.

Here the state is maintained for the resource. In some cases, the state is maintained for the user. In such a case the order is reversed. The stateful layer is over the stateless APIs. This is exactly the case for many portals where the user session needs to be tracked so that the users have a seamless experience between the actions they take and they don’t need to repeat their authentication each time.

Such state have a lifetime and this is established beforehand or tied to user actions. In all these cases, the destruction of the state invalidates subsequent api calls and this is very much like the socket that is terminated so that no subsequent actions can be taken. Generally the state is referred to with a key or an identifier that is used by the subsequent API calls.

#codingexercise

given four keys letter A, ctrl A, ctrl C, ctrl  V find the maximum number of A we can print using the combination of keys in N keystrokes. the elements of repetitions are
A
Ctrl V
Ctrl V Ctrl V
Ctrl A Ctrl C Ctrl V
Therefore, one way to do this could be :

static int getCount(int N, int copy, int current)

{

if(N <=0)

return 0;

if (N <= 6)

return N;

int count0 = current+getCount(N-3, current, current * 2); // ctrl A+C+V

int count1 =  copy * 2 + getCount(N-2, copy, current + copy * 2); // ctrl V + ctrl V

int count2 = copy + getCount(N-1, copy, current+copy); // ctrl V

int count3 = 1 + getCount( N-1, copy, current+1);

var counts = new List<int>(){ count0, count1, count2, count3 };

int max = counts.Max();

return current + max;

}

Console.WriteLine("Max = {0}", getCount(6, 0, 0));

Console.WriteLine("Max = {0}", getCount(9, 0, 0));

Max = 6

Max = 12

we can also consider a variation of the above dynamic programming with varying length repetitions of CtrlV and CtrlACV at any position beyond 6th occurrence of A.

Thursday, November 10, 2016

Today we take a break to discuss stateful APIs. With the move towards REST Framework most of the APIs are necessarily and correctly stateless. However, once in a while we do come across use cases where the APIs have to be stateful. Take for instance a native os command. Within the APIs, we can invoke a native system command and return the output each time we encounter a command. Since all the commands issued are maintained in the .bash_history, we don't always require to maintain state. and can even replay the commands issued assuming they are idempotent. But if we open a python console and start issuing python statements, we no longer have that state unless we redirect the input and output via the pipes to the API and even so there is no stashing unless the APIs keep track of it separately.
Enter request.session.session_key and we now solve this for every session initiated by the user by attaching the state to the session. Since the state can be persisted, all we have to do is name the state with the session key so we can tell apart the sessions. Still this is mostly a front-end call and the APIs remain stateless. Many UI framework support this model by facilitating the notion of sessions.
So the question really is should the statefulness be pushed down to the API ?
Pautasso et al. classify the application integration style as shared database, remote procedure call, message bus and File Transfer.

#codingexercise

Problem: Given a sequence of words, print all anagrams together.

Sort the letters within a word
Sort the words in the sequence
This brings the anagram groupings
Since each anagram keeps track of its index in the sequence we can find the corresponding words and their groupings.

Void PrintAllAnagrams(List<string> words)

{

Var items = new List<Tuple<string, int>>();

Words.enumerate((x,i) => items.Add(new Tuple<string, int>(x, i)));

Items.forEach(x => sort(x));

Items.sort(new TupleSorter());

Items.ForEach(x => console.writeLine("{0} at index {1}", words[x.second], x.second);

}

Or we could simply cluster them based on anagram similarity. This clustering has to be hierarchical since we don’t know the number of anagrams and the threshold has to be zero because we are looking for exact similarity and anything else is not.

Void PrintAllAnagramsByClustering(List<string> words)

{

Var items = new List<Tuple<List<string>, int>>();

Words.enumerate((x,i) => items.Add(new Tuple<List<string>, int>(new List<string>(x),-1))); // all items have label -1 at start

Bool over = false;

While (!over)

{

Var newCluster = new Tuple<List<string>, int>();

For ( int I =0; I < items.length; i++)

For (int j =i+1; j < items.Length; j++)

{

If (i != j && distance(items[i], items[j]) == 0) && items[I].second != items[j].second){

match = true;

var merged = merge(items[i], items[j], i);

If (newCluster.Contains(merged) == false)

NewCluster.Add(merge);

}

Foreach (var item in items)

If newcluster.contains(item)

Items.Remove(item);

Else

Item.second = -1;

If (newCluster.empty() == false)

Item.Append(newCluster);

If (newCluster.empty()){

Console.WriteLine(newCluster.ToString());

over = true;

}

We could also use a hashing function that computes the sum of the ascii values of the letters in the words.

Wednesday, November 9, 2016

We continue reading on Kubernetes. Kubernetes is not a traditional, all-inclusive PaaS. Unlike PaaS that restricts applications, dictates choice of application frameworks, restrict supported language runtimes or distinguish apps from services, Kubernetes aims to support an extremely diverse variety of workloads. As long as the application has been compiled to run in a container, it will work with Kubernetes. PaaS provides databases, message bus, cluster storage systems but those can run on Kubernetes. There is also no click to deploy service marketplace. Kubernetes does not build user code or deploy it. However it facilitates CI workflows to run on it.
Kubernetes allows users to choose logging, monitoring and alerting Kubernetes also does not require a comprehensive application language or system. It is independent of machine configuration or management. But PaaS can run on Kubernetes and extend its reach to different clouds.
A Kubernetes cluster can be launched on machines running Ubuntu 16.04, CentOS 7 or HypriotOS v1.0.1+ with an installation tool called kubeadm. The process works with local VMs, physical servers and/or cloud servers. This tool assumes we have a set of machines virtual or real but is preferable over baremetal. Each machine should have 1 GB RAM and there should be full network connectivity between all machines in the cluster.
Kubernetes cluster can be installed on cloud. While this was preferable over AWS, we now have Azure supporting Kubernetes installation. This is done with a tool called kops and is well-known for its high availability support. It is self healing, auto-scaling and can directly provision or generate terraform manifests. It uses fully qualified name for addressing within the cluster and outside. It creates a route53 domain for the cluster so all the cluster machines can be looked up without remembering each one's ipaddress or name. It creates an s3 bucket to store the cluster state. It builds the cluster configuration. It creates the cluster in AWS and enables several add-ons such as WeaveNet, Calico, Flannel, Canal and Romana for networking, WeaveScope and Dashboard for web interface and Legacy Add-ons.
#codingexercise
Find a tour for a trucker to visit all gas stations in a circle
The naiive solution can be O(n^2) but we can make it linear.
int Tour(List<int>fill, List<int>use)
{
int min = int_max;
int reserve = 0;
int index = 0;
for (int i =0; i < fill.size; i++)
{
reserve += fill[i] - use[i];
if (reserve < min)
{
min = reserve;
index = (i+1) % fill.size;
}
}
if (reserve >= 0)
return index;
return -1;
}
Convert a binary tree into one where the nodes have the sum of left and right subtree
int ToSumTree(node root)
{
if (root == null) return 0;
int val = root.data;
root.data = ToSumTree(root.left) + ToSumTree(root.right);
return root.data + val;
}
The same algorithm remains true even if we wanted the difference instead of sum.

Given a sequence of words, print all anagrams together.

Void PrintAllAnagrams(List<string> words)
{
Var items = new List<Tuple<string, int>>();
Words.enumerate((x,i) => items.Add(new Tuple<string, int>(x, i)));
Items.forEach(x => sort(x));
Items.sort(new TupleSorter());
Items.ForEach(x => console.writeLine("{0} at index {1}", words[x.second], words[x.second]);

}
Or we could simply cluster them based on anagram similarity