Cluster computing

Docker Containers, Solaris Zones and vCenter Virtual machines
There are different levels of isolation and virtualization of applications. With the embrace of cloud computing, applications have become modular with deep virtualizations such as separating code to run on different virtual machines. While the guest operating system in a virtual machine may provide another level of isolation in terms of resources such as with cgroups and namespaces, they don’t control data partitioning or load balancing. They improve portability so that the application can seamlessly run on premises, public cloud, private cloud, baremetal etc. Each level of isolation is encapsulated within another. A application running within a docker container is bound to a single linux instance that it is hosted on. Similarly by their very nature operating system zones or namespaces are bound to that machine. A virtual machine can move between datacenters or storage such as with vMotion but it is still bound to a single instance.
Most applications are indeed tied to their hosts. If they have dependencies on external services, mostly they are in the form of database connection or an api calls. Eventually these api servers or database servers become more consolidated and single point of existence for their purpose. The microservice model is a good example of such a dedicated single instance. While data can be partitioned, servers are seldom so. This is due to the fact that most operations are data intensive rather than computation intensive. Consequently database servers and api servers grow to become clusters and farms to take on the load that they do.
An alternative approach to this model is a distributed container – one that spans multiple servers with their own partitions of data. Consider a service that interacts with a database server for a table. If the pair of the database server and the api server could partitioned for this data, then the workload will always be within limits and the performance will improve as the workload is split. This separation in terms of pairs of servers can even be expanded to include clusters. In such a case the applications are written once with arbitrary data in mind but deployed almost always with their own regions to operate in. The api servers can then be differentiated one from another with their endpoints such as with us-region-1.apiserver.corp.company.com or with qualiifers such as apiserver.corp.company.com/region1/v1/exports
However there is very little tie-up required between the servers in the example above because most of the data transferred is with the help of connection endpoints across the network. There is hardly any resource sharing between the clusters or the machines in the pool. Another way to look at this is to say that this is a deployment time concern and requires non-traditional partitioning of tables across database servers. Further it requires additional mapping relations between the code and data for each region. This means that there is pairing information between each server and database for its partition of data. This is simple but it still does not harness the power of a container as an elastic loop around several machines.
Clusters solve this problem by allowing additional nodes to be seamlessly brought online or taken offline. However clusters are usually homogenous not a combination of api servers and database servers. Imagine a platform-as-a-service in an enterprise where several microservices are hosted each
with its own purpose and not necessarily sharing the same database, database server or database cluster. If this set of services is having hot spots in terms of one or two services, it may be easy to bolster those services with web farms or load balancers. However this is not the same as cloning the entire set of api servers for another region. In such a case we are looking at a container for a platform-as-a-service and not an application container alone.
This still does not require any operating system libraries like docker does for Linux. It’s merely a production or deployment time inventory classification convenience and mapping exercise. When a docker enables containers spanning over multiple machines, it will required some messaging framework between the servers so that the applications can move in and around the participant machines. The trouble is that code is easy to be cloned but the data is sticky. As data grows, the database becomes larger and it demands more servicing and application binding.
#codingexercise
Check if a tree satisfies children sum property
Bool isChildrenSum(node root)
{
If ( root == null || ( root.left == null && root.right == null))
Return true;
Int left = 0;
Int right= 0;
If (root.left != null) left = root.left.data;
If (root.right != null) right = root.right.data;
If (root.data == left + right && (isChildrenSum(root.left) && isChildrenSum(root.right))
Return true;
return false;

}
Convert an arbitrary tree into one that satisfies children sum property
void ToChildrenSumTree(ref node root)
{
int left = 0;
int right = 0;
if (root==null || (root.left == null && root.right == null)) return;
ToChildrenSumTree(ref root.left);
ToChildrenSumTree(ref root.right);
If (root.left != null) left = root.left.data;
If (root.right != null) right = root.right.data;
int delta = left + right - root.data;
if (delta >= 0)
root.data = root.data + delta;
else
increment(ref root, 0-delta);
}

void increment(ref node root, int delta)
{

if (root.left){
root.left.data += delta;
increment(root.left, delta);
}
if (root.right){
root.right,data += delta;
increment(root.right, delta);
}
}

Cluster computing

Sunday, July 17, 2016

No comments:

Post a Comment