Cluster computing

Yesterday we were discussing Linux Containers. We were comparing Containers with virtual machines. Today we look at this comparision in depth.
First Containers are small and fast. Unlike virtual machines where a hypervisor carves out the vm as a separate machine with even a different operating system than its own, containers share the same, containers run wherever the operating system is the same. Specifically the containers are best used for packaging applications so they can move around. The old way to deploy applications was to install the applications on a host using the operating system package manager, This tied the application to the host OS. Instead we could build immutable VM images in order to achieve predicatable rollouts and rollbacks but VMs are heavyweight and non-portable. The new way is to deploy containers based on a operating-system-level virtualization rather than hardware virtualization. There are two significant advantages to this. First the containers are isolated from each other and from the host in that they even have their own file system. which makes it portable across cloud and os distributions. Second the immutable container images can be created at build/release time rather than the deployment time of the application since each application doesn't need to be composed with the rest of the application stack nor tied to the production infrastructure environment. In this case each application is compiled with its own set of container libraries which enables a consistent environment to be carried from development into production. Moreover, containers are vastly more transparent the virtual machines because they facilitate monitoring and management.This is clear to see when the container process lifecycles are managed by the infrastructure rather than hidden by a process supervisor inside the container. Now managing the applications becomes the same as managing the containers while applications have gained tremendous portability. Kubernetes extends this idea of app+container all the way where the host can be nodes of a cluster.
With this introduction, we now list the differences between containers and virtual machines as follows:
1) Containers are more dense form of computing as compared to a virtual machine. While a hypervisor may support a few vms. a single vm may support hundreds of container.
2) Containers make the one-application-per-server more formal with an isolation of the compute and storage
3) Containers have very little overhead as compared to vms and are fast and small.
4) Containers make application creation and deployment easier
5) Containers such as in Kubernetes are an improvement over PaaS because it is build time and not just deployment time.
6) Point number 5 implies now that the containers decouple applications from infrastructure which separates dev from ops.
7) Containers show Environmental consistency across development, testing and production because they run the same on a desktop or in a cluster.
8) Containers raise the level of abstraction and make it application centric.
9) Containers enable loosely coupled, distributed, elastic, liberated micro-services
10) Point number one implies now that containers demonstrate better resource isolation and improved resource utilization.
Kubernetes evolved as an industry effort from the native Linux containers support of the operating system. It can be considered as a step towards a truly container centric development environment.

#codingexercise
Given an array of positive numbers, find the maximum sum of a subsequence with the constraint that no two numbers in the sequence should be adjacent or next to next in the array. So 3 2 7 10 should return 13 (sum of 3 and 10) or 3 2 5 10 7 should return 15 (sum of 3, 5 and 7)

static int GetAltSum(List<int> nums, int start)
{
if (start >= nums.Count) return 0;
int incl_sum = GetAltSum(nums, start + 3) + nums[start]; // start + 3 is used to denote the element after adjacent two
int excl_sum = GetAltSum(nums, start + 1);
return Math.Max(incl_sum, excl_sum);
}

#algorithms
How do we perform independent sampling in a high dimensional distribution ?
Independent samples are those where we choose two different types of items such that the values of one sample do not affect the values of the other.
In small dimensions, this is relatively easy because we can tell apart the samples. For example, to prove the effectiveness of a medicine, we use a test group and a control group as independent samples. The control group does not get the medicines but get say a placebo instead. In higher dimensions, there are many more factors involved than just one factor - the medicine.
In higher dimensional distributions, we use Metropolis Hastings algorithm.
The algorithm generates samples iteratively with the desired distribution. In each iteration, more and more samples are produced where the next sample depends only on the current in a Markov chain like manner. hence it is also called Markov Chain sequence model. The number of samples is proportional to the number of iterations. In each iteration it picks a sample based on the current value. then it measures how close this sample is to the desired distribution. if the sample is accepted, then the new sample is used as the current value.if it is rejected, the current value is reused.

Text analysis often uses high dimensional vectors so this may be of use there.

Its important to note that the samples need not be considered synthetic. Instead it draws samples which is why this comes useful.

Cluster computing

Monday, November 7, 2016

No comments:

Post a Comment