Cluster computing

Sunday, November 6, 2016

Today we discuss Containers versus Virtual Machines (vm).

As a cloud provider and consumer, we are aware that to the end user the computing power and storage matters more than how it is provisioned and where. The ease of requesting the compute, the cleanliness of instantiating and deleting the associated resource, the ability to migrate applications and services far out weigh any consideration of the vendor or the mechanism just so long as the performance, security and costs are comparable.

Essentially the provisioning of such compute is a result of partitioning some wholesale undivided resource whether it is at the ESX level or an instance level. Each time a new resource is requested, a partition is made and the resource is provisioned. Partition do not interact with each other so we have isolation and this works well on shared resources. The partitioning policy can depend on the size of the resource being cut , the availability and the load on the underlying fabric. A private cloud provider for example has several regional datacenters to distribute the volume of requests for new virtual machine instances.

Yet it is widely understood in computing world that we don't aggressively partition lest we shoot ourselves in the foot. Instead we are as lazy to partition the resources as and when the need arises.
As an example, we don't excessively partition our hard disk into many drives up front only to realize later that one drive is filled up while the other is relatively unused. This brings up the importance of elasticity of the resource we partition. If we partition a datacenter into several virtual machines, is the datacenter elastic to more and more usage ? Yes for example, we can add ESX blades as appropriate in say a VMWare cloud. But what about a Linux container ? The same is true, if the current resource is exhausted, we provision elsewhere. However, the container is restricted to a vm and the vm may or may not be elastic.

Furthermore containers are governed by the quotas and limits of the underlying system to manage the shared resources. One of the reasons Solaris zones suffered from application migration considerations from the operating system consumers is that they kept running into application failures due to limits. Both linux containers and solaris zones must be understood as operating system features and not as standalone products for virtualization technologies.

That said linux containers have the advantage that they provide the same environment without the overhead of a separate kernel and hardware. This translates to lightning fast availability of an isolated shell as well as there reduction of failure points. Moreover hardware is not necessarily partitioned per virtual machine in any case. However, separtion of kernel does play significantly in terms of performance because the emulation is different from the real especially when compared to a standalone.

In addition, we now have integration with Open-stack for linux containers. The plugin allows the hosts to be compute nodes while the container takes on the workloads from the user.

#codingexercise
Given an array of positive numbers, find the maximum sum of a subsequence with the constraint that no two numbers in the sequence should be adjacent in the array. So 3 2 7 10 should return 13 (sum of 3 and 10) or 3 2 5 10 7 should return 15 (sum of 3, 5 and 7)

static int GetSum(List<int> nums, int start)
{
if (start >= nums.Count) return 0;
int incl_sum = GetSum(nums, start + 2) + nums[start];
int excl_sum = GetSum(nums, start + 1);
return Math.Max(incl_sum, excl_sum);
}

static void Main(string[] args)
{
var nums = new List<int>() { 3, 2, 5, 7, 10, 9 };
var nums1 = new List<int>() { 3, 2, 5, 10, 7 };
var nums2 = new List<int>() { 3, 2, 7, 10 };
var nums3 = new List<int>() { 3, 2 };
var nums4 = new List<int>() { 3 };
var nums5 = new List<int>() { };
Console.WriteLine("Max = {0}", GetSum(nums, 0));
Console.WriteLine("Max = {0}", GetSum(nums1, 0));
Console.WriteLine("Max = {0}", GetSum(nums2, 0));
Console.WriteLine("Max = {0}", GetSum(nums3, 0));
Console.WriteLine("Max = {0}", GetSum(nums4, 0));
Console.WriteLine("Max = {0}", GetSum(nums5, 0));
}

Max = 19
Max = 15
Max = 13
Max = 3
Max = 3
Max = 0

Saturday, November 5, 2016

We continue to review the mechanics of Active Directory replication:
We are referring to intrasite replication for the most often use cases of Active directory domain name-context item changes. Such AD replication takes place between DCs in pull operations, not pushes. AD controls and manages the transmission of AD Data. We discussed what makes up the replication metadata and we saw how it is put to use. We saw how it was used to address conflict resolution and usn rollbacks.
Today we look at ReadOnly domain controllers. This is usually deployed in branch offices where there is no need to have dedicated and full strength server, personnel or protection. There are no write operations and outbound replication is not permitted. Moreover, the objects are replicated without the user's password information. This creates an additional layer of security because if a DC "grows legs and walks away" there is no password information resident on the hard drive.

#codingexercise
We were discussing the celebrity problem yesterday and making it O(N) from O(N^2) complexity
We used a stack that compared adjacent elements. However we can do away with the stack and use the ends of the list.
int findCelebrity(int n)
{
int a = 0;
int b = n - 1;

while (a < b)
{
if (knows(a, b))
a++;
else
b--;
}

for (int i = 0; i < n; i++)
{
if ( (i != a) &&
(knows(a, i) || !knows(i, a)) )
return -1;
}
return a;
}

Remove half-nodes from a binary tree

Node RemoveHalfNodes(node root)
{
if (Root == null) return null;
root.left = RemoveHalfNodes(root.left);
root.right = Remove HalfNodes(root.right);
if (root.left == null && root.right == null)
return root;
if (root.left == null){
var newroot = root.right;
delete root;
return newroot;
}
if (root.right == null){
var newroot = root.left;
delete root;
return newroot;
}
return root;
}

Given an infinite string defined by function f(x)=x+”0″+f(complement x) find k’th bit from start
Assuming complement of x is inverting bits
we just keep track of position and iterations
for example if k is less than length(x) then the answer is trivial
else we count the length(x) + 1 + length of complement of x as the new x for the next iteration.
we iterate until the length spanned is greater than k and return the value that is immediately preceding k

Facebook API to get all friends' id:

FB.api('/me/friends?limit=200', function(response) {

Log.info('API response', response);

document.getElementById('results').innerHTML = '';

var names = JSON.parse(JSON.stringify(response));

for (var i=0; i < names["data"].length; i++) {

document.getElementById('results').innerHTML += JSON.stringify(names["data"][i]["id"]);

}

// document.getElementById('accountInfo').innerHTML += JSON.stringify(response);

});

Examples : https://github.com/ravibeta

Friday, November 4, 2016

Thursday, November 3, 2016

Today we continue to review the mechanics of Active Directory replication:
We are referring to intrasite replication for the most often use cases of Active directory domain name-context item changes. Such AD replication takes place between DCs in pull operations, not pushes. AD controls and manages the transmission of AD Data. We discussed what makes up the replication metadata and we saw how it is put to use. We saw how it was used to address conflict resolution.
Today we will see one more mitigation which is USN rollback. It has to do with object deletions. Rather than simply removing the object, it is marked deleted so that this is replicated to other DCs who know now that the object is deleted. Also the original object does not need to be passed around since it my have a big size. Only the shell with the isDeleted attribute is sent across aka tombstone. USN rollback occurs when a domain controller is so completely out of date that it has "missed" one or more tomb-stoned objects and thus is unable to bring its local copy of the Active Directory up to date. One way to avoid this is to have the DC recognize that it has been out of service longer than the longest tombstone period and not come back online. Instead it is rebuilt from scratch.
Rebuilding in general is the right approach as compared to repairing in many cluster based technologies.
Theoretically this is proved because it improves the mean time to data loss. The protection overhead is reduced, space is better utilized and there is no net addition to management overhead.
The second kind of usn rollback occurs when a DC has been restored with an inappropriate means and it has not kept current with the other DCs. This is referred to as going back in time because the restore was not AD aware. This is mitigated with event logging where the text of the log and the necessary steps to recover them are available.

In order to override automatic replication of AD objects, we could do one of the following:
1) invoke repadmin command on the destination DC to pull the changes
2) invoke repadmin command on the souce DC to push changes to all other DC.
3) issue active directory powershell cmdlet to invoke sync on a per object basis.

#puzzle
In a party of N people, only one person is known to everyone. Such a person may be present in the party, if yes, she doesn’t know anyone in the party. We can only ask questions like "does A know B?". Find the stranger in minimum number of questions.
This problem is better visualized with a directed graph where there is only one vertex that has all incoming edges from the N-1 other vertices and no outbound edges to anybody else. To construct a graph, we will need to form the adjacency matrix which is O(N^2), Then we can traverse to see if there is any vertex satisfying the criteria.

int FindCelebrity(int[,] m, int row, int col)
{
assert (row == col);
for (int i = 0; i < row; i++)
{
int known = 0;
for (int j = 0; j < col; j++)
known += m[i,j];
if (known != 0) continue;
for (int k=0; k<row; k++)
if (k!=i && m[k,i] > 0)
known += m[k,i];
if (known == row-1) {
console.WriteLine("Celebrity found at {0}", i);
return i;
}
}
return -1;
}

Wednesday, November 2, 2016

Today we continue to review the mechanics of Active Directory replication:
We are referring to intrasite replication for the most often use cases of Active directory domain name-context item changes. Such AD replication takes place between DCs in pull operations, not pushes. AD controls and manages the transmission of AD Data with replication metadata. We saw that this metadata included
1) USN which revisions changes on local machine
2) HWMV which informs whether it is originating or replicating
3) UTDV which prevents repeated processing of replicating changes and helps with propagation dampening
The replicated change therefore includes the following:
the GUID of the DC that is replicating the change
the USN from the DC that is replicating the change
the GUID of the DC that originated the change
the USN from the DC that originated the change
Based on the UTDV table, the DC determines whether it needs to make the change or simply update the HWMV entry.
The DC also has to perform conflict resolution. There are three types of conflict that can occur in Active Directory environment, each of which uses a different method of conflict resolution.
1) The same object is updated in two originating servers
2) name collision when two objects are created by the same name
3) If one object is created but another deletes the container.
To; address first two types of conflict, the replication metadata includes two things - a version ID for the attribute and a timestamp. The third type of conflict is managed by moving to a specially designated container called the orphaned container.
We were comparing the Active directory replication to a database replication and suggesting that the snapshot isolation method works very well in this regard. Just like databases can be distributed with namespace and versions, we have similar comparison with active directory.
Generally a database detects conflicts in one of the following ways:
1) current versus old : the receiving site detects an update conflict if there is any difference between the old values of the replicated row and the current values of the same row at the receiving site
2) dulpicate : the receiving site detects a uniqueness conflict if a uniqueness constraint violation occurs during an insert or an update of the replicated data
3) missing : the receiving site detects a delete conflict if it cannot find a row for an update or delete statement because the primary key of the row does not exist.
Question can the usn, hwmv and utdv be simplified : hint : think globally unique version numbers, timestamps, action name and idempotent operations.
Courtesy : Laura Hunter, Windows Server Administration

#codingexercise
Convert an infix expression to postfix expression
An infix expression is like this a + b*c + d
An postfix expression is like this abc*d++
Operators have standard precedence
As we can see we are moving the operator, consequently a stack comes in useful.
The method proceeds like this
For every operand, maintain its sequence in the result
For every ( push it on the stack
For every ) pop elements from stack till matching open ( arrives or invalidate. the popped elements are added to resulting expression.
For every operator, empty the stack onto the result until the topmost of the stack has higher precedence than the current operator and then include the operator in the resulting expression.
That's it. since we only add operators to the stack, the sequence of operands is maintained
The opening and closing paranthesis are not included in the resulting expression because the operators are stacked and two operands apply the operation following it.
For example,
(a*b)^c => ab*c^
a*(b^c) => abc^*

Tuesday, November 1, 2016

Monday, October 31, 2016

Today we take a break to review the mechanics of Active Directory replication:
We are referring to intrasite replication for the most often use cases of Active directory domain name-context item changes. Such AD replication takes place between DCs in pull operations, not pushes.
Each domain controller (DC) stores three naming contexts (NC) in its local copy of AD at the minimum. These are schema NC, Configuration NC, and AD Domain NC which corresponds to AD Data.  The last one is usually the one that frequently changes. There are two type of writes - an originating write and a replication write. Since directory changes can be made from any active directory DC, any one DC can be the originating DC. AD controls and manages the transmission of AD Data with replication metadata.
The first item of replication metadata is the USN which is the update sequence number and pertinent only locally at a DC to indicate a sequence number associated with the changes at that DC.
The other DCs keep track of these USN for each other. The other DCs are referred to by their GUIDs so that the information does not change when the DC is renamed or offline.
Since an originating change can be come a replicated change for the other DCs and perpetually propagated, the cycle is broken with the help of an indicator called the high watermark vector. This tells apart an originating change from a replicated change and also prevents repeated changes for the same entry.
#codingexercise
Given an array integer k, find the maximum of each and every contiguous subarray of size k
Yesterday we saw a simple but O(n^2) complexity for solving this exercise.
We could also make it linear by keeping track of a list of candidates in sorted descending manner. As the window moves the numbers smaller than the current number are removed from the tail and the max element from the head is printed. The list keeps track of index only.
void PrintMax(List<int>nums, int k)
{
var sorted = new List<int>();
if ( k < nums.count() || k == 0 || nums.empty()) return;
int I;
for (int I =0; I < k; i++)
{
while (!sorted.empty() && nums[I] >= nums[sorted.Last()]))
      sorted.RemoveLast();
Sorted.InsertLast(I);
}
for (int I = k; I < n; i++)
{
Console.WriteLine(nums[sorted.first()]);
while (!sorted.empty() && sorted.frst() <= I - k)
      sorted.RemoveFirst();
while (!sorted.empty() && nums[I] >= nums[sorted.Last()])
      sorted.RemoveLast();
sorted.InsertLast(I);
}
Console.WriteLine(nums[sorted.first()]);
}