Cluster computing

Friday, August 11, 2017

We continue discussing the ZooKeeper. It is a co-ordination service with elements from group messaging, shared registers and distributed lock services. It provides a interface that guarantees wait-free property and FIFO execution of requests from each client. Requests across all clients are also linearized.
Both the linearizable writes and the FIFO client order form the guarantees for ZooKeeper. These two guarantees also interact. For example, when many clients elect a leader, the leader may take some time to complete its work before it can notify others. During this time, the other clients may not make any changes. And if the leader dies, the partial changes may not be used. Chubby service does not handle the latter requirement. ZooKeeper handles both by designating a path as the "ready" ZNode. The other clients may make changes only when that ZNode exists. Therefore the leader is only responsible for deleting ready, making changes and creating ready. If the leader dies, the ready is not available, so the other clients won't use the changes. If there are a large number of clients, then the latency per client ZNode change will aggregate but because the requests are asynchronous and the changes are pipelined. it does not take that much time.
#codingexercise
Find the majority element in an array.
Naive implementation
int GetMajority(List<int> A)
{
var h = new Hashtable();
for (int i = 0; i < A.Count; i++)
if (h.ContainsKey(A[i]) == false)
h.Add(A[i], 1);
else
h[A[i]] += 1;
int max = INT_MIN;
int count = 0;
foreach (var kvp in h)
{
if (kvp.value > count){
count = kvp.value;
max = kvp.key;
}
}
return max;
}

Thursday, August 10, 2017

Wednesday, August 9, 2017

We continue discussing the ZooKeeper. It incorporates elements from group messaging, shared registers and distributed lock services in a replicated, centralized service. It provides a interface that guarantees wait-free property and FIFO execution of requests from each client. Requests across all clients are also linearized. In order to do this, it implements a leader based atomic broadcast protocol called Zab. Zab is only used to linerarize writes. Reads can happen in any order and ZooKeeper services them at the local server.
Caching is also used to improve performance. The id of the leader is useful to cache because it is frequently required and avoids asking other servers. ZooKeeper uses a watch mechanism to enable clients to cache data without managing the client cache directly. With this mechanism, a client can watch for an update to a given data object, and receive notification about update. Chubby on the other hand centrally updates the data and blocks updates to invalidate the caches of all clients caching the data being changed. Leases are used to prevent a client from blocking the update indefinitely but leases only affect slow or faulty clients whereas ZooKeeper avoids this problem altogether.
ZooKeeper therefore demonstrates the following :
1) A co-ordination kernel which provides a wait free co-ordination service with relaxed consistency guarantees to be used in distributed systems
2) A set of higher level co-ordination primitives that are easy to be used in distributed computing.
3) Experience with co-ordination in a variety of distributed systems.
In ZooKeeper terminology, a client denotes the user of a ZooKeeper service, a server denotes a process providing ZooKeeper service, and znode denotes an in-memory data node which is organized in a hierarchical namespace as the data tree.
#codingexercise
Print Level order traversal of a tree
Void PrintSpiral (Node root)
{
bool change = false;
for (int i = 1; i < height (root); i++)
{
PrintGivenLevel (root, i, change);
change = !change;
}
}
void PrintGivenLevel(Node root, int level, bool direction)
{
if (root == null) return;
if (level ==1) print(root.data);
else if level > 1{
if (direction){
PrintGivenLevel(root.left, level - 1, direction);
PrintGivenLevel(root.right, level -1, direction);
}else{
PrintGivenLevel(root.right, level-1, direction);

PrintGivenLevel(root.left, level -1, direction);
}
}
}
Instead of the recursive option above, we can also enumerate each level with the help of a queue and reverse only alternate levels. Here we can serialize the nodes with level demarcators and then switch alternatively in one pass.

Tuesday, August 8, 2017

The ZooKeeper incorporates elements from group messaging, shared registers and distributed lock services in a replicated, centralized service. It provides a interface that guarantees wait-free property and FIFO execution of requests from each client. Requests across all clients are also linearized. Read requests are satisfied by local servers. It can handle hundreds of thousands of transactions per second. Zookeeper differentiates from other co-ordination services by being a one stop shop. Services like Amazon Simple Queue Service is great for queuing. There are other services specific to leader election. Chubby service provides strong locking and synchronization guarantees. Locks come in useful to implement leader election and group membership. ZooKeeper does not implement primitives, rather it implements a co-ordination service with an interface. Such a choice led to the implementation of a co-ordination kernel that enables new primitives without changing the service core. Exposing primitives also had an inherent issue. The processing of requests depended on responses and failure detection of other clients. On the other hand, ZooKeeper uses a wait free data objects organized hierarchically as in filesystems. The interface for ZooKeeper looks very much like that of a filesystem. It also looks Chubby without the lock methods, open and close. Moreover it can implement consensus for any number of processes with FIFO client ordering and linearized writes.
Availability and performance is improved with a collection of servers The service itself implements a pipelined architecture that enables hundreds of thousands of requests to be outstanding which still achieving low latency. Moreover it also enables clients to be asynchronous and generate more requests than they could with being synchronous. This is especially true for a client that wishes to be a leader.

Monday, August 7, 2017

We continue to discuss ZooKeeper which is a wait free coordination for Internet scale systems. It allows distributed processes to co-ordinate with each other through a shared hierarchical namespace of data registers. It is also used for distributed cluster management. It can be used for distributed synchronization in the form of locks, barriers and queues.The Zookeper service is replicated over a set of machines. All machines store a copy of the data (in memory). A leader is elected on service startup. Clients only connect to a single zookeeper server and maintains a TCP connection. Clients can only read from any Zookeeper server, writes go through the leader and need majority consensus. The data model consists of a hierarchical namespace where each node in the namespace is called a ZNode. Every such node in a Zookeeper's namespace is identified by a path. ZNode paths are canonical, absolute and slash-separted and they have no relative references. Paths are appended with a monotonically increasing number. ZNode with the path ending in the lowest number is elected the leader. Each node supports clients to set watch operations which the zookeeper uses to send notifications to the client. All watches are time-ordered but client should handle latency. APIs supported by all nodes are both synchronous and asynchronous
ZooKeeper introduces elements from group messaging, shared registers, and distributed lock services in a replicated centralized service.
#codingexercise
Yesterday we discussed an optimal solution for this problem: Given an integer array, find the maximum for each and every contiguous subarray of size k
Today we see a simple solution which works for small k
void PrintMaxInWindowSizeK(List<int> A, int k)
{
for (int i = 0; i <= A.Count - k; i++)
{
int max = A[i];
for (j = 1; j < k ; j++)
{
if (A[i+j] > max)
max = A[i+j];
}
Console.WriteLine("{0},", max);
}
}
The same works with a deque of size k by making only one pass over the list of integers.

Sunday, August 6, 2017

We continue to discuss ZooKeeper which is a wait free coordination for Internet scale systems. It allows distributed processes to co-ordinate with each other through a shared hierarchical namespace of data registers. We can think of it as a distributed lock server but it involves much more. It is available as open-source and introduces simple interface involving naming, configuration management, lock and synchronization and group services. This can be customized. It is used in configuration management in an unattended manner or for simpler deployment or provisioning. It is also used for distributed cluster management. It can be used for distributed synchronization in the form of locks, barriers and queues. We recall that the gossip protocol was used to propagate the updates via message exchanges. The Zookeper service is replicated over a set of machines. All machines store a copy of the data (in memory). A leader is elected on service startup. Clients only connect to a single zookeeper server and maintains a TCP connection. Clients can only read from any Zookeeper server, writes go through the leader and need majority consensus. The data model consists of a hierarchical namespace where each node in the namespace is called a ZNode. Every such node in a Zookeeper's namespace is identified by a path. ZNode paths are canonical, absolute and slash-separted and they have no relative references. Nodes may have varying lifetimes - from persistent which remains until deleted to ephemeral which remains until session close. These nodes can also be sequence-based where a monotonically increasing number is appended to the end of the path. Each node allows creation-deletion operations, getting and setting data or ACLs and syncing. These can easily be issued from the command line. Each node supports clients to set watch operations which the zookeeper uses to send notifications to the client. All watches are time-ordered but client should handle latency. APIs supported by all nodes are both synchronous and asynchronous.
Read requests are processed locally at the Zookeeper server to which the client connects. However, write requests are forwarded to the leader and go through a majority consensus before response is generated. Consistency is enforced via sequential updates, atomicity, a single system image, reliability and timeliness. To illustrate with an example of Zookeper as a cluster manager, we see the following operations : Each client host sets a watch on all the cluster members. Each host as a cluster member is also an ephemeral node. Whenever a node joins or leaves, it generates an alert. It keeps updating members periodically for node status changes.
In this case, we can also walkthrough the operations for leader election. Each znode has a path. All participants of the election process create an ephemeral-sequential node on the same election path. The node with the smallest sequence number is the leader. Each follower node listens to the leader until the next lower sequence number. When the leader is removed, another one is elected. When a session expires, re-election is conducted. We can see this in action when clients request to acquire a distributed lock. Each client creates a sequential ephemeral znode under a common path. The client corresponding to the lowest number is the winner of the lock. When the lock is release, the number is deleted forcing a re-election. Zookeeper ships client libraries in many languages so that programmability options are expanded. It should be noted that watches are one time triggers, continuous watches require reset after every trigger of a watch. This leads to a design issue with "herd effect" when there are many watches on the same node. Therefore some performance improvements involve steps such as separating the zookeper transaction log to its own disk, setting the heap size to avoid swapping, keeping session timeouts long enough to allow proper cleanup and carefully handling changes in z-node. Note that since a cluster is involved, addition or removal of hosts is easy. This comes in very handy for production hosts. Services that are deployed on production hosts where hosts change frequently can now easily be tracked with Zookeeper. In fact, Twitter uses this very technique for its service discovery.
#codingexercise
Given an integer array, find the maximum for each and every contiguous subarray of size k
void PrintMaxInWindowSizeK(List<int> A, int k)
{
var d = new deque<int>();
if (k > A.Count) return;
int i;
for (int i = 0; i < k; ++i)
{
while ( (!d.empty()) && A[i] >= A[d.back()])
d.pop_back();

d.push_back(i);
}
for ( int i = k; i < A.Count; ++i)
{
Console.WriteLine("{0},", A[d.front()]);
while ( (!d.empty()) && d.front() <= i - k)
d.pop_front();
while ( (!d.empty()) && A[i] >= A[d.back()])
d.pop_back();
d.push_back(i);
}
Console.WriteLine("{0},", A[d.front()]);
}

Saturday, August 5, 2017

Today we start discussing ZooKeeper which is a wait free coordination for Internet scale systems. It allows distributed processes to co-ordinate with each other through a shared hierarchical namespace of data registers. We can think of it as a distributed lock server but it involves much more. It is available as open-source and introduces simple interface involving naming, configuration management, lock and synchronization and group services. This can be customized. It is used in configuration management in an unattended manner or for simpler deployment or provisioning. It is also used for distributed cluster management. It can be used for distributed synchronization in the form of locks, barriers and queues. We recall that the gossip protocol was used to propagate the updates via message exchanges. The Zookeper service is replicated over a set of machines. All machines store a copy of the data (in memory). A leader is elected on service startup. Clients only connect to a single zookeeper server and maintains a TCP connection. Clients can only read from any Zookeeper server, writes go through the leader and need majority consensus. The data model consists of a hierarchical namespace where each node in the namespace is called a ZNode. ZNode paths are canonical, absolute and slash-separted and they have no relative references.
#coding exercises
Find the product of first K magic numbers
int GetMagicN(int n)
{
int power = 1;
int result = 0;
while (n)
{
power = power * 5;
if ( n & 1)
result += power;
n >> = 1;
}
return result;
}
long GetMagicProduct (int k)
{
long result = 0;
for (int I = 1; I <= k; I++)
result *= GetMagicN(k);
return result;
}
It might be interesting to note that since the magic numbers map to bit representations of consecutive integers, we can easily tell how far apart are perfect powers of 5 in the magic number sequence.