Cluster computing

Monday, August 14, 2017

Sunday, August 13, 2017

We continue discussing the ZooKeeper. It is a co-ordination service with elements from group messaging, shared registers and distributed lock services. It provides a interface that guarantees wait-free property and FIFO execution of requests from each client. Requests across all clients are also linearized.
Both the linearizable writes and the FIFO client order form the guarantees for ZooKeeper. These two guarantees also interact. ZooKeeper handles both by designating a path as the "ready" ZNode. The other clients may make changes only when that ZNode exists.
ZNodes can be both regular and ephemeral. Regular nodes are the ones that clients can create and delete explicitly. Ephemeral nodes are the ones that the clients can create for the duration of the session. Sessions have an associated timeout. A session is initiated when the client connects and ends when the client closes or ZooKeeper deems faulty. Sessions come with associated state changes that inform the stage in which the execution of the operations are. Sessions also enable a client to move from one ZooKeeper server to another.
Perhaps the most important aspect of ZooKeeper is the interface it provides the clients. These include the following methods:
1) create (path, data, flags) : Used to create a ZNode with the name path and to store data array in it. It returns the name of the newly created ZNode.
2) delete(path, version) : delete the ZNode path if it is at the expected version
3) exists(data, path): Returns trues if ZNode with name path exists, and returns false otherwise
4) getData (path, watch) : returns the data and metadata for the node.
5) setData (path, data, version): writes the data object array to the node path if the version matches
6) getChildren(path, watch) - returns the set of names of the children of a node
7) sync(path) - Waits for all updates pending at the start of the operation to the concerned Zookeeper server.
#codingexercise
Find the position of the number closest to the midpoint of the range of numbers in a row-wise and column-wise sorted two dimensional array
Tuple<int, int> GetPosition(int [,] A, int r, int c)
{
int min = A[0,0];
int max = A[r-1, c-1];
int mid = (min+max)/2;
var t = new Tuple<int, int>();
int threshold = INT_MAX;
for (int i = 0; i < r-1; i++) // row wise
for (int j = 0; j < c-1; j++) // column wise
// we could optimize this to picking the element bottom-right, or bottom or right for the one closet to mid
// here we simply traverse through the numbers
{ int distance = Math.Abs(A[i,j] - mid);
if (distance <= threshold){
threshold = distance;
t.first = i;
t.second = j;
}
return t;
}

Saturday, August 12, 2017

We continue discussing the ZooKeeper. It is a co-ordination service with elements from group messaging, shared registers and distributed lock services. It provides a interface that guarantees wait-free property and FIFO execution of requests from each client. Requests across all clients are also linearized.
Both the linearizable writes and the FIFO client order form the guarantees for ZooKeeper. These two guarantees also interact. For example, when many clients elect a leader, the leader may take some time to complete its work before it can notify others. During this time, the other clients may not make any changes. And if the leader dies, the partial changes may not be used. Chubby service does not handle the latter requirement. ZooKeeper handles both by designating a path as the "ready" ZNode. The other clients may make changes only when that ZNode exists. Therefore the leader is only responsible for deleting ready, making changes and creating ready. If the leader dies, the ready is not available, so the other clients won't use the changes. If there are a large number of clients, then the latency per client ZNode change will aggregate but because the requests are asynchronous and the changes are pipelined. it does not take that much time.
There are two types of nodes a client can create
clients can create and delete the nodes explicitly and these are called regular ZNodes
clients can create the nodes for the duration of a session and these are called ephemeral ZNodes
#codingexercise
Find the majority element in an array.
using Moore's voting algorithm:
int GetMajorityCandidate(List<int> A)
{
int mindex = 0;
int count = 1;
for (int i = 1; i < A.Count; i++)
{
if (A[mindex] == A[i])
count++;
else
count--;
if count == 0;
{
mindex = i;
count = 1;
}
}
return A[mindex];
}
bool IsMajorityCandidate(List<int>A, int candidate)
{
int i, count = 0;
for (int i = 0; i < A.Count; i++)
if (A[i] == candidate)
count ++;
if (count > n/2)
return true;
return false;
}

Friday, August 11, 2017

Thursday, August 10, 2017

Wednesday, August 9, 2017

We continue discussing the ZooKeeper. It incorporates elements from group messaging, shared registers and distributed lock services in a replicated, centralized service. It provides a interface that guarantees wait-free property and FIFO execution of requests from each client. Requests across all clients are also linearized. In order to do this, it implements a leader based atomic broadcast protocol called Zab. Zab is only used to linerarize writes. Reads can happen in any order and ZooKeeper services them at the local server.
Caching is also used to improve performance. The id of the leader is useful to cache because it is frequently required and avoids asking other servers. ZooKeeper uses a watch mechanism to enable clients to cache data without managing the client cache directly. With this mechanism, a client can watch for an update to a given data object, and receive notification about update. Chubby on the other hand centrally updates the data and blocks updates to invalidate the caches of all clients caching the data being changed. Leases are used to prevent a client from blocking the update indefinitely but leases only affect slow or faulty clients whereas ZooKeeper avoids this problem altogether.
ZooKeeper therefore demonstrates the following :
1) A co-ordination kernel which provides a wait free co-ordination service with relaxed consistency guarantees to be used in distributed systems
2) A set of higher level co-ordination primitives that are easy to be used in distributed computing.
3) Experience with co-ordination in a variety of distributed systems.
In ZooKeeper terminology, a client denotes the user of a ZooKeeper service, a server denotes a process providing ZooKeeper service, and znode denotes an in-memory data node which is organized in a hierarchical namespace as the data tree.
#codingexercise
Print Level order traversal of a tree
Void PrintSpiral (Node root)
{
bool change = false;
for (int i = 1; i < height (root); i++)
{
PrintGivenLevel (root, i, change);
change = !change;
}
}
void PrintGivenLevel(Node root, int level, bool direction)
{
if (root == null) return;
if (level ==1) print(root.data);
else if level > 1{
if (direction){
PrintGivenLevel(root.left, level - 1, direction);
PrintGivenLevel(root.right, level -1, direction);
}else{
PrintGivenLevel(root.right, level-1, direction);

PrintGivenLevel(root.left, level -1, direction);
}
}
}
Instead of the recursive option above, we can also enumerate each level with the help of a queue and reverse only alternate levels. Here we can serialize the nodes with level demarcators and then switch alternatively in one pass.

Tuesday, August 8, 2017

The ZooKeeper incorporates elements from group messaging, shared registers and distributed lock services in a replicated, centralized service. It provides a interface that guarantees wait-free property and FIFO execution of requests from each client. Requests across all clients are also linearized. Read requests are satisfied by local servers. It can handle hundreds of thousands of transactions per second. Zookeeper differentiates from other co-ordination services by being a one stop shop. Services like Amazon Simple Queue Service is great for queuing. There are other services specific to leader election. Chubby service provides strong locking and synchronization guarantees. Locks come in useful to implement leader election and group membership. ZooKeeper does not implement primitives, rather it implements a co-ordination service with an interface. Such a choice led to the implementation of a co-ordination kernel that enables new primitives without changing the service core. Exposing primitives also had an inherent issue. The processing of requests depended on responses and failure detection of other clients. On the other hand, ZooKeeper uses a wait free data objects organized hierarchically as in filesystems. The interface for ZooKeeper looks very much like that of a filesystem. It also looks Chubby without the lock methods, open and close. Moreover it can implement consensus for any number of processes with FIFO client ordering and linearized writes.
Availability and performance is improved with a collection of servers The service itself implements a pipelined architecture that enables hundreds of thousands of requests to be outstanding which still achieving low latency. Moreover it also enables clients to be asynchronous and generate more requests than they could with being synchronous. This is especially true for a client that wishes to be a leader.