Cluster computing

Saturday, October 12, 2013

Asynchronous consensus with process failures.
When there is one fault, it is impossible to solve the consensus problem in a distributed system that involves asynchronous model. Unfortunately, asynchronous design is common. Faults also happen such as failures of disk. And it only takes one fault to disrupt consensus.
This has been proved based on the observation that a protocol begins in a decision state value is either 0 or 1. And it continues in which either result is still possible.
In transaction systems, for commit/abort we used the two-phase locking semantics. In group membership, leader election and reliable multicast too , this problem is common Hence, some suggestions to modify the assumptions of the problem include:
1) Synchrony - If a system can provide an upper bound on the delay or recovery, then this problem can be solved.
2) Randomness - Some probabilistic algorithms can help
3) Failure Detectors - a delayed process is difficult to differentiate from a crashed process without a failure detector.
4) Approximation - Even if all the process cannot agree on the same value, some arbitrary value can suffice.
Let us look at synchronous agreement with process failures.
Every process broadcasts to all other processes its initial value and this can be done in a single round of messages. All of them can then agree on say the minimum.
If there were to be a crash before the first round completes, then they may not have all the values. So a second round of messages could be initiated with all the values from the first round so that everyone has the same set.
Notice that if there are more failures, then there must be more rounds before the agreement. In such a case, there must be an upper bound on the number of rounds otherwise this may not converge.
Also there could be authentication and encryption involved in the messages so that it is not tampered and that it can be shared effectively. We don't focus on those at this time because we are addressing a different issue.
When we discussed the solution for synchronous systems, we may see how the assumption modifications come useful for asynchronous networks.

We will continue our discussion on garbage collection. The process has some steps that are not obvious. We will go over them in this post.
First, we look at the propagator step we added. The propagator tests a condition for each edge. This condition marks an edge between x and y as ok if x is a root and with an edge between x and y, they are connected with y as root. That is the propagator takes each garbage and connects them to food. It continues execution until a termination condition has been reached where the root is connected with all the vertices visited such that each edge x,y is ok.
Between two edges there is only one condition when the edge is not ok. This is when x is not a root. If x is not a root, it is manure. A manure is now easy to collect.
The fact that x is manure and is not a root is maintained as an invariant with the progress because we add edges. Notice that in the ok condition, we checked and maintained that y is root. This is a refinement from the earlier definition where we considered y is food. This let us come up with two separate connected graph where one is manure and the other is food. Everything else is unconnected and garbage.
To be able to prove that this is the case, we considered our actions and maintain the invariant that if x is manure it is not a root. Said differently, each vertex is neither manure or not a root for the marker to propagate. We show that this invariant holds throughout the progress.
Initially there is a root for food and manure is part of garbage. This is easy to establish because the food is mutually exclusive. By definition, food is any vertex that is reachable to the root. So our initial condition holds.
Next in our actions, we look at the two that change a vertex from being or not being a root.
The first is when we add the edge between x and y and y is food. In this case since y is food, it is not manure. Since the initial invariant holds for all vertex that is not y, adding an edge from that vertex to y, maintains the invariant.
The other action we do is when x is a root and now there exists an edge between x and y. In this case we proceed from the left. If x is not manure and we connect y, then y is not manure. if y is not manure, then y is root. This re-establishes our invariant.
Thus by maintaining our invariant and our progress, we are able to propagate. we detect the termination condition when the root is chosen and all the edges reaching to it are ok.

Friday, October 11, 2013

In the previous post we described how the garbage collection works however we did not explain the second attempt or provide proof of correctness. We will look into that now. Remember that for our jargon a vertex is food if it can be reached from the root. If it is not a food, it is garbage. Now onto garbage collection.
The mutator and the marker program superimpose their actions. The mutator actions are not affected and it can continue to add and remove the edges. However we cannot say the same for the marker because the mutator actions can frustrate the marker. For example, marking a single vertex as food could depend on the edges added or removed. To fix this difficulty, we may have to take another step.
We modify the mutator program from the original where it maintained that a vertex that is food is connected to the root and a vertex that is not food is garbage. In the original program, it assigned an edge between x and y if y was food and set the edge to true. If this was not the case, it would delete the edge.
Now we add an extra step that we set y as root when y is food. Also, if x is root and adding an edge to x,y implies only y is root. This is called the propagator step How do we know that this program is correct ? Because we now work off of a root. Two vertices will not satisfy the propagator step only when x is manure. We know the marker has terminated when we mark the root and for all edges we have completed the propagator.
Note that we mentioned that manure vertices need not be marked. Since they were marked earlier and we will collect them in the next iteration. As an aside, some of the properties that remain true are as follows: a garbage remains a garbage, a food remains a food, a manure is a garbage. Each of the category can only be reached from themselves.
Now we discuss the proof of correctness. Here we have to prove the invariant that if x is manure then x cannot be root. For all vertex, each is neither a manure nor a root.
There are two actions that can modify the root property.
First is the action step we discussed. That is, if x is added to y and y is food, then edge exists between x and y and y is root. Since y is food, y cannot be manure. For all x that is not y, if x is not manure or x is not root and y is not manure, we maintain the invariant.
For the other action where x is root and the edge between x,y is added with y becoming root, the invariant is maintained because x is not a manure. That and y not being a manure implies y can remain a root. Thus maintaining the invariant.

Another interesting application in distributed computing is garbage collection. When a program runs there are objects created on the heap. These objects are cleaned up by a managed runtime. This is called a garbage collector. All the objects on the heap derive from a base object. So the objects have a common root. During a collection, the entire set of objects on the heap then translates to an object graph which is finite and directed with a fixed set of vertices representing the objects but not the edges. The edges are added and deleted by a mutator process that modifies the graph objects that are connected can be called food and those that are not can be called garbage so we know which objects are to be collected. The mutator only adds edges if it has been marked as food.
The task is to define a marker process that runs concurrently with the mutator and is responsible for marking things as food. The marker need not be exact. It can be conservative and mark things that are not food. At the same time, we do not want the marker to be weak to mark at least all the food. i.e. we want to mark as few non-food vertices as possible. we can skip a few objects in the current iteration because they may change and we will call this set manure.
The root vertex in a garbage collector when applied to managing runtime memory, keeps track of the segment header and the free store list. The edges are merely the references to memory. Each vertex keeps track of the users of its memory with a reference count. When the reference count is zero, it is garbage. We reclaim useless memory and add them to the free store list.
We use the marker algorithm to mark the things that are food and this can keep running in the background. When the mutator deletes an edge, the reference count decreases. In a way, the marker algorithm is like the termination detection problem in that it has to know that the marking process has terminated. So we add a special node that every active process has a reference to. When a process goes idle, it deletes its reference to the special node.
As earlier, we have two sets of vertex, one that is food and the other that is garbage. The garbage may have some special vertices called the manure.
The marker program begins by adding a marked field to each vertex. It also adds a boolean to say that the marker has completed. The safety of the program is that there is a final invariant that all vertices have the completed set to true. The progress is in marking each vertex boolean for completed as true.
The mutator and the marker do not interfere with each others running because they work on basis of superposition. That is they touch data that are not related to each other.
We begin with the root and simply mark vertices that can be reached from the already marked ones. This behaves just like the gossip program mentioned earlier. However, the mutator can still frustrate the marker. So we add a second logic to say that we add an edge to x,y when y is food and set both x and y to completed, we delete an edge between x and y when either of them is false
and if x is connected to root, then we set y to completed.

Thursday, October 10, 2013

In distributed computing, termination detection is another interesting topic and one which has an application for snapshot we discussed earlier. Consider a topology of processes with unidirectional channels for communication. The graph need not be connected. Here each process can be in one of two states - an idle state or active state. A process that is in the active state can either send a message, receive a message or switch to idle. An idle process can only receive messages.
Initially all processes are active and all channels are empty. The system has come to termination when the there are no active processes and there are no messages in the channels.
To detect the termination, we introduce a special process called the termination detector. The detector is connected to all processes. When a process is idle, it sends a message to the detector. So the detector knows when all the processes are idle. However is this a sufficient condition for the termination detection. No, a process could send a message to another and both may have gone idle after the send and before the receive. So the detector gets to know that both are idle but the detector has not yet seen the message in transit.
Therefore, something more needs to be sent from each process. This information is the number of messages on each of its outgoing channels and the number of messages on each of its incoming channels.
The detector then detects whether all the processes are idle and for each channel, whether the number of messages put into the channel is equal to the number of messages taken out of the channel.
The detector works well when the termination is a valid snapshot. The state seen by the detector should be a valid snapshot.
As before, the marker algorithm comes useful to take a valid snapshot. Here we send out markers to flush the messages in the channels prior to taking a snapshot. Each process receives a marker, records its local state, and records the incoming channel used by the marker as empty and then sends the markers along each outgoing channel. When a process receives a subsequent marker, it records the state of that incoming channel as completed. The process is complete when all the process have received markers along all incoming channels.
This way the snapshot is determined to be valid and the detector can safely perform termination detection.

In the previous post, we talked about snapshots. However, we did not give the rigor for the correctness of the marker approach. In the current post, I want to show just that. The snapshot is the accurate picture of the global state.
The goal here is to make sure that each computation occurred. We do this by labeling each action in the computation as either pre or post. This helps us know which actions occured before the local state was saved and which actions occurred after the state was saved. A pre action is the action that was recorded before the local state was saved. And the post action is the action that was taken after the state was saved. The actual computation occurred with a interleaved sequence of pre and post actions. We will swap adjacent pre and post actions in the computation. We will retain the order of all the pre's as they appear in the original. We will also retain the order of all the post's as they appear in the original. This way we have ordered all the pres to occur before the state was saved and all the posts to occur after the state was saved. By swapping and retaining the order, we want to have an equivalent computation as the original where each computation can be acertained to have occurred.
To complete the equivalency, let us look at the swapping of a pair of actions that are out of order. Such a pair has the form <apost, bpre>. The a and b are different processes.
There are three cases for bpre:
bpre is a local action. In this case, local actions can be swapped since the two does not affect each other.
bpre is a send action. In this case too the swap can be done.
bpre is a receive action. The only case we cannot swap with apost is when apost is the corresponding send. We show that it is impossible for apost to be a corresponding send because the markers must have been sent and since the channels are a queue, the marker must be delivered before the message sent in action a. So b must be a post and they cannot be out of order.
Thus in all cases we are able to swap. By continuing with the swapping until the desired sequence of all pres and all posts, we establish equivalency and the correctness that the snapshot is accurately taken by the markers approach.

Wednesday, October 9, 2013

In today's post, we will see an interesting use for the logical time we discussed earlier. and we will discuss snapshots.
Snapshots are a capture of a global state of a system. The global state is the combination of all the local states of processes and the channels that connect them. We treated the channels earlier as message queues where processes send messages to each other. For our discussion, processes can do one of three things, they can change their internal state, they can send a message or they can receive a message. The snapshot is this global state capture without affecting the process computations. Looking back at the event sequence diagram and the directed graph we saw earlier with processes, this snapshot is a record of the state on each of the processes, this snapshot appears as a boundary across processes in a wave like manner with regard to the real time. If all the processes shared the same time from a shared clock, this snapshot would have been a straight line at that time. This is where the concept of logical time helps us. We can take a snapshot at the same logical time across processes.
But next we will discuss whether this guarantees correctness.
Take an example of two bank accounts where one has a balance of 1000 dollars and the other has no balance and one sends half the amount to the other. Now if a snapshot is taken where the banks record their state right after send at the sender and right after the receive at the receiver, the snapshot has a net worth of 1500 which is an inconsistent state. Hence a consistent cut is required where the cut is a valid snapshot, cut is consistent and cut has no incoming edges.
How do we make a consistent cut? Use logical time because an inconsistency occurs only because the message between two process is in flight and the source and destination have not had a chance to synchronize their logical time. The logical time is updated with the max between the local time and the max of the message received.
This is made possible by recording the snapshot at the same logical time as one solution.

Another solution to the taking consistent snapshots is to send out markers through the channels to flush the messages.The initiator records its local state and sends out marker along each outgoing channel. Similarly a process receives a marker, records its local state, and records the incoming channel used by marker as empty and then sends markers along each outgoing channel. When a process receives a subsequent marker, it records the state of that incoming channel as completed. The process is complete when all the process have received markers along all incoming channels.