Cluster computing

Friday, October 11, 2013

Another interesting application in distributed computing is garbage collection. When a program runs there are objects created on the heap. These objects are cleaned up by a managed runtime. This is called a garbage collector. All the objects on the heap derive from a base object. So the objects have a common root. During a collection, the entire set of objects on the heap then translates to an object graph which is finite and directed with a fixed set of vertices representing the objects but not the edges. The edges are added and deleted by a mutator process that modifies the graph objects that are connected can be called food and those that are not can be called garbage so we know which objects are to be collected. The mutator only adds edges if it has been marked as food.
The task is to define a marker process that runs concurrently with the mutator and is responsible for marking things as food. The marker need not be exact. It can be conservative and mark things that are not food. At the same time, we do not want the marker to be weak to mark at least all the food. i.e. we want to mark as few non-food vertices as possible. we can skip a few objects in the current iteration because they may change and we will call this set manure.
The root vertex in a garbage collector when applied to managing runtime memory, keeps track of the segment header and the free store list. The edges are merely the references to memory. Each vertex keeps track of the users of its memory with a reference count. When the reference count is zero, it is garbage. We reclaim useless memory and add them to the free store list.
We use the marker algorithm to mark the things that are food and this can keep running in the background. When the mutator deletes an edge, the reference count decreases. In a way, the marker algorithm is like the termination detection problem in that it has to know that the marking process has terminated. So we add a special node that every active process has a reference to. When a process goes idle, it deletes its reference to the special node.
As earlier, we have two sets of vertex, one that is food and the other that is garbage. The garbage may have some special vertices called the manure.
The marker program begins by adding a marked field to each vertex. It also adds a boolean to say that the marker has completed. The safety of the program is that there is a final invariant that all vertices have the completed set to true. The progress is in marking each vertex boolean for completed as true.
The mutator and the marker do not interfere with each others running because they work on basis of superposition. That is they touch data that are not related to each other.
We begin with the root and simply mark vertices that can be reached from the already marked ones. This behaves just like the gossip program mentioned earlier. However, the mutator can still frustrate the marker. So we add a second logic to say that we add an edge to x,y when y is food and set both x and y to completed, we delete an edge between x and y when either of them is false
and if x is connected to root, then we set y to completed.

Thursday, October 10, 2013

In distributed computing, termination detection is another interesting topic and one which has an application for snapshot we discussed earlier. Consider a topology of processes with unidirectional channels for communication. The graph need not be connected. Here each process can be in one of two states - an idle state or active state. A process that is in the active state can either send a message, receive a message or switch to idle. An idle process can only receive messages.
Initially all processes are active and all channels are empty. The system has come to termination when the there are no active processes and there are no messages in the channels.
To detect the termination, we introduce a special process called the termination detector. The detector is connected to all processes. When a process is idle, it sends a message to the detector. So the detector knows when all the processes are idle. However is this a sufficient condition for the termination detection. No, a process could send a message to another and both may have gone idle after the send and before the receive. So the detector gets to know that both are idle but the detector has not yet seen the message in transit.
Therefore, something more needs to be sent from each process. This information is the number of messages on each of its outgoing channels and the number of messages on each of its incoming channels.
The detector then detects whether all the processes are idle and for each channel, whether the number of messages put into the channel is equal to the number of messages taken out of the channel.
The detector works well when the termination is a valid snapshot. The state seen by the detector should be a valid snapshot.
As before, the marker algorithm comes useful to take a valid snapshot. Here we send out markers to flush the messages in the channels prior to taking a snapshot. Each process receives a marker, records its local state, and records the incoming channel used by the marker as empty and then sends the markers along each outgoing channel. When a process receives a subsequent marker, it records the state of that incoming channel as completed. The process is complete when all the process have received markers along all incoming channels.
This way the snapshot is determined to be valid and the detector can safely perform termination detection.

In the previous post, we talked about snapshots. However, we did not give the rigor for the correctness of the marker approach. In the current post, I want to show just that. The snapshot is the accurate picture of the global state.
The goal here is to make sure that each computation occurred. We do this by labeling each action in the computation as either pre or post. This helps us know which actions occured before the local state was saved and which actions occurred after the state was saved. A pre action is the action that was recorded before the local state was saved. And the post action is the action that was taken after the state was saved. The actual computation occurred with a interleaved sequence of pre and post actions. We will swap adjacent pre and post actions in the computation. We will retain the order of all the pre's as they appear in the original. We will also retain the order of all the post's as they appear in the original. This way we have ordered all the pres to occur before the state was saved and all the posts to occur after the state was saved. By swapping and retaining the order, we want to have an equivalent computation as the original where each computation can be acertained to have occurred.
To complete the equivalency, let us look at the swapping of a pair of actions that are out of order. Such a pair has the form <apost, bpre>. The a and b are different processes.
There are three cases for bpre:
bpre is a local action. In this case, local actions can be swapped since the two does not affect each other.
bpre is a send action. In this case too the swap can be done.
bpre is a receive action. The only case we cannot swap with apost is when apost is the corresponding send. We show that it is impossible for apost to be a corresponding send because the markers must have been sent and since the channels are a queue, the marker must be delivered before the message sent in action a. So b must be a post and they cannot be out of order.
Thus in all cases we are able to swap. By continuing with the swapping until the desired sequence of all pres and all posts, we establish equivalency and the correctness that the snapshot is accurately taken by the markers approach.

Wednesday, October 9, 2013

In today's post, we will see an interesting use for the logical time we discussed earlier. and we will discuss snapshots.
Snapshots are a capture of a global state of a system. The global state is the combination of all the local states of processes and the channels that connect them. We treated the channels earlier as message queues where processes send messages to each other. For our discussion, processes can do one of three things, they can change their internal state, they can send a message or they can receive a message. The snapshot is this global state capture without affecting the process computations. Looking back at the event sequence diagram and the directed graph we saw earlier with processes, this snapshot is a record of the state on each of the processes, this snapshot appears as a boundary across processes in a wave like manner with regard to the real time. If all the processes shared the same time from a shared clock, this snapshot would have been a straight line at that time. This is where the concept of logical time helps us. We can take a snapshot at the same logical time across processes.
But next we will discuss whether this guarantees correctness.
Take an example of two bank accounts where one has a balance of 1000 dollars and the other has no balance and one sends half the amount to the other. Now if a snapshot is taken where the banks record their state right after send at the sender and right after the receive at the receiver, the snapshot has a net worth of 1500 which is an inconsistent state. Hence a consistent cut is required where the cut is a valid snapshot, cut is consistent and cut has no incoming edges.
How do we make a consistent cut? Use logical time because an inconsistency occurs only because the message between two process is in flight and the source and destination have not had a chance to synchronize their logical time. The logical time is updated with the max between the local time and the max of the message received.
This is made possible by recording the snapshot at the same logical time as one solution.

Another solution to the taking consistent snapshots is to send out markers through the channels to flush the messages.The initiator records its local state and sends out marker along each outgoing channel. Similarly a process receives a marker, records its local state, and records the incoming channel used by marker as empty and then sends markers along each outgoing channel. When a process receives a subsequent marker, it records the state of that incoming channel as completed. The process is complete when all the process have received markers along all incoming channels.

Time, Clocks and synchronization
In a distributed world, the concept of time is very important for the progress of operations. Time is available on a local processor because it relies on a clock or counter. In the distributed world, there is no shared clock. Yet the shared notion of time is critical. Instead of maintaining a shared central counter for the the time, let us look at how this time is used.
The processors want to know about time because they are looking to order their events. Time is used to place events such that some events "happen before" others. In a distributed system, there are three ways in which a certain event affects another.
1) A and B are on the same processor and A is earlier in computation before B
2) A and B are on different processors and A sends a message to B
3) A can affect a third event C and C can affect B
Let us look at how to represent these events. We know that events occurs on different processors on different time. So we can structure a diagram where all events that occur on a processor P are are arranged in a straight lines and there are parallel lines for different processors. The events are represented by the vertices on these lines and the "happens-before" relation is represented by directed edges between the vertices. Time proceeds to the right so the edges are directed up or down or towards the right. A horizontal axis represents the "true time"
An abstraction of a monotonic counter is used for a clock. The values read from this counter is the timestamp. But there is no central clock. Each processor maintains its own clock.
Timestamps are issued such that the timestamp is greater than all the events that happened before.
There are three cases :
Local event: the clock is merely incremented
Send event: the clock is incremented and the updated value is the timestamp for the event and the message
Receive evnet: : the clock is updated by the maximum of the current clock and the timestamp of the message.
The invariant here is that the clock shows the most recently assigned timestamp. This is the concept of the logical clock.

Tuesday, October 8, 2013

In distributed computing, there is a very interesting technique called gossip. We will describe it in this post. By the way my next several posts are going to be in distributed computing just like the one this Sunday. This is because I'm reading that book I cited earlier. Gossip refers to the mechanism of diffusing computations from one process to all the other processes. The computations are forwarded as messages and although the mechanism may not be obvious, the goal is that all the processes will have received the computations.
First let us consider the topology of processes as an undirected graph such that processes are able to communicate to their neighbor. By undirected I mean neighbors can send messages to each other in any order. We will shortly see why the ordering is not important but if we try to focus on the exact order in which the messages will be sent, there are so many that we can easily lose ourselves. Instead let us focus on the tenets of this algorithm.
The safety of the algorithm is that there is a final invariant. Each process will have computations.
The progress of the algorithm is satisfied by some forwarding of messages from processes to their neighbors such that the computations spread to everyone. When one process forwards a message, it informs the others and it wants to be informed that the others received theirs. So each process forwards the messages onto its immediate neighbor but sends an acknowledgement only to its parent. So there seems to be two different messages, the computations and the acknowledgements. For the sake of efficiency, we can eliminate the acknowledgements and say that a process received the computations the first time and everything else is an acknowledgement now that it has received it's own. When the sender gets back the computation from the receiver, the same message acts as an acknowledgement. However there is a possibility to confuse the computations and acknowledgements due to race conditions. Assume that X and Y send computations to each other. X may not know if Y was sending the computation to it before X sent or whether Y was sending an acknowledgement after it received.
Therefore convincing ourselves that the algorithm works requires some rigor. And there are great applications to this technique such as barrier synchronization.
And this is how we present the proof of correctness. A process can only be in one of three different states - idle, active and, complete. A process starts out in the idle state. Once it hears the gossip and passes it to its children, it becomes active. Once it receives the acknowledgements and sends the acknowledgement to its parent, it becomes complete.
We divide the graph into two directed graphs- the first consists of the active or completed nodes and the second consists of the nodes that are active. Both include the initiator and the edges that connect. The second starts out as a proper subset of first. We notice that the directed graphs are trees where the first tree grows and the second tree grows and shrinks. We determine that the second tree shrinks only because a node moves into the first tree as completed.

In the previous post we had talked about cshtml and aspx page. Both may require database and this can be done with the same enterprise data block as mentioned earlier.