Cluster computing

Saturday, October 19, 2013

In this post, we will talk about fairness both weak and strong. Every program has a skip action and every program has multiple actions. In an informal model, we reach into a hat and pick an action. If we are very unlucky we pick the skip action over and over again. None of the others are ever chosen and the program does nothing. To prevent this kind of selection, we impose a fairness requirement on the selection of actions. There are two kinds of fairness - weak and strong.
Under weak fairness, every action is guaranteed to be selected infinitely often. Said another way between any two selections of the same action, there are a finite number of selections of other actions. There is no however guarantee how many actions may be selected before all actions have been selected at least once.
In we take an example where the program maintains a state based on a boolean variable and increments a counter modulo four times. b is assigned the result of the modulo. If n is zero, the boolean is assigned false.
A directed graph can be drawn representing the program with the vertex as states of the program and the directed edges representing the actions. So weak fairness says that each edge label is guaranteed to repeat often in an infinite path.
If the program begins execution in a state satisfying the predicate n = 1, the initial state is <false,1> and then each possible computation leaves the state unchanged. If on the other hand, the initial state is the state <true, 1>, then there is no guarantee the state reaches the fix point of <false, 1> and it cycles repeatedly between <true,1>, <true, 2>, <true, 3> and <true, 0>. The action that assigns false to b must be selected an infinite number of times, but it may be selected every time n = 3 and the action has no effect. In such a program, every action is selected infinitely often and this satisfies the weak requirement.
Under strong fairness, in addition to the weakness requirement, if an action is enabled infinitely often, it is selected infinitely often. Note that in the previous example, the selection was extraordinarily unlucky because one action was only chosen at particular times when it happens to be unenabled. In the strong fairness program, the cycle would not be allowed to repeat without selecting the action that assigns false to b. This program could then be guaranteed to reach the fixed point and terminate.
In general, if we come up with a property for a program then that property holds regardless of whether the actual execution is weak or strong. However is a property relies on strong fairness, that property may or may not hold for weak fairness. The program could choose to assume weak fairness.

Friday, October 18, 2013

In today's post we will talk about one more distributed computing before we move on to the next topic. We will talk about time synchronization of physical clocks. We mentioned that virtual clocks and logical time can capture a partial order of events in a distributed system. But in this post, we will talk about say a real clock. Such a clock would advance at an appropriate rate. eg. one tick per millisecond. Each process may have its own clock.
The issue here is that the current time displayed by the clock is not a shared state. Moreover, a watch that ticks accurately is not enough. It can not be used to tell time though it can be used to time the duration of some interval. The clocks may need to be synchronized to tell time and even then they can drift over time requiring synchronization again.
Consider a process receiving a time notification from another process. The receiving process has no way of knowing the delay from the transmission One solution could involve sending a message first and then waiting for an echo. The receiving process can then know the time elapsed.
Since the process knows the timestamp on the echo, it can split the elapsed time to know what the delay was in receiving the echo.
Sending a request and receiving an echo can be repeated with the hope to improve accuracy. If the delays are the same, there is no improvement. If the delays vary, then the repetitions narrow the possible interval. This improves accuracy of the clock.
As an example, lets say the timestamp of the first echo is 3 and that of the second echo is 3:30. Since the elapsed time for the earlier was ten minutes, and the elapsed time for the other is twelve minutes, we have two intervals for the current time at the other process. The interval intersecting these two intervals is now much smaller thus improving the accuracy in predicting the time at the other process.
When this is repeated with several servers, and each server finds the interval its working on, the intersection of all these intervals will be even narrower thus increasing the accuracy of the time.
It might happen that the intersection for the intervals could come out to be empty.
This means that one or more of the servers have a wider drift. So we take each interval as a likely vote and go with the majority.
One way to do such a tally is as follows is to see how many intervals are satisfied by a given time. We increment a counter whenever we are within an interval and decrement the counter whenever we leave the interval. We also update our minimum and maximum bound on each entry and exit. At the maximum count of overlapping intervals, we know what intersection majority have.

Thursday, October 17, 2013

In some of the previous posts we discussed distribute computing . My next book reading is going to be on search.
Meanwhile here is a quick summary of things that we discussed.
1) We talked about reasoning of programs and how to give proof of correctness for the program.
2) We talked about some sample programs such as Earliest Meeting time and Greatest Common Divisor of two integers X and Y
3) We talked about time, clocks and synchronization. In particular we discussed what a logical time is and how to maintain a logical clock. There are vector clocks that keep track of time at all other processes. We talked about synchronizing clocks.
4) We talked about diffusing computations or gossip and how we know that it progresses and terminates. We mentioned interesting applications of gossip.
5) We discussed mutual exclusion as a layer that resolves conflicts between processes.
6) We discussed solutions for mutual exclusion that relied on distributed atomic variables and non-token based solutions including Lamport's algorithm of sending acknowledgements and optimizations.
7) we discussed token based solutions including a simple token ring and token ring with requests and a token tree
8) The token tree approach let us generalize a token graph solution. The key ideas were that tokens were neither created nor destroyed which guarantees safety.
9) Tree is directed and token is always at the root. Process sends only one request and may maintain a list of pending requests. The tree is generalized to a partial order. To maintain the partial order, make all the edges incoming for node with token.
10) We discussed dining philosophers problem and the partial order solution aka hygienic solution. The key idea was to break the symmetry by forming a partial order. To lower a philosopher in the partial order, make all the edges outgoing. Tokens called forks are used for mutual exclusion. Priority is encoded in clean and dirty forks. Request tokens are used to determine whether a neighbor is hungry.
11) We discussed snapshots and how to cut through timeline for various processes. We discussed solutions to taking snapshot including logical time based solution and marker algorithm. We included a proof of correctness. The marker algorithm was meant to flush the channel of messages.
12) We discussed termination detection and the use of special detector process. We applied snapshot to help with the detection.
13) We discussed garbage collection and its relationship to termination detection. We discussed the principle of superposition and the use of a marker witht the mutator. We made first and second attempts with a propagator to spread the marks in a gossip like manner and check to see which garbage is not root i.e. it's manure.
14) We talked about tolerating faults in a distributed system both for byzantine failures and crashes. We talked about the use of authenticated messages and building a consensus tree.
15) We also talked about discrete event simulation with sequential and time driven simulation. We talked about conservative approaches using null events and look aheads. We also talked about optimistic approaches where we allow mistakes that can be undone i.e allow stragglers and do roll-backs.

Wednesday, October 16, 2013

In addition to the previous post, we can consider some optimizations as well. When a request enters the queue, it was acknowledged immediately. However, one optimization is that not all acknowledgements are needed. For example, if a request has already been sent with a later timestamp, then that request acts as an acknowledgement. The timestamp in the packet is used to guarantee a known time for this process that is greater than the request time.
Another optimization from Ricart-Agrawala involves eliminating acknowledgements for even more requests. If a request has already been sent with a time stamp with an earlier timestamp than a received request and that request is still pending, there is no need to send an acknowledgement. When that request is granted, we will send a release message and that will serve as an acknowledgement. When the process Pt receives a request,timestamp from a process Pj, it defers an acknowledgement if Pt is in critical section or if the request from Pj with a later timestamp than the one we sent, is still pending. When the process Pt leaves the critical section, it will send out an acknowledgement.
This completes the non-token based approach for mutual exclusion that we wanted to discuss.
Reverting to the program proof of correctness topic, here is one more:
Take the program to find the greatest common divisor of two given integers. The program is described with the following:
Let X and Y be two integers that are given, we select two numbers x and y such that x = y = gcd(X,Y)
Initially the program sets x to be greater than zero and equal to X, y to be greater than zero and equal to Y.
at each step the program assigns x to x-y if x > y or y to y - x if y > x. By doing this repeatedly, the program converges to the gcd.
The fixed point of the program can be interepreted to be y = 0 when x > y and x = 0 when y > x from the above.
The invariant adds the condition the gcd of (x,y) should be the same as the gcd for (X,Y) in addition to requiring that both x and y are positive integers.
We could use the metric as x+y since this value continually decreases and remains bounded below because on each step a positive number is subtracted from either x or y. We know the program can guarantee the metric will change because if x and y are different, at the next computation, their sum will be decreased. Thus the program terminates.

Non token based solutions to mutual exclusion in distributed computing include Lamport Algorithm. There was a single centralized pending request queue which allowed the processes to enter their critical section on a first come first serve basis. This queue is just a data structure like any other we could support a distributed version of a data structure i.e an integer with increment or double operations. We can use this same strategy to support a distribution of the pending request queue data structure.
Each process keeps a queue of time-stamped requests for critical section sorted in ascending order and a list of known times for all other processes.
To request entry to its critical section P broadcasts <req,t> to all other processes.
When Pj receives a request to enter critical section, a timestamped acknowledgement is returned.
To enter the critical section, P must have req at the head of the reqQ and its timestamp should be smaller than all the known times for all other processes.
On the other hand, to release the critical section, P removes the request from its reqQ and broadcasts a release message to all the other processes.
When a process receives a release message, it removes the P's request from its reqQ. Note that this may cause this process to enter its critical section since it may now have the request with the minimum timestamp.
We can prove that this works by using contradiction. Suppose two processes were both in their critical section These processes Pa and Pb would have their <reqa,ta> and <reqb, tb> respectively at the head of their sorted reqQ.
If we assume that ta < tb. But Pb is in its critical section and so its time stamp must be less than the known time of a at Pb. Hence reqa with timestamp of ta must be in the reqQ at Pb. Therefore reqb is not at the head of Pb's reqQ.
Thus we know that only one process can be in the critical section meeting our goal.

Tuesday, October 15, 2013

When defining the correctness of a program, it is helpful to do it in steps. First, consider the invariant of the program and a fixed point. If the program reaches the fixed point, it terminates. If the program terminates, then we know both the conditions are satisfied - the invariant and the termination.
The second step of the program is to establish that the program indeed terminates. This step involves finding a metric. A metric should be guaranteed to change eventually otherwise the program does not progress. Further a metric should be non-decreasing and bounded below or non-increasing and bounded above for it to indicate that the fixed point has been reached.
Skilled programmers can easily correlate this to the recursive functions. The termination condition for the recursion is specified first so that the function can terminate. The metric is the condition on which the recursion continues. The condition is enforcing that there is a convergence to the fixed point.
Take the example of the Earliest meeting time. This problem consists of finding the first time at which three people are all simultaneously available. The three people are named A, B and C. With each person, we associate a function f, g and h respectively. These functions represent the earliest time each is available. For example, f.t is the earliest time at or after time t at which A is available.
f.t = t => A being available at time t.
The earliest that all three can meet is represented by M which is the minimum time at which all the f.t = g.t = h.t
To calculate this M, we define a metric r that denotes time. We try different values of time. We initialize it to zero. We assign r to f.r or g.r or h.r
The goal is to get to a fixed point which we define as
r = f.r = h.r = g.r
This fixed point implies that r >= MThe r is guaranteed not to decrease because f.t > t.
The invariant we define as r <= M
Therefore at termination r = M
The steps above are in the order in which we stated earlier to discuss the correctness of the program. It follows the metric. We already noted the fixed point and the invariant. We guarantee that r will change if r is below M because we consider the case where r is < M. In this case, one of the persons is not available otherwise all would be available and this would be M (proof by contradiction) Therefore f.r > r so the action r = f.r increases the metric.
Since r increases and r reaches M, we know that the program terminates.

In the previous post we discussed the conservative approach to handling event queues together with the refinements or extensions. By conservative we meant that we handled only events that are safe. Safe events are those that have a timestamp greater than the current time.
In todays post, we discuss the optimistic approach that doesn't require that condition to be met. The processes simulate their queue with the hope that events with an earlier time stamp will not arrive . If an event with an earlier timestamp does arrive, the process must undo part of the simulation. This going back in time is called time warp and the events that arrive with an earlier time stamp is called a straggler. The optimistic algorithms deal with stragglers.
To address the straggler that arrives with a timestamp t, the process records the last valid state before time t. The process then rolls back to this last valid state.
However, events may have been sent to other processes after time t. These events must be undone as well. So the process can send a so called anti event to these other processes. This anti-event cancels the corresponding event.
The anti-event could arrive in two different ways. It could arrive with a time-stamp greater than the current local time. In such a case, the corresponding event has not been processed. So the cancelation action simply removes the corresponding event from the queue.
The second case is when the anti-event arrives with a timestamp less than the current time. The anti-event is itself a straggler now. To handle this straggler, more anti-events may need to be fired. In such a case, there is a cascading anti-events. How do we know that it does not continue indefinitely ? We use the concept of a global virtual time. This is the minimum time stamp in the system. There are two advantages now.
The algorithm never rolls back to before GVT.
The event scheduled at time GVT can be executed without causing a rollback.
In the approach we discussed so far, we need not roll back all actions. As an optimization, we could skip the roll back on actions that would happen regardless. If a process proceeded optimistically on an assumption that later turned out to be false, it maybe that the events were scheduled to happen regardless. So a lazy cancellation is introduced. After a process rolls back, it resumes the simulation without sending anti-events. An anti-event is sent only when the optimistic solution generated an event that is not generated by the new simulation.