Cluster computing

We quickly summarize the discussion on sequential consistency from the WRL report on Shared Memory consistency models. The requirements are : First, a processor must ensure that its previous memory operation is complete before proceeding with its next memory operation in program order. This is the "program order" requirement. To determine that an operation is complete, acknowledgement messages are exchanged. Further, in a cache based system, a write must generate invalidate or update messages. The second requirement pertains only to cache based systems and concerns write atomicity. It requires that the writes to the same memory location be serialized and made visible to all the processors. The value of the write is not returned until all messages are exchanged. This is the "write atomicity" requirement. Compiler optimizations such as register allocations violate this sequential consistency and are unsafe. Hardware and Compiler optimizations that do not violate sequential consistency include: First technique involves prefetching the ownership for any write operations that are delayed and thus partially overlapping with the operations preceding them in the program order. This technique is only applicable to cache based systems that use an invalidation based protocol. The second technique speculatively services read operations that are delayed due to the program order requirement. The sequential consistency is ensured in this case by rolling back and reissuing the read and subsequent operations in the case that the read line gets invalidated. The second technique is generally applicable to dynamically scheduled processors because they already have an implementation for rollbacks. Other latency hiding techniques such as non-binding software prefetching or hardware support for multiple contexts also improve performance but the above techniques are widely applicable. Next we review more relaxed memory models.
#codingexercise
Decimal GetAlternateNumberRangeMin(Decimal [] A)
{
if (A == null) return 0;
Return A.AlternateNumberRangeMin();
}
Relaxed memory consistency models are based on two key characteristics: how they relax the program order requirement and how they relax the write atomicity requirement. With the former, we can further distinguish models based on whether they relax the order from a write to a following read between two writes and from a read to a following read In all cases, the relaxation only applies to operation pairs with different addresses. The architecture here resembles the one without caches that we discussed earlier.
With respect to the write atomicity requirement, we distinguish models based on whether they allow a read to return the value of another processor's write before all copies of the accessed location receive an invalidation or update messages generated by the write. This relaxation only applies to cache based systems.
Finally we consider a relaxation related to both program order and write atomicity. when a processor is allowed to read the value of its own previous write before the write is made visible to other processors.In a cache based system, the relaxation allows the read to return value of the write before the write is serialized with respect to other writes to the same location and before the invalidations / updates of write reach any other processors.

Relaxed models typically provide programmers with mechanisms for overriding such relaxations. For example, fence instructions may be provided to override program order relaxations. The ability to override relaxations is often referred to as safety nets.

The different relaxation models can now be categorized as :
Relax write to read program order
Relax write to write program order
Relax read to read and read to write program order
Read others write early
Read own write early

The corresponding relaxations are usually straightforward implementations of the corresponding model. We will review the above relaxation models in detail. We implicitly assume the following constraints are satisfied. First all models that require a write to be eventually made visible to all processors and for writes to the same location to be serialized. This is true if shared data is not cached or enforced by cache coherence protocol when cached. Second, all models enforce uniprocessor data and control dependencies. Finally, the models that relax program order from reads to the following writes must also maintain a subtle form of microprocessor data and control dependencies.

We now look at the first relaxation model : relaxing the write to read program order.
Here the optimization enabled is that a read is allowed to be reordered with respect to previous writes from the same processor. As a consequence of this reordering, the shared bus write buffer model where two processors check and set their flags have their sequential consistency violated. The safety net feature here is to provide specialization instructions that may be placed between two operations. The serialization functions are either special memory instructions that are used for synchronization (e.g compare and Swap) or they are non-memory instructions such as branch. Placing the serialization instruction after the write on each processor provides sequentially consistent results for the program even in this case with the optimization. However, this is not the only technique used by these models. A read-modify-write operation can also provide the illusion that program order is maintained between a write and a following read. Usually making the read part of or replaced by the read-modify-write instruction helps with sequential consistency. In certain models, doing the same for a write operation is not sufficient because they require that no other writes to any location appear to occur between the read and the write of the read-modify-write.
Next to enforce the write atomicity in models which relax this atomicity, making the read or replacing it by the read-modify-write suffices to enforce this atomicity as long as it is not indiscriminately used.There is however some cost in performing the write when replacing a read with a read-modify write unless the read is already part of it. Moreover the compilers require the full flexibility of both reordering a write followed by a read as well as a read followed by a write.

We now look at relaxing the write to read and write to write program orders.
These set of models further relax the program order requirement by eliminating ordering constraints between writes to different locations.The writes to different locations from the same processor can be pipelined or overlapped and are allowed to reach memory or other cached copies out of program order. With respect to atomicity requirements, this model allows the processor to read the value of its own write early and prohibits a processor from reading the value of another processor's write before the write is visible to all the processors.
The safety net provided by this model is the same as earlier where serializing instructions are placed between writes. In this case, an STBAR instruction is placed for imposing program order between two writes. With FIFO write buffers, this instruction is inserted into the write buffer and the retiring of writes that were buffered after a STBAR are delayed until writes that were buffered before the STBAR have retired and completed.

Cluster computing

Monday, December 22, 2014

No comments:

Post a Comment