We continue our discussion on WRL report on Shared memory consistency models. We looked at implementing sequential consistency in architectures without caches. We specifically reviewed write buffers with bypassing capability, overlapping write operations and non-blocking read operations. We now study architectures with caches. Caching can also introduce reordering behavior. Therefore an implementation with caches must also hide it. For example, if a processor issues a read to the cache, it cannot read until its previous operations are complete. The replication of shared data introduces three additional issues: First, the presence of multiple copies requires a cache coherence protocol to propagate newly written values to all cached copies of the modified location. Second, detecting when a write is complete requires more transactions with replication. Thirdly, propagating changes to multiple copies is inherently non-atomic. We review these in this order.
Cache coherency has often implied both that a write is eventually made visible to all the processors and writes to the same location appear to be seen in the same order by all the processors. But these are not sufficient to enforce sequential consistency because that requires writes to all locations not just the same location to be made visible and also operations of a single processor must be in program order.To do this we propagate the value by either invalidating the copy or updating the copy to the newly written value. We translate the sequential consistency requirements to be an early and a late bound on when a new value can be propagated to any given processor. The early bounds should be such that the sequential order is maintained between early and late bounds of successive operations. The late bounds should be such that the program order is maintained.
To detect the completion of write operations, we were using acknowledgements in the design without caches.There the acknowledgement could be generated as soon as the value reached the memory module. Here we have to wait until the cache is invalidated or updated. For example, with an overlapped write in a general interconnect system, consider a processor P2 to have Data in its cache. P1 proceeds with the write to Head after the previous write to Data reaches the target memory but before it's propagated to P2. Now P2 can read the value of Head but return an old value of Data. This can be avoided by having P2 cache updated before write to Head.
Therefore invalidation and update messages need to be exchanged in addition to acknowledgement messages.
To maintain the illusion of atomicity for writes, we first look at the problems due to non-atomicity. Let us take the update based protocol, Let us say there are four processors, two of whom write values to variables and two others read those variables. Initially all variables are set to zero. Then its possible to violate the sequential consistency if the writes reach the two processors arbitrarily consequently making the write of A appear non-atomic. This can be avoided by either requiring them to be written to the same location to be serialized or having the network preserve the ordering of messages.
#codingexercise
Decimal GetAlternateNumberRangeMax(Decimal [] A)
{
if (A == null) return 0;
Return A.AlternateNumberRangeMax();
}
We continue our discussion with considerations for Compilers. A key requirement for the compiler is to preserve the program order in its optimizations and shared memory operations. This requirement directly restricts any uniprocessor compiler optimizations which can result in reordering memory operations. In addition to a reordering effect, optimizations such as register allocation also leads to the elimination of certain shared memory operations because they can potentially violate sequential consistency. Take the example we described above and the register allocation of Head of P2 which may never get updated. On the other hand, sequentially consistent code clearly avoids this. To summarize some compiler optimizations can violate sequential consistency. Fortunately, compilers for parallel programs avoid such unsafe optimization.
Cache coherency has often implied both that a write is eventually made visible to all the processors and writes to the same location appear to be seen in the same order by all the processors. But these are not sufficient to enforce sequential consistency because that requires writes to all locations not just the same location to be made visible and also operations of a single processor must be in program order.To do this we propagate the value by either invalidating the copy or updating the copy to the newly written value. We translate the sequential consistency requirements to be an early and a late bound on when a new value can be propagated to any given processor. The early bounds should be such that the sequential order is maintained between early and late bounds of successive operations. The late bounds should be such that the program order is maintained.
To detect the completion of write operations, we were using acknowledgements in the design without caches.There the acknowledgement could be generated as soon as the value reached the memory module. Here we have to wait until the cache is invalidated or updated. For example, with an overlapped write in a general interconnect system, consider a processor P2 to have Data in its cache. P1 proceeds with the write to Head after the previous write to Data reaches the target memory but before it's propagated to P2. Now P2 can read the value of Head but return an old value of Data. This can be avoided by having P2 cache updated before write to Head.
Therefore invalidation and update messages need to be exchanged in addition to acknowledgement messages.
To maintain the illusion of atomicity for writes, we first look at the problems due to non-atomicity. Let us take the update based protocol, Let us say there are four processors, two of whom write values to variables and two others read those variables. Initially all variables are set to zero. Then its possible to violate the sequential consistency if the writes reach the two processors arbitrarily consequently making the write of A appear non-atomic. This can be avoided by either requiring them to be written to the same location to be serialized or having the network preserve the ordering of messages.
#codingexercise
Decimal GetAlternateNumberRangeMax(Decimal [] A)
{
if (A == null) return 0;
Return A.AlternateNumberRangeMax();
}
We continue our discussion with considerations for Compilers. A key requirement for the compiler is to preserve the program order in its optimizations and shared memory operations. This requirement directly restricts any uniprocessor compiler optimizations which can result in reordering memory operations. In addition to a reordering effect, optimizations such as register allocation also leads to the elimination of certain shared memory operations because they can potentially violate sequential consistency. Take the example we described above and the register allocation of Head of P2 which may never get updated. On the other hand, sequentially consistent code clearly avoids this. To summarize some compiler optimizations can violate sequential consistency. Fortunately, compilers for parallel programs avoid such unsafe optimization.
No comments:
Post a Comment