Cluster computing

Friday, January 2, 2015

Today we will continue to discuss the WRL Research report on the Shasta shared memory protocol. We were introducing the system in our last post and in this post, we will continue to look at some of the checks that Shasta instruments. First let us look at a basic shared miss check. This code first checks if the target address is in the shared memory range and if not, skip the remainder of the check. Otherwise the code calculates the address of the state table entry corresponding to the target address and checks that the line containing the target address is in the exclusive state. While Shasta optimizes these checks, the costs of the missed checks are still high. To reduce the overhead further, some more advanced optimizations are involved. These are described next:
Invalid Flag technique: Whenever a line of processor becomes invalid, the Shasta protocol stores a particular flag value in each longword of the line. The miss check code for a load can then just compare the value loaded with the flag value. If the loaded value is not equal to the flag value, the data must be valid and the application code can continue immediately. If the loaded value is equal to the flag value, then a miss routine is called that first does the normal range check and state table lookup. The state check distinguishes an actual miss from a "false miss" and simply returns back to the application code in case of a false miss. A false miss is one when the application data actually contains the flag value. Another advantage of this technique is that the load of the state table entry is eliminated. The second optimization is batching miss checks. When the checks for multiple loads and stores are batched together, it reduces the overhead significantly. If the base is the same and the offsets are all less than or equal to the Shasta line size, then a sequence of these loads and stores collectively touch at most two consecutive lines in memory. Therefore, if inline checks verify that these are in the correct state, then all loads and stores can proceed without further checks. Its convenient to check both lines by just checking the beginning and ending address of the range. This batching technique also applies to loads and stores via multiple registers as long as the range can be determined.

Cluster computing

Friday, January 2, 2015

No comments:

Post a Comment