Cluster computing

Saturday, January 3, 2015

Today we continue the discussion on the performance of the Shasta Distributed Shared Memory Protocol. We study the performance optimizations in Shasta that reduce the effect of long latencies and large message overheads that are typical in software DSM systems. The optimizations include minimizing extraneous protocol messages, supporting prefetch and home placement directives, supporting coherence and communication at multiple granularities, exploiting a relaxed memory model, batching together requests for multiple misses, and optimizing access to migratory data. We review this one by one. Number of messages plays a significant role in the overhead associated with DSM protocol. To reduce the number of messages, we trim the extraneous ones. First, Shasta protocol has a key property that the current owner node specified by the directory guarantees to service the request that is forwarded to it. Hence there are no retries and the messages such as negative acks for the retry requests. The transient state is maintained by allocating queue space at the target processor to delay servicing an incoming request. This avoids sharing write-backs or owner-ship changes or replacements as well and the current owner has a valid copy of the data. There are other ways to trim the extraneous messages as well. Shasta supports dirty sharing which eliminates the need for sending an up to date copy of the line back to home when the home node is remote and the data is dirty in a third node. Third, supporting exclusive or upgrade requests is also an important optimization since it reduces the need for fetching data on a store if the requesting processor has the line in a shared state. Lastly, the number of invalidation acknowledgements that are expected for an exclusive request is piggybacked on one of the invalidation acknowledgements to the requestor being sent a separate message.
Shasta proposes supporting multiple granularities for communication and coherence, even within a single application. This is enabled for multiple applications via the configurability of the granularity in the inline state check. In addition, data that is close together can be communicated with coarser granularity within the same application while that which leads to false sharing can be communicated with finer granularity. The shared data in the system undergoes state transitions only between three states - invalid, shared and exclusive. By requiring that each load and store is a shared miss check on the data being referenced, Shasta maintains cache coherency. Shared data is split based on address space ranges into block and line size. The operations supported are read, read-exclusive and exclusive/upgrade and a state table is maintained for each line. Each processor maintains a directory and an owner for each line. With these data structures and the tight transitions, coherency is guaranteed via communication.
#codingexercise
Double GetAlternateNumberRangeVariance()(Double [] A)
{
if (A == null) return 0;
Return A.AlternateNumberRangeVariance();
}

#codingexercise

Double GetAlternateNumberRangemid()(Double [] A)

{

if (A == null) return 0;

Return A.AlternateNumberRangemid();

}

Cluster computing

Saturday, January 3, 2015

No comments:

Post a Comment