Monday, December 8, 2014


We resume our discussion on the WRL system's cache analysis program.  It was found that simulating split instruction and data first level caches and a direct mapped second level cache, analysis plus trace generation takes about 100 times as long as untraced execution.  For example, tracing and analyzing a run of TV, a WRL timing verification program, extends the runtime from 25 minutes to 45 hours.

Saving traces for later analysis :
On the fly analysis does not require storage. When storage was required, the data was put on tapes for the sake of capacity and portability.  Data was written onto the tape using the Maching scheme. A differencing technique was used to write the data. The original trace data was first converted into a cache difference trace consisting of reference points  and differences.  Addresses are represented as a difference from the previous address, rather than as an absolute address. This helps with creating patterns in the trace reference string. The cache difference trace is then compressed common sequences of data are translated to single code words. Reference locality and regularity of trace entry formats allow trace data to be very effectively compressed.

We next review the Base Case Cache organizations. A number of assumptions were made about the machine used for analyzing.
The only assumption related to the generation of traces is that of a RISC architecture in which the instruction fetches and explicit data loads and stores are the only forms of memory reference. This makes the simple form of the traces useful even though they contain  no information about the instruction types other than the loads and the stores.
Among the assumptions made about the machine, one is that it's a pipelined architecture. If there were no cache misses, a new instruction is started on every cycle. This is possible when the page offset is simultaneously used to retrieve the real page from the TLB and to index the correct line in the cache. If there are no other delays other than by cache misses, the based cost is one cycle. Therefore the design of memory hierarchy is to keep the CPI close to 1.
Another assumption is that the machine is fast enough to require two levels of cache. The cost of going to the second level cache was considered about 12 cycles.

These assumptions are in line with the discussions on this topic with the hardware engineers who built the WRL system.

The first level data cache is considered a write through cache. It's a four entry FIFO that queues up writes to the second level.

#codingexercise
Decimal getStdDev(decimal [] a)
{
If (a == NULL) return 0;
Return a.StdDev();
}


No comments:

Post a Comment