Monday, December 8, 2014

#codingexercise
Decimal GetDistinctRangeVariance(Decimal [] A)
{
if (A == null) return 0;
Return A.DistinctRangeVariance ();
}


We continue our discussion on WRL systems. We saw that the first level caches was a write through and behaves like a four entry FIFO, queuing up writes to the second level cache. The second level cache is write back. It may be either virtually or physically addressed.

We next review the choice of user programs to trace made in this paper. User programs that were real and in use, those that require large amounts of memory not because they are poorly written but because they problems they solve are inherently large are the user programs chosen for the study.
One such program was a timing verifier (tv) for VLSI circuits which is in regular use at WRL. It generates a linked data structure and traverses it. SOR is another program chosen which implements successive overrelaxation algorithm using sparse matrices.  Tree is a compiled Scheme program chosen for this study. It builds and searches for the largest element in the  tree. This implementation uses a garbage collection algorithm. Mult is another program chosen because it's a multiprogramming workload.
It was found that the TV made the most memory references to the tune of billions. The number of instructions as a percent of memory references was generally higher across all the chosen programs than in CISC programs.

Measures for cache behavior: The most common measure for cache performance is the miss ratio, the fractions of requests that are not satisfied by a cache.  Though the miss ratio is useful for comparing individual caches, it does not give a clear picture of the effect of the cache's performance on the overall or its relative importance in a multi cache system. A better measure in such case is the CPI, the average cycles per instruction. CPI accounts for variations in the number of references to the different caches and differences in the costs of misses to the caches and can be broken down into the components contributed by each part of the memory hierarchy. Since we are concerned only with memory delays and assume one instruction is issued per cycle, the total CPI over some execution interval is one plus the CPI from data plus the CPI from instructions plus the CPI from level 2 and the CPI as the contribution of the write buffer. CPI total thus gives a better indicator of the performance of the base architecture.

If for each cache,
    m-cache-type = number of misses during the interval
    c - cache type = cost in cycles of a miss
    i = number of instructions in the interval
then CPI = the ratio of the product m with c over i
We want to reduce the total CPI. 

No comments:

Post a Comment