Cluster computing

Thursday, December 11, 2014

#codingexercise
Decimal GetEvenNumberRangeMean(Decimal [] A)
{
if (A == null) return 0;
Return A.EvenNumberRangeMean ();
}

Today we will continue to discuss WRL system for trace generation. We were discussing long traces and its usefulness. This is not limited to multi level caches. Even for single level caches, varying cache size and degree of associativity, the effects of the trace length was studied and found to be beneficial. With the short traces, there were issues such as high miss ratios, cold start, and erroneous results or predictions depending on where the short run was chosen. The miss ratios are not always lower for long traces since not all programs behave the same and it depends on where the short run was chosen. Some programs terminate before it stabilizes, Others appear to stabilize early before they take a jump after a spell. Yet others stabilize reasonably well but they require a long run. Moreover, the cache warm up itself needs millions of references.
The WRL system does not need to save traces. To analyze a program with a different cache organization, the program only needs to be rerun with a different parameterized analysis program. This works well for single user programs that are deterministic because they generate identical traces on different runs. For multiple processes, this is no longer deterministic because they get interleaved differently on different runs. This is not bad. If we repeat the runs a few times, we can find the dominant trends. However, the variations may make it difficult to understand the significance of individual peculiarities.To overcome this limitation, the cache analysis program could be modified to simulate multiple caches at the same time. However, the variations have been within tolerable limits to not require that.
The contribution due to the size of the second level cache was also studied. The CPI was measured for three sizes of second level cache: 512K, 4M and 16M. Except for the early part of a run with TV program, there is a dramatic reduction in CPI when the cache size is increased to 4M but a considerably smaller improvement resulting from the jump to 16M. When a program actually references a large part of its address space, having a bigger size second level cache tremendously helps. However, most of the programs don't have this large working set.
We can get a bit more insight with existing data by dividing level2 cache misses into three categories - compulsory, capacity and conflict. Compulsory misses happen the first time an address is referenced, and occur regardless of the size of the cache. The capacity miss ratio is the difference between the fully associative ratio and the compulsory miss ratio. The conflict miss ratio is the difference between the direct mapped miss ratio and the associative miss ratio.
The capacity misses are due to fixed cache size. This decreases when the cache size increases but only upto a certain size.Conflict misses result from too many address mappings to the same cache lines. This increases between 512K and 4M cache size but reduces tremendously from 4M to 16M where the interfering blocks are spread out more evenly. This is where the pid hashing reduces the miss ratio for the 16M.

#codingexercise
Decimal GetEvenNumberRangeMode(Decimal [] A)
{
if (A == null) return 0;
Return A.EvenNumberRangeMode ();
}
#codingexercise

Decimal GGetEvenNumberRangSum(Decimal [] A)
{
if (A == null) return 0;
Return A.EvenNumberRangeSum ();
}

Cluster computing

Thursday, December 11, 2014

No comments:

Post a Comment