Two of the most important of these issues are, first, the cost in terms of the hardware overhead that the use of directories implies, and, second, the increased distance to memory, which is the reason for the higher cache miss
latencies that are currently being observed in cc-NUMA architectures.
Additionally, MemMax Scheduler adds support for XOR burst sequences, which minimizes cache miss
latency for DDR3 DRAMs and is configurable on a per thread basis allowing chip architects to perform delicate QoS trade-offs between memory utilization and latency minimization.
The address traces generated by our Gauss-Seidel execution was fed into this profiler which in turn modelled the corresponding cache operation to give out cache miss
Average reductions on the cache miss
rate between 30% and 60% and peak reductions greater than 200% are obtained.
For example, a processor might fail when a particular instruction is in a specific pipeline stage at the same time that a data FIFO is full and a cache miss
occurs and a specific interrupt arrives.
The large data set with low locality of reference results in high data cache miss
rates and low (less than 30%) effective utilization of a single-threaded processor.
The SR71040B enhances system performance analysis by including dual performance counters for tracking many of the processors internal metrics such as cache miss
rates or number of stalls.