Re: [hatari-devel] Suspicious instruction & data cache hit/miss accounting |
[ Thread Index |
Date Index
| More lists.tuxfamily.org/hatari-devel Archives
]
Hi,
On 02/06/2018 03:29 PM, Nicolas Pomarède wrote:
[...]> I wanted to add details to my latest mail, but as you guessed it,
the
differences you see are indeed mostly due to prefetch / pipeline inside
the 68020/30.
For the details, see "11.2.2 instruction pipe" in the 68030 user manual
doc.
Basically, the cpu has an internal 32 bit reg named "cache holding
register" CAHR. This reg is used to fill the internal stages A, B, C and
D of the cpu.
One of the difference with the 68000, is that this reg is 32 bits, while
on 68000 it's 16 bit.
So, on 68000, you have at least a mem access during every instruction to
keep this 16 bit prefetch reg filled.
On the 68030, you need to refill when the 2 words of the cache hold reg
were pushed to stage A.
So, if we take the example of a flow of instructions where each
instruction would be 2 bytes (eg "adda.l d2,a2", "movea.l (a3),a3"), you
can see that if the CAHR was filled just before, then you can get 1 word
without doing an external mem access, and without even doing an i-cache
access.
Imagine a flow of 100 NOP (1 word each), then you will get 1 access to
the i-cache every 2 instructions (it could be a hit or a miss). Every
other 2 instruction, you get a "free" access to the opcode.
On the contrary, when you have an instruction involving a branch, CAHR
must be refilled at the new PC, and you will need to access
cache/external mem to do so (so, hit or miss counter will increase)
Note that in the end, it doesn't necessary means that the code will be
faster (it depends on the RAM speed), this just explains the flow of
memory access.
If your RAM is not capable of 32 bit access (so called fast ram),
refilling CAHR will take 2 word accesses, instead of 1 long word access.
In the case of i-cache counter in the profiler, maybe you can add a 3rd
cases to hit or miss like "prefetch", when hit/miss counter were both 0
for current instruction.
I added that statistic, but now I started to wonder about terminology,
what would be best understood by the users of the profiler.
Is "prefetch" correct name for moving data from i-cache to CAHR
register, or do people normally interpret "prefetch" to mean reading
instructions from system RAM to i-cache?
I.e. maybe the zero i-cache hit/miss case should be named as:
"Cache holding register (CAHR) already refilled from i-cache"
instead of my current "Already prefetched" name?
- Eero