Re: [hatari-devel] Issues with cache hits/misses? |
[ Thread Index |
Date Index
| More lists.tuxfamily.org/hatari-devel Archives
]
Le 24/05/2015 21:35, Eero Tamminen a écrit :
Both TOS4 and EmuTOS have:
CACR 00003111
Looking at Mikro's post here:
http://dhs.nu/bbs-coding/index.php?request=3608
It seems that in addition to both caches being enabled,
burst mode is also enabled for both.
But still there are no d-cache hits/misses reported...
Hi,
when Hatari starts in this mode, it should print sthg like :
CPU cycleunit: 256 (0.500)
run_1
CPU reset
what run_xx value do you have ? Did you try using cycle exact mode ?
I'm seeing more instruction cache misses per instruction,
upto 6 misses per instruction, just from TOS4 desktop boot,
and going over desktop menus.
Need to be sure that 'misses' here are actually misses in the cache, and
not physical words having to be fetched from the bus. There will be 2x as
many fetches as misses in general, because of the Falcon's 16bit bus.
Nicolas, which one it is?
When words need to be read for the instruction (to keep the prefetch
queue filled), they're fetched from the i-cache, but when the
instructions need to read memory, it's fetched from the d-cache.
So it should be correctly counted ; but I don't have any test program to
check this, so if there's a reproducable case on falcon that doesn't
work, I could use it.
Note that one possible way to check this could be to have a small
program that starts an MFP timer and execute some instructions to test
until the timer completes. In that case, we can count how many
instructions were executed on Falcon and under Hatari and see if this
matches (with cache on/off, burst on/off)
eg :
start timer A for the equivalent 1/2 a VBL for example
loop :
[do some instructions]
increment a counter
wait for timer A to complete, if not goto loop
We should then test some instruction that do no memory access (eg add.l
d0,d1), or just one access (eg move.w d0,(a0)), and so on...
I was able to get 6 instruction cache misses also with MMU, on
EmuTOS desktop, but it happens more rarely that without MMU.
(Misses aren't from menus, but from interrupt handling as they
can happen while desktop is completely idle. Because they
happen more often when mouse is moved, I guess it's the EmuTOS
mouse handler.)
Note that I think it's an error to print a warning in profiler.c when
misses are above 5 or 6 for example, you shouldn't print anything.
For example, a movem.l could transfer 13 long words, which could be 13
hits in the data cache (or less), but it could also be 26 hits if data
are un-aligned in the cache.
Movem should be the instruction with the maximum possible cache access,
with up to 16 long words being read, this means a possible max of 32
d-cache accesses, so printing a warning for value above 5 or 6 will
certainly print too much useless warnings.
Hmm. The MMU shouldn't affect things. The MMU has its own cache (ATC) but
not for instructions - for the MMU tables themselves. It can inhibit
caching but should not affect timing unless it has to fetch a table
entry, and should not affect hit/miss counts in the CPU caches.
According to (Amiga) WHDL documentation:
"caches on 68030..68060 are controlled by the Cache Control Register (CACR)
and the MMU! In the CACR the caches will be globally enabled or disabled.
Using the MMU single Pages (4 KiB with WHDLoad) will be marked how they can
be cached. On the 68030 a memory page can be Cacheable or NotCacheable."
No idea whether TOS uses MMU to do something like that though
(e.g. set interrupt handler code area to be cached differently).
Maybe TOS just does something differently in presence of MMU...
tos sets up a small MMU translation table, but that's mostly to
translate some 24 bits hardware regs addresses into a 32 bit compatible
address. There's no special behaviour for the interrupt handlers, with
TOS the translations are not as small as 4 kB pages.
Sounds wrong to me, but someone else might have an opinion here.
I think WinUAE MMU emulation has also other differences
than just MMU HW part of the emulation. Nicolas?
The MMU part of the 68030 should be the same for Amiga and Falcon, but
timings of the MMU are not really described in docs, so the cycles when
applying translations and going throught 1 of the 3 possible levels of
translation could be incomplete at the moment (that's what Tony told me
if I recall correctly)
This could explain some differences between Hatari and real Falcon.
Nicolas