Re: [hatari-devel] Re: Profiler - long history

[ Thread Index | Date Index | More lists.tuxfamily.org/hatari-devel Archives ]


Le 21/05/2015 23:09, Eero Tamminen a écrit :

Hi


- If it's a 68030/40/60, you should use I_Cache_miss, I_Cache_hit,
D_Cache_miss, D_Cache_hit.

Unused values should remain zero, so it shouldn't be a problem to
output both instruction & data cache values, right?

yes, it should be ok.


I added data cache miss counter support to Hatari profiler, but
I'm not getting any data cache misses *OR* hits for Falcon emulation
with TOS4.  Is TOS4 disabling data cache at boot?

I don't know, you should check the value of the CACR to see how it's configured.


Also, when looking at the code, I see D_Cache variables being
updated only in dcache030 functions, not in dcache040 ones?

68040/60 cache is not complete at the moment in WinUAE, so until then I didn't add counter for 68040/68060 (as these cpu are not used in "normal" falcon/tt machines this should not be a problem)


So, a read for 32 bits could yield :
   - 1 hit
   - or 1 misses
   - or 1 hit and 1 miss
   - or 2 hits
   - or 2 misses

I'm seeing more instruction cache misses per instruction,
upto 6 misses per instruction, just from TOS4 desktop boot,
and going over desktop menus.

WARNING: 6 CPU instruction cache misses > 5 at 0xe00c9a:
$00e1c3b8 : 4e73                               rte
0.00% (7, 248, 17, 0)
$00e00c9a : 3f00                               move.w    d0,-(sp)
0.00% (1, 8, 0, 0)

WARNING: 6 CPU instruction cache misses > 5 at 0xe03288:
$00e1c236 : 4eb9 00e0 946a                     jsr       $e0946a
0.30% (294183, 9415132, 1176836, 0)
$00e03288 : 48e7 f0f0                          movem.l   d0-d3/a0-a3,-(sp)
0.00% (401, 14708, 324, 0)

You can get 2 hits or 2 misses per *32 bit access*, but an instruction can read/write more than 1 long word, so this doesn't seem an error to me. For example, RTE will unstack SR, PC, fetch long word at new PC, ...


Interestingly, above happens only without MMU.  With MMU, maximum
number of i-cache misses per instruction is 4 for same use-case.

(Previous Hatari WinUAE core had only up to 3 i-cache misses per
instruction.)


Of course, if cache is disabled, you get 0 hit and 0 miss.

Also, in order to not slow down the main cpu loop in newcpu.c, it's the
external profiler *that must clear the hit/miss cache counter*. This
way, counters will be cleared only when needed, no need to clear them
every time in newcpu.c if the profiler is not used anyway.

This is fine (and I noticed you already made this change).
Same clearing needs to be done also when profiling is
(re-)initialized.

Yes, I put some default code to remember it should be done, but feel free to move the code somewhere else if you don't want to clear the counter tha soon.


If you don't use "hit" counters for now, I will comment this part of the code in newcpu.c, it will save a few cycles.

Nicolas



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/