Re: [hatari-devel] Re: Profiler - long history

[ Thread Index | Date Index | More lists.tuxfamily.org/hatari-devel Archives ]


Hi,

On tiistai 19 toukokuu 2015, Nicolas Pomarède wrote:
> Code updated. As indicated, 68030/40/60 also have a data cache, so I
> added more variables to track this :
> 
> typedef struct {
>          int     I_Cache_miss;
>          int     I_Cache_hit;
>          int     D_Cache_miss;
>          int     D_Cache_hit;
> } cpu_instruction_t;
> 
> Eero, for now, I only updated debug/profilecpu.c to use I_Cache_miss, as
> it was before with the older version of WinUAE cpu.

Ok, thanks!


> To get correct cache stat, you will now need to check the cpu model and
> update profilecpu.c accordingly :
> 
> - If it's a 68020, you should use only I_Cache_miss and I_Cache_hit.

Because of memory needed to store all the profiling info
(>100MB for 14MB for ST-RAM), I'm going to store only misses.

(Data goes just into memory sized struct array.  It's wasteful,
but fast and with OS overcommit it should work fine as long
as one is not going to try to do profiling with 32-bit Hatari
having lots of TT-RAM configured.)


> - If it's a 68030/40/60, you should use I_Cache_miss, I_Cache_hit,
> D_Cache_miss, D_Cache_hit.

Unused values should remain zero, so it shouldn't be a problem to
output both instruction & data cache values, right?

I added data cache miss counter support to Hatari profiler, but 
I'm not getting any data cache misses *OR* hits for Falcon emulation
with TOS4.  Is TOS4 disabling data cache at boot?

Also, when looking at the code, I see D_Cache variables being
updated only in dcache030 functions, not in dcache040 ones?


> Regarding data cache, it's not fully implemented yet for 68040/60, so
> results are not be trusted I guess. But 68030 cache should give correct
> values.

Ok, I'll try to remember to update Hatari manual after profiler
and data seems to be good enough for 030.


> One thing to note about 68030 data cache is that if a long word (32
> bits) must be read, it might be stored in 2 cache's entries, depending
> if the address was aligned on 2 or 4 bytes, requiring 2 read in the
> cache.
> 
> So, a read for 32 bits could yield :
>   - 1 hit
>   - or 1 misses
>   - or 1 hit and 1 miss
>   - or 2 hits
>   - or 2 misses

I'm seeing more instruction cache misses per instruction,
upto 6 misses per instruction, just from TOS4 desktop boot,
and going over desktop menus.

WARNING: 6 CPU instruction cache misses > 5 at 0xe00c9a:
$00e1c3b8 : 4e73                               rte                                  
0.00% (7, 248, 17, 0)
$00e00c9a : 3f00                               move.w    d0,-(sp)                   
0.00% (1, 8, 0, 0)

WARNING: 6 CPU instruction cache misses > 5 at 0xe03288:
$00e1c236 : 4eb9 00e0 946a                     jsr       $e0946a                    
0.30% (294183, 9415132, 1176836, 0)
$00e03288 : 48e7 f0f0                          movem.l   d0-d3/a0-a3,-(sp)          
0.00% (401, 14708, 324, 0)

Interestingly, above happens only without MMU.  With MMU, maximum
number of i-cache misses per instruction is 4 for same use-case.

(Previous Hatari WinUAE core had only up to 3 i-cache misses per 
instruction.)


> Of course, if cache is disabled, you get 0 hit and 0 miss.
> 
> Also, in order to not slow down the main cpu loop in newcpu.c, it's the
> external profiler *that must clear the hit/miss cache counter*. This
> way, counters will be cleared only when needed, no need to clear them
> every time in newcpu.c if the profiler is not used anyway.

This is fine (and I noticed you already made this change).
Same clearing needs to be done also when profiling is
(re-)initialized.


	- Eero



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/