Re: [hatari-devel] Issues with cache hits/misses?

[ Thread Index | Date Index | More lists.tuxfamily.org/hatari-devel Archives ]


Hi,

On perjantai 22 toukokuu 2015, Douglas Little wrote:
> The data cache actually misses most of the time. It's not like the
> instruction cache. The more common case is a miss even in well-optimized
> code (optimized code that isn't specifically aware of a data cache).
> 
> However it is extremely interesting to see where the data cache gets
> successful hits, simply because it is relatively difficult to arrange.
> 
> So, if possible, I would suggest that i-cache records misses (since its
> easy to predict where hits will occur a lot of the time) and d-cache
> records hits. This is probably the most useful arrangement if trying to
> limit what gets recorded.

I'm fairly sure you're going to be using this feature most / 
have best feedback on how well Hatari emulates the real machine
regarding cache usage.  I'm just wondering do others have
differing opinions on this?


> > I added data cache miss counter support to Hatari profiler, but
> > I'm not getting any data cache misses *OR* hits for Falcon emulation
> > with TOS4.  Is TOS4 disabling data cache at boot?
> 
> TOS4 should enable it at boot.
> 
> On 68030, a value of $0101 in CACR will enable both caches (IIRC - best
> doublecheck that).

Both TOS4 and EmuTOS have:
	CACR 00003111

Looking at Mikro's post here:
	http://dhs.nu/bbs-coding/index.php?request=3608

It seems that in addition to both caches being enabled,
burst mode is also enabled for both.

But still there are no d-cache hits/misses reported...


> > I'm seeing more instruction cache misses per instruction,
> > upto 6 misses per instruction, just from TOS4 desktop boot,
> > and going over desktop menus.
> 
> Need to be sure that 'misses' here are actually misses in the cache, and
> not physical words having to be fetched from the bus. There will be 2x as
> many fetches as misses in general, because of the Falcon's 16bit bus.

Nicolas, which one it is?


> But assuming it really is referring to cache misses (longs) then it seems
> like a lot for one instruction. 24bytes!
> 
> IIRC the CPU won't fetch half of an instruction - it will try to complete
> the fetch, so it can fetch beyond the immediate longword needed. But 6
> seems like a lot to me...

Also with burst mode?


> > WARNING: 6 CPU instruction cache misses > 5 at 0xe00c9a:
> > $00e1c3b8 : 4e73                               rte
> 
> RTE/RTS might be a special case, since its a flow control operation. It
> may pull in a lot more when it jumps.
> 
> > WARNING: 6 CPU instruction cache misses > 5 at 0xe03288:
> > $00e1c236 : 4eb9 00e0 946a                     jsr       $e0946a
> 
> Again, flow control instruction returning.
> 
> > 0.30% (294183, 9415132, 1176836, 0)
> > $00e03288 : 48e7 f0f0                          movem.l  
> > d0-d3/a0-a3,-(sp) 0.00% (401, 14708, 324, 0)
> > 
> > Interestingly, above happens only without MMU.  With MMU, maximum
> > number of i-cache misses per instruction is 4 for same use-case.

This MMU behavior was with TOS4.

I was able to get 6 instruction cache misses also with MMU, on
EmuTOS desktop, but it happens more rarely that without MMU.

(Misses aren't from menus, but from interrupt handling as they
can happen while desktop is completely idle.  Because they
happen more often when mouse is moved, I guess it's the EmuTOS
mouse handler.)


> Hmm. The MMU shouldn't affect things. The MMU has its own cache (ATC) but
> not for instructions - for the MMU tables themselves. It can inhibit
> caching but should not affect timing unless it has to fetch a table
> entry, and should not affect hit/miss counts in the CPU caches.

According to (Amiga) WHDL documentation:
"caches on 68030..68060 are controlled by the Cache Control Register (CACR) 
and the MMU!  In the CACR the caches will be globally enabled or disabled. 
Using the MMU single Pages (4 KiB with WHDLoad) will be marked how they can 
be cached. On the 68030 a memory page can be Cacheable or NotCacheable."

No idea whether TOS uses MMU to do something like that though
(e.g. set interrupt handler code area to be cached differently).
Maybe TOS just does something differently in presence of MMU...


> Sounds wrong to me, but someone else might have an opinion here.

I think WinUAE MMU emulation has also other differences
than just MMU HW part of the emulation.  Nicolas?


	- Eero



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/