Re: [hatari-devel] Hatari profiling question

[ Thread Index | Date Index | More lists.tuxfamily.org/hatari-devel Archives ]

To: hatari-devel@xxxxxxxxxxxxxxxxxxx
Subject: Re: [hatari-devel] Hatari profiling question
From: Toni Wilen <twilen@xxxxxxxxxx>
Date: Thu, 11 Feb 2021 10:58:04 +0200

if you look at the code in newcpu.c, you can see that it's surrounded by

#ifdef WINUAE_FOR_HATARI

so it's code specific to hatari, not toni's code :)


Yeah.. Not sure what does numbers mean but..

$00e21794 : adda.w    d2,a0       2.79% (1178064, 2361790, 1419, 0)
$00e21796 : adda.w    d3,a1       2.79% (1178064, 2353471, 38, 0)
$00e21798 : move.l    d1,d0       2.79% (1178064, 7068967, 1178050, 0)
$00e2179a : move.w    (a0),d0     2.79% (1178064, 10689954, 190, 577010)
$00e2179c : swap      d0          2.79% (1178064, 7069362, 1178150, 0)
$00e2179e : move.l    d0,d1       2.79% (1178064, 76, 0, 0)
$00e217a0 : rol.l     d4,d0       2.79% (1178064, 76, 0, 0)
$00e217a2 : jmp       (a2)        2.79% (1178064, 14139013, 2356248, 0)

Like I said, I'm not sure how to interpret the I-cache misses,
particularly in the last line. Is it because it's a JMP and both the
cache miss while fetching the instruction as well as the cache miss
while fetching the jump target count towards the number? Or is it
because the cache misses for the instruction *preceding* the JMP (ROL.L,
shown with 0 I-cache misses) are counted towards the JMP instruction?

Assuming the question is about JMP causing many instruction cachemisses: it can be normal because pipeline refill means 2 long reads(pipeline needs 3 words) and both can miss the cache and they happenduring JMP execution (even if at least in theory pipeline refill afterany branch is probably internally separate "instruction")

68020 and 68030 has long word sized holding register which is eitherloaded from instruction cache or from memory which means single wordsize instruction execution might not need any cache or memory read andprefetch can read opcodes for two following instructions.

Alignment of opcode also matters (long aligned vs only word aligned),for example jumping to non-long word aligned address would "waste" oneword because prefetches are always long aligned long reads. (for examplejump to $122 would cause prefetch from $120 and $124)

Cache hit/miss counts should be mostly accurate. Order of accesses mightnot be correct because no one knows how 68020/030 decides order ofprefetches vs data if both are pending (CPU is still microcoded but bussequencer and instruction execution run separately). Cycle timing is notvery accurate.


Hope this helped.

References:
- [hatari-devel] Hatari profiling question (was: Accelerating blitting on TT by code re-arrangement (on Emutos-devel))
  - From: Christian Zietz
- Re: [hatari-devel] Hatari profiling question (was: Accelerating blitting on TT by code re-arrangement (on Emutos-devel))
  - From: Eero Tamminen
- Re: [hatari-devel] Hatari profiling question
  - From: Nicolas Pomarède

Messages sorted by: [ date | thread ]
Prev by Date: Re: [hatari-devel] Hatari profiling question
Next by Date: [hatari-devel] Mac problem mentioned in Atari-forum
Previous by thread: Re: [hatari-devel] Hatari profiling question
Next by thread: [hatari-devel] An issue with a demo "We Were @" by Oxygene

Mail converted by MHonArc 2.6.19+

http://listengine.tuxfamily.org/