Re: [hatari-devel] Hatari profiling question

[ Thread Index | Date Index | More lists.tuxfamily.org/hatari-devel Archives ]


if you look at the code in newcpu.c, you can see that it's surrounded by

#ifdef WINUAE_FOR_HATARI

so it's code specific to hatari, not toni's code :)

Yeah.. Not sure what does numbers mean but..

$00e21794 : adda.w    d2,a0       2.79% (1178064, 2361790, 1419, 0)
$00e21796 : adda.w    d3,a1       2.79% (1178064, 2353471, 38, 0)
$00e21798 : move.l    d1,d0       2.79% (1178064, 7068967, 1178050, 0)
$00e2179a : move.w    (a0),d0     2.79% (1178064, 10689954, 190, 577010)
$00e2179c : swap      d0          2.79% (1178064, 7069362, 1178150, 0)
$00e2179e : move.l    d0,d1       2.79% (1178064, 76, 0, 0)
$00e217a0 : rol.l     d4,d0       2.79% (1178064, 76, 0, 0)
$00e217a2 : jmp       (a2)        2.79% (1178064, 14139013, 2356248, 0)

Like I said, I'm not sure how to interpret the I-cache misses,
particularly in the last line. Is it because it's a JMP and both the
cache miss while fetching the instruction as well as the cache miss
while fetching the jump target count towards the number? Or is it
because the cache misses for the instruction *preceding* the JMP (ROL.L,
shown with 0 I-cache misses) are counted towards the JMP instruction?

Assuming the question is about JMP causing many instruction cache misses: it can be normal because pipeline refill means 2 long reads (pipeline needs 3 words) and both can miss the cache and they happen during JMP execution (even if at least in theory pipeline refill after any branch is probably internally separate "instruction")

68020 and 68030 has long word sized holding register which is either loaded from instruction cache or from memory which means single word size instruction execution might not need any cache or memory read and prefetch can read opcodes for two following instructions.

Alignment of opcode also matters (long aligned vs only word aligned), for example jumping to non-long word aligned address would "waste" one word because prefetches are always long aligned long reads. (for example jump to $122 would cause prefetch from $120 and $124)

Cache hit/miss counts should be mostly accurate. Order of accesses might not be correct because no one knows how 68020/030 decides order of prefetches vs data if both are pending (CPU is still microcoded but bus sequencer and instruction execution run separately). Cycle timing is not very accurate.

Hope this helped.



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/