Re: [hatari-devel] Suspicious instruction & data cache hit/miss accounting |
[ Thread Index |
Date Index
| More lists.tuxfamily.org/hatari-devel Archives
]
Hi,
On 02/02/2018 11:37 PM, Nicolas Pomarède wrote:
Le 02/02/2018 à 21:54, Eero Tamminen a écrit :
Here's example disassembly from EmuTOS 0.9.9.1 on Falcon emu.
Instructions which have either zero instruction cache hits & misses,
or zero data cache hits & misses, are marked with '*':
[...]
As you can see, they're the majority (as indicated by
the profiler cache hit/miss histogram).
If you want more output, I pushed commit that shows the info
after you set "DEBUG" to 1 in profilecpu.c, re-build Hatari,
start Falcon or TT emulation, and enable profiling:
https://hg.tuxfamily.org/mercurialroot/hatari/hatari/rev/822222b90afb
It's common enough that you see it immediately, regardless
of what you run and on what 030 TOS version.
regarding data cache, most instructions in these lines are writing data,
not reading them. So this seems normal that there's no hit/miss when
writing, only when reading.
Ok.
(I'll add a reminder to cache histogram info that data cache
events can happen only for instructions doing data reads.)
As for instructions cache, do you have another example where some small
piece of code would be repeated in a loop but there would be no hit/miss
for instr cache ? Such case would be indeed strange as instr are likely
to go into cache during a small loop.
I changed the cache debugging code to include both hits & misses
for both instruction & data cache in the disassembly.
Attached is profile for beginning of New Beat's Falcon demo
called "Blue". It has several short loops.
The items inside parenthesis are:
- instruction execution count for given address
- cycle count
- i-cache hits
- i-cache misses
- d-cache hits
- d-cache misses
The simplest loop (with code surrounding it) looks like this:
------------------------------------------------------------------
$0001f772: adda.l d2,a2 0.00% (91, 0, 0, 0, 0, 0)
$0001f774: movea.l $21c9a,a3 0.00% (91, 728, 182, 0, 0, 0)
$0001f77a: movea.l (a3),a3 0.00% (91, 728, 0, 0, 0, 0)
$0001f77c: move.w #$1c1f,d5 0.00% (91, 0, 91, 0, 0, 0)
$0001f780: move.l (a3)+,(a2)+ 8.05% (655200, 10483200, 0, 0, 0, 0)
$0001f782: dbra d5,$1f780 8.05% (655200, 0, 1310400, 0, 0, 0)
$0001f786: rts 0.00% (91, 819, 182, 0, 0, 0)
------------------------------------------------------------------
As can be seen from the disassembly stats for the loop,
i-cache data is there only for the branching instruction
(as I deducted from Hatari code).
"dbra" gets 2x i-cache hits for each executed instruction, and
no cycles, whereas the other loop instruction gets all cycles.
Branching at "rts" gets also 2x i-cache hits, and cycles.
Are the hits for instructions leading to the loop, due to
there being prefetch done on them and there naturally being
a hit as there's no diverging code-flow?
Then the other loop with 2+1 instructions:
------------------------------------------------------------------
$1f302 tst.b $21cac 27.93% (2271792, 17040162, 71, 71, 0, 0)
$1f308 beq $1f3e0 27.93% (2271791, 14767904, 4543233, 407, 0, 0)
$1f30c cmpi.w #1,$21c50 0.00% (189, 2268, 0, 189, 0, 0)
....
$1f3da clr.b $21cac 0.00% (189, 756, 0, 0, 0, 0)
$1f3e0 bra $1f302 27.93% (2271790, 18175267, 6815453, 95, 0, 0)
$1f3e4 move.b #1,$21cac 0.00% (189, 2271, 0, 189, 0, 0)
$1f3ec rte 0.00% (189, 5300, 466, 2, 0, 0)
------------------------------------------------------------------
Same thing here, except that the "bra" instruction that's
alone gets actually 3x hits for each executed instruction,
and none of the loop instructions is missing cycles.
(The few i-cache misses are likely due to some interrupt
handler(s) running in the background.)
So, above corresponds somewhat to what I saw in the code,
where the (Hatari specific) CpuInstruction struct gets updated.
How often instruction prefetch is supposed to happen on 030,
when non-branching code is being executed?
- Eero
Hatari CPU profile (Hatari v2.0.0, WinUAE CPU core)
Cycles/second: 16042494
Field names: Executed instructions, Used cycles, Instruction cache misses, Data cache hits
Field regexp: ^\$([0-9a-f]+) :.*% \((.*)\)$
ST_RAM: 0x000000-0x400000
ROM_TOS: 0xe00000-0xe80000
CARTRIDGE: 0xfa0000-0xfc0000
PROGRAM_TEXT: 0x01f054-0x01fe5e
# disassembly with profile data: <instructions percentage>% (<sum of instructions>, <sum of cycles>, <sum of i-cache misses>, <sum of d-cache hits>)
$0001f302 : tst.b $21cac 27.93% (2271792, 17040162, 71, 71, 0, 0)
$0001f308 : beq $1f3e0 27.93% (2271791, 14767904, 4543233, 407, 0, 0)
$0001f30c : cmpi.w #1,$21c50 0.00% (189, 2268, 0, 189, 0, 0)
$0001f314 : beq.s $1f34a 0.00% (189, 3024, 189, 378, 0, 0)
$0001f316 : movea.l $2009e,a0 0.00% (189, 3024, 0, 189, 0, 0)
$0001f31c : movea.l $200a2,a1 0.00% (189, 4536, 0, 378, 0, 0)
$0001f322 : exg a1,a0 0.00% (189, 0, 0, 0, 0, 0)
$0001f324 : move.l a0,$2009e 0.00% (189, 4408, 16, 362, 0, 0)
$0001f32a : move.l a1,$200a2 0.00% (189, 2896, 16, 173, 0, 0)
$0001f330 : move.b $200a3,$ffff8201.w 0.00% (189, 4091, 32, 346, 0, 0)
$0001f338 : move.b $200a4,$ffff8203.w 0.00% (189, 4536, 0, 378, 0, 0)
$0001f340 : move.b $200a5,$ffff820d.w 0.00% (189, 3024, 0, 189, 0, 0)
$0001f348 : bra.s $1f390 0.00% (189, 3213, 0, 378, 0, 0)
[...]
$0001f390 : movea.l $2005e,a0 0.00% (189, 4536, 0, 378, 0, 0)
$0001f396 : move.l 8(a0),d0 0.00% (189, 3024, 0, 189, 0, 0)
$0001f39a : move.l $21c9e,d1 0.00% (189, 3024, 0, 189, 0, 0)
$0001f3a0 : cmpi.l #$fffffc18,(a0) 0.00% (189, 3024, 0, 189, 0, 0)
$0001f3a6 : beq.s $1f3e4 0.00% (189, 3024, 0, 378, 0, 0)
$0001f3a8 : cmp.l d0,d1 0.00% (189, 0, 0, 0, 0, 0)
$0001f3aa : blt.s $1f3c0 0.00% (189, 780, 374, 4, 0, 0)
$0001f3ac : move.l #0,$21c9e 0.00% (2, 64, 0, 6, 0, 0)
$0001f3b6 : adda.w #$c,a0 0.00% (2, 0, 2, 0, 0, 0)
$0001f3ba : move.l a0,$2005e 0.00% (2, 16, 2, 0, 0, 0)
$0001f3c0 : move.l 4(a0),$21c9a 0.00% (189, 3024, 378, 0, 0, 0)
$0001f3c8 : cmpi.l #$fffffc18,(a0) 0.00% (189, 1512, 189, 0, 0, 0)
$0001f3ce : beq.s $1f3e4 0.00% (189, 0, 378, 0, 0, 0)
$0001f3d0 : jsr ([a0]) 0.00% (189, 4536, 189, 189, 0, 0)
$0001f3d4 : addq.l #1,$21c9e 0.00% (189, 5260, 189, 189, 0, 0)
$0001f3da : clr.b $21cac 0.00% (189, 756, 0, 0, 0, 0)
$0001f3e0 : bra $1f302 27.93% (2271790, 18175267, 6815453, 95, 0, 0)
$0001f3e4 : move.b #1,$21cac 0.00% (189, 2271, 0, 189, 0, 0)
$0001f3ec : rte 0.00% (189, 5300, 466, 2, 0, 0)
[...]
$0001f6ec : move.l $21c9a,d0 0.00% (189, 1512, 378, 0, 0, 0)
$0001f6f2 : cmp.l $21cce,d0 0.00% (189, 1512, 0, 0, 0, 0)
$0001f6f8 : beq.s $1f71c 0.00% (189, 3008, 4, 376, 0, 0)
$0001f6fa : move.l #$2022f,$2006e 0.00% (2, 48, 0, 4, 0, 0)
$0001f704 : move.w #$f0,$21cc6 0.00% (2, 40, 0, 4, 0, 0)
$0001f70c : move.w #0,$21cb2 0.00% (2, 40, 0, 4, 0, 0)
$0001f714 : move.w #0,$21cca 0.00% (2, 40, 0, 4, 0, 0)
$0001f71c : cmpi.w #2,$21cb2 0.00% (189, 2268, 0, 189, 0, 0)
$0001f724 : beq.s $1f7a4 0.00% (189, 2832, 205, 346, 0, 0)
$0001f726 : movea.l $2006e,a1 0.00% (173, 2768, 0, 173, 0, 0)
$0001f72c : move.l d0,$21cce 0.00% (173, 4152, 0, 346, 0, 0)
$0001f732 : move.w $21cc6,d2 0.00% (173, 2076, 0, 173, 0, 0)
$0001f738 : cmpi.w #1,$21cb2 0.00% (173, 1384, 173, 0, 0, 0)
$0001f740 : beq.s $1f74c 0.00% (173, 692, 482, 0, 0, 0)
$0001f742 : cmpi.b #$f6,(a1) 0.00% (136, 680, 0, 0, 0, 0)
$0001f746 : beq.s $1f788 0.00% (136, 680, 272, 0, 0, 0)
$0001f748 : move.b (a1)+,d1 0.00% (56, 336, 0, 0, 0, 0)
$0001f74a : bra.s $1f758 0.00% (56, 168, 112, 0, 0, 0)
$0001f74c : suba.w #1,a1 0.00% (37, 148, 37, 0, 0, 0)
$0001f750 : move.b (a1),d1 0.00% (37, 148, 37, 0, 0, 0)
$0001f752 : cmpi.b #$ec,(a1) 0.00% (37, 148, 0, 0, 0, 0)
$0001f756 : beq.s $1f7a4 0.00% (37, 188, 73, 1, 0, 0)
$0001f758 : move.l a1,$2006e 0.00% (91, 1253, 182, 0, 0, 0)
$0001f75e : ext.w d1 0.00% (91, 0, 0, 0, 0, 0)
$0001f760 : sub.w d1,d2 0.00% (91, 0, 91, 0, 0, 0)
$0001f762 : move.w d2,$21cc6 0.00% (91, 588, 91, 0, 0, 0)
$0001f768 : muls.w #$240,d2 0.00% (91, 2072, 91, 0, 0, 0)
$0001f76c : movea.l $2009e,a2 0.00% (91, 980, 182, 0, 0, 0)
$0001f772 : adda.l d2,a2 0.00% (91, 0, 0, 0, 0, 0)
$0001f774 : movea.l $21c9a,a3 0.00% (91, 728, 182, 0, 0, 0)
$0001f77a : movea.l (a3),a3 0.00% (91, 728, 0, 0, 0, 0)
$0001f77c : move.w #$1c1f,d5 0.00% (91, 0, 91, 0, 0, 0)
$0001f780 : move.l (a3)+,(a2)+ 8.05% (655200, 10483200, 0, 0, 0, 0)
$0001f782 : dbra d5,$1f780 8.05% (655200, 0, 1310400, 0, 0, 0)
$0001f786 : rts 0.00% (91, 819, 182, 0, 0, 0)
$0001f788 : addi.w #1,$21cca 0.00% (80, 1120, 160, 0, 0, 0)
$0001f790 : cmpi.w #$46,$21cca 0.00% (80, 400, 80, 0, 0, 0)
$0001f798 : blt.s $1f7a2 0.00% (80, 400, 161, 0, 0, 0)
$0001f79a : move.w #1,$21cb2 0.00% (1, 11, 1, 0, 0, 0)
$0001f7a2 : rts 0.00% (80, 720, 160, 0, 0, 0)
$0001f7a4 : move.w #2,$21cb2 0.00% (18, 211, 18, 0, 0, 0)
$0001f7ac : rts 0.00% (18, 162, 36, 0, 0, 0)
# <callee>: <caller1> = <calls> <types>[ <inclusive/totals>[ <exclusive/totals>]], <caller2> ..., <callee name>
# types: u = unknown PC change, n = PC moved to next instruction, b = branch/jump, s = subroutine call, r = return from subroutine, e = exception, x = return from exception,
# totals: calls/instructions/cycles/i-misses/d-hits