Re: [hatari-devel] Hatari data cache tests |
[ Thread Index |
Date Index
| More lists.tuxfamily.org/hatari-devel Archives
]
Le 21/06/2015 23:41, Eero Tamminen a écrit :
Hi,
On sunnuntai 21 kesäkuu 2015, Nicolas Pomarède wrote:
Le 20/06/2015 11:39, Douglas Little a écrit :
The tests I did were in CE/PF mode. It seemed like the i-cache was
affecting cycles but the d-cache was not (cycle sum unaffected), even
if the d-cache hit ratio would change.
However I only looked at that one sequence of code - I'll look closer
elsewhere and see if there is some additional context.
I don't use profiler that often, but maybe there's a bug in it when
accumulating stats for all the loops ?
Is it possible to profile the block of instructions you posted in
previous mail but by running it only once ? This way we can see if
cycles reported by the profiler's stats are the same as those shown when
running hatari with "--trace cpu_all".
Just do:
profile on
trace cpu_all
c <instruction count>
profile addresses [1]
And compare the results.
Hi
I set breakpoints at start/end of the code, using CE mode, and results
don't match :(
cpu video_cyc= 50408 232@ 98 : 000C4EE8 2039 0004 0000 MOVE.L
$00040000,D0
cpu video_cyc= 50416 240@ 98 : 000C4EEE 4e71 NOP
cpu video_cyc= 50418 242@ 98 : 000C4EF0 2039 0004 0000 MOVE.L
$00040000,D0
cpu video_cyc= 50424 248@ 98 : 000C4EF6 4e71 NOP
cpu video_cyc= 50426 250@ 98 : 000C4EF8 2039 0004 0000 MOVE.L
$00040000,D0
cpu video_cyc= 50432 256@ 98 : 000C4EFE 4e71 NOP
cpu video_cyc= 50434 258@ 98 : 000C4F00 2039 0004 0000 MOVE.L
$00040000,D0
cpu video_cyc= 50440 264@ 98 : 000C4F06 41f9 0005 0000 LEA.L
$00050000,A0
cpu video_cyc= 50444 268@ 98 : 000C4F0C 1010 MOVE.B
(A0),D0
cpu video_cyc= 50450 274@ 98 : 000C4F0E 4e71 NOP
cpu video_cyc= 50452 276@ 98 : 000C4F10 1010 MOVE.B
(A0),D0
cpu video_cyc= 50456 280@ 98 : 000C4F12 4e71 NOP
cpu video_cyc= 50458 282@ 98 : 000C4F14 1010 MOVE.B
(A0),D0
cpu video_cyc= 50462 286@ 98 : 000C4F16 4e71 NOP
1st move takes 8 cycles (240-232), then nop takes 2 cycles, etc.
But the profiler gives :
Normal RAM (0-0x200000):
- active address range:
0x0c4ee8-0x0c4f16
- active instruction addresses:
14 (100.00% of all)
- executed instructions:
14 (100.00% of all)
- instruction cache misses:
12 (100.00% of all)
- data cache hits:
5 (100.00% of all)
- used cycles:
112 (100.00% of all)
= 0.00001s
> profile addresses
# disassembly with profile data: <instructions percentage>% (<sum of
instructions>, <sum of cycles>, <sum of i-cache misses>, <sum of d-cache
hits>)
$000c4ee8 : move.l $40000,d0 7.14% (1,
16, 2, 0)
$000c4eee : nop 7.14% (1,
4, 0, 0)
$000c4ef0 : move.l $40000,d0 7.14% (1,
12, 2, 1)
$000c4ef6 : nop 7.14% (1,
4, 0, 0)
$000c4ef8 : move.l $40000,d0 7.14% (1,
12, 2, 1)
$000c4efe : nop 7.14% (1,
4, 0, 0)
$000c4f00 : move.l $40000,d0 7.14% (1,
12, 2, 1)
$000c4f06 : lea $50000,a0 7.14% (1,
8, 1, 0)
$000c4f0c : move.b (a0),d0 7.14% (1,
12, 1, 0)
$000c4f0e : nop 7.14% (1,
4, 0, 0)
$000c4f10 : move.b (a0),d0 7.14% (1,
8, 1, 1)
$000c4f12 : nop 7.14% (1,
4, 0, 0)
$000c4f14 : move.b (a0),d0 7.14% (1,
8, 1, 1)
$000c4f16 : nop 7.14% (1,
4, 0, 0)
So, here 1st move take 16 cycles (instead of 8), then nop takes 4 cycles
(instead of 2), etc.
Basically, those are twice the values from the disasm trace. I think
there's a problem when the debugger/profiler gets the cycles count (as
if the x2 factor due to 16 MHz instead of 8 MHz was not taken into account).
What variables do you use when computing the cycles for the profiler ?
Nicolas