Re: [hatari-devel] Hatari data cache tests

[ Thread Index | Date Index | More lists.tuxfamily.org/hatari-devel Archives ]


Le 21/06/2015 23:41, Eero Tamminen a écrit :
Hi,

On sunnuntai 21 kesäkuu 2015, Nicolas Pomarède wrote:
Le 20/06/2015 11:39, Douglas Little a écrit :
The tests I did were in CE/PF mode. It seemed like the i-cache was
affecting cycles but the d-cache was not (cycle sum unaffected), even
if the d-cache hit ratio would change.

However I only looked at that one sequence of code - I'll look closer
elsewhere and see if there is some additional context.

I don't use profiler that often, but maybe there's a bug in it when
accumulating stats for all the loops ?

Is it possible to profile the block of instructions you posted in
previous mail but by running it only once ? This way we can see if
cycles reported by the profiler's stats are the same as those shown when
running hatari with "--trace cpu_all".

Just do:

	profile on
	trace cpu_all
	c <instruction count>
	profile addresses       [1]

And compare the results.


Hi

I set breakpoints at start/end of the code, using CE mode, and results don't match :(

cpu video_cyc= 50408 232@ 98 : 000C4EE8 2039 0004 0000 MOVE.L $00040000,D0
cpu video_cyc= 50416 240@ 98 : 000C4EEE 4e71                     NOP
cpu video_cyc= 50418 242@ 98 : 000C4EF0 2039 0004 0000 MOVE.L $00040000,D0
cpu video_cyc= 50424 248@ 98 : 000C4EF6 4e71                     NOP
cpu video_cyc= 50426 250@ 98 : 000C4EF8 2039 0004 0000 MOVE.L $00040000,D0
cpu video_cyc= 50432 256@ 98 : 000C4EFE 4e71                     NOP
cpu video_cyc= 50434 258@ 98 : 000C4F00 2039 0004 0000 MOVE.L $00040000,D0 cpu video_cyc= 50440 264@ 98 : 000C4F06 41f9 0005 0000 LEA.L $00050000,A0 cpu video_cyc= 50444 268@ 98 : 000C4F0C 1010 MOVE.B (A0),D0
cpu video_cyc= 50450 274@ 98 : 000C4F0E 4e71                     NOP
cpu video_cyc= 50452 276@ 98 : 000C4F10 1010 MOVE.B (A0),D0
cpu video_cyc= 50456 280@ 98 : 000C4F12 4e71                     NOP
cpu video_cyc= 50458 282@ 98 : 000C4F14 1010 MOVE.B (A0),D0
cpu video_cyc= 50462 286@ 98 : 000C4F16 4e71                     NOP

1st move takes 8 cycles (240-232), then nop takes 2 cycles, etc.

But the profiler gives :

Normal RAM (0-0x200000):
- active address range:
  0x0c4ee8-0x0c4f16
- active instruction addresses:
  14 (100.00% of all)
- executed instructions:
  14 (100.00% of all)
- instruction cache misses:
  12 (100.00% of all)
- data cache hits:
  5 (100.00% of all)
- used cycles:
  112 (100.00% of all)
  = 0.00001s

> profile addresses
# disassembly with profile data: <instructions percentage>% (<sum of instructions>, <sum of cycles>, <sum of i-cache misses>, <sum of d-cache hits>) $000c4ee8 : move.l $40000,d0 7.14% (1, 16, 2, 0) $000c4eee : nop 7.14% (1, 4, 0, 0) $000c4ef0 : move.l $40000,d0 7.14% (1, 12, 2, 1) $000c4ef6 : nop 7.14% (1, 4, 0, 0) $000c4ef8 : move.l $40000,d0 7.14% (1, 12, 2, 1) $000c4efe : nop 7.14% (1, 4, 0, 0) $000c4f00 : move.l $40000,d0 7.14% (1, 12, 2, 1) $000c4f06 : lea $50000,a0 7.14% (1, 8, 1, 0) $000c4f0c : move.b (a0),d0 7.14% (1, 12, 1, 0) $000c4f0e : nop 7.14% (1, 4, 0, 0) $000c4f10 : move.b (a0),d0 7.14% (1, 8, 1, 1) $000c4f12 : nop 7.14% (1, 4, 0, 0) $000c4f14 : move.b (a0),d0 7.14% (1, 8, 1, 1) $000c4f16 : nop 7.14% (1, 4, 0, 0)

So, here 1st move take 16 cycles (instead of 8), then nop takes 4 cycles (instead of 2), etc.

Basically, those are twice the values from the disasm trace. I think there's a problem when the debugger/profiler gets the cycles count (as if the x2 factor due to 16 MHz instead of 8 MHz was not taken into account).
What variables do you use when computing the cycles for the profiler ?

Nicolas



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/