Re: [hatari-devel] Feature request/idea: cycle counting

[ Thread Index | Date Index | More lists.tuxfamily.org/hatari-devel Archives ]


Le 02/02/2018 à 00:08, Eero Tamminen a écrit :
Hi,

On 02/01/2018 03:14 AM, Miro Kropáček wrote:
Just enable profiling before you run the code, and you have the info
in the disassembly:

Wow, this is new.

Profiling data in disassembly has been there for few years
actually, as it was added in Hatari v1.5, in 2011...

Separate "profile addresses" command was added in v1.7 in 2013.

Profile cache information is much newer as cache emulation itself
is fairly newer.  That was initially added in v1.9 in 2015.


How I could forget this!?

profile addresses
# disassembly with profile data: <instructions percentage>% (<sum of
instructions>, <sum of cycles>, <sum of i-cache misses>, <sum of d-cache
hits>)
$00e007a2 : cmpi.w    #$73,d0  0.53% (10311, 0, 0, 0)
$00e007a6 : bne.s     $e007b0  0.53% (10311, 68795, 3453, 0)

$00e007a8 : jsr       $e43ed8  0.53% (10311, 240558, 20622, 0)
$00e007ae : rte                0.53% (10311, 309330, 20622, 13748)

I don't really understand this output.
 >
I see four instructions but the 'sum of instructions' is always the same?
How so?

Above was just excerpt of full command output.

Sum of instructions tells how many times the instruction in each
of these addresses was executed.  From above you can see that "bne"
"branch" wasn't taken as the following "jsr" instruction was executed
the same number of times.

When code is run multiple times, you can from the counts see how
the code has flowed, which of the branches were used more, which
functions called more etc.


Also the cycles look quite strange, so it's 0 cycles, 68795 cycles,
240558 cycles?

These are also totals for each of the instruction (addresses).

I think zero cycles means that the instruction was somehow paired /
executed together with neighboring instruction(s), and the cost
gets accounted for the last one.

(I remember that this has been discussed before, but not what
was the conclusion.  The counts come as-is from the CPU core,
they're just summed together by profiler.)

Nicolas?



Hi

there were some cases like this, but I don't remember the details. The 68030 can run head/tail of consecutive instructions in parallel under the right context, so maybe it's a case like this.

For such cases, the whole 680xx code would be needed to be run only once to see how it interact with caches.

Nicolas




Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/