Re: [hatari-devel] Feature request/idea: cycle counting

[ Thread Index | Date Index | More lists.tuxfamily.org/hatari-devel Archives ]


Hi,

On 02/01/2018 12:39 AM, Miro Kropáček wrote:
On 1 February 2018 at 07:27, Eero Tamminen <oak@xxxxxxxxxxxxxx> wrote:
You were talking about cache hits & misses, which on 030 can have
a huge impact on performance.  That's a run-time property dependent
on what you've run before (e.g. are you doing things in a loop, how
long your loop is etc).

I'm sorry, I should stop expecting other people reading my mind. :)

This is basically what I often do: I have some code (either mine or someone
else's) and wondering how much I gain if using this or that approach
(different addressing, reordering instructions, different head/tail
offloads etc). So what I would do is to setup a breakpoint (say in the
beginning of a loop), look at the "timing tagged" code, let it run for one
iteration and look again -- which instructions gained from the cache most,
which instructions still take the most cycles etc. Using the disassembly
view (so I'm not flooded with *every* instruction, only the loop I'm
interested in).

Just enable profiling before you run the code, and you have the info
in the disassembly:
--------------------------------
> profile on
Profiling enabled.
> c
Returning to emulation...
....
> profile addresses
# disassembly with profile data: <instructions percentage>% (<sum of instructions>, <sum of cycles>, <sum of i-cache misses>, <sum of d-cache hits>)
$00e007a2 : cmpi.w    #$73,d0  0.53% (10311, 0, 0, 0)
$00e007a6 : bne.s     $e007b0  0.53% (10311, 68795, 3453, 0)
$00e007a8 : jsr       $e43ed8  0.53% (10311, 240558, 20622, 0)
$00e007ae : rte                0.53% (10311, 309330, 20622, 13748)
....
> d 0x0e007a2
$00e007a2 : cmpi.w    #$73,d0  0.53% (10311, 0, 0, 0)
$00e007a6 : bne.s     $e007b0  0.53% (10311, 68795, 3453, 0)
$00e007a8 : jsr       $e43ed8  0.53% (10311, 240558, 20622, 0)
$00e007ae : rte                0.53% (10311, 309330, 20622, 13748)

Only difference between these two commands is that "d" command is
missing the explanatory heading, and it lists also instructions
that haven't been run i.e. which are lacking the cycle information.

If you want cycles for just one loop round, set breakpoints
so that it gets run only once each time you continue.


	- Eero

So you see, it's a very fine-grained work.

And while I agree it's not 100% related (or even useful) to profiling, if
it's something easy to add, I'd even maintain my own branch for it, that's
how useful I find it. Especially with all that recalculation needed for
Falcon 16-bit bus where you have to do quite a few calculations by hand
even if you have numbers from the timing table (which assumes a 32-bit bus).

Only execution of the instructions with the CPU emulation code
considers these things, disassembly considers each line only in
isolation.

I'm aware of that, see above for my detailed explanation. We are in
agreement here.

I.e. you need to run the code for cycle information.

Yes.




Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/