[hatari-devel] Re: Suspicious profiler output

Hi Eero,

I.e. calls to _Atari800_Frame() symbol address are 0.02% of all symbols
called within the profiling run, and profiler heuristics were able to
track 36% of all the calls to be originating under _Atari800_Frame().

And this is a wrong conclusion. In fact, 99% of all calls should be tracked there.

* When a a "subroutine call" (JSR/BSR) or "exception" instruction
switches PC to an address listed in symbol table, that is tracked
at run-time as a call in the callgraph
* When "subroutine return" or "exception return" instruction returns
PC to address from which previous subroutine was called from, accrued
run-time costs are transferred to the context of the calling symbol
* only address sites which have a symbol present *when profiling is
active* are tracked like this

This makes sense.

If symbol gets called in any other way, e.g. by jumping, as can
happen with code generated by GCC for static functions, or by stack
manipulation (e.g. EmuTOS does some function calls using RTS!), that
doesn't get recorded in as a call (except EmuTOS thing which I've
special-cased in profiler).

Hmm... so if I have a function "A" which is in assembly and within that I call function "B" which is in C trough an absolute jump (jmp instruction), A's costs wont include B's cost and B will be shown as standalone in the graph?

You need to investigate the symbol calls sites in the profiler's
annotated assembly output to find out what's happening...

While _RTI symbol address is visible in the profiler assembly, I didn't
find from the assembly anything that calls that address (although it's
executed). I.e. it's called by some other means, not through something
Hatari tracks as call from the executed assembly instruction.

See above. It's called from another assembly function trough a jump table. But this still doesn't make sense to me -- if the profiler can't differentiate it as a separate function, that's fine. But why the total cost is messed up? If the caller contains any number of jumps, those jumps should be included in the total cost of the caller then, no?

And as you can see from the assembly, _CPU_rts_handler() address
is *never* executed, only addresses slightly after that, i.e. there
really is nothing that directly calls it.

Well, it's a function pointer. So it's called from a number of places via that pointer.

It can be correct function, if something had jumped in middle of that
function, or an incorrect function, if you're lacking a symbol for
the correct one.

-> It's important that your symbol table contains all symbols.

It does.

Only thing suspicious is that by normal statistics I would expect
half of them to be switches *to* interrupt handler, not returns from
one. Based on symbol names, none of them look like interrupt
handlers, and looking at the assembly, your code doesn't have
any RTEs, only TOS code uses those.

Yep, it's 100% user space application.

Fortunately, I didn't get stuck with this problem because I can change between C and asm backends. And guess what - C backend (doing basically the same, using same symbol names etc) has 100% correct graph, i.e. Atari800_Frame() takes really 99% of all CPU cycles, all call counts are shown correctly etc. So that makes me wonder how a few symbols coming from an assembly file can mess things up so badly. I've been wondering whether vasm couldn't be a culprit here, i.e. producing some a.out symbol names incorrectly.

Anyway -- if you like, I can send you some intermediate object files and builds to compare the outputs (C vs asm backend), I was able to track down the performance bug, saved me hours of investigation. Using all those scripts is a bit awkward but the result is astonishing.

MiKRO / Mystic Bytes
http://mikro.atari.org