Re: [hatari-devel] Re: Suspicious profiler output

[ Thread Index | Date Index | More lists.tuxfamily.org/hatari-devel Archives ]


Hi Miro,

Regarding the profiled Atari800 program...
	https://atari800.github.io/

If I start Hatari with:
hatari --monitor vga --trace os_base -s 14 --machine falcon --dsp none --tos etos512k.img --fpu 68881 ./

Or with "--machine tt" (FPU included), and then:
- set resolution to 320x240@256 from GEM desktop
- start atari800.gtp with some ROM file

It asks to select cartridge module after which it either shown
blank screen, or a screen that gets constant '>' character output.

How I should use the Atari800 program?


	- Eero

PS. the Atari800 program seems a bit buggy, if I don't give it a ROM
file as argument, it complains that it runs only on real Atari OS.

On 09/20/2017 04:55 AM, Miro Kropáček wrote:
Hi Eero,

I.e. calls to _Atari800_Frame() symbol address are 0.02% of all symbols
called within the profiling run, and profiler heuristics were able to
track 36% of all the calls to be originating under _Atari800_Frame().

And this is a wrong conclusion. In fact, 99% of all calls should be tracked
there.

* When a a "subroutine call" (JSR/BSR) or "exception" instruction
   switches PC to an address listed in symbol table, that is tracked
   at run-time as a call in the callgraph
* When "subroutine return" or "exception return" instruction returns
   PC to address from which previous subroutine was called from, accrued
   run-time costs are transferred to the context of the calling symbol
* only address sites which have a symbol present *when profiling is
   active* are tracked like this

This makes sense.

If symbol gets called in any other way, e.g. by jumping, as can
happen with code generated by GCC for static functions, or by stack
manipulation (e.g. EmuTOS does some function calls using RTS!), that
doesn't get recorded in as a call (except EmuTOS thing which I've
special-cased in profiler).

Hmm... so if I have a function "A" which is in assembly and within that I
call function "B" which is in C trough an absolute jump (jmp instruction),
A's costs wont include B's cost and B will be shown as standalone in the
graph?


You need to investigate the symbol calls sites in the profiler's
annotated assembly output to find out what's happening...

While _RTI symbol address is visible in the profiler assembly, I didn't
find from the assembly anything that calls that address (although it's
executed).  I.e. it's called by some other means, not through something
Hatari tracks as call from the executed assembly instruction.

See above. It's called from another assembly function trough a jump table.
But this still doesn't make sense to me -- if the profiler can't
differentiate it as a separate function, that's fine. But why the total
cost is messed up? If the caller contains any number of jumps, those jumps
should be included in the total cost of the caller then, no?

And as you can see from the assembly, _CPU_rts_handler() address
is *never* executed, only addresses slightly after that, i.e. there
really is nothing that directly calls it.

Well, it's a function pointer. So it's called from a number of places via
that pointer.

It can be correct function, if something had jumped in middle of that
function, or an incorrect function, if you're lacking a symbol for
the correct one.

-> It's important that your symbol table contains all symbols.

It does.

    Only thing suspicious is that by normal statistics I would expect
    half of them to be switches *to* interrupt handler, not returns from
    one.  Based on symbol names, none of them look like interrupt
    handlers, and looking at the assembly, your code doesn't have
    any RTEs, only TOS code uses those.

Yep, it's 100% user space application.

Fortunately, I didn't get stuck with this problem because I can change
between C and asm backends. And guess what - C backend (doing basically the
same, using same symbol names etc) has 100% correct graph, i.e.
Atari800_Frame() takes really 99% of all CPU cycles, all call counts are
shown correctly etc. So that makes me wonder how a few symbols coming from
an assembly file can mess things up so badly. I've been wondering whether
vasm couldn't be a culprit here, i.e. producing some a.out symbol names
incorrectly.

Anyway -- if you like, I can send you some intermediate object files and
builds to compare the outputs (C vs asm backend), I was able to track down
the performance bug, saved me hours of investigation. Using all those
scripts is a bit awkward but the result is astonishing.





Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/