| [hatari-devel] Re: Opinions on Hatari profiler "user interface"? |
[ Thread Index | Date Index | More lists.tuxfamily.org/hatari-devel Archives ]
Hi,
(Sorry, huge mail, but this is important for anybody interpreting
the profiler post-processor results.)
Next profiler "user interface" issue is showing the caller information
in callgraphs.
Hatari has now code for getting real cumulative costs for functions called
as subroutine calls (with m68k JSR/BSR instructions or their DSP variants),
in addition to the post-processor summing costs listed in the disassembly
and assigning them to a symbol preceeding the instructions having these
costs.
Both methods are needed because they have their own pros & cons:
1. subroutine call/return method and resulting cost information:
+ This information is accurate, *as long as* program doesn't do any
wierd stack manipulation. (I've added special handling for EmuTOS
task switcher which uses *RTS* to call the subroutines)
+ Callgrind format output for Kcachegrind GUI needs accurate data.
- This information is available in profile *only* if user loaded symbol
address information to debugger before starting profiling.
- Subroutine calls are used only for a subset of functions in optimized
code, even global functions can get called with BRA & JMP, so showing
data just for subroutine calls could miss *a lot* of the details in
the code.
2. in-between-symbols method and resulting cost information:
+/- Symbols can be anything; from functions to loop labels. I.e. single
function cost could be split to function entry, and different loops
it has, with function apparently calling loop label etc.
+ It's nice to have this detailed information on how program is structured
and where exactly the costs go. (different "call" instruction types are
indicated with different kind of arrows in callgraphs)
- With this method one cannot get total costs for higher level symbols
that call the low level functionality.
- This information can be easily misleading, not just when there are
additional symbols (for loop etc labels), but especially when symbols
are missing. Then completely unrelated costs will be assigned to
a symbol preceeding instructions that have these costs.
NOTE: always give all relevant symbols to the post-processor, regarless
of whether profile data already contains symbol information.
Current callgraphs produced by the post-processing script by default show
information from latter method, because subroutine information is optional.
If caller information was included in the profile data,
with the post-processor "-p" option:
- "top" lists show subroutine call costs
- callgraph nodes for functions that are called as subroutines
(i.e. with BSR/JSR), show subroutine information instead
(-> such nodes have different shape)
See the attached example (Bad Mood CPU side).
The problem with the "-p" option is that:
- different (looking) nodes contain different information
- subroutine costs may not match costs acquired with the in-between-symbols
method for above mentioned reasons
While having more information is good, it can be confusing to people trying
to interpret the graphs.
When it looks like there are conflicts in this data, it's an indication
that provided symbol data was incomplete, and the issue can be checked
from the profile disassembly information. Such issues are easiest to
see from "top" lists.
For example profile information of EmuTOS boot (first column is
in-between-symbols cost, second column is total from subroutine calls):
------------
Calls:
10.80% 10.83% 1800 ___mulsi3
8.03% 8.09% 1339 _memcpy
5.20% 866 _int_timerc
4.61% 768 _call_user_wheel
3.86% 3.89% 643 _min
3.76% 3.80% 627 _sti
3.76% 3.76% 627 _cli
3.34% 3.35% 556 _bcostat4
1.88% 0.78% 314 _gsx2
1.88% 23.13% 314 _screen
1.88% 25.05% 314 _GSX_ENTRY
1.88% 1.88% 313 _get_vwk_by_handle
1.87% 312 _win_sinfo
1.87% 45.38% 312 _gem
1.87% 311 _back
1.87% 37.21% 311 _super
1.86% 1.86% 310 _rs_obfix
1.70% 1.69% 283 _gettext
1.67% 23.56% 278 _gsx_ncode
1.58% 263 _int_vbl
1.47% 25.74% 245 _rsrc_obfix
1.30% 1.30% 216 _kb_timerc_int
1.30% 1.30% 216 _sndirq
Executed instructions:
61.63% 61.85% 1897542 _timeout_gpip
8.60% 9.32% 264918 _draw_rect
4.96% 4.97% 152579 _run_calibration
4.06% 4.08% 124944 _blank_out
3.22% 99134 meminit
1.86% 1.88% 57254 _abline
1.43% 1.46% 43936 _text_blt
1.30% 0.02% 39985 _get_start_addr
1.05% 32332 _act_bdown
Used cycles:
61.62% 62.05% 22773364 _timeout_gpip
5.67% 6.82% 2093624 _draw_rect
4.29% 1586108 meminit
4.15% 4.18% 1534476 _run_calibration
3.04% 3.07% 1121752 _blank_out
2.98% 3.08% 1100712 _stop_until_interrupt
1.45% 1.49% 536520 _abline
1.20% 1.25% 443000 _text_blt
1.05% 1.06% 389052 ___mulsi3
1.01% 0.03% 371820 _get_start_addr
------------
....shows total subroutine calls count for "_gsx2" being smaller
than the normal call count:
1.88% 0.78% 314 _gsx2
This is because most of the calls to it were were jumps/branches
and it itself didn't do any subroutine calls that would be added
to the total.
This can be seen from the profile disassembly output:
------------
_gsx2:
$e0347c : move.l #$1fe94,$48a8 0.01% (314, 8792, 0)
$e03486 : move.l #$48a8,d1 0.01% (314, 3768, 0)
$e0348c : moveq #$73,d0 0.01% (314, 1256, 0)
$e0348e : trap #2 0.01% (314, 10048, 0)
$e03490 : rts 0.01% (314, 5024, 0)
------------
$ grep e0347c etos-boot.txt
$e387d4 : jsr $e0347c 0.00% (22, 440, 0)
$e38822 : jsr $e0347c 0.00% (4, 80, 0)
$e38f40 : jmp $e0347c 0.00% (10, 120, 0)
$e39b0a : jmp $e0347c 0.01% (278, 3336, 0)
0xe0347c: 0xe39b0a = 278 b, 0xe387d4 = 22 s 110/2276/32864 22/462/6600,
0xe38f40 = 10 b, 0xe38822 = 4 s 20/424/6208 4/84/1200, _gsx2
------------
In the case of "_get_start_addr" instruction & cycle count:
1.30% 0.02% 39985 _get_start_addr
The in-between method accounts unrelated costs to "_get_start_addr":
------------
_get_start_addr:
$e33782 : move.l d2,-(sp) 0.00% (24, 288, 0)
$e33784 : move.w 8(sp),d2 0.00% (24, 288, 0)
$e33788 : andi.w #$fff0,d2 0.00% (24, 192, 0)
$e3378c : moveq #0,d0 0.00% (24, 96, 0)
$e3378e : move.w $15fa,d0 0.00% (24, 384, 0)
$e33794 : lea $e0aba0,a0 0.00% (24, 288, 0)
$e3379a : move.b (a0,d0.l),d0 0.00% (24, 384, 0)
$e3379e : ext.w d0 0.00% (24, 96, 0)
$e337a0 : lsr.w d0,d2 0.00% (24, 192, 0)
$e337a2 : andi.l #$ffff,d2 0.00% (24, 384, 0)
$e337a8 : add.l $44e,d2 0.00% (24, 576, 0)
$e337ae : moveq #0,d0 0.00% (24, 96, 0)
$e337b0 : move.w $15fc,d0 0.00% (24, 384, 0)
$e337b6 : move.l d0,-(sp) 0.00% (24, 288, 0)
$e337b8 : movea.w $e(sp),a0 0.00% (24, 288, 0)
$e337bc : move.l a0,-(sp) 0.00% (24, 288, 0)
$e337be : jsr $e60790 0.00% (24, 480, 0)
$e337c4 : addq.l #8,sp 0.00% (24, 192, 0)
$e337c6 : add.l d2,d0 0.00% (24, 192, 0)
$e337c8 : move.l (sp)+,d2 0.00% (24, 288, 0)
$e337ca : rts 0.00% (24, 384, 0)
[...]
$e337ce : lea $ffec(sp),sp 0.00% (11, 88, 0)
....
$e33c60 : move d0,sr 0.00% (72, 864, 0)
$e33c62 : rts 0.00% (72, 1152, 0)
[...]
__v_show_c:
$e33dd6 : movea.l $1602,a0 0.00% (12, 240, 0)
------------
$ grep -A2 _get_start_addr etos512k.sym
0x00e33782 T _get_start_addr
0x00e33c64 T _v_locator
0x00e33dd6 T __v_show_c
-------------
I.e. the actual "_get_start_addr" function is in $e33782-$e337ca, but
because EmuTOS symbols are missing for code between $e337ce-$e33c62,
(large) costs for that area get assinged for "_get_start_addr"
with the in-between-symbols method.
My question is whether you think there's something I can do to make all
this clearer to the users of the profiler information?
- Eero
PS. Note that while information needed for in-between-symbols is in
the profile data, post-processor cannot produce graphs that would contain
just subroutine call information. Adding support for that that would
require changes also to profile data produced by Hatari, and some option
in profiler to select what kind data is collected...
Attachment:
bmsym-2-crop.pdf
Description: Adobe PDF document
| Mail converted by MHonArc 2.6.19+ | http://listengine.tuxfamily.org/ |