[hatari-devel] Re: Opinions on Hatari profiler "user interface"? |
[ Thread Index | Date Index | More lists.tuxfamily.org/hatari-devel Archives ]
Hi, (Sorry, huge mail, but this is important for anybody interpreting the profiler post-processor results.) Next profiler "user interface" issue is showing the caller information in callgraphs. Hatari has now code for getting real cumulative costs for functions called as subroutine calls (with m68k JSR/BSR instructions or their DSP variants), in addition to the post-processor summing costs listed in the disassembly and assigning them to a symbol preceeding the instructions having these costs. Both methods are needed because they have their own pros & cons: 1. subroutine call/return method and resulting cost information: + This information is accurate, *as long as* program doesn't do any wierd stack manipulation. (I've added special handling for EmuTOS task switcher which uses *RTS* to call the subroutines) + Callgrind format output for Kcachegrind GUI needs accurate data. - This information is available in profile *only* if user loaded symbol address information to debugger before starting profiling. - Subroutine calls are used only for a subset of functions in optimized code, even global functions can get called with BRA & JMP, so showing data just for subroutine calls could miss *a lot* of the details in the code. 2. in-between-symbols method and resulting cost information: +/- Symbols can be anything; from functions to loop labels. I.e. single function cost could be split to function entry, and different loops it has, with function apparently calling loop label etc. + It's nice to have this detailed information on how program is structured and where exactly the costs go. (different "call" instruction types are indicated with different kind of arrows in callgraphs) - With this method one cannot get total costs for higher level symbols that call the low level functionality. - This information can be easily misleading, not just when there are additional symbols (for loop etc labels), but especially when symbols are missing. Then completely unrelated costs will be assigned to a symbol preceeding instructions that have these costs. NOTE: always give all relevant symbols to the post-processor, regarless of whether profile data already contains symbol information. Current callgraphs produced by the post-processing script by default show information from latter method, because subroutine information is optional. If caller information was included in the profile data, with the post-processor "-p" option: - "top" lists show subroutine call costs - callgraph nodes for functions that are called as subroutines (i.e. with BSR/JSR), show subroutine information instead (-> such nodes have different shape) See the attached example (Bad Mood CPU side). The problem with the "-p" option is that: - different (looking) nodes contain different information - subroutine costs may not match costs acquired with the in-between-symbols method for above mentioned reasons While having more information is good, it can be confusing to people trying to interpret the graphs. When it looks like there are conflicts in this data, it's an indication that provided symbol data was incomplete, and the issue can be checked from the profile disassembly information. Such issues are easiest to see from "top" lists. For example profile information of EmuTOS boot (first column is in-between-symbols cost, second column is total from subroutine calls): ------------ Calls: 10.80% 10.83% 1800 ___mulsi3 8.03% 8.09% 1339 _memcpy 5.20% 866 _int_timerc 4.61% 768 _call_user_wheel 3.86% 3.89% 643 _min 3.76% 3.80% 627 _sti 3.76% 3.76% 627 _cli 3.34% 3.35% 556 _bcostat4 1.88% 0.78% 314 _gsx2 1.88% 23.13% 314 _screen 1.88% 25.05% 314 _GSX_ENTRY 1.88% 1.88% 313 _get_vwk_by_handle 1.87% 312 _win_sinfo 1.87% 45.38% 312 _gem 1.87% 311 _back 1.87% 37.21% 311 _super 1.86% 1.86% 310 _rs_obfix 1.70% 1.69% 283 _gettext 1.67% 23.56% 278 _gsx_ncode 1.58% 263 _int_vbl 1.47% 25.74% 245 _rsrc_obfix 1.30% 1.30% 216 _kb_timerc_int 1.30% 1.30% 216 _sndirq Executed instructions: 61.63% 61.85% 1897542 _timeout_gpip 8.60% 9.32% 264918 _draw_rect 4.96% 4.97% 152579 _run_calibration 4.06% 4.08% 124944 _blank_out 3.22% 99134 meminit 1.86% 1.88% 57254 _abline 1.43% 1.46% 43936 _text_blt 1.30% 0.02% 39985 _get_start_addr 1.05% 32332 _act_bdown Used cycles: 61.62% 62.05% 22773364 _timeout_gpip 5.67% 6.82% 2093624 _draw_rect 4.29% 1586108 meminit 4.15% 4.18% 1534476 _run_calibration 3.04% 3.07% 1121752 _blank_out 2.98% 3.08% 1100712 _stop_until_interrupt 1.45% 1.49% 536520 _abline 1.20% 1.25% 443000 _text_blt 1.05% 1.06% 389052 ___mulsi3 1.01% 0.03% 371820 _get_start_addr ------------ ....shows total subroutine calls count for "_gsx2" being smaller than the normal call count: 1.88% 0.78% 314 _gsx2 This is because most of the calls to it were were jumps/branches and it itself didn't do any subroutine calls that would be added to the total. This can be seen from the profile disassembly output: ------------ _gsx2: $e0347c : move.l #$1fe94,$48a8 0.01% (314, 8792, 0) $e03486 : move.l #$48a8,d1 0.01% (314, 3768, 0) $e0348c : moveq #$73,d0 0.01% (314, 1256, 0) $e0348e : trap #2 0.01% (314, 10048, 0) $e03490 : rts 0.01% (314, 5024, 0) ------------ $ grep e0347c etos-boot.txt $e387d4 : jsr $e0347c 0.00% (22, 440, 0) $e38822 : jsr $e0347c 0.00% (4, 80, 0) $e38f40 : jmp $e0347c 0.00% (10, 120, 0) $e39b0a : jmp $e0347c 0.01% (278, 3336, 0) 0xe0347c: 0xe39b0a = 278 b, 0xe387d4 = 22 s 110/2276/32864 22/462/6600, 0xe38f40 = 10 b, 0xe38822 = 4 s 20/424/6208 4/84/1200, _gsx2 ------------ In the case of "_get_start_addr" instruction & cycle count: 1.30% 0.02% 39985 _get_start_addr The in-between method accounts unrelated costs to "_get_start_addr": ------------ _get_start_addr: $e33782 : move.l d2,-(sp) 0.00% (24, 288, 0) $e33784 : move.w 8(sp),d2 0.00% (24, 288, 0) $e33788 : andi.w #$fff0,d2 0.00% (24, 192, 0) $e3378c : moveq #0,d0 0.00% (24, 96, 0) $e3378e : move.w $15fa,d0 0.00% (24, 384, 0) $e33794 : lea $e0aba0,a0 0.00% (24, 288, 0) $e3379a : move.b (a0,d0.l),d0 0.00% (24, 384, 0) $e3379e : ext.w d0 0.00% (24, 96, 0) $e337a0 : lsr.w d0,d2 0.00% (24, 192, 0) $e337a2 : andi.l #$ffff,d2 0.00% (24, 384, 0) $e337a8 : add.l $44e,d2 0.00% (24, 576, 0) $e337ae : moveq #0,d0 0.00% (24, 96, 0) $e337b0 : move.w $15fc,d0 0.00% (24, 384, 0) $e337b6 : move.l d0,-(sp) 0.00% (24, 288, 0) $e337b8 : movea.w $e(sp),a0 0.00% (24, 288, 0) $e337bc : move.l a0,-(sp) 0.00% (24, 288, 0) $e337be : jsr $e60790 0.00% (24, 480, 0) $e337c4 : addq.l #8,sp 0.00% (24, 192, 0) $e337c6 : add.l d2,d0 0.00% (24, 192, 0) $e337c8 : move.l (sp)+,d2 0.00% (24, 288, 0) $e337ca : rts 0.00% (24, 384, 0) [...] $e337ce : lea $ffec(sp),sp 0.00% (11, 88, 0) .... $e33c60 : move d0,sr 0.00% (72, 864, 0) $e33c62 : rts 0.00% (72, 1152, 0) [...] __v_show_c: $e33dd6 : movea.l $1602,a0 0.00% (12, 240, 0) ------------ $ grep -A2 _get_start_addr etos512k.sym 0x00e33782 T _get_start_addr 0x00e33c64 T _v_locator 0x00e33dd6 T __v_show_c ------------- I.e. the actual "_get_start_addr" function is in $e33782-$e337ca, but because EmuTOS symbols are missing for code between $e337ce-$e33c62, (large) costs for that area get assinged for "_get_start_addr" with the in-between-symbols method. My question is whether you think there's something I can do to make all this clearer to the users of the profiler information? - Eero PS. Note that while information needed for in-between-symbols is in the profile data, post-processor cannot produce graphs that would contain just subroutine call information. Adding support for that that would require changes also to profile data produced by Hatari, and some option in profiler to select what kind data is collected...
Attachment:
bmsym-2-crop.pdf
Description: Adobe PDF document
Mail converted by MHonArc 2.6.19+ | http://listengine.tuxfamily.org/ |