Re: [hatari-devel] Profiler question

[ Thread Index | Date Index | More lists.tuxfamily.org/hatari-devel Archives ]


Hi!

I've got a problem to get the profiler showing some reasonable data. I've compiled the app by vasm, exported symbols, followed Hatari manual... when I want to finally see something:

What format are you using to import symbols? 

There are two - one can load the symbols direct from the PRG/TOS if the file was built with GST/DRI (?) format. This did workwell for me from Devpac, I'm not sure I tried it with VASM, since I really only used VASM to generate GCC ".a" format linkables to use with a C project...

The other format is NM format (ASCII formatted symbol list), which must be exported using the NM tool from GCC's binutils. It's specifically for GCC-generated code. I use this one for BadMood.

When you import symbols from NM format text files, you have to specify the names of the text, data, bss sections the symbols are relative to. e.g.

symbols test.sym TEXT DATA BSS

TBH I had a lot of trouble with this, especially when mixing C with asm. Sometimes you get nonsense symbol addresses if you specify the DATA BSS fields, and only get sensible results with TEXT alone. I haven't looked into the reasons for this yet but it's probably due to differences in section naming.


[...]
- 0x282d8: dp_calc_sam (return = 0x28268)
- 0x282d8: dp_calc_sam (return = 0x28264)
- 0x282d8: dp_calc_sam (return = 0x28260)
- 0x282d8: dp_calc_sam (return = 0x2825c)
- 0x282a8: dp_swap_buffers (return = 0x2820a)
Allocated CPU profile address buffer (13 KB).
Normal RAM (0-0xE00000):
- active address range:
  0x01fa22-0x03933c
- active instruction addresses:
  2873 (100.00% of all)
- executed instructions:
  17700459 (100.00% of all)
- instruction cache misses:
  270156 (100.00% of all)
- used cycles:
  170191864 (100.00% of all)
  = 10.60882s
ROM TOS (0xE00000-0xE80000):
- no activity
Cartridge ROM (0xFA0000-0xFC0000):
- no activity

= 10.60882s

Cache misses per instruction, number of occurrences:
- 0: 17451036
- 1: 235377
- 2: 7359
- 3: 6687

CPU=$3262a, VBL=5633, FrameCycles=74, HBL=0, LineCycles=74, DSP=$537
> profile counts 8
addr:           count:
0x0387d2         3.59%  636015  DC.W      $b090
0x0387d4         3.59%  636015  DC.W      $67fc
0x037e58         2.60%  459881  DC.W      $0828
0x037e5e         2.60%  459881  DC.W      $660a
0x037e60         2.60%  459792  DC.W      $33fc
0x037e68         2.60%  459792  DC.W      $60ee
0x033616         1.45%  256320  DC.W      $3225
0x033618         1.45%  256320  DC.W      $6708
8 CPU addresses listed.

1. I can't believe there can be 100% instruction cache misses, that is just nonsense

I don't think that means your code has 100% miss rate. It just means the profiler was able to account for all of the misses which did occur, based on cross-referenced sums. I'm sure you can ignore it (ask Eero for details).

It's worth mentioning that there are 2 separate levels of profile analysis from Hatari. The first level is the 'raw' annotated disassembly, which just lists the whole program disassembly with hit sums, cycle sums and cache miss sums for every address encountered. You get this data by doing this:

> profile save myprofile.txt

...and then you can read the disassembly with all that data attached. If you find the disassembly contains lots of garbage data, it generally means some executed code has been replaced by data while the program was running. Note that the profile disassembly - unlike a normal disassembly - will only include instructions which were actually used. So there will be gaps [...] between functions for unused code, or where branches are always taken to skip very rare special-case code.

The second level of profile analysis is a python script which reads this raw data, parses it and turns it into 'classic' profiler results and callgraphs organised by symbols (assuming symbols are present - it's useless without them, unlike the raw disassembly profile which is usable with or without symbols).

It's better to experiment first with the raw profiled disassembly listing to get a feel for whats happening.
 

2. most used stuff doesn't make any sense, too

Hatari console doesn't report any errors, I see the symbols are imported, everything all right. Am I missing something obvious here?

It's certainly possible the symbols are not properly 'rebased' relative to your text section. This will cause them all to have wrong addresses (See my notes above).

The fact you are getting DC.W disassembly for the hottest addresses is very strange! This would normally only happen if the memory is being reused by more than one bit of code and/or data. The profiler is a static process which monitors activity at fixed addresses - it expects code remains in one place, and is not replaced or moved after it executes.

I can't think of any other reason you'd get DC.W as profiled code (I haven't seen it happen myself).....

D.



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/