|Re: [hatari-devel] Hatari data cache tests|
[ Thread Index |
| More lists.tuxfamily.org/hatari-devel Archives
On torstai 18 kesäkuu 2015, Nicolas Pomarède wrote:
> Le 18/06/2015 18:19, Douglas Little a écrit :
> > I did get some warnings like this during recording, which I haven't
> > investigated yet....
> > ERROR: trying to add costs to non-existing 0x1012480 caller of
> > 0x10114e8! ERROR: trying to add costs to non-existing 0x101242c caller
> > of 0x10114e8! ERROR: trying to add costs to non-existing 0x1012480
> > caller of 0x10114e8! ERROR: trying to add costs to non-existing
> > 0x101242c caller of 0x10114e8! ERROR: trying to add costs to
> > non-existing 0x1012480 caller of 0x10114e8!
If you would have asserts enabled at build time, this would
assert Hatari, as it had detected an inconsistency in call graph
Can you provide test-case triggering it?
> > I also noticed that trying to profile code with TT ram allocated
> > doesn't work very well as it slows my test down to less than 1fps, and
> > can halt for several seconds at a time. While it's probably reasonable
> > that it could get slower as the memory footprint increases, it seems
> > to be happening in a very nonlinear way and the actual amount of code
> > being executed remains the same. The code is executing from TT ram at
> > the time, so perhaps that has something to do with it.
> The error message is related to the profiler, and the slow down you
> noticed might also be due to the profiler collecting some stats in a
> linear way maybe (instead of using dichotomy or sthg similar to search a
> huge list).
> Eero will certainly know better how this works.
There aren't linear searches for anything.
However, when debug symbols are loaded, profiler does
automatically call graph tracking, which can be very heavy.
Call tracking is done for every symbol and it does on every
instruction a binary search for whether current address matches
any of the tracked call addresses.
That slowdown should be constant though. If it's not, problem
is probably very frequent symbol address matches, as on each match
profiler will allocate & record call stack information.
Tight loop on symbol address, both on CPU & DSP side at the same
time would probably be the worst case.
With traditional DRI/GST symbol data, that's not a problem as only
functions get such symbols. GCC can generate symbols also for loop
addresses, but those should be be removed on debugger symbol import
(they can be detected by name).
So, problem is probably with manually written assembly code.
You'll want to grep out loop addresses from the symbol list.
Besides unnecessarily slowing down the call-graph tracking, they
also cause messy & misleading  call graphs.
-> I need to add note about this to Hatari manual.
 Each symbol is listed as node in graph. With loop labels,
function costs get split into:
- function node containing costs for code before loop label, and
- loop node containing costs for code after loop label.