Re: [hatari-devel] Very slow emulation when enabling Cycle Exact

[ Thread Index | Date Index | More lists.tuxfamily.org/hatari-devel Archives ]


Hi,

On 02/14/2018 08:22 PM, Nicolas Pomarède wrote:
Le 14/02/2018 à 19:03, Jerome Vernet a écrit :
Le 13/02/2018 à 22:34, Eero Tamminen a écrit :
During bootup, CPU utilization was somewhat higher.

I profiled it with Valgrind Callgrind tool.  Attached is
a callgraph of where most of the PC CPU *instructions* are spent
according to it.  Cache prefill emulation seems to cost a lot.

Yeah, that's what I can see here: fill_prefetch_030, fill_prefetch_030_ntx are using most of the CPU (about 70 %).

profiler

Nice profiler, is this an Apple tool ?

There are tools that can provide somewhat similar results
on Linux too.  On proprietary side there's Intel VTune
(requires its own kernel module), there are quite a few
tracing tools, and for sampling based CPU usage profiling
there are Valgrind callgrind/cachegrind + interactive
KCachegrind GUI, and stuff based on top of "perf".

(Valgrind is x86 CPU emulator for debugging all kinds of
things without need to recompile programs for profiling.)


For the callgraph I provided...

To collect profiling data:
http://valgrind.org/docs/manual/cl-manual.html

To view it with interactive callgraphs etc, you can use
Kcachegrind:
http://kcachegrind.sourceforge.net/

Here are some screenshots of it:
https://www.google.fi/search?q=kcachegrind&tbm=isch


Note: profiling data produced by Hatari profiler can also
be exported to callgrind format, so that one can interactively
browse it with Kcachegrind.  The format isn't fully supported,
so it's not quite as nice as PC profiling info though.


	- Eero

I can't see any SDL function in the profiler, nothing, so SDL is now for negligeable part in performances. BUT i found something insteresting: the second Thread at 50% is not Hatari, but an Apple thread between Hatari Audio part and MacOsX, called com.apple.audio.IOTHreadClient. This thread use 50% of CPU, even if there is no sound playing.
In any other mode (ST/TT/...), this thread use less than 1% of CPU...

Diffrerence between ST and Falcon is that falcon has an optional sound recording capability (enabled when port audio library is found)

Could you try to build without portaudio to see if this lower this extra thread ?

Nicolas






Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/