|Re: [hatari-devel] Very slow emulation when enabling Cycle Exact|
[ Thread Index | Date Index | More lists.tuxfamily.org/hatari-devel Archives ]
Hi, On 02/13/2018 07:10 PM, Jerome Vernet wrote:
Le 13/02/2018 à 17:58, Nicolas Pomarède a écrit :It is required for most accurate emulation, especially for some demos that require perfect sync between cpu and dsp. But at the moment, cycle exact mode is not perfect either, some instructions don't have the right timing yet. But it's still more accurate than when using "prefect mode" for example.In fact, disabling MMU emulation while Cycle exact (and prefetch mode) are enabled keep it usable. Just about 90% cpu just in TOS.
On my old (2010) 3 GHz i3, Hatari with cycle exact mode enabled... TOS v4 idling in desktop takes: - 80-85% CPU with DSP enabled, regardless of MMU & CPU exact settings - 70% CPU with DSP emu disabled This was according to "top", which doesn't take into account at which frequency the CPU is running at. DSP can take a *lot* more CPU when it's heavily used. In TOS desktop it's running just idle loop, that's why there's only 10-15% difference in CPU usage. During bootup, CPU utilization was somewhat higher. I profiled it with Valgrind Callgrind tool. Attached is a callgraph of where most of the PC CPU *instructions* are spent according to it. Cache prefill emulation seems to cost a lot. NOTE: While callgrind gives good indication where program might be spending its time, it's quite inaccurate and maybe more interesting for finding out how functions get called in given use-case. This is because instruction count can differ *a lot* from what actually takes time (CPU cycles), that doesn't consider impact of cache & instruction pipelining. (E.g. disabling tracing which is visible in the callgraph, didn't reduce Hatari CPU usage reported by "top" noticeably.) EmuTOS v0.9.9.1 512k version idling in desktop takes: - 40% CPU regardless of DSP / MMU setting
There are 2 Core used, both about 50%, so things can be improved.
Hatari is single threaded, but SDL audio handling uses an extra thread. I.e. 2 core usage is probably from your OS ping-ponging the process between two cores, which can be part of the problem. In Linux you can bind process to a single core with: taskset 0x1 <program> On MacOS, SDL has been traditionally a performance hog. While that improved with v2, maybe it's still somewhat an issue. - Eero
Description: PNG image
|Mail converted by MHonArc 2.6.19+||http://listengine.tuxfamily.org/|