|Re: [hatari-devel] Falcon Sound|
[ Thread Index |
| More lists.tuxfamily.org/hatari-devel Archives
Le 12/10/2016 à 23:17, Eero Tamminen a écrit :
On 10/12/2016 09:07 AM, Anders Eriksson wrote:
But rather amusing, my 8 core 2.2 GHz Xeon system can't emulate the DSP
fast enough. Small DSP programs are ok (Protracker style players) but
stuff like ACE and Graoumf is a no-no. 7 cores are idle, 1 core
overflows :) I wish some day that different parts of the Falcon
emulation can be put to it's own thread (CPU, DSP, Videl..).
Hatari DSP emulation came originally from Aranym and it *was*
Threading makes sense when the CPU & DSP cores can run large
parts independently and every emulated program does explicit
synchronization. However, that turned out not to be the case
with many of Falcon's DSP-using programs, so threading was removed.
Programs expect things "just to happen" at right speed without explicit
synchronization, and that can be guaranteed only when DSP core
instructions are emulated in lock-step with each CPU side instruction.
Adding threading and locking on top of that, to be able to run few DSP
instructions at a time on separate core wouldn't help much and could
even slow down things.
Once Hatari has been verified to have cycle-accurate 030 (including
instruction and data cache hits/misses), FPU and DSP emulation,
maybe we could consider looking to threading DSP again. That's still
far off though.
maybe aranym used threading for DSP, but it was more a proof of concept ?
I really doubt it had no huge negative impact on emulation, doing some
context switches in the OS on every emulated CPU/DSP opcode would be
really time consuming.
At best, you could imagine running the DSP alone as long as it uses only
its own RAM and doesn't send/receive data to the 68030 cpu. Then DSP
could run in its own thread for longer, minimizing the thread switches
But even so, if you consider MP2/MP3 replays or 3D code as used by DML
in "030" then the number of consecutive dsp instructions you could run
without needing to synchronize with the cpu would be rather limited (10
or 20 in one go maybe), still needing lots of context switches.