|Re: [hatari-devel] DSP performance|
[ Thread Index |
| More lists.tuxfamily.org/hatari-devel Archives
Le 26/06/2015 13:37, Douglas Little a écrit :
Just a quick note - perhaps someone can have a look and confirm that the
DSP in Hatari is correctly counting cycles w.r.t. the emulation master
On closer inspection of my profile results (not using Hatari's profiler
but rather using an MFP based profiler and comparing Hatari with HW) I
see what looks like a 50% performance benefit on Hatari's side.
i.e. the DSP seems to be clocked at 50MHz or something like that. This
sort of explains some of the performance differences I noticed when I
picked up Hatari 1.9 and also explains why I didn't run into any timing
issues straight away - a faster DSP hides most of the types of issues I
would have hit.
Basically what I see is that the CPU timings appear fairly similar to
Hatari up to v1.8 - most functions have comparable durations with real
HW, perhaps a bit too favourable in places - but the DSP performance has
magically gone up by a big margin. One of the DSP-only routines (when
serialized for measurment) records 15.5ms on real HW, but only 9.5ms in
Anyway I haven't had time to confirm this 100% other than via the
profiler I mentioned - it needs to be done in another way to be sure.
I'll try to do this with a cycle counter later and see if results tie
up. But if anyone has an explanation meantime that would help :D
Ok I quickly ran DSPBENCH on Hatari 1.9 and this confirms a magical 70%
speed increase over real HW...
I don't think my own project is affected by this as much as it might
have been, because DSP time is often hidden/overlapped - but it does
show quite obviously in some cases.
I can't say right now if cycles are not correct for DSP, but it's quite
possible because the way it works is the following :
- emulate the current 68030 instruction, which takes 'n' cycles
- for each cpu cycle, we run the DSP twice this number, because DSP
freq is 32 MHz for a CPU Freq of 16 MHz. So, we call DSP emulation for 2
x n cycles
As can be seen from this, the DSP speed will depend on the cycle
accuracy of the 68030 instructions that run at this time. Number of
cycles will vary if you choose prefetch mode or CE mode, but even in CE
mode, not all instructions have correct cycles at the moment (but at
least we should have better results for the caches now)
If the cycles at cpu side are underestimated, you will get more
instructions per second and more DSP speed. At the moment, we can't do
much about this :(
But the benchmarks you sent some days ago is the kind of things that can
help fixing this. The more cpu instructions we get correct, the better
the DSP will run.