|Re: [hatari-devel] WinUAE CPU core CPU/FPU/DSP performance according to Centurbo benchmark|
[ Thread Index |
| More lists.tuxfamily.org/hatari-devel Archives
Le 02/01/2015 15:12, Eero Tamminen a écrit :
I was mainly wondering how CPU speed can be
off by >10x, whereas e.g. DSP is only off by <2x...
Numbers were same for 020, these are for 16Mhz 040:
- CPU 294 Mhz
- FPU 926 Mhz
- DSP 32 Mhz
No difference in FPU speed depending on the FPU type,
regardless of Wikipedia stating 040 FPU to be a lot
Attached is 030 Falcon results also from Gembench 4.03.
Integer division seems to be off quite a lot (5x).
Attached is also profile of what the Centurbo benchmark CPU & DSP sides
of the test do, with ROM calls removed (I think they're for GUI updates).
FPU test seems to be just bunch of these:
And CPU test bunch of these:
DSP test seems slightly larger.
the problem is that those instructions are mostly those that are not
cycle exact at the moment :(
div.l will always return 8 cycles, which is wrong as div/mul will take a
different number of cycles depending on their operands' value, and this
is not really known for cpu > 68000
same for FPU, cycles are not correct.
So, the problem of this benchmark is that it will mix results from
memory copy (move), with arithmetic operations (add/div/...) and FPU.
You could have very good results for move, but if div or FPU get too
much differences compared to real HW, then your global benchmark score
will be off by a very large factor.
For now, what we need is to have our own *very simple* benchmarks,
involving mainly 4-5 instructions at a time if possible :
- copying memory with .B .W .L variants, with or without cache : this
is were we need to update the RAM access time and the fact that they are
often rounded to 4 cycles. This would really be the reference test. As
long as memory access time are not correct, we won't be good.
- doing lots of arithmetic operations : apart from div.L/mul.L, I
think all timings should be good already
IIRC doug posted some results of a test program he wrote some weeks ago,
that would be a good way of comparing emulation and real HW when he has
time to work on it.
Some of the results of gembench could be used (ram/rom access, int
division, ...) but unless we disassemble it, we don't know what kind of
operations are done. It would be better to start with our own/simpler
tests to ensure our bases are solid, then move to more complex
benchmarks made by others.
In all cases, I think the memory wait cycles that are not yet correct
will make most tests fail for the moment, as they will change prefetch
time and caching time at the lowest level.