[hatari-devel] Falcon emulation accuracy

[ Thread Index | Date Index | More lists.tuxfamily.org/hatari-devel Archives ]


I've been trying to build up a picture of how Hatari differs from real Falcon machines in terms of performance differences for compatibility purposes.

I find Hatari to be really very compatible with Falcon stuff providing care is taken not to rely on timing. Most Falcon stuff doesn't need to rely on timing but CPU/DSP synchro is a regular headache. Also, a few optimizations will make things faster on one but slower on the other.

I took these grabs using the internal profiler in my game which uses a sampling timer and function boundary events to figure out distribution of work. This has the benefit of working in both Hatari and on a real machine, and can to an extent measure its own impact on everything else (about 15-20%).

Hardware:
https://dl.dropboxusercontent.com/u/12947585/hw.png

Hatari:
https://dl.dropboxusercontent.com/u/12947585/hatari.png

Note the FPS vs msec measures were not properly calibrated for that displaytype so the figures there are closer to 7.9/7.2fps (and not 8.8/8.1).


Experience with various tests, optimizations and using the sampling and Hatari profilers has built up a picture of some of the differences. It's not a perfect picture but it's getting clearer.

- Hatari is consistently faster at accessing RAM, particularly in the higher-bandwidth video modes, relative to real HW. 
- Indexing data (e.g. translating indexed colours through a table) is often faster on HW due to presence of datacache.
- Fully-cached code seems a bit faster on real HW, in some cases at least (A:mux is fully i-cached but does not use datacache).
- Uncached/codegenerated routines (and i-cache misses) are slower on real HW, especially in high bandwidth video modes. This is probably contention between instruction and data fetching combined with Hatari's bus access being quicker.
- DSP/host port timing is unclear but it seems like the Hatari port is slower, perhaps as a compatibility helper to make up for the faster CPU execution for uncached code. (R:TexStream is solid contiguous writes to the DSP). The DSP timing is accurate, but timing of exchanges over the host are not (R:VisPlanes also does a lot of host port read access, and seems much quicker on HW).
- FPU timing doesn't seem emulated at all (but I could be wrong about that) so Hatari is quicker. 

(also: not all FPU opcodes appeared to disassemble in the debugger? Some do, other's don't! Maybe this is an 060 vs 882 set difference thing? I didn't make a note of the ops but the disasm was a mixture of real instructions and garbage encoded as $fxxx)


BTW this isn't a request for anything, just some guiding information.







Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/