Re: [hatari-devel] DSP performance

[ Thread Index | Date Index | More lists.tuxfamily.org/hatari-devel Archives ]


I believe the current way is the good one, as long as we manage to set the correct timings to each instruction. The static table was of course a better/worse approach, not a "exact timings" solution.

As far as I know, we'll never reach the exact cycles for some instructions like mul or div, but if we can approach the correct timings for the current ones in all cpu modes (mmu, cycle exact, prefetch, ...) it would be a big step.

Let's forget the table approach (I've kept a hatari 1.7 on my hatd drive to compare with future versions, so no need to keep it in the current versions).

Laurent



Le 29/06/2015 00:03, Nicolas Pomarède a écrit :
Le 28/06/2015 23:47, Laurent Sallafranque a écrit :
Hi all,

Until now, I've always thought that the first fight for Falcon emulation
was the accuracy of the CPU cycles, as the cpu is THE clock for all the
system.

When I did the static cycles table in the previous version of hatari
(until 1.8), I did recompute the whole table for 16 bits memory acces
and for .w or .l instructions (cycles are different due to the cycle
access).

Maybe the current 68030 cycles are for a 32 bit computer (as the amiga
68020 is) and the cycles are not recomputed for a 16 bits BUS. As the
cpu core is issued from winUAE, it may be something like that.
Maybe there's something else to search ;)

I know my static table was not perfect at all, but it seemed to give a
not so far timing accuracy from a real falcon.
I spent more than 1 month recomputing the figures according to Mikro's
documentation about 68030 cycles in the Falcon.

I don't know where the cycles are computed in the new engine (I should
take the time to have a closer look at this).


Hi

in WinUAE CPU (as in old UAE CPU), cycles are computed not with a table but with some basic sets of "rules" that combine the time needed to prefetch, to access memory, to do bit operations, arithmetic and so on, taking into account the operand size.

On the average, it's possible the table gave better results, or better results for the instructions that are most commonly used on Falcon when cycle accuracy is needed.

But it didn't take instr/data cache into account by using the real logic as in a 68030 (it was some worst/best case values)

And in the end, it was too difficult to merge new WinUAE cpu with this table, there're too much differences in both approaches.

Also, I have the feeling the table was based on 68020, not 68030 ? For example, NOP took 2 cycles in cache and 4 cycles with no cache, but 68030 doc says it's always 2 cycles (same for EXG dx,dy, timings are different between cache and no cache, but it should not be the case).

All in all, I have no "fit all" solution. Keeping an old version of WinUAE CPU core was not good, as many 68000 CE behaviour are handled much more cleanly/accurately in latest WinUAE ; and this also fixed several games/demos for Falcon that didn't work before.

By comparing real Falcon cycles result with latest Hatari, we will be able to spot some differences and this could give hint on how some instructions should work internally to reach the correct number of cycles, but until then, we will have a mix of improvements/regressions.

Nicolas








Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/