|[hatari-devel] Improved cycle counting / rasters for 16/32 MHz|
[ Thread Index |
| More lists.tuxfamily.org/hatari-devel Archives
I just committed some changes for some better cycle counting when using
x2 or x4 cpu speed. This should help later for better 68030 cycles accuracy.
For those interested in details, Hatari used so far some timings (cpu,
video, sound, fdc, mfp, ...) based on a 8 MHz clock (same as STF/STE).
In order to simulate 16 or 32 MHz cpu, the "normal" cycle count for each
instruction was divided by 2 or 4. This was a quick and easy way to have
faster cpu without altering other components' behaviour.
But it has the drawback of creating rounding error ; for example,
consider a MOVE instruction that would take 16 cycles : it would take 8
cycles at 16 MHz or 4 cycles at 32 MHz.
If we consider a 50 Hz VBL with 160256 cycles, we then have :
- 160256/16 = 10016 of these MOVE at 8 MHz
- 160256/8 = 20032 of these MOVE at 16 MHz
- 160256/4 = 40064 of these MOVE at 32 MHz
So, there's really 2 or 4 times more MOVE executed during a VBL, which
looks like a faster cpu.
The problem is when instruction's cycles are not a multiple of 4. For
example, 6 cycles at 8 MHz gives 3 cycles at 16 MHz and 1.5 cycles at 32
MHz. These cycles would be rounded to 2 in the ST, so it would gives 6,
4 and 2 cycles.
We can see that if a block of 100 MOVE is executed then we will
propagate this rounding error and this can lead to a rather big number
of cycles per second in the end.
In the new code, I do the opposite (which is more logical and closer to
real HW) : each instruction keeps its number of cycles (ie 16 in this
example), but all other components have their clock updated accordingly.
So, if a VBL took the equivalent time of 160256 cpu cycles at 8 MHz, it
will now take 320512 cycles at 16 MHz or 641024 at 32 MHz.
Same for all other components where timings are expressed as an
equivalent number of CPU cycles (mfp, ym, fdc, acia, blitter, ...)
As can be seen, this requires to modify all other parts to ensure they
run at constant speed when cpu speed increases (eg sound should not be
faster when cpu is set to 32 MHz), which was the big part of the work.
When using internal timers (from cycInt.c), there're now 3 units to
specify a delay :
- INT_CPU_CYCLE : delay will use the direct number of cycles (cpu, blitter)
- INT_CPU8_CYCLE : this is for delays that were measured on a 8 MHz
machine and should take the same number of microsec at any cpu speed
(fdc, sound acia, dma sound, ...). Given number of cycles is
<<CpuFreqShift to compensate for the faster cpu.
- INT_MFP_CYCLE : this is for delay measured in MFP cycles on a 8 MHz
machine. Delay should not be changed when cpu goes to 16 or 32 MHz.
As a consequence of this, video emulation will now also support color
changes at 16 and 32 MHz. This is visible with demos that use timer b or
hbl to change colors.
For example, try F1 or F2 screen in the FNIL demo by TNT, or the LCD
demo by TEX, and compare with the results we had under Hatari 1.9
(flickering colors, unstable rasters)
This should also be closer to real HW, in theory nothing prevent to code
fullscreen demo on a 16 MHz Mega ST, you just need to carefully adapt
the freq switches to take the faster cpu speed into account.
Hatari should be able to do this too now (but I don't know any demo that
runs at 16 MHz and remove border)
To sum it up, this is a preliminary and necessary step to improve 68030
accuracy in Hatari : dividing cycles by 2 or 4 created rounding errors
and made it very difficult to have correct sync between cpu and dsp.
Next big work is now to improve 68030 cycles by taking Falcon/TT
specific bus time for each memory access.
But this will be later :)