[hatari-devel] Improved cycle counting / rasters for 16/32 MHz

[ Thread Index | Date Index | More lists.tuxfamily.org/hatari-devel Archives ]


Hello

I just committed some changes for some better cycle counting when using x2 or x4 cpu speed. This should help later for better 68030 cycles accuracy.


For those interested in details, Hatari used so far some timings (cpu, video, sound, fdc, mfp, ...) based on a 8 MHz clock (same as STF/STE).

In order to simulate 16 or 32 MHz cpu, the "normal" cycle count for each instruction was divided by 2 or 4. This was a quick and easy way to have faster cpu without altering other components' behaviour.

But it has the drawback of creating rounding error ; for example, consider a MOVE instruction that would take 16 cycles : it would take 8 cycles at 16 MHz or 4 cycles at 32 MHz.

If we consider a 50 Hz VBL with 160256 cycles, we then have :
 - 160256/16 = 10016 of these MOVE at 8 MHz
 - 160256/8  = 20032 of these MOVE at 16 MHz
 - 160256/4  = 40064 of these MOVE at 32 MHz

So, there's really 2 or 4 times more MOVE executed during a VBL, which looks like a faster cpu.

The problem is when instruction's cycles are not a multiple of 4. For example, 6 cycles at 8 MHz gives 3 cycles at 16 MHz and 1.5 cycles at 32 MHz. These cycles would be rounded to 2 in the ST, so it would gives 6, 4 and 2 cycles. We can see that if a block of 100 MOVE is executed then we will propagate this rounding error and this can lead to a rather big number of cycles per second in the end.

In the new code, I do the opposite (which is more logical and closer to real HW) : each instruction keeps its number of cycles (ie 16 in this example), but all other components have their clock updated accordingly.

So, if a VBL took the equivalent time of 160256 cpu cycles at 8 MHz, it will now take 320512 cycles at 16 MHz or 641024 at 32 MHz. Same for all other components where timings are expressed as an equivalent number of CPU cycles (mfp, ym, fdc, acia, blitter, ...)

As can be seen, this requires to modify all other parts to ensure they run at constant speed when cpu speed increases (eg sound should not be faster when cpu is set to 32 MHz), which was the big part of the work.


When using internal timers (from cycInt.c), there're now 3 units to specify a delay :

- INT_CPU_CYCLE : delay will use the direct number of cycles (cpu, blitter)
- INT_CPU8_CYCLE : this is for delays that were measured on a 8 MHz machine and should take the same number of microsec at any cpu speed (fdc, sound acia, dma sound, ...). Given number of cycles is <<CpuFreqShift to compensate for the faster cpu. - INT_MFP_CYCLE : this is for delay measured in MFP cycles on a 8 MHz machine. Delay should not be changed when cpu goes to 16 or 32 MHz.


As a consequence of this, video emulation will now also support color changes at 16 and 32 MHz. This is visible with demos that use timer b or hbl to change colors. For example, try F1 or F2 screen in the FNIL demo by TNT, or the LCD demo by TEX, and compare with the results we had under Hatari 1.9 (flickering colors, unstable rasters)

This should also be closer to real HW, in theory nothing prevent to code fullscreen demo on a 16 MHz Mega ST, you just need to carefully adapt the freq switches to take the faster cpu speed into account. Hatari should be able to do this too now (but I don't know any demo that runs at 16 MHz and remove border)


To sum it up, this is a preliminary and necessary step to improve 68030 accuracy in Hatari : dividing cycles by 2 or 4 created rounding errors and made it very difficult to have correct sync between cpu and dsp.

Next big work is now to improve 68030 cycles by taking Falcon/TT specific bus time for each memory access.

But this will be later :)


Nicolas




Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/