[hatari-devel] Improved internal timers performances in cycInt.c

[ Thread Index | Date Index | More lists.tuxfamily.org/hatari-devel Archives ]


Hi

despite not too many spare time at the moment, I finally complete a rewrite of cycInt.c that should give better performance, and sometimes huge boost in emulation speed.

current code in cycInt.c does several things :
1) after each cpu instruction, check if an internal interrupt should be processed
 2) call the corresponding handler + reorder all interrupts after that
 3) add/remove some timers and reorder everything.

to do this, cycInt.c stores delay in cycles before next timer happens. This means that each time a timer happens, we must correct the relative delay for all other timers.

Instead of storing relative delay, new code now uses the global cycle counter and stores absolute cycle of each timer. This means that when you reorder you don't have to update the InterruptHandlers[].Cycles values.

Also the new code stores a list of active interrupts with a double-linked list (next/prev members) in 'Cycles' ascending order. This means that when an interrupt happens you can immediately get the next active interrupt (using 'next' member) and you don't need to reorder anything.

And when you add/remove an interrupt, you just need to walk through the list of active interrupts (instead of checking all possible interrupts as current code does).

All in all, this can give big speedup when :

- an interrupt happens very often at high frequency (eg : timer D at boot on some STF/STE TOS)

- we can add many more interrupt sources (for example for scsi or other harddriver HW as this was discussed some times ago) without any impact on the emulation speed as long as those interrupts remain disable (which is not the case with current code where CycInt_SetNewInterrupt and CycInt_UpdateInterrupt always check all the interrupts, even not active ones ; so the more the list grows, the slower it gets)

I made some measures to show the improvements ; as written above, this will mainly depend on the use of high freq timers by the running program ; but even with lower freq timers the new code will always perform faster.

Tests are made with 'patch timer D' not applied (this is the default setting) , running in benchmark mode with audio/video disabled.

hatari --machine ste --tos tos162fr.img --benchmark --sound off --disable-video 1 --run-vbls 8000

values are in emulated frames/sec for old code and new code

hatari_21.msa 						499		792	+58%

[Inner Circle]-Decade Demo (patched).st			562		895	+59%

[Oxygene]-Nostalgic-O-Demo (STNICCC 2000 Edition).msa	489		711	+45%

gem desktop idle (patch timer d off)			575		854	+48%

gem desktop idle (patch timer d on)			1135		1235	+9%

UnionDemo.stx						1162		1267	+9%

For the 3 first demos, we see a boost of 45-60%, because those demos don't stop the "buggy" timer D set at boot by tos.

Same for gem dektop when timer D is not disabled.

When timer D is disabled, we see the gain is only 9-10%, which is not bad anyway (on boot Union Demo really stops timer D)


Using gmon profiler, we get confirmation of the gain :

old code, gem desktop idle (patch timer d off)

  %   cumulative   self              self     total
 time   seconds   seconds    calls   s/call   s/call  name
23.18 2.70 2.70 143858790 0.00 0.00 CycInt_SetNewInterrupt 18.07 4.81 2.11 143858788 0.00 0.00 CycInt_UpdateInterrupt 1.72 8.07 0.20 44701327 0.00 0.00 CycInt_AddRelativeInterruptWithOffset 0.86 9.59 0.10 43365456 0.00 0.00 CycInt_RemovePendingInterrupt 0.43 10.09 0.05 50239066 0.00 0.00 CycInt_AcknowledgeInterrupt 0.00 11.65 0.00 5552941 0.00 0.00 CycInt_AddRelativeInterrupt

->  ~44% of emulation is spent in CycInt code


new code, gem desktop idle (patch timer d off)

    %   cumulative   self              self     total
 time   seconds   seconds    calls   s/call   s/call  name
6.24 1.00 0.44 44700213 0.00 0.00 CycInt_AddRelativeInterruptWithOffset 3.19 3.48 0.23 43378651 0.00 0.00 CycInt_RemovePendingInterrupt 2.41 4.54 0.17 50237968 0.00 0.00 CycInt_AcknowledgeInterrupt 1.06 5.37 0.08 50237968 0.00 0.00 CycInt_CallActiveHandler 0.99 5.52 0.07 5552973 0.00 0.00 CycInt_AddRelativeInterrupt

->  only ~14% of emulation is spent in CycInt code


As always with such low level changes, regression might happen. I tested lots of demos that require precise MFP timers emulation and didn't see any problem so far. Don't hesitate to test some games/demos you like.


Also as a bonus, this new code allows to set any MFP external clock value, instead of the usual 2.4576 MHz one (as we know some models had slightly different clock). There's no option to change it the moment, it needs to be done in src/clocks_timings.c, but an option will be added later.


Nicolas



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/