[hatari-devel] Improved internal timers performances in cycInt.c |
[ Thread Index |
Date Index
| More lists.tuxfamily.org/hatari-devel Archives
]
Hi
despite not too many spare time at the moment, I finally complete a
rewrite of cycInt.c that should give better performance, and sometimes
huge boost in emulation speed.
current code in cycInt.c does several things :
1) after each cpu instruction, check if an internal interrupt should
be processed
2) call the corresponding handler + reorder all interrupts after that
3) add/remove some timers and reorder everything.
to do this, cycInt.c stores delay in cycles before next timer happens.
This means that each time a timer happens, we must correct the relative
delay for all other timers.
Instead of storing relative delay, new code now uses the global cycle
counter and stores absolute cycle of each timer. This means that when
you reorder you don't have to update the InterruptHandlers[].Cycles values.
Also the new code stores a list of active interrupts with a
double-linked list (next/prev members) in 'Cycles' ascending order. This
means that when an interrupt happens you can immediately get the next
active interrupt (using 'next' member) and you don't need to reorder
anything.
And when you add/remove an interrupt, you just need to walk through the
list of active interrupts (instead of checking all possible interrupts
as current code does).
All in all, this can give big speedup when :
- an interrupt happens very often at high frequency (eg : timer D at
boot on some STF/STE TOS)
- we can add many more interrupt sources (for example for scsi or
other harddriver HW as this was discussed some times ago) without any
impact on the emulation speed as long as those interrupts remain disable
(which is not the case with current code where CycInt_SetNewInterrupt
and CycInt_UpdateInterrupt always check all the interrupts, even not
active ones ; so the more the list grows, the slower it gets)
I made some measures to show the improvements ; as written above, this
will mainly depend on the use of high freq timers by the running program
; but even with lower freq timers the new code will always perform faster.
Tests are made with 'patch timer D' not applied (this is the default
setting) , running in benchmark mode with audio/video disabled.
hatari --machine ste --tos tos162fr.img --benchmark --sound off
--disable-video 1 --run-vbls 8000
values are in emulated frames/sec for old code and new code
hatari_21.msa 499 792 +58%
[Inner Circle]-Decade Demo (patched).st 562 895 +59%
[Oxygene]-Nostalgic-O-Demo (STNICCC 2000 Edition).msa 489 711 +45%
gem desktop idle (patch timer d off) 575 854 +48%
gem desktop idle (patch timer d on) 1135 1235 +9%
UnionDemo.stx 1162 1267 +9%
For the 3 first demos, we see a boost of 45-60%, because those demos
don't stop the "buggy" timer D set at boot by tos.
Same for gem dektop when timer D is not disabled.
When timer D is disabled, we see the gain is only 9-10%, which is not
bad anyway (on boot Union Demo really stops timer D)
Using gmon profiler, we get confirmation of the gain :
old code, gem desktop idle (patch timer d off)
% cumulative self self total
time seconds seconds calls s/call s/call name
23.18 2.70 2.70 143858790 0.00 0.00
CycInt_SetNewInterrupt
18.07 4.81 2.11 143858788 0.00 0.00
CycInt_UpdateInterrupt
1.72 8.07 0.20 44701327 0.00 0.00
CycInt_AddRelativeInterruptWithOffset
0.86 9.59 0.10 43365456 0.00 0.00
CycInt_RemovePendingInterrupt
0.43 10.09 0.05 50239066 0.00 0.00
CycInt_AcknowledgeInterrupt
0.00 11.65 0.00 5552941 0.00 0.00
CycInt_AddRelativeInterrupt
-> ~44% of emulation is spent in CycInt code
new code, gem desktop idle (patch timer d off)
% cumulative self self total
time seconds seconds calls s/call s/call name
6.24 1.00 0.44 44700213 0.00 0.00
CycInt_AddRelativeInterruptWithOffset
3.19 3.48 0.23 43378651 0.00 0.00
CycInt_RemovePendingInterrupt
2.41 4.54 0.17 50237968 0.00 0.00
CycInt_AcknowledgeInterrupt
1.06 5.37 0.08 50237968 0.00 0.00
CycInt_CallActiveHandler
0.99 5.52 0.07 5552973 0.00 0.00
CycInt_AddRelativeInterrupt
-> only ~14% of emulation is spent in CycInt code
As always with such low level changes, regression might happen. I tested
lots of demos that require precise MFP timers emulation and didn't see
any problem so far. Don't hesitate to test some games/demos you like.
Also as a bonus, this new code allows to set any MFP external clock
value, instead of the usual 2.4576 MHz one (as we know some models had
slightly different clock). There's no option to change it the moment, it
needs to be done in src/clocks_timings.c, but an option will be added later.
Nicolas