Re: [hatari-devel] Improved internal timers performances in cycInt.c

[ Thread Index | Date Index | More lists.tuxfamily.org/hatari-devel Archives ]


Hi,

After recompiling with your changes hatari terminates with a segmentation
fault when I start it from the command line and then terminate hatari with
Ctrl-C. I'm quite sure that before there was no segmentation fault. Platform
is Linux 64 bit.

Take care

Uwe

> Hi
> 
> despite not too many spare time at the moment, I finally complete a 
> rewrite of cycInt.c that should give better performance, and sometimes 
> huge boost in emulation speed.
> 
> current code in cycInt.c does several things :
>   1) after each cpu instruction, check if an internal interrupt should 
> be processed
>   2) call the corresponding handler + reorder all interrupts after that
>   3) add/remove some timers and reorder everything.
> 
> to do this, cycInt.c stores delay in cycles before next timer happens. 
> This means that each time a timer happens, we must correct the relative 
> delay for all other timers.
> 
> Instead of storing relative delay, new code now uses the global cycle 
> counter and stores absolute cycle of each timer. This means that when 
> you reorder you don't have to update the InterruptHandlers[].Cycles values.
> 
> Also the new code stores a list of active interrupts with a 
> double-linked list (next/prev members) in 'Cycles' ascending order. This 
> means that when an interrupt happens you can immediately get the next 
> active interrupt (using 'next' member) and you don't need to reorder 
> anything.
> 
> And when you add/remove an interrupt, you just need to walk through the 
> list of active interrupts (instead of checking all possible interrupts 
> as current code does).
> 
> All in all, this can give big speedup when :
> 
>   - an interrupt happens very often at high frequency (eg : timer D at 
> boot on some STF/STE TOS)
> 
>   - we can add many more interrupt sources (for example for scsi or 
> other harddriver HW as this was discussed some times ago) without any 
> impact on the emulation speed as long as those interrupts remain disable 
> (which is not the case with current code where CycInt_SetNewInterrupt 
> and CycInt_UpdateInterrupt always check all the interrupts, even not 
> active ones ; so the more the list grows, the slower it gets)
> 
> I made some measures to show the improvements ; as written above, this 
> will mainly depend on the use of high freq timers by the running program 
> ; but even with lower freq timers the new code will always perform faster.
> 
> Tests are made with 'patch timer D' not applied (this is the default 
> setting) , running  in benchmark mode with audio/video disabled.
> 
> hatari --machine ste --tos tos162fr.img --benchmark --sound off 
> --disable-video 1 --run-vbls 8000
> 
> values are in emulated frames/sec for old code and new code
> 
> hatari_21.msa 						499		792	+58%
> 
> [Inner Circle]-Decade Demo (patched).st			562		895	+59%
> 
> [Oxygene]-Nostalgic-O-Demo (STNICCC 2000 Edition).msa	489		711	+45%
> 
> gem desktop idle (patch timer d off)			575		854	+48%
> 
> gem desktop idle (patch timer d on)			1135		1235	+9%
> 
> UnionDemo.stx						1162		1267	+9%
> 
> For the 3 first demos, we see a boost of 45-60%, because those demos 
> don't stop the "buggy" timer D set at boot by tos.
> 
> Same for gem dektop when timer D is not disabled.
> 
> When timer D is disabled, we see the gain is only 9-10%, which is not 
> bad anyway (on boot Union Demo really stops timer D)
> 
> 
> Using gmon profiler, we get confirmation of the gain :
> 
> old code, gem desktop idle (patch timer d off)
> 
>    %   cumulative   self              self     total
>   time   seconds   seconds    calls   s/call   s/call  name
>   23.18      2.70     2.70 143858790     0.00     0.00 
> CycInt_SetNewInterrupt
>   18.07      4.81     2.11 143858788     0.00     0.00 
> CycInt_UpdateInterrupt
>    1.72      8.07     0.20 44701327     0.00     0.00 
> CycInt_AddRelativeInterruptWithOffset
>    0.86      9.59     0.10 43365456     0.00     0.00 
> CycInt_RemovePendingInterrupt
>    0.43     10.09     0.05 50239066     0.00     0.00 
> CycInt_AcknowledgeInterrupt
>    0.00     11.65     0.00  5552941     0.00     0.00 
> CycInt_AddRelativeInterrupt
> 
> ->  ~44% of emulation is spent in CycInt code
> 
> 
> new code, gem desktop idle (patch timer d off)
> 
>      %   cumulative   self              self     total
>   time   seconds   seconds    calls   s/call   s/call  name
>    6.24      1.00     0.44 44700213     0.00     0.00 
> CycInt_AddRelativeInterruptWithOffset
>    3.19      3.48     0.23 43378651     0.00     0.00 
> CycInt_RemovePendingInterrupt
>    2.41      4.54     0.17 50237968     0.00     0.00 
> CycInt_AcknowledgeInterrupt
>    1.06      5.37     0.08 50237968     0.00     0.00 
> CycInt_CallActiveHandler
>    0.99      5.52     0.07  5552973     0.00     0.00 
> CycInt_AddRelativeInterrupt
> 
> ->  only ~14% of emulation is spent in CycInt code
> 
> 
> As always with such low level changes, regression might happen. I tested 
> lots of demos that require precise MFP timers emulation and didn't see 
> any problem so far. Don't hesitate to test some games/demos you like.
> 
> 
> Also as a bonus, this new code allows to set any MFP external clock 
> value, instead of the usual 2.4576 MHz one (as we know some models had 
> slightly different clock). There's no option to change it the moment, it 
> needs to be done in src/clocks_timings.c, but an option will be added later.
> 
> 
> Nicolas
> 
> 



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/