Re: [hatari-devel] Improved internal timers performances in cycInt.c |
[ Thread Index |
Date Index
| More lists.tuxfamily.org/hatari-devel Archives
]
- To: hatari-devel@xxxxxxxxxxxxxxxxxxx
- Subject: Re: [hatari-devel] Improved internal timers performances in cycInt.c
- From: Uwe Seimet <Uwe.Seimet@xxxxxxxxx>
- Date: Sun, 5 Dec 2021 19:13:13 +0100
- Authentication-results: strato.com; dkim=none
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; t=1638727993; s=strato-dkim-0002; d=seimet.de; h=In-Reply-To:References:Message-ID:Subject:To:From:Date:Cc:Date:From: Subject:Sender; bh=QW7PTDZ07d2bJNtxxoE/1w9+KINsNqGMBonEKYFZX6E=; b=Sk9TdYfQcWwpIEnT7sV3VV+I1uUIsWna4dOU/a1tQYQgFFkBWP3bd8g8zS+BUdeUsd zPR1A1dsFU8TpaUrd+MlgriQpF6T26ptOC1X2FVN0lEJ9vX8dHHPEWOxBNRxfScsLKn1 8OeWL/b/1dwknG91ORWAbyp9BTnPQbC8B0w2jptUhUVVEpuoXkEhkBUmGLegdAbBDjNB I3rHd8fKHXPFuV6e9zyKQUbX4pVTi2pPPEsSzyfpWJcz9bmfNGgsYBJAAFSedbW6Fh+K 9NSUXV7VAKbYdv2PSEP0gjz0pKchfSusaC9tW8vgDjieSVA5Z9nxo2Jlb2j/i0quXffD Mmog==
Hi,
After recompiling with your changes hatari terminates with a segmentation
fault when I start it from the command line and then terminate hatari with
Ctrl-C. I'm quite sure that before there was no segmentation fault. Platform
is Linux 64 bit.
Take care
Uwe
> Hi
>
> despite not too many spare time at the moment, I finally complete a
> rewrite of cycInt.c that should give better performance, and sometimes
> huge boost in emulation speed.
>
> current code in cycInt.c does several things :
> 1) after each cpu instruction, check if an internal interrupt should
> be processed
> 2) call the corresponding handler + reorder all interrupts after that
> 3) add/remove some timers and reorder everything.
>
> to do this, cycInt.c stores delay in cycles before next timer happens.
> This means that each time a timer happens, we must correct the relative
> delay for all other timers.
>
> Instead of storing relative delay, new code now uses the global cycle
> counter and stores absolute cycle of each timer. This means that when
> you reorder you don't have to update the InterruptHandlers[].Cycles values.
>
> Also the new code stores a list of active interrupts with a
> double-linked list (next/prev members) in 'Cycles' ascending order. This
> means that when an interrupt happens you can immediately get the next
> active interrupt (using 'next' member) and you don't need to reorder
> anything.
>
> And when you add/remove an interrupt, you just need to walk through the
> list of active interrupts (instead of checking all possible interrupts
> as current code does).
>
> All in all, this can give big speedup when :
>
> - an interrupt happens very often at high frequency (eg : timer D at
> boot on some STF/STE TOS)
>
> - we can add many more interrupt sources (for example for scsi or
> other harddriver HW as this was discussed some times ago) without any
> impact on the emulation speed as long as those interrupts remain disable
> (which is not the case with current code where CycInt_SetNewInterrupt
> and CycInt_UpdateInterrupt always check all the interrupts, even not
> active ones ; so the more the list grows, the slower it gets)
>
> I made some measures to show the improvements ; as written above, this
> will mainly depend on the use of high freq timers by the running program
> ; but even with lower freq timers the new code will always perform faster.
>
> Tests are made with 'patch timer D' not applied (this is the default
> setting) , running in benchmark mode with audio/video disabled.
>
> hatari --machine ste --tos tos162fr.img --benchmark --sound off
> --disable-video 1 --run-vbls 8000
>
> values are in emulated frames/sec for old code and new code
>
> hatari_21.msa 499 792 +58%
>
> [Inner Circle]-Decade Demo (patched).st 562 895 +59%
>
> [Oxygene]-Nostalgic-O-Demo (STNICCC 2000 Edition).msa 489 711 +45%
>
> gem desktop idle (patch timer d off) 575 854 +48%
>
> gem desktop idle (patch timer d on) 1135 1235 +9%
>
> UnionDemo.stx 1162 1267 +9%
>
> For the 3 first demos, we see a boost of 45-60%, because those demos
> don't stop the "buggy" timer D set at boot by tos.
>
> Same for gem dektop when timer D is not disabled.
>
> When timer D is disabled, we see the gain is only 9-10%, which is not
> bad anyway (on boot Union Demo really stops timer D)
>
>
> Using gmon profiler, we get confirmation of the gain :
>
> old code, gem desktop idle (patch timer d off)
>
> % cumulative self self total
> time seconds seconds calls s/call s/call name
> 23.18 2.70 2.70 143858790 0.00 0.00
> CycInt_SetNewInterrupt
> 18.07 4.81 2.11 143858788 0.00 0.00
> CycInt_UpdateInterrupt
> 1.72 8.07 0.20 44701327 0.00 0.00
> CycInt_AddRelativeInterruptWithOffset
> 0.86 9.59 0.10 43365456 0.00 0.00
> CycInt_RemovePendingInterrupt
> 0.43 10.09 0.05 50239066 0.00 0.00
> CycInt_AcknowledgeInterrupt
> 0.00 11.65 0.00 5552941 0.00 0.00
> CycInt_AddRelativeInterrupt
>
> -> ~44% of emulation is spent in CycInt code
>
>
> new code, gem desktop idle (patch timer d off)
>
> % cumulative self self total
> time seconds seconds calls s/call s/call name
> 6.24 1.00 0.44 44700213 0.00 0.00
> CycInt_AddRelativeInterruptWithOffset
> 3.19 3.48 0.23 43378651 0.00 0.00
> CycInt_RemovePendingInterrupt
> 2.41 4.54 0.17 50237968 0.00 0.00
> CycInt_AcknowledgeInterrupt
> 1.06 5.37 0.08 50237968 0.00 0.00
> CycInt_CallActiveHandler
> 0.99 5.52 0.07 5552973 0.00 0.00
> CycInt_AddRelativeInterrupt
>
> -> only ~14% of emulation is spent in CycInt code
>
>
> As always with such low level changes, regression might happen. I tested
> lots of demos that require precise MFP timers emulation and didn't see
> any problem so far. Don't hesitate to test some games/demos you like.
>
>
> Also as a bonus, this new code allows to set any MFP external clock
> value, instead of the usual 2.4576 MHz one (as we know some models had
> slightly different clock). There's no option to change it the moment, it
> needs to be done in src/clocks_timings.c, but an option will be added later.
>
>
> Nicolas
>
>