Re: [hatari-devel] DSP performance

[ Thread Index | Date Index | More lists.tuxfamily.org/hatari-devel Archives ]


Hi,

On perjantai 26 kesäkuu 2015, Douglas Little wrote:
> Ok thanks for explaining - I didn't understand that the MFP depends on
> the CPU in the same way as the DSP.
> 
> Still, I can't explain why it worked so well before, unless somebody
> deliberately did something to make the results look correct in earlier
> versions (other than CPU cycles alone - which were definitely not correct
> in the cases I mentioned). Weird.

I think there *is* rather large change in how cycles are handled
in WinUAE CPU core.

AFAIK earlier 030 versions used hand-coded, static, per-instruction
cycle counts added by Laurent, after he found that WinUAE cycles were
not matching what's expected.  Now that Nicolas updated to much newer
WinUAE version, Hatari relies on WinUAE CPU core itself calculating
the cycles correctly.

This means e.g. that cycles counts can differ a lot when executing
exactly the same instruction, depending on caches etc.

Nicolas, can you confirm this?

<speculation>
I think that because Hatari doesn't take all bus activity / peripheral
delays into account, this means that sometimes cycles can be
unrealistically short.

Douglas, could that have unexpected effects on the DSP code behavior?
<speculation>

OldUAE CPU core still uses the static cycles count approach,
so you could try building and timing DSP for that too.  If
that gives correct results, we know that issue is in CPU core
itself, not MFP / DSP side.


	 - Eero


> Anyway, I'm way out of my depth with this side of Hatari - maybe the
> detailed cause will surface later and all will become clear :)
> 
> D
> 
> On 26 June 2015 at 14:13, Nicolas Pomarède <npomarede@xxxxxxxxxxxx> wrote:
> > Le 26/06/2015 15:03, Douglas Little a écrit :
> >> Hi,
> >> 
> >>     Maybe timings were right before in CE mode by luck when data
> >>     caching was not enabled, and now that it's enabled they're not
> >>     good anymore ?
> >>     
> >>     If you compile your own Hatari, could you try to force a "return
> >>     false" in function cancache030() around line 7607 cpu/newcpu.c .
> >>     Do you get better values then wrt DSP speed ?
> >> 
> >> Ok I can give it a try. I was only able to build SDL1 versions in the
> >> past under Cygwin so hopefully that is still possible?
> > 
> > yes it should, I tested some days ago.
> > 
> >> However I suspect d-cache changes will have no meaningful impact,
> >> based on what I can see so far.
> >> 
> >> - the code which waits on DSP in the first test case (the game) is a
> >> host-port status spinloop. The cost for these spin instructions was
> >> never accurate vs real HW, and the new timing hasn't changed much from
> >> what I can see. Not by 70% for sure. It's within 10% of previous
> >> versions.
> > 
> > note that MFP works the same as DSP : if 68030 cpu cycles are not
> > correct, then the duration of an MFP timer (if you convert it into a
> > number of milli seconds) will not be correct either.
> > 
> > So, you can't have a reference delay by any mean in the emulated
> > machine if some cycles have too much difference with real HW.
> > 
> >> - The code which waits on the DSP in the second test case (DSPBENCH)
> >> is based on MFP events. i.e. the waiting time is dictated by
> >> something other than the CPU. If the CPU cycles costs have increased,
> >> it will just execute fewer CPU cycles during the test. I *think* this
> >> is why DSPBENCH reported correct results previously (IIRC to within a
> >> decimal point) even if the CPU timings were never perfect.
> >> 
> >> - The performance gain measured on the DSP side should vary a lot
> >> depending on the CPU side instructions which are running concurrently.
> >> I don't see that happening - it's pretty much fixed (maybe some
> >> variation, I'm not sure - but it seems to remain close to 70% when
> >> calculating back).
> >> 
> >> - The host port status/data registers (which execute in the spinloop,
> >> while timing the DSP) are not data-cacheable. They are volatile-mapped
> >> HW memory. If it was cacheable, the software would lose coherency with
> >> HW and quickly crash. I can't be sure that introducing the d-cache
> >> support is unrelated, but in real terms disabling the cache should
> >> have no effect on that test.
> >> 
> >> So taking these into account, I believe the change has something to do
> >> with DSP clocks relative to the MFP or the master timer - and not in
> >> relation to the CPU at all. There are too many clues beginning to
> >> point there I think. The MFP-based timing seems the most concrete of
> >> those.
> > 
> > There was no change in MFP lately. Many STF demos rely on precise MFP
> > timings to remove top border, and they still work. If sthg was broken
> > with MFP for 68030, it would affect 68000 mode too.
> > 
> > No idea so far :(  Let's see what you get when compiling the suggested
> > change.




Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/