[no subject]

[ Thread Index | Date Index | More lists.tuxfamily.org/hatari-devel Archives ]


So it doesn't seem to matter that everything hangs off the CPU timing
table. What matters is that they all agree on the CPU timings. This is
where something is going amiss - they don't seem to agree.

Does that make sense?

D


On 26 June 2015 at 14:27, Douglas Little <doug694@xxxxxxxxxxxxxx> wrote:

> Ok thanks for explaining - I didn't understand that the MFP depends on th=
e
> CPU in the same way as the DSP.
>
> Still, I can't explain why it worked so well before, unless somebody
> deliberately did something to make the results look correct in earlier
> versions (other than CPU cycles alone - which were definitely not correct
> in the cases I mentioned). Weird.
>
> Anyway, I'm way out of my depth with this side of Hatari - maybe the
> detailed cause will surface later and all will become clear :)
>
> D
>
>
>
> On 26 June 2015 at 14:13, Nicolas Pomar=C3=A8de <npomarede@xxxxxxxxxxxx> =
wrote:
>
>> Le 26/06/2015 15:03, Douglas Little a =C3=A9crit :
>>
>>> Hi,
>>>
>>>     Maybe timings were right before in CE mode by luck when data cachin=
g
>>>     was not enabled, and now that it's enabled they're not good anymore=
 ?
>>>
>>>     If you compile your own Hatari, could you try to force a "return
>>>     false" in function cancache030() around line 7607 cpu/newcpu.c . Do
>>>     you get better values then wrt DSP speed ?
>>>
>>>
>>> Ok I can give it a try. I was only able to build SDL1 versions in the
>>> past under Cygwin so hopefully that is still possible?
>>>
>>>
>> yes it should, I tested some days ago.
>>
>>
>>> However I suspect d-cache changes will have no meaningful impact, based
>>> on what I can see so far.
>>>
>>> - the code which waits on DSP in the first test case (the game) is a
>>> host-port status spinloop. The cost for these spin instructions was
>>> never accurate vs real HW, and the new timing hasn't changed much from
>>> what I can see. Not by 70% for sure. It's within 10% of previous
>>> versions.
>>>
>>
>> note that MFP works the same as DSP : if 68030 cpu cycles are not
>> correct, then the duration of an MFP timer (if you convert it into a num=
ber
>> of milli seconds) will not be correct either.
>>
>> So, you can't have a reference delay by any mean in the emulated machine
>> if some cycles have too much difference with real HW.
>>
>>
>>> - The code which waits on the DSP in the second test case (DSPBENCH) is
>>> based on MFP events. i.e. the waiting time is dictated by something
>>> other than the CPU. If the CPU cycles costs have increased, it will jus=
t
>>> execute fewer CPU cycles during the test. I *think* this is why DSPBENC=
H
>>> reported correct results previously (IIRC to within a decimal point)
>>> even if the CPU timings were never perfect.
>>>
>>> - The performance gain measured on the DSP side should vary a lot
>>> depending on the CPU side instructions which are running concurrently. =
I
>>> don't see that happening - it's pretty much fixed (maybe some variation=
,
>>> I'm not sure - but it seems to remain close to 70% when calculating
>>> back).
>>>
>>> - The host port status/data registers (which execute in the spinloop,
>>> while timing the DSP) are not data-cacheable. They are volatile-mapped
>>> HW memory. If it was cacheable, the software would lose coherency with
>>> HW and quickly crash. I can't be sure that introducing the d-cache
>>> support is unrelated, but in real terms disabling the cache should have
>>> no effect on that test.
>>>
>>> So taking these into account, I believe the change has something to do
>>> with DSP clocks relative to the MFP or the master timer - and not in
>>> relation to the CPU at all. There are too many clues beginning to point
>>> there I think. The MFP-based timing seems the most concrete of those.
>>>
>>
>> There was no change in MFP lately. Many STF demos rely on precise MFP
>> timings to remove top border, and they still work. If sthg was broken wi=
th
>> MFP for 68030, it would affect 68000 mode too.
>>
>> No idea so far :(  Let's see what you get when compiling the suggested
>> change.
>>
>>
>>
>

--047d7bfcf01e8f82ad05196c2daf
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div><div><div><div><div><div><div><div>Actually this whol=
e thing still bothers me at some level. I can&#39;t quite put my finger on =
it, but something doesn&#39;t add up in my head :-)<br><br></div>If there i=
s one big table that specifies cycle counts for CPU instructions (which see=
ms fair) and everything is relative to that as a master reference=C2=A0 the=
n surely all other divergences would just cancel out? Except for divergence=
 vs wallclock time - but that&#39;s not an issue here.<br><br></div>i.e. if=
 the DSP runs an op which it thinks takes 2 master cycles and the CPU runs =
an op which it thinks takes 8 master cycles, then you can expect 4 DSP ops =
executed in that same time.<br><br></div>If something breaks the CPU op tim=
ing so it takes 16 cycles, the DSP just gets to execute 8 ops instead. The =
CPU got virtually slower. The DSP did not get faster.<br><br></div>From the=
 MFP&#39;s perspective, the CPU got slower also. So MFP events happen more =
often relative to that one CPU instruction. i.e. the MFP and DSP both agree=
 that the CPU got slower.<br><br></div>So it doesn&#39;t seem to matter tha=
t everything hangs off the CPU timing table. What matters is that they all =
agree on the CPU timings. This is where something is going amiss - they don=
&#39;t seem to agree.<br><br></div>Does that make sense?<br></div><br></div=
><div>D</div><div><div><div><div><div><div><div><br></div></div></div></div=
></div></div></div></div><div class=3D"gmail_extra"><br><div class=3D"gmail=
_quote">On 26 June 2015 at 14:27, Douglas Little <span dir=3D"ltr">&lt;<a h=
ref=3D"mailto:doug694@xxxxxxxxxxxxxx"; target=3D"_blank">doug694@googlemail.=
com</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style=3D"mar=
gin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr=
"><div><div>Ok thanks for explaining - I didn&#39;t understand that the MFP=
 depends on the CPU in the same way as the DSP.<br><br></div>Still, I can&#=
39;t explain why it worked so well before, unless somebody deliberately did=
 something to make the results look correct in earlier versions (other than=
 CPU cycles alone - which were definitely not correct in the cases I mentio=
ned). Weird.<br><br></div><div>Anyway, I&#39;m way out of my depth with thi=
s side of Hatari - maybe the detailed cause will surface later and all will=
 become clear :)<br><br></div><div>D<br></div><div><br></div><div><br></div=
></div><div class=3D"gmail_extra"><br><div class=3D"gmail_quote">On 26 June=
 2015 at 14:13, Nicolas Pomar=C3=A8de <span dir=3D"ltr">&lt;<a href=3D"mail=
to:npomarede@xxxxxxxxxxxx" target=3D"_blank">npomarede@xxxxxxxxxxxx</a>&gt;=
</span> wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .=
8ex;border-left:1px #ccc solid;padding-left:1ex">Le 26/06/2015 15:03, Dougl=
as Little a =C3=A9crit :<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">
Hi,<br>
<br>
=C2=A0 =C2=A0 Maybe timings were right before in CE mode by luck when data =
caching<br>
=C2=A0 =C2=A0 was not enabled, and now that it&#39;s enabled they&#39;re no=
t good anymore ?<br>
<br>
=C2=A0 =C2=A0 If you compile your own Hatari, could you try to force a &quo=
t;return<br>
=C2=A0 =C2=A0 false&quot; in function cancache030() around line 7607 cpu/ne=
wcpu.c . Do<br>
=C2=A0 =C2=A0 you get better values then wrt DSP speed ?<br>
<br>
<br>
Ok I can give it a try. I was only able to build SDL1 versions in the<br>
past under Cygwin so hopefully that is still possible?<br>
<br>
</blockquote>
<br>
yes it should, I tested some days ago.<br>
<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">
<br>
However I suspect d-cache changes will have no meaningful impact, based<br>
on what I can see so far.<br>
<br>
- the code which waits on DSP in the first test case (the game) is a<br>
host-port status spinloop. The cost for these spin instructions was<br>
never accurate vs real HW, and the new timing hasn&#39;t changed much from<=
br>
what I can see. Not by 70% for sure. It&#39;s within 10% of previous versio=
ns.<br>
</blockquote>
<br>
note that MFP works the same as DSP : if 68030 cpu cycles are not correct, =
then the duration of an MFP timer (if you convert it into a number of milli=
 seconds) will not be correct either.<br>
<br>
So, you can&#39;t have a reference delay by any mean in the emulated machin=
e if some cycles have too much difference with real HW.<br>
<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">
<br>
- The code which waits on the DSP in the second test case (DSPBENCH) is<br>
based on MFP events. i.e. the waiting time is dictated by something<br>
other than the CPU. If the CPU cycles costs have increased, it will just<br=
>
execute fewer CPU cycles during the test. I *think* this is why DSPBENCH<br=
>
reported correct results previously (IIRC to within a decimal point)<br>
even if the CPU timings were never perfect.<br>
<br>
- The performance gain measured on the DSP side should vary a lot<br>
depending on the CPU side instructions which are running concurrently. I<br=
>
don&#39;t see that happening - it&#39;s pretty much fixed (maybe some varia=
tion,<br>
I&#39;m not sure - but it seems to remain close to 70% when calculating bac=
k).<br>
<br>
- The host port status/data registers (which execute in the spinloop,<br>
while timing the DSP) are not data-cacheable. They are volatile-mapped<br>
HW memory. If it was cacheable, the software would lose coherency with<br>
HW and quickly crash. I can&#39;t be sure that introducing the d-cache<br>
support is unrelated, but in real terms disabling the cache should have<br>
no effect on that test.<br>
<br>
So taking these into account, I believe the change has something to do<br>
with DSP clocks relative to the MFP or the master timer - and not in<br>
relation to the CPU at all. There are too many clues beginning to point<br>
there I think. The MFP-based timing seems the most concrete of those.<br>
</blockquote>
<br>
There was no change in MFP lately. Many STF demos rely on precise MFP timin=
gs to remove top border, and they still work. If sthg was broken with MFP f=
or 68030, it would affect 68000 mode too.<br>
<br>
No idea so far :(=C2=A0 Let&#39;s see what you get when compiling the sugge=
sted change.<br>
<br>
<br>
</blockquote></div><br></div>
</blockquote></div><br></div>

--047d7bfcf01e8f82ad05196c2daf--



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/