[ Thread Index |
Date Index
| More lists.tuxfamily.org/hatari-devel Archives
]
There may be a bug somewhere in there, but a valid 'middle case' may exist
which I didn't think of before today, and may still be 'real'.
That is - if the loop fits exactly in the cache, and the CPU reaches the
final 'dbra', does it prefetch beyond the dbra, always assuming contiguous
execution? Or does it wait until the condition is known, and therefore
holding up the CPU? There is no branch prediction so it should be one or
the other (and I don't know of any case where a condition forces a stall).
It would make sense to prefetch another longword beyond the loop
(invalidating one valid entry from earlier in the loop) and then discarding
it, forcing a refetch from the loop head shortly afterwards. This is in
fact what Hatari's profiler shows - misses at the loop head only, not
anywhere inside the loop. Just the first 4 bytes get thrashed.
However even if this is the case there must be something else strange
happening, because a 252-byte loop should 'fit' using this model, but
Hatari tests show it doesn't (250 bytes or less works ok - confirmed now
with 2 separate bits of code).
Curious.
D.
On 11 July 2013 22:38, Miro Krop=C3=A1=C4=8Dek <miro.kropacek@xxxxxxxxx> wr=
ote:
> I can't say whether it's a bug or not on Amiga but it's incorrect in Atar=
i
> because we don't have any burst mode in the first place :) So if the code
> is smaller than 256 bytes and it's aligned on 4 bytes, it must fit in the
> cache else it's a bug.
>
>
> On Fri, Jul 12, 2013 at 5:35 AM, Eero Tamminen <oak@xxxxxxxxxxxxxx> wrote=
:
>
>> Hi,
>>
>> There's interesting discussion between Douglas and Cyprian
>> on 030 caching & prefetching on the Bad Mood Atari-forum
>> thread:
>> http://www.atari-forum.com/viewtopic.php?f=3D68&t=3D24561&p=3D234100#p23=
4100
>>
>> There's a potential bug on WinUAE code used to emulate
>> that in Hatari. According to Douglas it has had rewrites
>> in more recent WinUAE code.
>>
>> Laurent, any comments on that? :-)
>>
>>
>> - Eero
>>
>>
>>
>>
>
>
> --
> MiKRO / Mystic Bytes
> http://mikro.atari.org
>
--047d7b41bb0ee67b7d04e14436c3
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
<div dir=3D"ltr">From what I could see in the UAE newcpu source, burst mode=
implementation has a big #if (0) around it, so there may be no burst suppo=
rt for i-cache on Amiga either :)<div><br></div><div>There may be a bug som=
ewhere in there, but a valid 'middle case' may exist which I didn&#=
39;t think of before today, and may still be 'real'.</div>
<div><br></div><div><br></div><div>That is - if the loop fits exactly in th=
e cache, and the CPU reaches the final 'dbra', does it prefetch bey=
ond the dbra, always assuming contiguous execution? Or does it wait until t=
he condition is known, and therefore holding up the CPU? There is no branch=
prediction so it should be one or the other (and I don't know of any c=
ase where a condition forces a stall).</div>
<div><br></div><div>It would make sense to prefetch another longword beyond=
the loop (invalidating one valid entry from earlier in the loop) and then =
discarding it, forcing a refetch from the loop head shortly afterwards. Thi=
s is in fact what Hatari's profiler shows - misses at the loop head onl=
y, not anywhere inside the loop. Just the first 4 bytes get thrashed.<br>
<div class=3D"gmail_extra"><br></div><div class=3D"gmail_extra">However eve=
n if this is the case there must be something else strange happening, becau=
se a 252-byte loop should 'fit' using this model, but Hatari tests =
show it doesn't (250 bytes or less works ok - confirmed now with 2 sepa=
rate bits of code).</div>
<div class=3D"gmail_extra"><br></div><div class=3D"gmail_extra">Curious.</d=
iv><div class=3D"gmail_extra"><br></div><div class=3D"gmail_extra">D.<br><b=
r><div class=3D"gmail_quote">On 11 July 2013 22:38, Miro Krop=C3=A1=C4=8Dek=
<span dir=3D"ltr"><<a href=3D"mailto:miro.kropacek@xxxxxxxxx" target=3D=
"_blank">miro.kropacek@xxxxxxxxx</a>></span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr">I can't say whether it&=
#39;s a bug or not on Amiga but it's incorrect in Atari because we don&=
#39;t have any burst mode in the first place :) So if the code is smaller t=
han 256 bytes and it's aligned on 4 bytes, it must fit in the cache els=
e it's a bug.<br>
<div class=3D"gmail_extra"><br><br><div class=3D"gmail_quote">On Fri, Jul 1=
2, 2013 at 5:35 AM, Eero Tamminen <span dir=3D"ltr"><<a href=3D"mailto:o=
ak@xxxxxxxxxxxxxx" target=3D"_blank">oak@xxxxxxxxxxxxxx</a>></span> wrot=
e:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">Hi,<br>
<br>
There's interesting discussion between Douglas and Cyprian<br>
on 030 caching & prefetching on the Bad Mood Atari-forum<br>
thread:<br>
<a href=3D"http://www.atari-forum.com/viewtopic.php?f=3D68&t=3D24561&am=
p;p=3D234100#p234100" target=3D"_blank">http://www.atari-forum.com/viewtopi=
c.php?f=3D68&t=3D24561&p=3D234100#p234100</a><br>
<br>
There's a potential bug on WinUAE code used to emulate<br>
that in Hatari. =C2=A0According to Douglas it has had rewrites<br>
in more recent WinUAE code.<br>
<br>
Laurent, any comments on that? =C2=A0:-)<br>
<span><font color=3D"#888888"><br>
<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 - Eero<br>
<br>
<br>
<br><span><font color=3D"#888888">
</font></span></font></span></blockquote></div><span><font color=3D"#888888=
"><br><br clear=3D"all"><div><br></div>-- <br>MiKRO / Mystic Bytes<br><a hr=
ef=3D"http://mikro.atari.org" target=3D"_blank">http://mikro.atari.org</a>
</font></span></div></div>
</blockquote></div><br></div></div></div>
--047d7b41bb0ee67b7d04e14436c3--