Re: [hatari-devel] DSP Mandelbrot bug |
[ Thread Index | Date Index | More lists.tuxfamily.org/hatari-devel Archives ]
Douglas and Laurent, i'm very sorry! Obviously the hint to test on real hardware was the solution ...douglas, I like the optimizations, although i don't understand them ;-)I hope you did not invest too much time trying to find that none existing bug (i did).You might like the explanation from a NeXT insider:"The generation of "Mandelbrot" figures is floating point intensive. In particular, the fine details that appear in the renderings are closely related to the maximum precision of the floating point calculations as the function is iterated many, many times.The FPU has 80 bit registers (64 bit mantissa, sign bit, and 15 bit signed exponent)The 56K DSP is a fixed point processor with 24 bit data words and registers that can be ganged for a 48 bit word, and 56 bit accumulators.The different levels of precision produce different results when used to solve heavily iterative math. That shows up in the displayed results. (And yes, Prof. Crandall had to explain this to Mr. Jobs...)"Am 23.11.2015 um 11:37 schrieb Douglas Little <doug694@xxxxxxxxxxxxxx>:...but fully prepared to admit to this being an imperfect hack for purists, because configured like this it will steal the very tip of the set boundary at one specific point, at the closest possible zoomlevels. For realtime hackery though it's good :)Andreas.... another small optimization for the NeXT :)You can use the knowledge that a) the first iteration won't escape, and b) that the beginning of the second iteration already gives a clue to where in the set you are (e.g. inside the trap zone) so you can perform a quick bounding test and eliminate (total-1) iterations for maybe 15% of the interior of the set - which is where most of the wasted iterations go.If you're determined you can unroll the whole thing and find an arbitrary bound at each iteration but I won't bother here.In any case I've thrashed this to death by now - well offtopic - so I'll leave it here!
; bounding iteration #1
mpy y0,y0,b a,x0
mac x0,x0,b
mpy y0,x0,b y1,a
asl b
addl a,b
mpy -y0,y0,a b,y0
mac x0,x0,a x1,b
addl b,a
; bounding iteration #2
mpy y0,y0,b a,x0
mac x0,x0,b #<$2,a
cmp a,b
jlt <_iters_bound ; 99.999% certain we're trapped! shortcut!
mpy y0,x0,b y1,a
asl b
addl a,b
mpy -y0,y0,a b,y0
mac x0,x0,a x1,b
addl b,a
do #mand_iters-2,_iters
mpy y0,y0,b a,x0
mac x0,x0,b
jes <_stop
mpy y0,x0,b y1,a
asl b
addl a,b
mpy -y0,y0,a b,y0
mac x0,x0,a x1,b
addl b,a_iters:DOn 23 November 2015 at 09:15, Douglas Little <doug694@xxxxxxxxxxxxxx> wrote:Doug.TBH I have myself seen an odd bug in my own code when doing calculations from LC remainder. And it doesn't involve mandelbrot - something else.Of course I don't have the setup code for the NeXT routine, only the inner part. So it may be something to do with initial conditions, or what happens after escaping (calculation from LC).From what I could see, there were actually no bad pixels in the output.. This test was run inside Hatari (i.e. not on real HW - didn't bother trying that since I didn't see a problem).Hi,I was able to run a test of the NeXT mandelbrot routine by tidying up the debug log and extracting the inner loop instructions (which my last post was based on), before trying to optimize it.
I have not figured out what that issue is yet - assuming for now its just a bug in my code, but there's a small chance these things are related? I'll need to check that bug on real HW soon - just haven't had time.On 22 November 2015 at 22:29, Andreas Grabher <andreas.grabher@xxxxxxxxxxxx> wrote:Laurent and Douglas, you are of course right. This should be checked on real hardware. I try to find someone to test it for me. But until then, given the sense for perfection of all NeXT stuff, i think it is quite unlikely they shipped it with that obvious bug. There are also obvious issues (black line in the middle), if the default parameters are used.I'll report back as soon as is have information from real hardware.Laurent, i tried to answer your questions below.Douglas, your optimizations are interesting. I think NeXT would have been interested in them, because the main purpose of the Mandelbrot demo was to demonstrate the performance advantage of the DSP over the CPU for certain tasks ;-)I'll have a look if i can isolate the complete DSP application. It seems to be embedded into the binary. Maybe i'll just print if from the DSP RAM ...Am 22..11.2015 um 12:06 schrieb Laurent Sallafranque <laurent.sallafranque@xxxxxxx>:I think this is all correct. It gets back to 00:7... later just before the jec.Hi Andreas,
First, are you sure the resulting bad pixels don't appear on the real computer too ?
(just to be sure).
When I have a closer look at the trace, I can read :
; Previous loop
p:0099 0af0a5 0000a1 (06 cyc) jec p:$00a1
; New loop that seems to bug
p:00a1 2000d8 (02 cyc) mpy +y0,x0,b
Reg: b $00:4a0b9a:44a2c4 -> $00:2355a0:579062
Reg: sr $8040 -> $8050
p:00a2 20003a (02 cyc) asl b
Reg: b $00:2355a0:579062 -> $00:46ab40:af20c4
Reg: sr $8050 -> $8040
; Is the next one correct ? (we have b = $00:8... ) ?y1 is 0x2c28f8, it gets set up at the beginning and does not get changed during the calculation.
p:00a3 20003a (02 cyc) asl b
Reg: b $00:46ab40:af20c4 -> $00:8d5681:5e4188
Reg: sr $8040 -> $8060
; What worth Y1 value just below ? (I haven't found it in the trace)
; The problem may be hereI think that overflow is normal behavior. But i'm not sure about it.
p:00a4 200078 (02 cyc) add y1,b
Reg: b $00:8d5681:5e4188 -> $00:ae4c44:5e4188
; Just below, we've got the "overflow" (7fffff) (into y0)
; The problem seem to be that the program copy $00:ae4c44:5e4188 into y0 and set it to $7fffffThe problem seems to that unlike with the "good" pixels, a is too small to get to 00:8...
p:00a5 21e696 (02 cyc) mac -y0,y0,a b,y0
Reg: y0 $4e71df -> $7fffff
Reg: a $00:19f86d:2f6242 -> $ff:e9e540:1a21c0
Reg: sr $8060 -> $8058
p:00a6 200032 (02 cyc) asl a
Reg: a $ff:e9e540:1a21c0 -> $ff:d3ca80:344380
Reg: sr $8058 -> $8059
p:00a7 200060 (02 cyc) add x1,a
Reg: a $ff:d3ca80:344380 -> $ff:fff376:344380
Reg: sr $8059 -> $8058
p:0096 21c498 (02 cyc) mpy +y0,y0,b a,x0
Reg: x0 $39a7ef -> $fff376
Reg: b $00:ae4c44:5e4188 -> $00:7ffffe:000002
Reg: sr $8058 -> $8040
p:0097 200080 (02 cyc) mpy +x0,x0,a
Reg: a $ff:fff376:344380 -> $00:000001:3a74c8
Reg: sr $8040 -> $8050
p:0098 200018 (02 cyc) add a,b
Reg: b $00:7ffffe:000002 -> $00:7fffff:3a74ca <-- Here KO ?Reg: sr $8050 -> $8040
p:0099 0af0a5 0000a1 (06 cyc) jec p:$00a1
Laurent
Le 22/11/2015 10:24, Andreas Grabher a écrit :
Update:
Looking at the values of b at time of final jec it seems that it might also be some kind of rounding issue:
00:800d9f:248aca <--- 3 pixels before00:80064e:53e34a00:8001b2:34bbf400:7fffff:3a74ca <--- bad pixel00:80016c:07e92200:800631:562b0400:800e89:ff27ca <--- 3 pixels after
Anfang der weitergeleiteten Nachricht:
Von: Andreas Grabher <andreas.grabher@xxxxxxxxxxxxx>
Betreff: [hatari-devel] DSP Mandelbrot bug
Datum: 22. November 2015 09:53:17 MEZ
Antwort an: hatari-devel@xxxxxxxxxxxxxxxxxxx
Hello Hatari Community,
i am experiencing a hard to find DSP bug here with Previous. Luckily it can be made "visible" using NeXTstep's included Mandelbrot demo. It might also be responsible for some distorted audio in other applications.
I appended a screenshot of the mandelbrot application where the effect of the bug is clearly visible. I pointed to one failing pixel. I also appended some debugging output containing the calculation of the failing pixel and one pixel before and one after the failing one. The last file i appended contains an overview about the variables during calculation to get a better overview.
Short overview on the calculation:
Every pixel is calculated separately. The visible pixel color is derived from the remaining loop count of some calculation. The higher the remaining count, the lower is the output value of the function. The loop exits using a jec instruction (check extension bit, exit if false).
For the "good" pixels it exits after the second run, because the upper 9 bits of b are no longer all 0. For the "bad" pixel it does not exit, because these bits are still all 0. The most suspect part of the calculation seems to be mpy +x0,x0,a at p:0097. The value of a after the third call of that instruction seems to not fit into the pattern.
Can someone with more DSP experience see the bug? It might or might not be in dsp_mul56.
Any help is greatly appreciated!
Andreas
Mail converted by MHonArc 2.6.19+ | http://listengine.tuxfamily.org/ |