Re: [hatari-devel] DSP performance

[ Thread Index | Date Index | More lists.tuxfamily.org/hatari-devel Archives ]


Le 30/06/2015 22:57, Laurent Sallafranque a écrit :
I've given a little look at this :

Running the winuae core, falcon mode, cycle exact, no MMU

I've traced a few lines and something seems strange to me :

cpu video_cyc=   158 158@  0 : 0005197E 4a39 0018 105e TST.B $0018105e
p:0370  08f4a0 000001  (06 cyc)  movep #$000001,x:$ffe0
p:0372  0aa980 000370  (06 cyc)  jclr #0,x:$ffe9,p:$0370
p:0370  08f4a0 000001  (06 cyc)  movep #$000001,x:$ffe0
p:0372  0aa980 000370  (06 cyc)  jclr #0,x:$ffe9,p:$0370
p:0370  08f4a0 000001  (06 cyc)  movep #$000001,x:$ffe0
p:0372  0aa980 000370  (06 cyc)  jclr #0,x:$ffe9,p:$0370
p:0370  08f4a0 000001  (06 cyc)  movep #$000001,x:$ffe0
cpu video_cyc=   164 164@  0 : 00051984 6706 BEQ.B #$00000006 ==
$0005198c (F)
p:0372  0aa980 000370  (06 cyc)  jclr #0,x:$ffe9,p:$0370
p:0370  08f4a0 000001  (06 cyc)  movep #$000001,x:$ffe0
cpu video_cyc=   166 166@  0 : 00051986 4eb9 0005 7a18           JSR
$00057a18
p:0372  0aa980 000370  (06 cyc)  jclr #0,x:$ffe9,p:$0370
p:0370  08f4a0 000001  (06 cyc)  movep #$000001,x:$ffe0
p:0372  0aa980 000370  (06 cyc)  jclr #0,x:$ffe9,p:$0370
p:0370  08f4a0 000001  (06 cyc)  movep #$000001,x:$ffe0
p:0372  0aa980 000370  (06 cyc)  jclr #0,x:$ffe9,p:$0370
p:0370  08f4a0 000001  (06 cyc)  movep #$000001,x:$ffe0
p:0372  0aa980 000370  (06 cyc)  jclr #0,x:$ffe9,p:$0370
p:0370  08f4a0 000001  (06 cyc)  movep #$000001,x:$ffe0
p:0372  0aa980 000370  (06 cyc)  jclr #0,x:$ffe9,p:$0370
p:0370  08f4a0 000001  (06 cyc)  movep #$000001,x:$ffe0
p:0372  0aa980 000370  (06 cyc)  jclr #0,x:$ffe9,p:$0370
cpu video_cyc=   174 174@  0 : 00057A18 6000 4b32                BT .W
#$4b32 == $0005c54c (T)



Except if I'm wrong in reading the trace, I can see that

TST.B $0018105e                        is 6 CPU cycles (164-158)
BEQ.B #$00000006 == $0005198c (F)      is 2 CPU cycles (166 - 164)
JSR $00057a18                          is 8 CPU cycles (174 - 166)

But the DSP cycles between each instruction are not twice but 4 to 6
times the cycles of the cpu instruction

Am I wrong somewhere ?


Hi

looking at this, it seems there's indeed a problem. From what I see in DSP_Run, we have :

void DSP_Run(int nHostCycles)
{
#if ENABLE_DSP_EMU
        save_cycles += nHostCycles * 2;
}

So, DSP_Run is already counting twice the CPU cycles.

But when I migrated to recent WinUAE cpu core, I copied the code that was in newcpu.c. And for example in Hatari 1.8, we had in cpu/newcpu.c
     DSP_Run(Cycles_GetCounter(CYCLES_COUNTER_CPU) * 2);
or   DSP_Run(cpu_cycles*2/ CYCLE_UNIT);

So, cpu_cycles was already multiplied by 2 in newcpu.c ? In that case we end up with a x4 factor in the dsp.

And in old cpu core, we have 2 cases :
     DSP_Run( Cycles_GetCounter(CYCLES_COUNTER_CPU) * 2);  (in run_1)
or DSP_Run( Cycles_GetCounter(CYCLES_COUNTER_CPU) ); (in run_2, less accurate)

It seems number of DSP cycles was always too much and wrong ?
At least, a x4 factor matches with this :


cpu video_cyc= 1130 106@ 2 : 0002EBAA 0838 0000 a202 BTST.B #$0000,$ffffa202
p:02fa  228f05         (02 cyc)  cmp b,a r4,b
p:02fb  0e22fa         (04 cyc)  jne p:$02fa
p:02fa  228f05         (02 cyc)  cmp b,a r4,b
p:02fb  0e22fa         (04 cyc)  jne p:$02fa
p:02fa  228f05         (02 cyc)  cmp b,a r4,b
p:02fb  0e22fa         (04 cyc)  jne p:$02fa
p:02fa  228f05         (02 cyc)  cmp b,a r4,b
p:02fb  0e22fa         (04 cyc)  jne p:$02fa
p:02fa  228f05         (02 cyc)  cmp b,a r4,b
p:02fb  0e22fa         (04 cyc)  jne p:$02fa
cpu video_cyc= 1134 110@ 2 : 0002EBB0 67f8 BEQ.B #$fffffff8 == $0002ebaa (T)

btst takes 8 cycles (remember that cpu cycles displayed in disasm are >> depending on cpu freq). This means we should run the DSP for 16 cycles, but here we see it runs for 30 cycle, which is a x4 factor instead of x2 (maybe there's also an error here, why does the dsp stops after 30 and not 32 cycles ?)

Regarding laurent's example, the tst.b takes in fact 12 cycles and we see that the dsp runs during 42 cycles. Here also, it should run for 48 cycles, not 42, but we have the same x4 factor.

Laurent, is there really a bug that used a x4 factor instead of x2, or am I missing sthg inth way dsp emulation runs ? And why 42 cycles are used instead of 48 ?

Nicolas



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/