Re: [hatari-devel] EXA demo Entracte : Improvement in search

[ Thread Index | Date Index | More lists.tuxfamily.org/hatari-devel Archives ]


Le 05/02/2017 à 22:32, Laurent Sallafranque a écrit :
I don't understand something :

According to hatari debug mode, I can read :

cpu video_cyc=  1034  10@  1 : 0004EB82 4a78 0468                TST.W
$00000468
cpu video_cyc=  1038  14@  1 : 0004EB86 6600 0006                BNE.W
#$0006 == $0004eb8e (F)
cpu video_cyc=  1046  22@  1 : 0004EB8A 5240                     ADD.W
#$00000001,D0
cpu video_cyc=  1048  24@  1 : 0004EB8C 60f4                     BT .B
#$fffffff4 == $0004eb82 (T)

If(s the CPU speed loop of EXA demos.
The code runs into the instr cache.

I've taken the 4th iteration (they all take the same time except the
first one, which seems OK to me).

If I can rely on the debugger, I read 14 cycles for this iteration.


Your count is wrong, 14 doesn't count the bra.s at $4eb8c ; if you count it, you get +6 and total is 20 cycles, which is what I see too and which is 1 cycle too slow on average that on real HW (if we want to get D0 between $4100-$4200)


If I compute the loop by hand, I get :

tst.w     $0468.w        {2,    2,     8,1,0,0,    12,1,2,0}, // TST.W
(xxx).W
bne       $1cd2a         {6,    0,     6,0,0,0,  8,0,1,0},    // Bcc.W
(not taken)
addq.w    #1,d0          {2,    0,     2,0,0,0,  4,0,1,0},    // ADDQ.W
#<data>,Dn
bra.s     $1cd1e         {6,    0,     6,0,0,0, 12,0,2,0},    // Bcc
(taken)

Which is 8+6+2+6-2 cycles = 20 cycles.

(-2 cycles because of the Head/tail -2 between the first and second
instructions)


So, it seems that the loop run faster under hatari than it should, but
D0 value is lower than $4100 (it should be higher, no ?)


See above ; 20 is the theoretical value, but as discussed with Toni, the 68030 has some complex pipeline and it seems some access are taking less time, but the rule is not known yet.








Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/