Re: [hatari-devel] DSP performance

[ Thread Index | Date Index | More lists.tuxfamily.org/hatari-devel Archives ]


The whole mail if this can help to remeber what was exchanged between me and Mikro on this list :

The thread was from 2011 and was called : "Re: [hatari-devel] Long mail : new cpu cores + 2 questions."


NOP (2 or 4 cycles) ?
68030 UM: 2(0/0/0) 2(0/1/0)

(total number of cycles) - (number of bus activity cycles) = (number of internal cycles)

Cache case: 2 - 0*2 = 2
Non-cache case: 2 - 1*2 = 0

Once we have this number (internal cycles), we can re-calculate the bus activity (4 clock cycles for 1 bus cycle plus split every instruction for byte+word and long access, since these numbers differ on Falcon 16-bit bus):

Cache case: 2 + 0*4 = 2 cycles (no prefetch, no data, the number of cycles stays)
Non-cache case: 0 + 1*4 = 4 cycles (instruction prefetch = 1 bus access),

 
MOVE.W D0,D1
MOVE.L D0,D1
  2(0/0/0) 2(0/1/0) (MOVE Rn,Dn)
Same as NOP, no bus access for data
 
MOVE.W D0,(A0)
MOVE.L D0,(A0)
3(0/0/1) 4(0/1/1)

Falcon numbers for the word variant:
Cache: 1 + 1*4 = 5
Non-cache: 0 + 1*4 + 1*4 = 8
 
Falcon numbers for the long variant:
Cache: 1 + 2*4 = 9
Non-cache: 0 + 1*4 (instruction prefetch) + 2*4 (long write) = 12

 
MOVE.W (A0), D0
MOVE.L (A0), D0
I'm lazy, sorry :) But it's about the same, again and again. We'are "lucky" the data cache is write-trough, no need to worry about writes, they always take the same number of cycles.

Another nice thing about timing is you don't need to worry about misaligned longs, they are always "misaligned", because you read them in two bus cycles.






Le 30/06/2015 23:24, Laurent Sallafranque a écrit :
Hi Nicolas,

That's where I don't agree.
I've recomputed the whole table according to Mikro's explanation, and my static table contained ATARI 68030 cycles (16 bit bus) (and not AMIGA ones with 32 bit bus).

The NOP is given into the documentation like this :

/*903 */    {0,    0,     2,0,0,0,     2,0,1,0},    // NOP.L


So, in instruction cache mode, NOP is 0 head, 0 tail and 2 cycles)

But in non cached mode, there's one access to the bus, so the cycles taken by the instruction is 4 cycles, not 2 (I'll try to find again the rule behing this).
That's why I recomputed the whole table by hand to have the Falcon 16 bit bus values and not the 68030 default 32 bit bus ones

Regards

Laurent





Le 29/06/2015 00:03, Nicolas Pomarède a écrit :
Also, I have the feeling the table was based on 68020, not 68030 ? For example, NOP took 2 cycles in cache and 4 cycles with no cache, but 68030 doc says it's always 2 cycles (same for EXG dx,dy, timings are different between cache and no cache, but it should not be the case).






Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/