Re: [hatari-devel] DSP performance

[ Thread Index | Date Index | More lists.tuxfamily.org/hatari-devel Archives ]


(total number of cycles) - (number of bus activity cycles) = (number
of internal cycles) Non-cache case 2(0/1/0):     2 - (0*2 + 1*2 +
0*2) = 0 internal cycles

So, the total number of cycles is 0 (internal cycles) + 0*4 (read
access) + 1*4 (instruction access) + 0*4 (write access) = 4 cycles.

The same for all instructions.

For a long instruction, you have to count 2 access to the bus, so mul
by 8 instead of 4. Example : an instruction that would be 6(0,1,1)
would take 6 - (0*2 + 1*2 + 1*2) = 2 internal cycles and 2 + 0*4 +
1*4 + 1*4 = 10 cycles in word access 2 + 0*8 + 1*8 + 1*8 = 18 cycles
in long access

Problem is that prefetch has to be long word sized. Prefetch buffer and instruction cache are long word sized. CPU logic sends it as long word fetch which bus sequencer splits to 2 word accesses.

It would make NOP 8 cycles. But because apparently it does not, pipeline probably can "hide" it. Of course there will be stall if following instruction is also very fast without other memory accesses or internal calculations.

Do you see different NOP speeds if you align first NOP to long boundary and then do some test with NOP at long boundary + 2? (at least 2 NOPs back to back)

Prefetch sequencing is one very undocumented part. Logic analyzer checks probably reveals something..

That's the whole job I did for all the 1900+ instructions and
addressing modes in the static table. I've converted every addressing
mode for all instructions with the Falcon values.

I'll check your table someday, perhaps I find something interesting.

I think the good approach for a generic 68030 emulator should be to
keep the value of the inner cycles of each instruction and each
addressing mode and then compute the final cycles values accordind to
the bus access cycles of the machine.

I don't see how it can work accurately if any memory access length can be practically "random". This is extremely common in Amiga when accessing chip ram and some chipset DMA is active at the same time, practically the last remaining unexpanded A1200 compatibiliy problem.

This is what I tried to do, divide each instruction to its internal "atomic" operations (calculate ea, operand fetches, instruction itself, writes and so on). real 68020/030 microcode most likely does the same, unfortunately it is not documented anywhere.



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/