Re: [hatari-devel] DSP performance |
[ Thread Index |
Date Index
| More lists.tuxfamily.org/hatari-devel Archives
]
(total number of cycles) - (number of bus activity cycles) = (number
of internal cycles) Non-cache case 2(0/1/0): 2 - (0*2 + 1*2 +
0*2) = 0 internal cycles
So, the total number of cycles is 0 (internal cycles) + 0*4 (read
access) + 1*4 (instruction access) + 0*4 (write access) = 4 cycles.
The same for all instructions.
For a long instruction, you have to count 2 access to the bus, so mul
by 8 instead of 4. Example : an instruction that would be 6(0,1,1)
would take 6 - (0*2 + 1*2 + 1*2) = 2 internal cycles and 2 + 0*4 +
1*4 + 1*4 = 10 cycles in word access 2 + 0*8 + 1*8 + 1*8 = 18 cycles
in long access
Problem is that prefetch has to be long word sized. Prefetch buffer and
instruction cache are long word sized. CPU logic sends it as long word
fetch which bus sequencer splits to 2 word accesses.
It would make NOP 8 cycles. But because apparently it does not, pipeline
probably can "hide" it. Of course there will be stall if following
instruction is also very fast without other memory accesses or internal
calculations.
Do you see different NOP speeds if you align first NOP to long boundary
and then do some test with NOP at long boundary + 2? (at least 2 NOPs
back to back)
Prefetch sequencing is one very undocumented part. Logic analyzer checks
probably reveals something..
That's the whole job I did for all the 1900+ instructions and
addressing modes in the static table. I've converted every addressing
mode for all instructions with the Falcon values.
I'll check your table someday, perhaps I find something interesting.
I think the good approach for a generic 68030 emulator should be to
keep the value of the inner cycles of each instruction and each
addressing mode and then compute the final cycles values accordind to
the bus access cycles of the machine.
I don't see how it can work accurately if any memory access length can
be practically "random". This is extremely common in Amiga when
accessing chip ram and some chipset DMA is active at the same time,
practically the last remaining unexpanded A1200 compatibiliy problem.
This is what I tried to do, divide each instruction to its internal
"atomic" operations (calculate ea, operand fetches, instruction itself,
writes and so on). real 68020/030 microcode most likely does the same,
unfortunately it is not documented anywhere.