Re: [hatari-devel] DSP performance

[ Thread Index | Date Index | More lists.tuxfamily.org/hatari-devel Archives ]

To: hatari-devel@xxxxxxxxxxxxxxxxxxx
Subject: Re: [hatari-devel] DSP performance
From: Toni Wilen <twilen@xxxxxxxxxx>
Date: Wed, 1 Jul 2015 16:10:27 +0300
Organization: arabuusimiehet

(total number of cycles) - (number of bus activity cycles) = (number
of internal cycles) Non-cache case 2(0/1/0):     2 - (0*2 + 1*2 +
0*2) = 0 internal cycles

So, the total number of cycles is 0 (internal cycles) + 0*4 (read
access) + 1*4 (instruction access) + 0*4 (write access) = 4 cycles.

The same for all instructions.

For a long instruction, you have to count 2 access to the bus, so mul
by 8 instead of 4. Example : an instruction that would be 6(0,1,1)
would take 6 - (0*2 + 1*2 + 1*2) = 2 internal cycles and 2 + 0*4 +
1*4 + 1*4 = 10 cycles in word access 2 + 0*8 + 1*8 + 1*8 = 18 cycles
in long access

Problem is that prefetch has to be long word sized. Prefetch buffer andinstruction cache are long word sized. CPU logic sends it as long wordfetch which bus sequencer splits to 2 word accesses.

It would make NOP 8 cycles. But because apparently it does not, pipelineprobably can "hide" it. Of course there will be stall if followinginstruction is also very fast without other memory accesses or internalcalculations.

Do you see different NOP speeds if you align first NOP to long boundaryand then do some test with NOP at long boundary + 2? (at least 2 NOPsback to back)

Prefetch sequencing is one very undocumented part. Logic analyzer checksprobably reveals something..

That's the whole job I did for all the 1900+ instructions and
addressing modes in the static table. I've converted every addressing
mode for all instructions with the Falcon values.


I'll check your table someday, perhaps I find something interesting.

I think the good approach for a generic 68030 emulator should be to
keep the value of the inner cycles of each instruction and each
addressing mode and then compute the final cycles values accordind to
the bus access cycles of the machine.

I don't see how it can work accurately if any memory access length canbe practically "random". This is extremely common in Amiga whenaccessing chip ram and some chipset DMA is active at the same time,practically the last remaining unexpanded A1200 compatibiliy problem.

This is what I tried to do, divide each instruction to its internal"atomic" operations (calculate ea, operand fetches, instruction itself,writes and so on). real 68020/030 microcode most likely does the same,unfortunately it is not documented anywhere.

References:
- Re: [hatari-devel] DSP performance
  - From: laurent . sallafranque

Messages sorted by: [ date | thread ]
Prev by Date: RE: [hatari-devel] DSP performance
Next by Date: Re: [hatari-devel] For info : the current warnings I get
Previous by thread: RE: [hatari-devel] DSP performance
Next by thread: Re: [hatari-devel] DSP performance

Mail converted by MHonArc 2.6.19+

http://listengine.tuxfamily.org/