Re: [hatari-devel] Hatari profiler updates and DSP cycle questions |
[ Thread Index |
Date Index
| More lists.tuxfamily.org/hatari-devel Archives
]
Hi,
There's one answear for Eero and one question for Doug :
Eero,
I've given a look at the code of this add instruction in dsp_cpu.c
This instruction is one of the hardcoded instr. (see dsp_add_x1_a() function)
It can not take any other value than 2 cycles.
It only calls 2 other function : dsp_add56() and dsp_ccr_update_e_u_n_z() and there's no cycles modification in these 2 functions.
dsp_core.instr_cycle is only set to 2 (minimum cycles for an instruction) at the beginning of the main loop and the add extracycles are set only :
- in the opcodes that take more than 2 cycles (with a add XXX cycles)
- in the ea adressing modes like (Rx+Nx)
- in the main loop if there are more than 1 external access (see below)
- in move L: function
Eero, this problem may be for you.
Doug,
For the external access to memory, my code in hatari does the following :
In memory access, I count the number of access to external memory in nb_access_to_extMemory
Then, in the main loop, I do :
/* Add the waitstate due to external memory access */
if (nb_access_to_extMemory > 1)
dsp_core.instr_cycle += nb_access_to_extMemory - 1;
Do you think I should replace it with
/* Add the waitstate due to external memory access */
if (nb_access_to_extMemory > 1)
dsp_core.instr_cycle += nb_access_to_extMemory*2 - 2;
Laurent
----- Mail original -----
De: "Douglas Little" <doug694@xxxxxxxxxxxxxx>
À: hatari-devel@xxxxxxxxxxxxxxxxxxx
Envoyé: Vendredi 1 Février 2013 10:34:50
Objet: Re: [hatari-devel] Hatari profiler updates and DSP cycle questions
I changed it from min,max to max-min i.e. diff. That way it's
much easier to notice when it happens and post-processor can
handle the differences as "cache misses".
That seems like a good way to do it.
In doomino demo, I got such thing in only one place out
of 1258 instructions:
In well-optimised code it should be rare but it would occur more often in 'support' code which doesn't get the same attention.
....
p:0447 0608a0 (04 cyc) rep #$08 0.38% (960218, 3840872, 0)
p:0448 200032 (02 cyc) asl a 3.04% (7681744, 15363488, 0)
p:0449 0bcc67 (04 cyc) btst #7,a1 0.38% (960218, 3840872, 0)
p:044a 0af0a0 00044f (07 cyc) jcc p:$044f 0.38% (960218, 6721526, 0)
p:044c 45f400 ffff00 (05 cyc) move #$ffff00,x1 0.19% (484216, 2421080, 0)
p:044e 200060 (02 cyc) add x1,a 0.19% (484217, 968439, 3)
p:044f 44ee00 (05 cyc) move x:(r6+n6),x0 0.38% (960219, 4801095, 0)
I think this is suspicious because 'add x1,a' is a trivial instruction which references no memory except it's own instruction fetch. Penalties are not possible on that instruction.
They will only be seen on instructions which have 2 or more memory accesses and where 2 or more of them come from external memory...
p:044f 44ee00 (05 cyc) move x:(r6+n6),x0 0.38% (960219, 4801095, 0)
For example this one might see a penalty sometimes - since the program address is >$100 and the X: address it is fetching from *might* also be >$200, which would mean competition for the external bus inside a single opcode.
(note: internal P memory is half the size of internal X or Y, hence the $100/$200 boundaries mentioned above - IIRC (?) this is because P: addresses are twice as 'wide' - 2 words per address or 48bits... 2 fetches per opcode, which is also probably why no operation takes less than 2 osc cycles)
D.