Re: [hatari-devel] Hatari profiler updates and DSP cycle questions

[ Thread Index | Date Index | More lists.tuxfamily.org/hatari-devel Archives ]


Hi,

There's one answear for Eero and one question for Doug :

Eero,

I've given a look at the code of this add instruction in dsp_cpu.c

This instruction is one of the hardcoded instr. (see dsp_add_x1_a() function)

It can not take any other value than 2 cycles. 
It only calls 2 other function : dsp_add56() and dsp_ccr_update_e_u_n_z() and there's no cycles modification in these 2 functions.

dsp_core.instr_cycle is only set to 2 (minimum cycles for an instruction) at the beginning of the main loop and the add extracycles are set only :
   - in the opcodes that take more than 2 cycles (with a add XXX cycles)
   - in the ea adressing modes like (Rx+Nx)
   - in the main loop if there are more than 1 external access (see below)
   - in move L: function

Eero, this problem may be for you.



Doug,

For the external access to memory, my code in hatari does the following :

In memory access, I count the number of access to external memory in nb_access_to_extMemory

Then, in the main loop, I do :

	/* Add the waitstate due to external memory access */
	if (nb_access_to_extMemory > 1)
		dsp_core.instr_cycle += nb_access_to_extMemory - 1;


Do you think I should replace it with 

	/* Add the waitstate due to external memory access */
	if (nb_access_to_extMemory > 1)
		dsp_core.instr_cycle += nb_access_to_extMemory*2 - 2;

Laurent


----- Mail original -----
De: "Douglas Little" <doug694@xxxxxxxxxxxxxx>
À: hatari-devel@xxxxxxxxxxxxxxxxxxx
Envoyé: Vendredi 1 Février 2013 10:34:50
Objet: Re: [hatari-devel] Hatari profiler updates and DSP cycle questions




I changed it from min,max to max-min i.e. diff. That way it's 
much easier to notice when it happens and post-processor can 
handle the differences as "cache misses". 



That seems like a good way to do it. 


In doomino demo, I got such thing in only one place out 
of 1258 instructions: 



In well-optimised code it should be rare but it would occur more often in 'support' code which doesn't get the same attention. 





.... 
p:0447 0608a0 (04 cyc) rep #$08 0.38% (960218, 3840872, 0) 
p:0448 200032 (02 cyc) asl a 3.04% (7681744, 15363488, 0) 
p:0449 0bcc67 (04 cyc) btst #7,a1 0.38% (960218, 3840872, 0) 
p:044a 0af0a0 00044f (07 cyc) jcc p:$044f 0.38% (960218, 6721526, 0) 
p:044c 45f400 ffff00 (05 cyc) move #$ffff00,x1 0.19% (484216, 2421080, 0) 
p:044e 200060 (02 cyc) add x1,a 0.19% (484217, 968439, 3) 
p:044f 44ee00 (05 cyc) move x:(r6+n6),x0 0.38% (960219, 4801095, 0) 



I think this is suspicious because 'add x1,a' is a trivial instruction which references no memory except it's own instruction fetch. Penalties are not possible on that instruction. 


They will only be seen on instructions which have 2 or more memory accesses and where 2 or more of them come from external memory... 



p:044f 44ee00 (05 cyc) move x:(r6+n6),x0 0.38% (960219, 4801095, 0) 






For example this one might see a penalty sometimes - since the program address is >$100 and the X: address it is fetching from *might* also be >$200, which would mean competition for the external bus inside a single opcode. 


(note: internal P memory is half the size of internal X or Y, hence the $100/$200 boundaries mentioned above - IIRC (?) this is because P: addresses are twice as 'wide' - 2 words per address or 48bits... 2 fetches per opcode, which is also probably why no operation takes less than 2 osc cycles) 


D.



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/