Re: [hatari-devel] Hatari profiler updates and DSP cycle questions

[ Thread Index | Date Index | More lists.tuxfamily.org/hatari-devel Archives ]


Hi,

On perjantai 01 helmikuu 2013, Douglas Little wrote:
> > I haven't commited the min/max cycles code yet as it makes the output
> > more verbose and I'm wondering how useful it is.  I guess for 99% of
> > DSP code, there's no difference between min and max cycles, they
> > always are executed the same way...?
> 
> I think the main value is for the programmer to notice that a particular
> block of code has an unexpected (potentially large) penalty due to the
> location of data, more than anything else - and a penalty that perhaps
> doesn't show all the time. i.e. perhaps something that's difficult to
> find any other way except reading all the code and layout.
> 
> However this is from a developer/optimisation perspective only :-) i.e.
> this is how I would use it. I don't know if it's of use to anyone
> else....

I changed it from min,max to max-min i.e. diff.  That way it's
much easier to notice when it happens and post-processor can
handle the differences as "cache misses".

In doomino demo, I got such thing in only one place out
of 1258 instructions:
---------
....
p:0447  0608a0         (04 cyc)  rep #$08                                          
0.38% (960218, 3840872, 0)
p:0448  200032         (02 cyc)  asl a                                             
3.04% (7681744, 15363488, 0)
p:0449  0bcc67         (04 cyc)  btst #7,a1                                        
0.38% (960218, 3840872, 0)
p:044a  0af0a0 00044f  (07 cyc)  jcc p:$044f                                       
0.38% (960218, 6721526, 0)
p:044c  45f400 ffff00  (05 cyc)  move #$ffff00,x1                                  
0.19% (484216, 2421080, 0)
p:044e  200060         (02 cyc)  add x1,a                                          
0.19% (484217, 968439, 3)
p:044f  44ee00         (05 cyc)  move x:(r6+n6),x0                                 
0.38% (960219, 4801095, 0)
....
---------

Maximum difference between cycles that "add x1,a" took,
was 3.

Either it took single time 5 cycles instead of 2,
out of half a million calls:
	968439-484217*2 = 5

Which seems unlikely, or the call alternated e.g. between 1 and 4:
	968439-484217*1-1*2-121055*4 = 0
:-)


	- Eero




Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/