Re: [hatari-devel] WinUAE and 030 cache hits/misses?

[ Thread Index | Date Index | More lists.tuxfamily.org/hatari-devel Archives ]


Hi,

On sunnuntai 27 tammikuu 2013, Thomas Huth wrote:
> schrieb Eero Tamminen <oak@xxxxxxxxxxxxxx>:
> > On sunnuntai 27 tammikuu 2013, Thomas Huth wrote:
> > > In cycle exact mode, the CPU core counts each cycle seperately, so
> > > the global "CurrentInstrCycles" variable is not needed and not set.
> > 
> > Where the CPU core keeps/counts that information in cycle exact mode?
> > 
> > That information is needed by the profiler.
> 
> Since every CPU mode is doing it slightly differently, you could maybe
> add a counter to cycles.c and use Cycles_GetCounter() after each
> instruction to get the up-to-date cycles count.

Why I would need a new counter if I'm interested about CPU cycles
that already seem to be tracked, apparently with multiple things? :-)


Instead of using those two variables, I tested using:
	Cycles_GetCounter(CYCLES_COUNTER_CPU)

Which for cycle exact CPU core gives this kind of results:
-------
> profile addresses 
$e0054c :             bsr       $e01422                    0.00% (5, 30, 5)
$e00550 :             dbra      d1,$e0054c                 0.00% (6, 60, 6)
$e00554 :             moveq     #2,d0                      0.00% (1, 10, 1)
$e00556 :             bsr       $e00bd2                    0.00% (1, 2, 1)
$e0055a :             moveq     #3,d1                      0.00% (1, 10, 3)
$e0055c :             move.w    $184c.w,d2                 0.00% (1, 2, 3)
$e00560 :             bne.s     $e00566                    0.00% (1, 6, 1)
[...]
$e00566 :             move.w    d2,$184c.w                 0.00% (1, 6, 1)
$e0056a :             move.l    #$e00030,$046e.w           0.00% (1, 6, 1)
$e00572 :             move.w    #1,$0452.w                 0.00% (1, 10, 1)
-------

Which to me looks at least somewhat sane.

(Profiling information is in parenthesis, first is instruction count,
then used cycles and last is instruction cache misses.)


Unfortunately for the non-cycle exact CPU core the results look insane:
-------
> profile addresses 
$e00790 :             bsr       $e00986                    0.00% (1, 
80888274, 0)
$e00794 :             bsr       $e01320                    0.00% (1, 
105172758, 0)
$e00798 :             tst.w     $482                       0.00% (1, 
105633218, 0)
$e0079e :             beq.s     $e007be                    0.00% (1, 
105633222, 0)
[...]
$e007be :             bsr       $e01102                    0.00% (1, 
105633226, 0)
-------

As I saw DSP_Run() using it, I checked what else it was using:
-------
$ grep DSP_Run */newcpu.c 
cpu/newcpu.c:                            DSP_Run(cpu_cycles* 2 / 
CYCLE_UNIT);
cpu/newcpu.c:                    
DSP_Run(Cycles_GetCounter(CYCLES_COUNTER_CPU) * 2);
cpu/newcpu.c:                    DSP_Run(cpu_cycles*2/ CYCLE_UNIT);
cpu/newcpu.c:                    DSP_Run(cpu_cycles* 2 / CYCLE_UNIT);
uae-cpu/newcpu.c:            DSP_Run( Cycles_GetCounter(CYCLES_COUNTER_CPU) 
* 2);
uae-cpu/newcpu.c:            DSP_Run( Cycles_GetCounter(CYCLES_COUNTER_CPU) 
);
-------

So, for old core and one of the WinUAE core loops it's using 
Cycles_GetCounter(), but not for others.  This is very inconsistent.

Couldn't the same API be used for the cycles in every CPU core?


	 - Eero

PS. last listed DSP_Run() call doesn't double the counter value, is
that a bug?



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/