Re: [hatari-devel] WinUAE and 030 cache hits/misses? |
[ Thread Index |
Date Index
| More lists.tuxfamily.org/hatari-devel Archives
]
Hi,
On sunnuntai 27 tammikuu 2013, Thomas Huth wrote:
> schrieb Eero Tamminen <oak@xxxxxxxxxxxxxx>:
> > On sunnuntai 27 tammikuu 2013, Thomas Huth wrote:
> > > In cycle exact mode, the CPU core counts each cycle seperately, so
> > > the global "CurrentInstrCycles" variable is not needed and not set.
> >
> > Where the CPU core keeps/counts that information in cycle exact mode?
> >
> > That information is needed by the profiler.
>
> Since every CPU mode is doing it slightly differently, you could maybe
> add a counter to cycles.c and use Cycles_GetCounter() after each
> instruction to get the up-to-date cycles count.
Why I would need a new counter if I'm interested about CPU cycles
that already seem to be tracked, apparently with multiple things? :-)
Instead of using those two variables, I tested using:
Cycles_GetCounter(CYCLES_COUNTER_CPU)
Which for cycle exact CPU core gives this kind of results:
-------
> profile addresses
$e0054c : bsr $e01422 0.00% (5, 30, 5)
$e00550 : dbra d1,$e0054c 0.00% (6, 60, 6)
$e00554 : moveq #2,d0 0.00% (1, 10, 1)
$e00556 : bsr $e00bd2 0.00% (1, 2, 1)
$e0055a : moveq #3,d1 0.00% (1, 10, 3)
$e0055c : move.w $184c.w,d2 0.00% (1, 2, 3)
$e00560 : bne.s $e00566 0.00% (1, 6, 1)
[...]
$e00566 : move.w d2,$184c.w 0.00% (1, 6, 1)
$e0056a : move.l #$e00030,$046e.w 0.00% (1, 6, 1)
$e00572 : move.w #1,$0452.w 0.00% (1, 10, 1)
-------
Which to me looks at least somewhat sane.
(Profiling information is in parenthesis, first is instruction count,
then used cycles and last is instruction cache misses.)
Unfortunately for the non-cycle exact CPU core the results look insane:
-------
> profile addresses
$e00790 : bsr $e00986 0.00% (1,
80888274, 0)
$e00794 : bsr $e01320 0.00% (1,
105172758, 0)
$e00798 : tst.w $482 0.00% (1,
105633218, 0)
$e0079e : beq.s $e007be 0.00% (1,
105633222, 0)
[...]
$e007be : bsr $e01102 0.00% (1,
105633226, 0)
-------
As I saw DSP_Run() using it, I checked what else it was using:
-------
$ grep DSP_Run */newcpu.c
cpu/newcpu.c: DSP_Run(cpu_cycles* 2 /
CYCLE_UNIT);
cpu/newcpu.c:
DSP_Run(Cycles_GetCounter(CYCLES_COUNTER_CPU) * 2);
cpu/newcpu.c: DSP_Run(cpu_cycles*2/ CYCLE_UNIT);
cpu/newcpu.c: DSP_Run(cpu_cycles* 2 / CYCLE_UNIT);
uae-cpu/newcpu.c: DSP_Run( Cycles_GetCounter(CYCLES_COUNTER_CPU)
* 2);
uae-cpu/newcpu.c: DSP_Run( Cycles_GetCounter(CYCLES_COUNTER_CPU)
);
-------
So, for old core and one of the WinUAE core loops it's using
Cycles_GetCounter(), but not for others. This is very inconsistent.
Couldn't the same API be used for the cycles in every CPU core?
- Eero
PS. last listed DSP_Run() call doesn't double the counter value, is
that a bug?