Re: [hatari-devel] WinUAE and 030 cache hits/misses?

[ Thread Index | Date Index | More lists.tuxfamily.org/hatari-devel Archives ]


From my point of view, it's inconsistant to have so many different cores for the Falcon.
The Falcon is already quite slow to emulate (it needs a good hardware).

So, choosing cycles exact instead of no cycle exact should not take that much extra speed comparing to the DSP calls.

I think we should have only 1 CPU (2 if we want to keep with/without MMU), but this have no sence in a Falcon, and focuse on this(these) CPUs instead of having 3/4 different CPUs with different behaviour.

Laurent

Le 27/01/2013 21:36, Eero Tamminen a écrit :
Hi,

On sunnuntai 27 tammikuu 2013, Thomas Huth wrote:
schrieb Eero Tamminen <oak@xxxxxxxxxxxxxx>:
On sunnuntai 27 tammikuu 2013, Thomas Huth wrote:
In cycle exact mode, the CPU core counts each cycle seperately, so
the global "CurrentInstrCycles" variable is not needed and not set.
Where the CPU core keeps/counts that information in cycle exact mode?

That information is needed by the profiler.
Since every CPU mode is doing it slightly differently, you could maybe
add a counter to cycles.c and use Cycles_GetCounter() after each
instruction to get the up-to-date cycles count.
Why I would need a new counter if I'm interested about CPU cycles
that already seem to be tracked, apparently with multiple things? :-)


Instead of using those two variables, I tested using:
	Cycles_GetCounter(CYCLES_COUNTER_CPU)

Which for cycle exact CPU core gives this kind of results:
-------
profile addresses
$e0054c :             bsr       $e01422                    0.00% (5, 30, 5)
$e00550 :             dbra      d1,$e0054c                 0.00% (6, 60, 6)
$e00554 :             moveq     #2,d0                      0.00% (1, 10, 1)
$e00556 :             bsr       $e00bd2                    0.00% (1, 2, 1)
$e0055a :             moveq     #3,d1                      0.00% (1, 10, 3)
$e0055c :             move.w    $184c.w,d2                 0.00% (1, 2, 3)
$e00560 :             bne.s     $e00566                    0.00% (1, 6, 1)
[...]
$e00566 :             move.w    d2,$184c.w                 0.00% (1, 6, 1)
$e0056a :             move.l    #$e00030,$046e.w           0.00% (1, 6, 1)
$e00572 :             move.w    #1,$0452.w                 0.00% (1, 10, 1)
-------

Which to me looks at least somewhat sane.

(Profiling information is in parenthesis, first is instruction count,
then used cycles and last is instruction cache misses.)


Unfortunately for the non-cycle exact CPU core the results look insane:
-------
profile addresses
$e00790 :             bsr       $e00986                    0.00% (1,
80888274, 0)
$e00794 :             bsr       $e01320                    0.00% (1,
105172758, 0)
$e00798 :             tst.w     $482                       0.00% (1,
105633218, 0)
$e0079e :             beq.s     $e007be                    0.00% (1,
105633222, 0)
[...]
$e007be :             bsr       $e01102                    0.00% (1,
105633226, 0)
-------

As I saw DSP_Run() using it, I checked what else it was using:
-------
$ grep DSP_Run */newcpu.c
cpu/newcpu.c:                            DSP_Run(cpu_cycles* 2 /
CYCLE_UNIT);
cpu/newcpu.c:
DSP_Run(Cycles_GetCounter(CYCLES_COUNTER_CPU) * 2);
cpu/newcpu.c:                    DSP_Run(cpu_cycles*2/ CYCLE_UNIT);
cpu/newcpu.c:                    DSP_Run(cpu_cycles* 2 / CYCLE_UNIT);
uae-cpu/newcpu.c:            DSP_Run( Cycles_GetCounter(CYCLES_COUNTER_CPU)
* 2);
uae-cpu/newcpu.c:            DSP_Run( Cycles_GetCounter(CYCLES_COUNTER_CPU)
);
-------

So, for old core and one of the WinUAE core loops it's using
Cycles_GetCounter(), but not for others.  This is very inconsistent.

Couldn't the same API be used for the cycles in every CPU core?


	 - Eero

PS. last listed DSP_Run() call doesn't double the counter value, is
that a bug?







Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/