Re: [hatari-devel] Hatari profiler updates and CPU cycle questions |
[ Thread Index |
Date Index
| More lists.tuxfamily.org/hatari-devel Archives
]
Hi Doug,
Could you send a little text file with an example of this ?
I can check from the DSP source code if needed.
Laurent
----- Mail original -----
De: "Douglas Little" <doug694@xxxxxxxxxxxxxx>
À: hatari-devel@xxxxxxxxxxxxxxxxxxx
Envoyé: Jeudi 31 Janvier 2013 15:16:32
Objet: Re: [hatari-devel] Hatari profiler updates and CPU cycle questions
> The only problem I noticed involved a few cases where the expected cycle
> count (?cyc) did not match (total cycles / count), (evaluated to 4cyc
> instead of 5cyc)....
Isn't it possible that the same instruction gets different cycles on
different invocations? I think the disassembly output shows the cycles
based on instructions being executed in strictly linear order...?
It's possible although the DSP architecture intentionally limits that sort of behaviour - it is supposed to be a 'constant time' processor in most respects. However, it's not always true - if data is being fetched from the address range crossing P:$0100 / XY:$0200, since below those addresses is internal memory with 100% parallel bus access, and above those addresses means serialised access / competition.
OTOH, I would expect that sort of switching behaviour to be rare in my case because I only use internal memory for fast access stuff, and not for advancing buffers which may spill over...
P: addresses can't lead to variable timing as the code is not copied around/relocated (yet), so the instruction would need to be P:EXT and the data would need to vary via an addressing mode across XY:INT/EXT for unstable timings in my specific program and I don't think this happens - and not in the case I observed.
Anyway I'll be able to check more closely as I get used to profiling with this. If I become sure it is wrong in any way I'll report detail. For now it's just a suspicion. :-)
To get more digits, you can apply the attached (untested) patch
to Hatari sources.
But I would suggest starting with the post-processor so that you get
function level information, percentages on that should be much
Yes of course - however I didn't want to depend too much on the post-processing just yet - more layers of conversion means more potential problems (until it all has time to settle) so having those extra digits is handy for confirmation just now.
I do expect later it will become mostly redundant as the post-processor makes the output more manageable...
D.