Re: [hatari-devel] Hatari profiler updates and CPU cycle questions

> The only problem I noticed involved a few cases where the expected cycle
> count (?cyc) did not match (total cycles / count), (evaluated to 4cyc
> instead of 5cyc)....

Isn't it possible that the same instruction gets different cycles on
different invocations? I think the disassembly output shows the cycles
based on instructions being executed in strictly linear order...?

It's possible although the DSP architecture intentionally limits that sort of behaviour - it is supposed to be a 'constant time' processor in most respects. However, it's not always true - if data is being fetched from the address range crossing P:$0100 / XY:$0200, since below those addresses is internal memory with 100% parallel bus access, and above those addresses means serialised access / competition..

OTOH, I would expect that sort of switching behaviour to be rare in my case because I only use internal memory for fast access stuff, and not for advancing buffers which may spill over...

P: addresses can't lead to variable timing as the code is not copied around/relocated (yet), so the instruction would need to be P:EXT and the data would need to vary via an addressing mode across XY:INT/EXT for unstable timings in my specific program and I don't think this happens - and not in the case I observed.

Anyway I'll be able to check more closely as I get used to profiling with this. If I become sure it is wrong in any way I'll report detail. For now it's just a suspicion. :-)

To get more digits, you can apply the attached (untested) patch
to Hatari sources.

But I would suggest starting with the post-processor so that you get
function level information, percentages on that should be much

Yes of course - however I didn't want to depend too much on the post-processing just yet - more layers of conversion means more potential problems (until it all has time to settle) so having those extra digits is handy for confirmation just now.

I do expect later it will become mostly redundant as the post-processor makes the output more manageable...