I noticed that CPU cycle counter returns cumulative values so I compensated
for that, and for the cycle value being for a previous instruction [1].
Great stuff. I have the code now but have been too busy to rebuild and experiment with it today.
Now I'd like to know, is it possible for an instruction to use 0 cycles,
like CPU cycle counter value difference would indicate for subq here:
Good question. My initial reaction would be no.
However, I've seen things on a real Falcon that indicate some 'trivial' ALU instructions can hide behind read or write latency (i.e. if the previous instruction involved a memory operation).
So in fact - I'd have to say yes. However I wouldn't jump to any conclusions on that until we understand where the zeroes really are coming from in UAE (?)... some of the other numbers there look strange too (104?).
Also, what is the largest cycle count that a single m68k instruction
can have?
One for Laurent's cycle sheet probably...
It depends if memory fetches (cache misses) are included in the cost. Without memory fetching it's usually single or low double digits - mid double digits for mul/div and even more for funny 020 addressing modes.
FPU can reach into mid triple digits for a single op.
[1] I was really mystified how single instruction could take
most of cycles when it was seemingly called only from single place,
but it turned out to be interrupt handler first instruction and
the cycles actually belonged to STOP instruction... :-)
Interrupt or bus spikes could be quite confusing yes :-/
For values collected as averages (or a total sum), I think it shouldn't adversely affect usability except at specific points in the application code where synchronization may occur either deliberately or accidentally (Any values recorded in terms of min/max would be quite wrong so that's probably best avoided).
D.