[hatari-devel] Spinloop detection in emulated code |
[ Thread Index |
Date Index
| More lists.tuxfamily.org/hatari-devel Archives
]
Hi,
I'm answering Douglas question on Atari-forum here
http://www.atari-forum.com/viewtopic.php?f=68&t=24561&p=235531#p235470
because it could be of more generic interest:
--------- quote ---------------
I'm wondering if it's possible for the profiler to analyze spinloops - any
pair of instructions which result in a tight loop, on either CPU or DSP.
Originally I was interested in detecting just DSP host port spinloops
because they are relatively easy to decode on the DSP - they always look the
same. On the CPU side however, they can vary a bit and it's harder.
I then realized that any 2-opcode loop is a spinloop, and the semantics are
going to be similar so it's probably easier/better to just detect all of
them and profile them in the same way (blitter spinloops could also benefit
from this). So on the CPU side any pair of ops where the 2nd is a branch
back to the first - doesn't matter what the first opcode happens to be.
The general idea is to track the activity of spinloops one level higher than
the instruction counts. Specifically, recording the minimum, maximum and
average iteration count for each spinloop recognized. A digest of the
spinloop sites with these metrics is immensely useful because it becomes
possible to spot a stall which occurs only infrequently but for a
significant duration. It also helps pinpoint those spinloops which never
iterate due to a favourable performance ratio, and can probably be removed.
I think the most valuable side to watch is the CPU side, because the CPU
should not be waiting for the DSP except in some rare cases for vector
operations where inputs and outputs follow each other closely. Watching the
DSP side is also useful though - it can give some indication of where there
are unused/idle cycles which could absorb nearby work if some code is moved.
etc.
Anyway see what you think :) The way I did this before involved a very large
and unresponsive spreadsheet with the profiler disasm pasted into it and
some simple column calcs to spot the stalls. It only handled the DSP side
but it 'inferred' the CPU side by counting any DSP spinloops with low
iteration counts (relative to neighbour ops), as a DSP bottleneck (i.e. CPU
side is spinning). It's better though to actually track the CPU side since
buffering between the two sides makes 'inferring' a bit less reliable.
--------------------------------
I was thinking that I could add profile command for setting "spinloop"
output file. CPU and DSP profiling functionality could then append to
that file information about detected spinloops, each time a loop is
exited.
If spinloop goes through without looping, no information is saved.
Information for each line in the file could be:
<address> <number of iterations> <VBLs[1]> <branched instruction>
[1] VBLs since boot, at loop exit, when info on that spin is saved.
Number of iterations + address allows calculating iteration min/max/avg
counts in post-processing, and sorting & grepping that info easily.
VBLs information is useful for knowing both whether (e.g. CPU + DSP)
spinning happens within same frame and from iteration-count sorted
output seeing in which order the spinloops were done.
Having each spin listed separately allows checking in detail in which
order things actually happened, what kind of spins there are within frame,
and even providing VBL metrics about spinloops.
The contents of this file will be reseted each time profiling is
re-started. This allows using it with Hatari's recent "worst frame"
profiling support, i.e. in case of Bad Mood, one can get spinloop
information that is specific for the worst frame.
How does that sound?
- Eero