[hatari-devel] Spinloop detection in emulated code

[ Thread Index | Date Index | More lists.tuxfamily.org/hatari-devel Archives ]


Hi,

I'm answering Douglas question on Atari-forum here
http://www.atari-forum.com/viewtopic.php?f=68&t=24561&p=235531#p235470

because it could be of more generic interest:
---------  quote ---------------
I'm wondering if it's possible for the profiler to analyze spinloops - any 
pair of instructions which result in a tight loop, on either CPU or DSP.

Originally I was interested in detecting just DSP host port spinloops 
because they are relatively easy to decode on the DSP - they always look the 
same. On the CPU side however, they can vary a bit and it's harder.

I then realized that any 2-opcode loop is a spinloop, and the semantics are 
going to be similar so it's probably easier/better to just detect all of 
them and profile them in the same way (blitter spinloops could also benefit 
from this). So on the CPU side any pair of ops where the 2nd is a branch 
back to the first - doesn't matter what the first opcode happens to be.

The general idea is to track the activity of spinloops one level higher than 
the instruction counts. Specifically, recording the minimum, maximum and 
average iteration count for each spinloop recognized. A digest of the 
spinloop sites with these metrics is immensely useful because it becomes 
possible to spot a stall which occurs only infrequently but for a 
significant duration. It also helps pinpoint those spinloops which never 
iterate due to a favourable performance ratio, and can probably be removed.

I think the most valuable side to watch is the CPU side, because the CPU 
should not be waiting for the DSP except in some rare cases for vector 
operations where inputs and outputs follow each other closely. Watching the 
DSP side is also useful though - it can give some indication of where there 
are unused/idle cycles which could absorb nearby work if some code is moved. 
etc.

Anyway see what you think :) The way I did this before involved a very large 
and unresponsive spreadsheet with the profiler disasm pasted into it and 
some simple column calcs to spot the stalls. It only handled the DSP side 
but it 'inferred' the CPU side by counting any DSP spinloops with low 
iteration counts (relative to neighbour ops), as a DSP bottleneck (i.e. CPU 
side is spinning). It's better though to actually track the CPU side since 
buffering between the two sides makes 'inferring' a bit less reliable.
--------------------------------


I was thinking that I could add profile command for setting "spinloop"
output file.  CPU and DSP profiling functionality could then append to
that file information about detected spinloops, each time a loop is
exited.

If spinloop goes through without looping, no information is saved.


Information for each line in the file could be:
  <address> <number of iterations> <VBLs[1]> <branched instruction>

[1] VBLs since boot, at loop exit, when info on that spin is saved.


Number of iterations + address allows calculating iteration min/max/avg
counts in post-processing, and sorting & grepping that info easily.


VBLs information is useful for knowing both whether (e.g. CPU + DSP)
spinning happens within same frame and from iteration-count sorted
output seeing in which order the spinloops were done.


Having each spin listed separately allows checking in detail in which
order things actually happened, what kind of spins there are within frame,
and even providing VBL metrics about spinloops.


The contents of this file will be reseted each time profiling is
re-started.  This allows using it with Hatari's recent "worst frame"
profiling support, i.e. in case of Bad Mood, one can get spinloop
information that is specific for the worst frame.


How does that sound?


	- Eero



Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/