|[hatari-devel] Re: BM407 + symbols|
[ Thread Index |
| More lists.tuxfamily.org/hatari-devel Archives
(CCing this also to hatari-devel as here's some info on how to use profiler
caller info for debugging code behavior and you added some info on DSP :-))
On tiistai 09 huhtikuu 2013, Douglas Little wrote:
> > Do you have any idea why my caller tracking code (based on checking
> > for jsr/bsr & returning to next address with rts/rte) would think
> > that DSP "end_addwall" function is called many times, recursively?
> end_addwall is a program flow merge point - multiple functions can jump
> there to terminate. I think one of the variants 'falls through' to it
> I'll take a look at the code later to see if anything else is involved
> but I think that's all there is to it.
When looking at the generated caller information for that symbol:
callee: caller: calls: calltype:
| | | /
0x379: 0x155 = 144 r, 0x283 = 112 b, 0x2ef = 112 b, 0x378 = 72 s
583236/359708265/1631189180 72/4419020/19123430, end_addwall
| | |
inclusive costs exclusive costs callee name
(for calls from 0x378 address)
- b: jump/branch
- n: PC just moved to next address
- r: subroutine return
- s: subroutine call
It claims that most "calls" to "end_addwall" were subroutine call
returns (=r) to it from address 0x155. Which is true as previous
instruction at 0x378 was subroutine call, and return from that
call returns to 0x379:
p:0155 00000c (04 cyc) rts
p:0378 0f7130 (04 cyc) jsgt p:$0130
p:0379 699b00 (02 cyc) move y:$001b,r1
However, rrom caller info I can see that the previous address 0x378
can in 72 cases also do subroutine call to "end_addwall", which on
RTS will obviously return -- to "end_addwall".
Is this how your code is supposed to work?
When looking at rest of "end_addwall" and "end_normal_addwall"
"end_dummy_addwall" following it:
p:037a 54f400 ffffff (04 cyc) move #$ffffff,a1
p:037c 0ae180 (04 cyc) jmp p:(r1)
p:037d 0aa981 00037d (06 cyc) jclr #1,x:$ffe9,p:$037d
p:037f 08cc2b (04 cyc) movep a1,x:$ffeb
p:0380 0aa981 000380 (06 cyc) jclr #1,x:$ffe9,p:$0380
p:0382 08f0eb 000031 (06 cyc) movep y:$0031,x:$ffeb
p:0384 0af080 000ac8 (06 cyc) jmp p:$0ac8
And how they're called:
0x37d: 0x37c = 328 b, end_normal_addwall
0x384: 0x382 = 328 n, 0x37c = 112 b, end_dummy_addwall
I can see that from "end_addwall" we always end in "end_dummy_addwall"
which leads to "command_base", which none of them have RTS:
p:0ac8 67f400 000ad9 (04 cyc) move #$000ad9,r7
p:0aca 68aa00 (02 cyc) move y:$002a,r0
p:0acb 0aa980 000acb (06 cyc) jclr #0,x:$ffe9,p:$0acb
p:0acd 52e000 (02 cyc) move x:(r0),a2
p:0ace 215f00 (02 cyc) move a2,n7
p:0acf 770000 (02 cyc) move n7,x:$0000
p:0ad0 6fef00 (06 cyc) move y:(r7+n7),r7
p:0ad1 6f0000 (02 cyc) move r7,y:$0000
p:0ad2 0ae780 (04 cyc) jmp p:(r7)
From caller info I can see that last instruction in above
to many different places.
When exactly subroutine calls to "end_addwall" will actually
lead to RTS? And will it do that twice when "end_addwall"
gets subroutine call to from previous instruction?
> BTW is there something I can add to the code for some kind of 'debug
> build' which adds flow verification signals you can compare with your
> profiler results? e.g. make explicit prologue/epilogue signal calls on
> every routine? If this is helpful let me know. Sometimes the easiest way
> to debug a debugger or profiler is to do it in the subjected code :)
Well, I think it anyway needs knowledge of what the code in general
is supposed to do, so it's better to ask from the author of the code,
if quick reading of the assembly in the profile disassembly and
the caller info doesn't tell enough.
> >  rte is accepted because there could have been exception on that
> > return address.
> There are no interrupts in the DSP module - no occurrences of rte
> anywhere. It's all 'user-land' code. Even the command system is
> explicitly handled (a word written as host port data is interpreted as a
> command by 'command_base' routine via a table and a JMP made to
> the appropriate sub-routine (albeit, not really a subroutine as it's not
> a real 'call'), followed by a JMP back to the command processor.
> In fact I'll probably convert this into a table of jmp instructions
> later, to save a little time on the lookup. The size and frequency of
> commands is getting more fine-grained recently.
> TRIVIA: The DSP actually has a hardware implementation of this command
> structure but it has some hidden implications and limitations so I
> implement it manually. In fact one of the problems with the HW command
> structure is that it is time critical - it forces an immediate interrupt
> to handle the command whereas 'normal' host data writes are FIFO
> buffered so the command sent by the CPU is decoupled from the time it
> needs to happen on the DSP - a sort of lazy response pipeline, which can
> actually help with performance and soaking up synchronization issues.
> > > When I've had a chance to try the new changes I'll report back. I may
> > > have some questions too about importing symbols from the .prg? I saw
> > > Laurent was using this on the dev list and it seems appealing!
> > It's dead simple. Just use Devpac D & X linker options to add symbols
> > to the built binary, and when that program is running (or you're at its
> > first instruction after "pc=text" breakpoint triggers), just call
> > "symbols prg" to get symbols loaded from that (unstripped) program.
> Wonderful. This is the next thing I will try because exporting those LST
> files from Devpac is quite annoying and takes a loooong time.