[hatari-devel] Re: BM407 + symbols

[ Thread Index | Date Index | More lists.tuxfamily.org/hatari-devel Archives ]


(CCing this also to hatari-devel as here's some info on how to use profiler
caller info for debugging code behavior and you added some info on DSP :-))

On tiistai 09 huhtikuu 2013, Douglas Little wrote:
> > Do you have any idea why my caller tracking code (based on checking
> > for jsr/bsr & returning to next address with rts/rte[1]) would think
> > that DSP "end_addwall" function is called many times, recursively?
> end_addwall is a program flow merge point - multiple functions can jump
> there to terminate. I think one of the variants 'falls through' to it
> instead.
> I'll take a look at the code later to see if anything else is involved
> but I think that's all there is to it.

When looking at the generated caller information for that symbol:

callee: caller: calls: calltype:
  |       |       |   /
0x379:  0x155 = 144 r, 0x283 = 112 b, 0x2ef = 112 b, 0x378 = 72 s
583236/359708265/1631189180 72/4419020/19123430, end_addwall
           |                       |                 |
inclusive costs              exclusive costs     callee name
      (for calls from 0x378 address)

- b: jump/branch
- n: PC  just moved to next address
- r: subroutine return
- s: subroutine call

It claims that most "calls" to "end_addwall" were subroutine call
returns (=r) to it from address 0x155.  Which is true as previous
instruction at 0x378 was subroutine call, and return from that
call returns to 0x379:
p:0155  00000c         (04 cyc)  rts
p:0378  0f7130         (04 cyc)  jsgt p:$0130
p:0379  699b00         (02 cyc)  move y:$001b,r1

However, rrom caller info I can see that the previous address 0x378
can in 72 cases also do subroutine call to "end_addwall", which on
RTS will obviously return -- to "end_addwall".

Is this how your code is supposed to work?

When looking at rest of "end_addwall" and "end_normal_addwall"
"end_dummy_addwall" following it:
p:037a  54f400 ffffff  (04 cyc)  move #$ffffff,a1
p:037c  0ae180         (04 cyc)  jmp p:(r1)
p:037d  0aa981 00037d  (06 cyc)  jclr #1,x:$ffe9,p:$037d
p:037f  08cc2b         (04 cyc)  movep a1,x:$ffeb                                 
p:0380  0aa981 000380  (06 cyc)  jclr #1,x:$ffe9,p:$0380                          
p:0382  08f0eb 000031  (06 cyc)  movep y:$0031,x:$ffeb                            
p:0384  0af080 000ac8  (06 cyc)  jmp p:$0ac8                                      

And how they're called:
0x37d: 0x37c = 328 b, end_normal_addwall
0x384: 0x382 = 328 n, 0x37c = 112 b, end_dummy_addwall

I can see that from "end_addwall" we always end in "end_dummy_addwall"
which leads to "command_base", which none of them have RTS:
p:0ac8  67f400 000ad9  (04 cyc)  move #$000ad9,r7
p:0aca  68aa00         (02 cyc)  move y:$002a,r0                                  
p:0acb  0aa980 000acb  (06 cyc)  jclr #0,x:$ffe9,p:$0acb                         
p:0acd  52e000         (02 cyc)  move x:(r0),a2
p:0ace  215f00         (02 cyc)  move a2,n7                                       
p:0acf  770000         (02 cyc)  move n7,x:$0000                                  
p:0ad0  6fef00         (06 cyc)  move y:(r7+n7),r7                                
p:0ad1  6f0000         (02 cyc)  move r7,y:$0000                                  
p:0ad2  0ae780         (04 cyc)  jmp p:(r7)

From caller info I can see that last instruction in above
to many different places.

When exactly subroutine calls to "end_addwall" will actually
lead to RTS?  And will it do that twice when "end_addwall"
gets subroutine call to from previous instruction?

> BTW is there something I can add to the code for some kind of 'debug
> build' which adds flow verification signals you can compare with your
> profiler results? e.g. make explicit prologue/epilogue signal calls on
> every routine? If this is helpful let me know. Sometimes the easiest way
> to debug a debugger or profiler is to do it in the subjected code :)

Well, I think it anyway needs knowledge of what the code in general
is supposed to do, so it's better to ask from the author of the code,
if quick reading of the assembly in the profile disassembly and
the caller info doesn't tell enough.

> > [1] rte is accepted because there could have been exception on that
> >     return address.
> There are no interrupts in the DSP module - no occurrences of rte
> anywhere. It's all 'user-land' code. Even the command system is
> explicitly handled (a word written as host port data is interpreted as a
> command by 'command_base' routine via a table and a JMP made to
> the appropriate sub-routine (albeit, not really a subroutine as it's not
> a real 'call'), followed by a JMP back to the command processor.
> In fact I'll probably convert this into a table of jmp instructions
> later, to save a little time on the lookup. The size and frequency of
> commands is getting more fine-grained recently.
> TRIVIA: The DSP actually has a hardware implementation of this command
> structure but it has some hidden implications and limitations so I
> implement it manually. In fact one of the problems with the HW command
> structure is that it is time critical - it forces an immediate interrupt
> to handle the command whereas 'normal' host data writes are FIFO
> buffered so the command sent by the CPU is decoupled from the time it
> needs to happen on the DSP - a sort of lazy response pipeline, which can
> actually help with performance and soaking up synchronization issues.
> > > When I've had a chance to try the new changes I'll report back. I may
> > > have some questions too about importing symbols from the .prg? I saw
> > > Laurent was using this on the dev list and it seems appealing!
> > 
> > It's dead simple.  Just use Devpac D & X linker options to add symbols
> > to the built binary, and when that program is running (or you're at its
> > first instruction after "pc=text" breakpoint triggers), just call
> > "symbols prg" to get symbols loaded from that (unstripped) program.
> Wonderful. This is the next thing I will try because exporting those LST
> files from Devpac is quite annoying and takes a loooong time.

	- Eero

Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/