Re: [hatari-devel] Videl / VBL interrupt issue in Hatari?

[ Thread Index | Date Index | More Archives ]

And one last test of interest - I set it up to encourage the bug (yielding in TimerA, VBL, plus some extra padding time inserted in VBL) and made sure it did occur quite reliably.

I then removed the C code which loads the Videl register group, from the main game code, when the view or attract screen is changed. I also removed the physbase register load. I further confirmed this with '--trace videl'

The bug still occurs, so it can't have anything to do with Videl register loads. That thought can be dismissed.

I'm not sure what else to do with it now, to narrow it further. The situation doesn't seem possible unless one or other interrupt starts occurring in a continuous burst, or a 3rd interrupt which takes tons of time, happening in the mix (and somehow not collecting in the regressed stack, once the fault triggers).

For now I have just removed the yield from the VBL until an explanation shows up.


On 20 January 2014 14:41, Douglas Little <doug694@xxxxxxxxxxxxxx> wrote:
Over lunchtime I made a few more tests to rule stuff out.

- removed TimerA, to see what happens
- regress/crash seems to stop
- added some padding time after move.w #$2300,sr, to simulate something like a fake TimerA event at the critical time
- doesn't seem to make any difference
- restored TimerA, but prevented it from doing any work (i.e. early-out), however the padding time was left in the VBL to increase the chance of an interaction.
- problem comes back immediately.
- removed equivalent yield from TimerA, bug goes away again
- i.e. bug only occurs with yields in both VBL and TimerA

So it seems to be important that TimerA occurs inside the VBL after all - even if the TimerA just exits again. Note that TimerA also lowers interrupt mask and clears in-service-a bit 5 very early indeed since the rest is not time-critical and would otherwise block important stuff.

Despite finding that TimerA is somehow involved, I still see no evidence of nesting between these two, when the regress occurs. The duration of TimerA also seems irrelevant - although the 'window of opportunity' inside the VBL does appear to be important.

In summary: the bug only occurs if both interrupts (TimerA, VBL) yield early, despite TimerA duration being short, TimerC being too simple to matter, only IKBD remains (and keyboard is not being touched), and VBL spacing being a 55Hz period. I don't believe any other interrupts are on - except FDC/HDC and there are no disk accesses near the crash.

This is very a strange result.

Anyway it's probably best to ignore my concerns about the physbase register for now - there may or may not be a problem with that but it is probably not the same problem, and could just be my fault anyway. The more things I try, the less these two things seem to be related so I'll concentrate on TimerA for now.

I have pasted the relevant interrupt code below. It's slightly simplified from the original release version of the game while investigating this bug - but the code below is what I am using to do that. Apart from the yielding thing used by both VBL and TimerA, there isn't much interesting going on at all. Most of the tricky areas (TimerB, heavy work on TimerA) have been removed, and this has no influence on the bug. TimerC is too simple to be interesting. Note that the videl register group updates happening occasionally in the game code are still present though - not removed those yet.

I'm still doing tests to narrow things further. Will report again when I run out of stuff to try.

    move.w        #$2700,sr
    move.l        d0,-(sp)

; this part is new - used to live in game loop. but shouldn't matter.

    move.l        _bm_physbase,d0
    beq.s        .npb                ; don't update physbase until valid
    lsr.w        #8,d0
    move.l        d0,$ffff8200.w
    clr.b        $ffff820d.w            ; framebuffers are aligned anyway   
    move.l        (sp)+,d0
    move.w        #$2300,sr ; *** enabling this leads to eventual crash

; TimerA, TimerC, IKBD can occur here (unnecessary, really)
; however VBL seems to occur and recurse, which is scary

; *** added fake delay added here to increase chance of interaction

    move.w        #1000,-(sp)
.ll subq.w        #1,(sp)
    bne.s        .ll
    addq.l        #2,sp

; remainder of routine

    tst.w        timer       
    beq.s        .nd
    subq.w        #1,timer
.nd:    addq.l        #1,vbls                ; debug stuff
    addq.w        #1,frame

    addq.l        #1,$462.w
    addq.l        #1,$466.w

    move.w        #$2300,sr ; *** enabling this yield leads to eventual crash
    bclr.b        #5,isra.w ; ***
    movem.l        d0-d7/a0-a6,-(sp)
    move.w        _mux_buffer_index,d0
    move.w        d0,d1
    bchg        #0,d0
    move.w        d0,_mux_buffer_index
    move.l        _mux_buffer_pages,a0
    move.l        (a0,d1.w*4),d1
    lea        $ffff8902.w,a0
    move.l        d1,d2
    add.l        #320*4,d2
    move..b        d1,5(a0)
    lsr.w        #8,d1
    move.l        d1,(a0)
    move.b        d2,17(a0)
    lsr.w        #8,d2
    move.l        d2,12(a0)
;    jsr        _audio_mux_frame ; *** this is expensive, but always < 1 VBL
; disabling this work does not prevent crash
    movem.l        (sp)+,d0-d7/a0-a6
;    bclr.b        #5,isra.w   ; ** replacing yield with this prevents crash

    ifd        bldcfg_profiler ; *** this is disabled
    move..w        d0,-(sp)
    move.w        prf_ctx,d0
    addq.l        #1,(prf_contexts,d0.w*4)
    move.w        (sp)+,d0
    else        ;bldcfg_profiler
    endc        ;bldcfg_profiler
    addq.l        #1,tm_accumulator  ; internal profiling stuff
    addq.l        #1,tm_clock
    addq.l        #1,tm_superticks
    addq.l        #1,$4ba.w
    bclr        #5,isrb.w ; code is tiny - no need to yield early

Mail converted by MHonArc 2.6.19+