Re: [hatari-devel] Linux user-space crashes -> bug in prefetch code when doing bus error for page fault

[ Thread Index | Date Index | More lists.tuxfamily.org/hatari-devel Archives ]


Le 19/06/2024 à 20:44, Eero Tamminen a écrit :
Hi,

On 19.6.2024 18.54, Nicolas Pomarède wrote:
I made some tries with --ide-master bb-rootfs.img (CPU MMU is enabled in all cases) :

  - no prefetch, no CE, no data cache : 'ls' works, no segfault

  - prefetch, no CE, no data cache : segfault

  - CE, no data cache : segfault

  - no prefetch, no CE, data cache ON : no segfault

This looks different to what you wrote "has never behaved correctly when data-cache is enabled", because in my tests I see that it works with data cache and the cause of the segfault would be related to the difference between normal and prefetch (as CE also includes prefetch we can leave it on the side for now)

Could you check if you still get segfault with data cache on and no prefetch / no CE ?

If both CE & prefetch are disabled, so is data-cache.

If data-cache _is_ allowed without them (see attached patch), then Linux kernel crashes already when it reaches init:

Hi

following this issue, I looked at the problem in more detail in latest june, but I had no time to fix this the proper way, so I'm adding a temporary fix that should do the job in the meantime (see below)

Regarding this issue, by adding the following line to m68000.c ;

 changed_prefs.cpu_data_cache = false;

we can force data_cache on 68030 to be disabled, even when prefetch mode or CE mode is enabled. In that case the linux kernel is still crashing, so my conclusion is that the problem is not in the cache but in the prefetch code.

I added some code to compare prefetched words with "real" content of RAM and bingo! this showed that each time the linux system gave a core dump, we have some error messages about prefetch mismatch (see recent commit 64f88f1548 from 2024/10/18 to enable this at compile time)

Looking at the corresponding code in Linux source and using the built symbols shipped with the kernel posted by Eero, we can see that core dump is following a bus error and that this bus error is the result of a page fault when the 68030 MMU "detects" that an address is not present.

In that case a bus error is generated, which calls a specific handler that sees it's a page fault and try to read the missing data (from disk or swap space). At the end of this bus error handler the RTE has a special behaviour to "replay" the instruction that generated the bus error : now that the page is not faulty anymore, replaying the same instruction will now work and program will go on.

After talking about this with Toni (WinUAE) in june we came to the conclusion that it was a bug in the way internal prefetch register are saved/restored in the case of a bus error.

This is were the emulation has a bug : to replay the instruction it uses the information stored in a "frame b" stack frame generated by the bus error. This stack frame contains the 3 words that should restore the prefetch words, but unfortunately it restores a slightly shifted list of prefetch words (from PC+2 or PC=4 instead of PC in that case). CPU emulation will then decode some wrong instructions from these bad prefetch words, hence the crash.

As the MMU / bus error code is rather complex in that case, I didn't have time to dive deeply in it since june, so I'm adding a temporary fix.

This will force a reload of the 3 prefetch words from RAM in the case of a "frame b" stack frame.

With this fix, Linux is now booting correctly, there's no more core dump (at least not where they were before, I didn't spend hours trying everything in this linux image :) ) and if one leaves the WINUAE_FOR_HATARI_DEBUG_PREFETCH_030 #define, this doesn't show any "printf" error about prefetch mismatch.


For further reference when bus error / stack frame can be improved and retested in the future, this is the command I run to get the crash :

hatari --machine tt --tos tos306fr.img --dsp off --fpu 68882 --mmu on -s 14 --ttram 64 --addr24 off -c lilo.cfg --lilo "debug=nfcon root=/dev/sda ro init=/init" --ide-master bb-rootfs.img


Nicolas





Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/