Re: [hatari-devel] Linux user-space crashes -> bug in prefetch code when

Re: [hatari-devel] Linux user-space crashes -> bug in prefetch code when doing bus error for page fault

[ Thread Index | Date Index | More lists.tuxfamily.org/hatari-devel Archives ]

To: hatari-devel@xxxxxxxxxxxxxxxxxxx
Subject: Re: [hatari-devel] Linux user-space crashes -> bug in prefetch code when doing bus error for page fault
From: Eero Tamminen <oak@xxxxxxxxxxxxxx>
Date: Tue, 22 Oct 2024 12:17:55 +0300

Hi,

On 20.10.2024 23.46, Nicolas Pomarède wrote:

following this issue, I looked at the problem in more detail in latestjune, but I had no time to fix this the proper way, so I'm adding atemporary fix that should do the job in the meantime (see below)


Thanks, that's great news!

It fixed all BusyBox issues I had documented, and Linux even boots now2x faster (in emulated time). :-)


=> updated docs accordingly.

Regarding this issue, by adding the following line to m68000.c ;

  changed_prefs.cpu_data_cache = false;
we can force data_cache on 68030 to be disabled, even when prefetch modeor CE mode is enabled.


There's nowadays "--data-cache off" option for that.

That is still needed to get Linux booting with 040 and 060 emulation.


	- Eero

In that case the linux kernel is still crashing,so my conclusion is that the problem is not in the cache but in theprefetch code.
I added some code to compare prefetched words with "real" content of RAMand bingo! this showed that each time the linux system gave a core dump,we have some error messages about prefetch mismatch (see recent commit64f88f1548 from 2024/10/18 to enable this at compile time)
Looking at the corresponding code in Linux source and using the builtsymbols shipped with the kernel posted by Eero, we can see that coredump is following a bus error and that this bus error is the result of apage fault when the 68030 MMU "detects" that an address is not present.
In that case a bus error is generated, which calls a specific handlerthat sees it's a page fault and try to read the missing data (from diskor swap space). At the end of this bus error handler the RTE has aspecial behaviour to "replay" the instruction that generated the buserror : now that the page is not faulty anymore, replaying the sameinstruction will now work and program will go on.
After talking about this with Toni (WinUAE) in june we came to theconclusion that it was a bug in the way internal prefetch register aresaved/restored in the case of a bus error.
This is were the emulation has a bug : to replay the instruction it usesthe information stored in a "frame b" stack frame generated by the buserror. This stack frame contains the 3 words that should restore theprefetch words, but unfortunately it restores a slightly shifted list ofprefetch words (from PC+2 or PC=4 instead of PC in that case). CPUemulation will then decode some wrong instructions from these badprefetch words, hence the crash.
As the MMU / bus error code is rather complex in that case, I didn'thave time to dive deeply in it since june, so I'm adding a temporary fix.
This will force a reload of the 3 prefetch words from RAM in the case ofa "frame b" stack frame.
With this fix, Linux is now booting correctly, there's no more core dump(at least not where they were before, I didn't spend hours tryingeverything in this linux image :) ) and if one leaves theWINUAE_FOR_HATARI_DEBUG_PREFETCH_030 #define, this doesn't show any"printf" error about prefetch mismatch.
For further reference when bus error / stack frame can be improved andretested in the future, this is the command I run to get the crash :
hatari --machine tt --tos tos306fr.img --dsp off --fpu 68882 --mmu on-s 14 --ttram 64 --addr24 off -c lilo.cfg --lilo "debug=nfconroot=/dev/sda ro init=/init" --ide-master bb-rootfs.img
Nicolas

Follow-Ups:
- Re: [hatari-devel] Linux user-space crashes -> bug in prefetch code when doing bus error for page fault
  - From: Eero Tamminen

References:
- Re: [hatari-devel] Linux user-space crashes -> bug in prefetch code when doing bus error for page fault
  - From: Nicolas Pomarède

Messages sorted by: [ date | thread ]
Prev by Date: Re: Aw: [hatari-devel] build fails on cirrus-ci for visualstudio target
Next by Date: [hatari-devel] Hatari speed display question
Previous by thread: Re: [hatari-devel] Linux user-space crashes -> bug in prefetch code when doing bus error for page fault
Next by thread: Re: [hatari-devel] Linux user-space crashes -> bug in prefetch code when doing bus error for page fault

Mail converted by MHonArc 2.6.19+

http://listengine.tuxfamily.org/