Bug 281524 - msdosfs error over USB causes a kernel panic
Summary: msdosfs error over USB causes a kernel panic
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: misc (show other bugs)
Version: 14.1-RELEASE
Hardware: amd64 Any
: --- Affects Only Me
Assignee: freebsd-fs (Nobody)
URL:
Keywords: crash
Depends on:
Blocks:
 
Reported: 2024-09-15 22:27 UTC by carlj
Modified: 2024-09-18 10:30 UTC (History)
3 users (show)

See Also:


Attachments
The core.txt file from the kernel dump. (48.26 KB, application/gzip)
2024-09-15 22:27 UTC, carlj
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description carlj 2024-09-15 22:27:07 UTC
Created attachment 253591 [details]
The core.txt file from the kernel dump.

I was working with a msdosfs from a Raspberry Pi Pico mounted on my system, and some kind of error caused the system to unmount, and then the system crashed.  This has now happened twice but a linux system handles this without any problem.  The Pico is running circuitpython which creates its own msdosfs that can be mounted and written to.  I had tried to rename a file on that, and it appears that something about that caused the crash.  This appears similar to Bug ID #185374, but that has been around for years with no resolution.

I have moved the Pico to a Raspberry Pi system running linux to avoid crashing my main system, but I can move it back if needed for further debugging.
Comment 1 Tomasz "CeDeROM" CEDRO 2024-09-16 01:04:37 UTC
Does the problem occur when you use different mount_msdosfs (8) options for the filesystem? For instance try -o longsnames/shortnames/nowin95 and/or -s / -9 flags.

These embedded FAT12/16 emulation implementations are usually limited and may not support all calls (i.e. metadata access).

For instance some time ago I was playing with DAPLink FAT12 implementation.. it only supported single sector due RAM limitations in MCU but was sufficient store a firmware file.. and partition table was not compatible with some FAT implementations (for instance Android always try fsck on disk attach and that failed).. etc etc.

Are you sure nothing in the background (i.e. gvfs / file manager) tries to access the drive, write anything on it (i.e. trashbin), check metadata? Can you try using the disk with no Xorg WM in plain terminal, mount, read files, write files, and see if that helps?


Maybe this macOS / msdosfs issue is somehow related?

https://github.com/adafruit/circuitpython/issues/8449

They had problems with slow writes and IO errors.. but the solution was not to fix the CircuitPython (CP) FAT emulation implementation but the host side. Looks like something similar happens here?

I am not sure but if FreeBSD msdosfs provided some sort of trace / debug it could be possible to pinpoint the issue and fix the CP?
Comment 2 carlj 2024-09-16 22:31:16 UTC
(In reply to Tomasz "CeDeROM" CEDRO from comment #1)
I tried it with the -9 option and couldn't get it to panic.  I then went back to without it and tried to verify that it still would panic, but it didn't panic.  I tried various other things that I had done and still couldn't get it to panic.  That included mounting it with -r which didn't allow changes, so then I did a 'mount -uw' while I was in /mnt and mv'ed the file, but that still didn't panic it.  I also tried unplugging it while it was still mounted, but even that didn't cause a panic.  In that case I did have to umount/mount to see any changes, which I would expect.  In summary, I haven't been able to repeat the original problem, so unless somebody can see something in the core.txt it looks like this report should be closed.  I won't do that yet to see if anybody thinks that it should be kept open.
Comment 3 Tomasz "CeDeROM" CEDRO 2024-09-16 23:19:18 UTC
Thanks Carl, good to hear a temporary solution was found :-)

The problem is still here, and FreeBSD should handle it gracefully, so the fix is needed :-)

I would also suggest reporting an issue on CP GitHub - as you have the device you have the details to fill in the report - you can point the problem is somewhere around Win95 extended attributes.

https://github.com/adafruit/circuitpython/issues

Someone with a JTAG/SWD debug probe attached to the rPI with CP running and mounted on FreeBSD should catch the problem in the CP firmware. I don't have that board sorry but I guess out of bounds memory write happens causing board panic and then BSD host panic. This is interesting to verify BSD msdosfs / fs driver cannot be manipulated by forged FAT device.

One more question - just to eliminate some "background writes" - no problem occurred in terminal with no Xorg or you were working on the Xorg desktop?
Comment 4 carlj 2024-09-17 03:52:54 UTC
(In reply to Tomasz "CeDeROM" CEDRO from comment #3)
Sorry I wasn't clear enough, but I haven't actually found a fix because I can't get the original problem to repeat.  I had just tested with the -9 option and it worked, so then I went back without the -9 option and it still worked.  I then went back and tried with everything I could think of that might have caused the original panic but I couldn't get the system to panic again, still without the -9 option.

You also asked if Xorg was involved, and it probably wasn't.  I did everything from the command line, but it was inside of xfce4-terminal.  So Xorg is running but I don't think that it should be involved in command line work from a terminal.  Xfce does have gvfs running, but I have never seen any evidence that it messes with anything that I do.
Comment 5 carlj 2024-09-17 21:04:26 UTC
I haven't been able to reproduce the problem lately, so I am closing it now.
Comment 6 Xin LI freebsd_committer freebsd_triage 2024-09-17 23:32:29 UTC
I think it's reasonable to expect that the system to not panic (at least, not to panic with a NULL pointer deference) when hardware fails, in this particular case it appears that fsync() decided to give up, but it did not done so gracefully (and should), I think this is suggesting that there was a bug that should be fixed:

(da0:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 00 00 00 06 00 00 01 00 
(da0:umass-sim0:0:0:0): CAM status: SCSI Status Error
(da0:umass-sim0:0:0:0): SCSI status: Check Condition
(da0:umass-sim0:0:0:0): SCSI sense: DATA PROTECT asc:27,0 (Write protected)
(da0:umass-sim0:0:0:0): Info: 0
(da0:umass-sim0:0:0:0): Error 13, Unretryable error
g_vfs_done():da0s1[WRITE(offset=2560, length=512)]error = 13
(da0:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 00 00 00 06 00 00 01 00 
(da0:umass-sim0:0:0:0): CAM status: SCSI Status Error
(da0:umass-sim0:0:0:0): SCSI status: Check Condition
(da0:umass-sim0:0:0:0): SCSI sense: DATA PROTECT asc:27,0 (Write protected)
(da0:umass-sim0:0:0:0): Info: 0
(da0:umass-sim0:0:0:0): Error 13, Unretryable error
g_vfs_done():da0s1[WRITE(offset=2560, length=512)]error = 13
(da0:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 00 00 00 06 00 00 01 00 
(da0:umass-sim0:0:0:0): CAM status: SCSI Status Error
(da0:umass-sim0:0:0:0): SCSI status: Check Condition
(da0:umass-sim0:0:0:0): SCSI sense: DATA PROTECT asc:27,0 (Write protected)
(da0:umass-sim0:0:0:0): Info: 0
(da0:umass-sim0:0:0:0): Error 13, Unretryable error
g_vfs_done():da0s1[WRITE(offset=2560, length=512)]error = 13
fsync: giving up on dirty (error = 13) 0xfffff80246d73e00: type VCHR state VSTATE_CONSTRUCTED op 0xffffffff818abe28
    usecount 1, writecount 0, refcount 8 seqc users 0 rdev 0xfffff80006679000
    hold count flags ()
    flags ()
    v_object 0xfffff800210e9840 ref 0 pages 5 cleanbuf 5 dirtybuf 1
    lock type mntfs: EXCL by thread 0xfffff802183f5000 (pid 2401, umount, tid 102536)


Fatal trap 12: page fault while in kernel mode
cpuid = 3; apic id = 03
fault virtual address	= 0x1000102a4
fault code		= supervisor read data, page not present
instruction pointer	= 0x20:0xffffffff80a75445
stack pointer	        = 0x28:0xfffffe016bd30a20
frame pointer	        = 0x28:0xfffffe016bd30a40
code segment		= base rx0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 2401 (umount)
rdi: fffff80246d73ee0 rsi: fffffe0017e2dbe0 rdx: 0000000000000000
rcx: ffffffffffffffe0  r8: fffffe000fe19420  r9: 0000000000000005
rax: 0000000100010204 rbx: fffffe0017e2dbe0 rbp: fffffe016bd30a40
r10: 0000000000000005 r11: fffff800038cb000 r12: 00000000a00010a4
r13: 0000000000000000 r14: fffff80246d73e00 r15: 0000000000000000
trap number		= 12
panic: page fault
cpuid = 3
time = 1726433998
KDB: stack backtrace:
#0 0xffffffff80b7fefd at kdb_backtrace+0x5d
#1 0xffffffff80b32bd1 at vpanic+0x131
#2 0xffffffff80b32a93 at panic+0x43
#3 0xffffffff8100091b at trap_fatal+0x40b
#4 0xffffffff81000966 at trap_pfault+0x46
#5 0xffffffff80fd6d48 at calltrap+0x8
#6 0xffffffff80bedfda at bufwrite+0x1da
#7 0xffffffff80c310b0 at vn_fsync_buf+0x230
#8 0xffffffff80bee16b at bufsync+0x3b
#9 0xffffffff80c14e7f at bufobj_invalbuf+0x19f
#10 0xffffffff80c17e9e at vgonel+0x26e
#11 0xffffffff80c183b1 at vgone+0x31
#12 0xffffffff809c370e at mntfs_freevp+0xe
#13 0xffffffff809ca05e at msdosfs_unmount+0x1de
#14 0xffffffff80c0cb67 at dounmount+0x787
#15 0xffffffff80c0c375 at kern_unmount+0x2f5
#16 0xffffffff810011c0 at amd64_syscall+0x100
#17 0xffffffff80fd765b at fast_syscall_common+0xf8
Uptime: 37m28s
Dumping 880 out of 14229 MB:..2%..11%..22%..31%..42%..51%..62%..71%..82%..91%
Comment 7 Konstantin Belousov freebsd_committer freebsd_triage 2024-09-18 10:30:20 UTC
(In reply to Xin LI from comment #6)
A backtrace from vmcore, with line numbers and possibly function args, is
needed to start doing meaningful analysis.