Created attachment 165117 [details] Verbose kernel panic textual information When copying files from an ext4 partition on device ada1 mounted read-only to a user's home directory on the FreeBSD root (UFS) partition on device ada0, a kernel panic occurs after around 100 files are copied. It appears at least one other person has experienced this recently: http://freebsd.1045724.n5.nabble.com/Ext4-Kernel-Panic-td6025605.html This kernel panic happens after a fresh install with NO packages or ports installed yet and both before AND after performing system update with freebsd-update. COMMANDS ISSUED (under root account): - - - # mount -t ext2fs -o ro /dev/ada1p4 /mnt/adisk # cd /mnt/adisk # ls lost+found will # cp -Rnv will /home/will (KP happens after about 100 files copied) - - - CRASH INFO: - - - Dump header from device /dev/ada0p3 Architecture: amd64 Architecture Version: 2 Dump Length: 714723328B (681 MB) Blocksize: 512 Dumptime: Tue Jan 5 10:51:56 2016 Hostname: will-freebsd Magic: FreeBSD Kernel Dump Version String: FreeBSD 10.2-RELEASE #0 r286666: Wed Aug 12 15:26:37 UTC 2015 root@releng1.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC Panic String: __lockmgr_args: recursing on non recursive lockmgr getblk @ (null):0 Dump Parity: 3650188378 Bounds: 1 Dump Status: good - - - The more verbose crash text is attached.
Testing the same ext4->UFS copy with FreeBSD 10.2 amd64 in VirtualBox also yields the same kernel panic with the same panic type, so it appears to rule out hardware. I also tried adding 'sync' to the mount options in /etc/fstab but this made no difference...the kernel panic still happens.
I noticed pfg@ (CC'd) just fixed a panic in reading ext4 recently. He may have some additional insight.
Thank you for the report: it is unclear if this is a new panic. I am CC'ing Zheng Liu as he is the ext4 implementation expert.
sys_write -> write -> ext2_bmap -> ext4_ext_find_extent -> ... -> panic, so looks like a memory mapped file is being written, during which reading from the mapped memory causes a panic. It seems a specific file is causing this (as opposed to iteration over the filesystem). You could try to narrow it down - use "find" to find the 100th or so file you suspect, and then copy the files you suspect are causing the panic to narrow it down to the single file that actually does. I suspect that file is sparse, causing a different manifestation of bug 205816 (panic instead of garbage data). Could you test the patch from there?
(In reply to Damjan Jovanovic from comment #4) Thank you. The file that it happens on is different each time, and as far as I can tell, they are not sparse files.
(In reply to Will B from comment #5) In the "cp" tool, function copy_file() in file /usr/src/bin/cp/utils.c has 2 ways of copying a file: mmap + write, and read + write. mmap + write is used only when all of the following are true: * VM_AND_BUFFER_CACHE_SYNCHRONIZED is defined (it is, in Makefile) * It's a regular file (according to S_ISREG()) * The file's size is > 0 * The file's size is <= 8 MiB * Calling mmap() succeeds You say it happens on different files each time. My guess is all of the files that it happens on, are regular non-empty files <= 8 MiB in size. There's 2 known bugs where mmap() succeeds but reading the mapped contents can cause problems: 1. The serious bug 205938 where struct buf leaks from ext4_bmapext(), causing a panic on every attempt to read even 1 byte of the mapped data. Because your trace shows the panic happens with lbn=0 (the very first block of the file) this is probably the problem here. This bug was fixed in CURRENT yesterday, which is newer than your 10.2-RELEASE. You could try the patch from that bug, it does apply and work for me (for a different mmap problem) on 10.2-RELEASE. 2. Even with that bug fixed, sparse blocks will at best contain garbage data read from the wrong disk blocks instead of zeroes, corrupting the copied file you create (maybe a panic is also possible?). This is bug 205816 which I previously mentioned. This isn't even fixed in CURRENT, but there is a patch on that bug you can try. (BTW you could also use ext4fuse from ports instead of the in-kernel ext2fs driver since it doesn't suffer from problem 1, but it does suffer from problem 2, and I don't know/have a patch for it) With both those patches, copying files from EXT4 should be reliable and correct, but very slow in the mmap + write case (about 250 kB/second with 100% CPU usage) which is unbearably slow if you have lots of files <= 8 MiB. I have a 1 line patch to speed it up about 100-fold, but it only works on CURRENT - 10.2 panics with it. Will keep trying. This is the sad state of current EXT4 support outside Linux...
(In reply to Damjan Jovanovic from comment #6) Thank you for that explanation, Damjan. Fortunately Midnight Commander copies the files just fine, so I used it instead of cp to bring over my 160 GB of data from the ext4 partition. Thanks for the patches. I don't think I'll be able to apply them right away, but maybe I'll get FreeBSD running in a VM and do it that way as my real metal is used for my busy day-to-day business. Thanks again! :-)
(In reply to Will B from comment #7) Hello; Unfortunately, testing the ext4 support is difficult because most testsuites out there want to be able to write to the filesystem, so the ext4 support has received a lot less testing than ext 2/3. I think the sysutils/fusefs-lkl may be in a better position functionally among the alternatives. Damjan's patches look good and may have some effect here but if we can't reproduce this particular bug we will have to close the issue.
On 10.2 I get a very similar panic to Will B when reading mmaped files, and the patch from bug 205938 fixes it, so this is probably a duplicate of 205938: panic: __lockmgr_args: recursing on non recusive lockmgr getblk @ (null):0 And the stack trace (details left out, due to i386 vs amd64 differences): db_trace_self_wrapper kdb_backtrace vpanic panic __lockmgr_args getblk breadn_flags ext4_ext_find_extent ext2_bmap VOP_BMAP_APV vnode_pager_generic_getpages ext2_getpages VOP_GETPAGES_APV vnode_pager_getpages vm_fault_hold vm_fault trap_pfault trap calltrap Oh and sysutils/fusefs-lkl looks awesome, thank you! It should also help other projects interoperating with Linux drivers, such as webcamd.
FWIW, I see this with ext3 as well: I get a repeatable panic when trying to copy files from an ext3 filesystewm which is mounted read-only. Listing files works ok. Details: root at kg-u35jc# uname -a FreeBSD kg-u35jc.kg4.no 10.1-STABLE FreeBSD 10.1-STABLE #0 r283269: Fri May 22 09:14:57 CEST 2015 root at kg-u35jc.kg4.no:/usr/obj/usr/src/sys/GENERIC amd64 Steps to reproduce: # kldload ext2fs # mount -r -t ext2fs /dev/ada0s5 /mnt # cp /mnt/whatever-file <somedir> Results in a nice panic: root at kg-u35jc# cat /var/crash/info.0 Dump header from device /dev/ada0s3b Architecture: amd64 Architecture Version: 2 Dump Length: 1075154944B (1025 MB) Blocksize: 512 Dumptime: Mon May 25 20:07:55 2015 Hostname: kg-u35jc.kg4.no Magic: FreeBSD Kernel Dump Version String: FreeBSD 10.1-STABLE #0 r283269: Fri May 22 09:14:57 CEST 2015 root at kg-u35jc.kg4.no:/usr/obj/usr/src/sys/GENERIC Panic String: __lockmgr_args: recursing on non recursive lockmgr getblk @ (null):0 Dump Parity: 3531684913 Bounds: 0 Dump Status: good which is easy to reproduce: root at kg-u35jc# cat /var/crash/info.1 Dump header from device /dev/ada0s3b Architecture: amd64 Architecture Version: 2 Dump Length: 478572544B (456 MB) Blocksize: 512 Dumptime: Mon May 25 20:24:29 2015 Hostname: kg-u35jc.kg4.no Magic: FreeBSD Kernel Dump Version String: FreeBSD 10.1-STABLE #0 r283269: Fri May 22 09:14:57 CEST 2015 root at kg-u35jc.kg4.no:/usr/obj/usr/src/sys/GENERIC Panic String: __lockmgr_args: recursing on non recursive lockmgr getblk @ (null):0 Dump Parity: 4100829037 Bounds: 1 Dump Status: good
Hi Torfinn Please provide the output of "stat -x /mnt/whatever-file" for the file that causes the panic, and if you can, a stack trace of the panic.
Ok, this is the file that causes the crash: root@kg-u35jc# file /mnt/home/tingo/.mozilla/firefox/1xz9ipeh.default/permissions.sqlite /mnt/home/tingo/.mozilla/firefox/1xz9ipeh.default/permissions.sqlite: SQLite 3.x database, user version 4 and 'stat -x' output as requested: root@kg-u35jc# stat -x /mnt/home/tingo/.mozilla/firefox/1xz9ipeh.default/permissions.sqlite File: "/mnt/home/tingo/.mozilla/firefox/1xz9ipeh.default/permissions.sqlite" Size: 54272 FileType: Regular File Mode: (0644/-rw-r--r--) Uid: ( 1000/ (1000)) Gid: ( 1000/ ltingo) Device: 0,107 Inode: 11272365 Links: 1 Access: Wed Oct 7 21:50:35 2015 Modify: Wed Oct 7 21:50:35 2015 Change: Wed Oct 7 21:50:35 2015 dump info root@kg-u35jc# cat /var/crash/info.4 Dump header from device /dev/ada0s3b Architecture: amd64 Architecture Version: 2 Dump Length: 445444096B (424 MB) Blocksize: 512 Dumptime: Mon Jan 11 19:27:16 2016 Hostname: kg-u35jc.kg4.no Magic: FreeBSD Kernel Dump Version String: FreeBSD 10.1-STABLE #0 r283269: Fri May 22 09:14:57 CEST 2015 root@kg-u35jc.kg4.no:/usr/obj/usr/src/sys/GENERIC Panic String: __lockmgr_args: recursing on non recursive lockmgr getblk @ (null):0 Dump Parity: 762402152 Bounds: 4 Dump Status: good
Created attachment 165409 [details] core dump, text format corresponding info file: root@kg-u35jc# cat /var/crash/info.4 Dump header from device /dev/ada0s3b Architecture: amd64 Architecture Version: 2 Dump Length: 445444096B (424 MB) Blocksize: 512 Dumptime: Mon Jan 11 19:27:16 2016 Hostname: kg-u35jc.kg4.no Magic: FreeBSD Kernel Dump Version String: FreeBSD 10.1-STABLE #0 r283269: Fri May 22 09:14:57 CEST 2015 root@kg-u35jc.kg4.no:/usr/obj/usr/src/sys/GENERIC Panic String: __lockmgr_args: recursing on non recursive lockmgr getblk @ (null):0 Dump Parity: 762402152 Bounds: 4 Dump Status: good
(In reply to Torfinn Ingolfsen from comment #13) Thank you. Firstly what makes you think that filesystem is EXT3? It's trying to use ext4_ext_find_extent(), so it looks more like EXT4. Is the inode corrupt, containing the IN_E4EXTENTS flag which is invalid for EXT3 (which we should ignore?), or are you mistaken about the version?
(In reply to Damjan Jovanovic from comment #14) I think it was not uncommon at the end of ext3 lifecycle for linux distributions to backport ext4 features to ext3. I think now it is more plausible to expect that people can be using the ext4 driver on ext3 formatted disks and the ext4 driver may be taking "liberties". :(
(In reply to Damjan Jovanovic from comment #14) Good question. It turns out it was false assumptions. I assumed that disktype (sysutils/disktype) reported correctly: root@kg-u35jc# disktype /dev/ada0s5 --- /dev/ada0s5 Character device, size 204.9 GiB (220010119168 bytes) Ext3 file system UUID AE5DE014-E0B5-4045-80CA-D4D6FF37AA79 (DCE, v4) Last mounted at "/home/tingo/mpoint" Volume size 204.9 GiB (220010119168 bytes, 53713408 blocks of 4 KiB) (at least it is consistent - it reports exactly the same when I run it on Linux) I booted the machine in Linux and did a check: [tingo@kg-u35jc ~]$ lsblk -f /dev/sda5 NAME FSTYPE LABEL UUID MOUNTPOINT sda5 ext4 ae5de014-e0b5-4045-80ca-d4d6ff37aa79 /mnt So, yes - I was mistaken. The filesystem is ext4. Sorry for my inaccurate reporting.
(In reply to Torfinn Ingolfsen from comment #16) In that case, your issue is the same as Will B's, and this bug is probably a duplicate of bug 209538 as per comment 9. Please retry with CURRENT or that patch backported.
(In reply to Damjan Jovanovic from comment #17) You are correct. I implemented the patch from Bug 205938, and when I copy files from a read-only mounted ext4 now, it works - no crashes. Thanks!
This appears to be a duplicate or at least very closely related to the issues with sparse files recently fixed. 10.3 Release will improve the situation but the complete issue is only fixed in current. Thank you for the report! *** This bug has been marked as a duplicate of bug 205816 ***