Bug 205932 - [panic] Kernel panic when copying from ext2fs partition to UFS partition
Summary: [panic] Kernel panic when copying from ext2fs partition to UFS partition
Status: Closed DUPLICATE of bug 205816
Alias: None
Product: Base System
Classification: Unclassified
Component: misc (show other bugs)
Version: 10.2-RELEASE
Hardware: amd64 Any
: --- Affects Many People
Assignee: freebsd-bugs mailing list
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-01-05 19:49 UTC by Will B
Modified: 2016-02-09 03:36 UTC (History)
7 users (show)

See Also:


Attachments
Verbose kernel panic textual information (82.40 KB, text/plain)
2016-01-05 19:49 UTC, Will B
no flags Details
core dump, text format (107.97 KB, text/plain)
2016-01-11 18:46 UTC, Torfinn Ingolfsen
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Will B 2016-01-05 19:49:15 UTC
Created attachment 165117 [details]
Verbose kernel panic textual information

When copying files from an ext4 partition on device ada1 mounted read-only to a user's home directory on the FreeBSD root (UFS) partition on device ada0, a kernel panic occurs after around 100 files are copied.

It appears at least one other person has experienced this recently:
http://freebsd.1045724.n5.nabble.com/Ext4-Kernel-Panic-td6025605.html

This kernel panic happens after a fresh install with NO packages or ports installed yet and both before AND after performing system update with freebsd-update.


COMMANDS ISSUED (under root account):
- - -
# mount -t ext2fs -o ro /dev/ada1p4 /mnt/adisk

# cd /mnt/adisk

# ls
lost+found will

# cp -Rnv will /home/will
(KP happens after about 100 files copied)
- - -


CRASH INFO:
- - -
Dump header from device /dev/ada0p3
  Architecture: amd64
  Architecture Version: 2
  Dump Length: 714723328B (681 MB)
  Blocksize: 512
  Dumptime: Tue Jan  5 10:51:56 2016
  Hostname: will-freebsd
  Magic: FreeBSD Kernel Dump
  Version String: FreeBSD 10.2-RELEASE #0 r286666: Wed Aug 12 15:26:37 UTC 2015
    root@releng1.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC
  Panic String: __lockmgr_args: recursing on non recursive lockmgr getblk @ (null):0

  Dump Parity: 3650188378
  Bounds: 1
  Dump Status: good
- - -

The more verbose crash text is attached.
Comment 1 Will B 2016-01-06 05:18:16 UTC
Testing the same ext4->UFS copy with FreeBSD 10.2 amd64 in VirtualBox also yields the same kernel panic with the same panic type, so it appears to rule out hardware.

I also tried adding 'sync' to the mount options in /etc/fstab but this made no difference...the kernel panic still happens.
Comment 2 Jason Unovitch freebsd_committer 2016-01-08 00:21:14 UTC
I noticed pfg@ (CC'd) just fixed a panic in reading ext4 recently.  He may have some additional insight.
Comment 3 Pedro F. Giffuni freebsd_committer 2016-01-08 02:34:40 UTC
Thank you for the report: it is unclear if this is a new panic.
I am CC'ing Zheng Liu as he is the ext4 implementation expert.
Comment 4 Damjan Jovanovic 2016-01-08 03:53:54 UTC
sys_write -> write -> ext2_bmap -> ext4_ext_find_extent -> ... -> panic, so looks like a memory mapped file is being written, during which reading from the mapped memory causes a panic.

It seems a specific file is causing this (as opposed to iteration over the filesystem). You could try to narrow it down - use "find" to find the 100th or so file you suspect, and then copy the files you suspect are causing the panic to narrow it down to the single file that actually does. I suspect that file is sparse, causing a different manifestation of bug 205816 (panic instead of garbage data). Could you test the patch from there?
Comment 5 Will B 2016-01-08 05:32:26 UTC
(In reply to Damjan Jovanovic from comment #4)
Thank you.

The file that it happens on is different each time, and as far as I can tell, they are not sparse files.
Comment 6 Damjan Jovanovic 2016-01-08 17:10:36 UTC
(In reply to Will B from comment #5)

In the "cp" tool, function copy_file() in file /usr/src/bin/cp/utils.c has 2 ways of copying a file: mmap + write, and read + write.

mmap + write is used only when all of the following are true:
* VM_AND_BUFFER_CACHE_SYNCHRONIZED is defined (it is, in Makefile)
* It's a regular file (according to S_ISREG())
* The file's size is > 0
* The file's size is <= 8 MiB
* Calling mmap() succeeds

You say it happens on different files each time. My guess is all of the files that it happens on, are regular non-empty files <= 8 MiB in size.

There's 2 known bugs where mmap() succeeds but reading the mapped contents can cause problems:

1. The serious bug 205938 where struct buf leaks from ext4_bmapext(), causing a panic on every attempt to read even 1 byte of the mapped data. Because your trace shows the panic happens with lbn=0 (the very first block of the file) this is probably the problem here. This bug was fixed in CURRENT yesterday, which is newer than your 10.2-RELEASE. You could try the patch from that bug, it does apply and work for me (for a different mmap problem) on 10.2-RELEASE.

2. Even with that bug fixed, sparse blocks will at best contain garbage data read from the wrong disk blocks instead of zeroes, corrupting the copied file you create (maybe a panic is also possible?). This is bug 205816 which I previously mentioned. This isn't even fixed in CURRENT, but there is a patch on that bug you can try.

(BTW you could also use ext4fuse from ports instead of the in-kernel ext2fs driver since it doesn't suffer from problem 1, but it does suffer from problem 2, and I don't know/have a patch for it)

With both those patches, copying files from EXT4 should be reliable and correct, but very slow in the mmap + write case (about 250 kB/second with 100% CPU usage) which is unbearably slow if you have lots of files <= 8 MiB. I have a 1 line patch to speed it up about 100-fold, but it only works on CURRENT - 10.2 panics with it. Will keep trying.

This is the sad state of current EXT4 support outside Linux...
Comment 7 Will B 2016-01-08 18:37:06 UTC
(In reply to Damjan Jovanovic from comment #6)
Thank you for that explanation, Damjan.  

Fortunately Midnight Commander copies the files just fine, so I used it instead of cp to bring over my 160 GB of data from the ext4 partition.

Thanks for the patches.  I don't think I'll be able to apply them right away, but maybe I'll get FreeBSD running in a VM and do it that way as my real metal is used for my busy day-to-day business.

Thanks again! :-)
Comment 8 Pedro F. Giffuni freebsd_committer 2016-01-08 19:40:44 UTC
(In reply to Will B from comment #7)

Hello;

Unfortunately, testing the ext4 support is difficult because most testsuites out there want to be able to write to the filesystem, so the ext4 support has received a lot less testing than ext 2/3.

I think the sysutils/fusefs-lkl may be in a better position functionally among the alternatives.

Damjan's patches look good and may have some effect here but if we can't reproduce this particular bug we will have to close the issue.
Comment 9 Damjan Jovanovic 2016-01-09 09:38:05 UTC
On 10.2 I get a very similar panic to Will B when reading mmaped files, and the patch from bug 205938 fixes it, so this is probably a duplicate of 205938:

panic: __lockmgr_args: recursing on non recusive lockmgr getblk @ (null):0

And the stack trace (details left out, due to i386 vs amd64 differences):

db_trace_self_wrapper
kdb_backtrace
vpanic
panic
__lockmgr_args
getblk
breadn_flags
ext4_ext_find_extent
ext2_bmap
VOP_BMAP_APV
vnode_pager_generic_getpages
ext2_getpages
VOP_GETPAGES_APV
vnode_pager_getpages
vm_fault_hold
vm_fault
trap_pfault
trap
calltrap

Oh and sysutils/fusefs-lkl looks awesome, thank you! It should also help other projects interoperating with Linux drivers, such as webcamd.
Comment 10 Torfinn Ingolfsen 2016-01-10 13:06:36 UTC
FWIW, I see this with ext3 as well:
I get a repeatable panic when trying to copy files from an ext3 filesystewm which is mounted read-only. Listing files works ok.
Details:
root at kg-u35jc# uname -a
FreeBSD kg-u35jc.kg4.no 10.1-STABLE FreeBSD 10.1-STABLE #0 r283269: Fri May 22 09:14:57 CEST 2015
     root at kg-u35jc.kg4.no:/usr/obj/usr/src/sys/GENERIC  amd64

Steps to reproduce:
# kldload ext2fs
# mount -r -t ext2fs /dev/ada0s5 /mnt
# cp /mnt/whatever-file <somedir>

Results in a nice panic:
root at kg-u35jc# cat /var/crash/info.0
Dump header from device /dev/ada0s3b
  Architecture: amd64
  Architecture Version: 2
  Dump Length: 1075154944B (1025 MB)
  Blocksize: 512
  Dumptime: Mon May 25 20:07:55 2015
  Hostname: kg-u35jc.kg4.no
  Magic: FreeBSD Kernel Dump
  Version String: FreeBSD 10.1-STABLE #0 r283269: Fri May 22 09:14:57 CEST 2015
    root at kg-u35jc.kg4.no:/usr/obj/usr/src/sys/GENERIC
  Panic String: __lockmgr_args: recursing on non recursive lockmgr getblk @ (null):0

  Dump Parity: 3531684913
  Bounds: 0
  Dump Status: good

which is easy to reproduce:
root at kg-u35jc# cat /var/crash/info.1
Dump header from device /dev/ada0s3b
  Architecture: amd64
  Architecture Version: 2
  Dump Length: 478572544B (456 MB)
  Blocksize: 512
  Dumptime: Mon May 25 20:24:29 2015
  Hostname: kg-u35jc.kg4.no
  Magic: FreeBSD Kernel Dump
  Version String: FreeBSD 10.1-STABLE #0 r283269: Fri May 22 09:14:57 CEST 2015
    root at kg-u35jc.kg4.no:/usr/obj/usr/src/sys/GENERIC
  Panic String: __lockmgr_args: recursing on non recursive lockmgr getblk @ (null):0

  Dump Parity: 4100829037
  Bounds: 1
  Dump Status: good
Comment 11 Damjan Jovanovic 2016-01-10 17:40:42 UTC
Hi Torfinn

Please provide the output of "stat -x /mnt/whatever-file" for the file that causes the panic, and if you can, a stack trace of the panic.
Comment 12 Torfinn Ingolfsen 2016-01-11 18:40:42 UTC
Ok, this is the file that causes the crash:
root@kg-u35jc# file /mnt/home/tingo/.mozilla/firefox/1xz9ipeh.default/permissions.sqlite
/mnt/home/tingo/.mozilla/firefox/1xz9ipeh.default/permissions.sqlite: SQLite 3.x database, user version 4

and 'stat -x' output as requested:
root@kg-u35jc# stat -x /mnt/home/tingo/.mozilla/firefox/1xz9ipeh.default/permissions.sqlite
  File: "/mnt/home/tingo/.mozilla/firefox/1xz9ipeh.default/permissions.sqlite"
  Size: 54272        FileType: Regular File
  Mode: (0644/-rw-r--r--)         Uid: ( 1000/  (1000))  Gid: ( 1000/  ltingo)
Device: 0,107   Inode: 11272365    Links: 1
Access: Wed Oct  7 21:50:35 2015
Modify: Wed Oct  7 21:50:35 2015
Change: Wed Oct  7 21:50:35 2015

dump info
root@kg-u35jc# cat /var/crash/info.4
Dump header from device /dev/ada0s3b
  Architecture: amd64
  Architecture Version: 2
  Dump Length: 445444096B (424 MB)
  Blocksize: 512
  Dumptime: Mon Jan 11 19:27:16 2016
  Hostname: kg-u35jc.kg4.no
  Magic: FreeBSD Kernel Dump
  Version String: FreeBSD 10.1-STABLE #0 r283269: Fri May 22 09:14:57 CEST 2015
    root@kg-u35jc.kg4.no:/usr/obj/usr/src/sys/GENERIC
  Panic String: __lockmgr_args: recursing on non recursive lockmgr getblk @ (null):0

  Dump Parity: 762402152
  Bounds: 4
  Dump Status: good
Comment 13 Torfinn Ingolfsen 2016-01-11 18:46:03 UTC
Created attachment 165409 [details]
core dump, text format

corresponding info file:
root@kg-u35jc# cat /var/crash/info.4
Dump header from device /dev/ada0s3b
  Architecture: amd64
  Architecture Version: 2
  Dump Length: 445444096B (424 MB)
  Blocksize: 512
  Dumptime: Mon Jan 11 19:27:16 2016
  Hostname: kg-u35jc.kg4.no
  Magic: FreeBSD Kernel Dump
  Version String: FreeBSD 10.1-STABLE #0 r283269: Fri May 22 09:14:57 CEST 2015
    root@kg-u35jc.kg4.no:/usr/obj/usr/src/sys/GENERIC
  Panic String: __lockmgr_args: recursing on non recursive lockmgr getblk @ (null):0

  Dump Parity: 762402152
  Bounds: 4
  Dump Status: good
Comment 14 Damjan Jovanovic 2016-01-13 00:12:26 UTC
(In reply to Torfinn Ingolfsen from comment #13)

Thank you.

Firstly what makes you think that filesystem is EXT3? It's trying to use ext4_ext_find_extent(), so it looks more like EXT4. Is the inode corrupt, containing the IN_E4EXTENTS flag which is invalid for EXT3 (which we should ignore?), or are you mistaken about the version?
Comment 15 Pedro F. Giffuni freebsd_committer 2016-01-13 03:36:45 UTC
(In reply to Damjan Jovanovic from comment #14)

I think it was not uncommon at the end of ext3 lifecycle for linux
distributions to backport ext4 features to ext3.

I think now it is more plausible to expect that people can be using
the ext4 driver on ext3 formatted disks and the ext4 driver may be
taking "liberties". :(
Comment 16 Torfinn Ingolfsen 2016-01-13 21:56:10 UTC
(In reply to Damjan Jovanovic from comment #14)
Good question. It turns out it was false assumptions. I assumed that disktype (sysutils/disktype) reported correctly:
root@kg-u35jc# disktype /dev/ada0s5

--- /dev/ada0s5
Character device, size 204.9 GiB (220010119168 bytes)
Ext3 file system
  UUID AE5DE014-E0B5-4045-80CA-D4D6FF37AA79 (DCE, v4)
  Last mounted at "/home/tingo/mpoint"
  Volume size 204.9 GiB (220010119168 bytes, 53713408 blocks of 4 KiB)
(at least it is consistent - it reports exactly the same when I run it on Linux)

I booted the machine in Linux and did a check:
[tingo@kg-u35jc ~]$ lsblk -f /dev/sda5
NAME FSTYPE LABEL UUID                                 MOUNTPOINT
sda5 ext4         ae5de014-e0b5-4045-80ca-d4d6ff37aa79 /mnt

So, yes - I was mistaken. The filesystem is ext4. Sorry for my inaccurate reporting.
Comment 17 Damjan Jovanovic 2016-01-14 00:07:47 UTC
(In reply to Torfinn Ingolfsen from comment #16)

In that case, your issue is the same as Will B's, and this bug is probably a duplicate of bug 209538 as per comment 9. Please retry with CURRENT or that patch backported.
Comment 18 Torfinn Ingolfsen 2016-01-14 20:21:26 UTC
(In reply to Damjan Jovanovic from comment #17)
You are correct. I implemented the patch from Bug 205938, and when I copy files from a read-only mounted ext4 now, it works - no crashes. Thanks!
Comment 19 Pedro F. Giffuni freebsd_committer 2016-02-09 03:36:45 UTC
This appears to be a duplicate or at least very closely related to the issues with sparse files recently fixed.
10.3 Release will improve the situation but the complete issue is only fixed in current.

Thank you for the report!

*** This bug has been marked as a duplicate of bug 205816 ***