Summary: | 11.1-RC1 kernel panic (zfs recv) | ||
---|---|---|---|
Product: | Base System | Reporter: | John Kennedy <warlock> |
Component: | kern | Assignee: | Graham Perrin <grahamperrin> |
Status: | Closed Overcome By Events | ||
Severity: | Affects Some People | CC: | re |
Priority: | --- | Keywords: | crash, regression |
Version: | 11.0-STABLE | ||
Hardware: | amd64 | ||
OS: | Any | ||
See Also: | https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=161533 |
Description
John Kennedy
2017-07-04 20:21:14 UTC
So I've done a bunch of I/O trying to narrow this down. Between 20170526 and 20170528 on a system where I thought some SSD drives were failing (recoverable checksum errors), I zfs sent | zfs recv the root disk (pool zroot) to a different disk, which I then moved to a different (current) box. Now I'm trying to move it again, and running into problems. It looks like I probably installed from installation media (11.0-RELEASE-p1 #0 r306420) and upgraded to -p10 after using beadm (possibly with some CMOS clock offset looking at the timestamps, but that should all just be relative time). I was running FreeBSD 11.0-RELEASE-p10 #0 r317487+8c96ad701987(releng/11.0) during the time that the bad snapshot I've done similar things before (and similar things after) with no problems. I can send|recv data from that filesystem up to a point (@20170526) without issues, but kernel panic after that. Right now, it seems to be isolated to the "zroot" since I can receive incrementals below that (ROOT, var, usr, etc). I'm trying to narrow it down some more. I've been extremely frustrated trying to get a kernel dump for this panic. Currently: FreeBSD jormungandr.phouka.net 11.1-RC1 FreeBSD 11.1-RC1 #79 r313908+8df37be70f94(releng/11.1): Thu Jul 6 18:20:07 PDT 2017 warlock@jormungandr.phouka.net:/usr/obj/usr/src/sys/GENERIC amd64 From rc.conf: dumpdev="/dev/ada0p1" dumpdir="/var/crash" savecore_enable="YES" # dumpon -l ada0p1 # savecore -Cv unable to open bounds file, using 0 checking for kernel dump on device /dev/ada0p1 mediasize = 35433480192 bytes sectorsize = 512 bytes magic mismatch on last dump header on /dev/ada0p1 No dump exists I've got a pair of ZFS-mirroed Samsung SSD 960 EVO 250GB drives (/dev/nvd[01]) M.2 drives as zroot. It didn't seem like dumpon liked the encrypted swap partitions on there (even with "late" option), so I threw in a extra drive and created a unencrypted swap partition on it with enough space for all the memory, just in case. # gpart show /dev/nvd0 => 40 488397088 nvd0 GPT (233G) 40 1024 1 freebsd-boot (512K) 1064 984 - free - (492K) 2048 4194304 2 freebsd-swap (2.0G) 4196352 484200448 3 freebsd-zfs (231G) 488396800 328 - free - (164K) # gpart show /dev/ada0 => 40 937703008 ada0 GPT (447G) 40 4056 - free - (2.0M) 4096 69206016 1 freebsd-swap (33G) 69210112 868492936 - free - (414G) # grep ' memory ' /var/log/dmesg.today real memory = 34359738368 (32768 MB) avail memory = 33191440384 (31653 MB) Nothing is saved to /var/crash. It exists, only contains "minfree" (with the contents of "2048"). Booting to single user after panic and running savecore by hand doesn't seem to find anything, either. The basic layout of the problem-pool is a stock ZFS layout with some additions and boot environment: # zfs list -r zspin/zroot NAME USED AVAIL REFER MOUNTPOINT zspin/zroot 96.6G 370G 96K /zspin/zroot zspin/zroot/ROOT 10.6G 370G 96K none zspin/zroot/ROOT/11.0-releng 48K 370G 3.22G /zspin/zroot zspin/zroot/ROOT/default 10.6G 370G 8.74G /zspin/zroot zspin/zroot/aux 200K 370G 96K /zspin/zroot/aux zspin/zroot/aux/aux 104K 370G 96K /zspin/zroot/aux/aux zspin/zroot/git 136K 370G 96K /zspin/zroot/git zspin/zroot/release 136K 370G 96K /zspin/zroot/release zspin/zroot/tmp 1004K 370G 400K /tmp zspin/zroot/usr 85.9G 370G 96K /usr zspin/zroot/usr/home 73.2G 370G 73.1G /zspin/zroot/usr/home zspin/zroot/usr/ports 9.98G 370G 6.73G /usr/ports zspin/zroot/usr/src 2.77G 370G 2.77G /usr/src zspin/zroot/var 2.20M 370G 96K /var zspin/zroot/var/audit 136K 370G 96K /var/audit zspin/zroot/var/crash 136K 370G 96K /var/crash zspin/zroot/var/log 1.00M 370G 336K /var/log zspin/zroot/var/mail 564K 370G 132K /var/mail zspin/zroot/var/tmp 248K 370G 96K /var/tmp There are some snapshots. Different filesystems have different stamps on them, but my current theory is that "aux" is the problem child: # zfs list -rtall zspin/zroot | fgrep @ | sed -E 's/^[^@]+(@[^ ]+).*$/\1/' | sort | uniq @2017-05-18-03:55:20 @20170526 @20170527 @20170528 @20170528-2 @backup @backup2 # zfs list -rt all zspin/zroot NAME USED AVAIL REFER MOUNTPOINT zspin/zroot 96.6G 370G 96K /zspin/zroot zspin/zroot@backup 8K - 96K - zspin/zroot@backup2 8K - 96K - zspin/zroot@20170526 8K - 96K - zspin/zroot@20170527 8K - 96K - zspin/zroot@20170528 8K - 96K - zspin/zroot@20170528-2 8K - 96K - ... zspin/zroot/aux 200K 370G 96K /zspin/zroot/aux zspin/zroot/aux@20170528 0 - 96K - zspin/zroot/aux/aux 104K 370G 96K /zspin/zroot/aux/aux zspin/zroot/aux/aux@20170528 8K - 96K - zspin/zroot/aux/aux@20170528-2 0 - 96K - ... This is how I can reliably panic my system: zfs destroy -rv zaux/ouroboros/zroot zfs send -RD zspin/zroot@backup | zfs receive -Fu -dv zaux/ouroboros zfs send -RD -I zspin/zroot@backup zspin/zroot@20170527 | zfs receive -Fu -dv zaux/ouroboros # panic during this zfs send -RD -I zspin/zroot@20170527 zspin/zroot@20170528 | zfs receive -Fu -dv zaux/ouroboros On the zfs receive, the last status message before the panic is receiving zspin/zroot/aux/aux@20170528 (#12 below, no summary of received bytes afterwards). # zfs send -RD -I zspin/zroot@20170527 zspin/zroot@20170528 | zfs receive -Fu -dvn zaux/ouroboros | cat -n 1 would receive incremental stream of zspin/zroot@20170528 into zaux/ouroboros/zroot@20170528 2 would receive incremental stream of zspin/zroot/var@20170528 into zaux/ouroboros/zroot/var@20170528 3 would receive incremental stream of zspin/zroot/var/audit@20170528 into zaux/ouroboros/zroot/var/audit@20170528 4 would receive incremental stream of zspin/zroot/var/tmp@20170528 into zaux/ouroboros/zroot/var/tmp@20170528 5 would receive incremental stream of zspin/zroot/var/crash@20170528 into zaux/ouroboros/zroot/var/crash@20170528 6 would receive incremental stream of zspin/zroot/var/mail@20170528 into zaux/ouroboros/zroot/var/mail@20170528 7 would receive incremental stream of zspin/zroot/var/log@20170528 into zaux/ouroboros/zroot/var/log@20170528 8 would receive incremental stream of zspin/zroot/tmp@20170528 into zaux/ouroboros/zroot/tmp@20170528 9 would receive incremental stream of zspin/zroot/release@20170528 into zaux/ouroboros/zroot/release@20170528 10 would receive incremental stream of zspin/zroot/git@20170528 into zaux/ouroboros/zroot/git@20170528 11 would receive full stream of zspin/zroot/aux@20170528 into zaux/ouroboros/zroot/aux@20170528 12 would receive full stream of zspin/zroot/aux/aux@20170528 into zaux/ouroboros/zroot/aux@20170528 13 would receive incremental stream of zspin/zroot/ROOT@20170528 into zaux/ouroboros/zroot/ROOT@20170528 14 would receive incremental stream of zspin/zroot/ROOT/default@20170528 into zaux/ouroboros/zroot/ROOT/default@20170528 15 would receive incremental stream of zspin/zroot/usr@20170528 into zaux/ouroboros/zroot/usr@20170528 16 would receive incremental stream of zspin/zroot/usr/home@20170528 into zaux/ouroboros/zroot/usr/home@20170528 17 would receive incremental stream of zspin/zroot/usr/ports@20170528 into zaux/ouroboros/zroot/usr/ports@20170528 18 would receive incremental stream of zspin/zroot/usr/src@20170528 into zaux/ouroboros/zroot/usr/src@20170528 19 would receive incremental stream of zspin/zroot/ROOT/11.0-releng@20170528 into zaux/ouroboros/zroot/ROOT/11.0-releng@20170528 Someone might want to close this because 4+ years later I couldn't possibly reproduce it and 11.1 is long out of date. |