| Summary: | 11.1-RC1 kernel panic (zfs recv) | ||
|---|---|---|---|
| Product: | Base System | Reporter: | John Kennedy <warlock> |
| Component: | kern | Assignee: | Graham Perrin <grahamperrin> |
| Status: | Closed Overcome By Events | ||
| Severity: | Affects Some People | CC: | re |
| Priority: | --- | Keywords: | crash, regression |
| Version: | 11.0-STABLE | ||
| Hardware: | amd64 | ||
| OS: | Any | ||
| See Also: | https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=161533 | ||
|
Description
John Kennedy
2017-07-04 20:21:14 UTC
So I've done a bunch of I/O trying to narrow this down. Between 20170526 and 20170528 on a system where I thought some SSD drives were failing (recoverable checksum errors), I zfs sent | zfs recv the root disk (pool zroot) to a different disk, which I then moved to a different (current) box. Now I'm trying to move it again, and running into problems. It looks like I probably installed from installation media (11.0-RELEASE-p1 #0 r306420) and upgraded to -p10 after using beadm (possibly with some CMOS clock offset looking at the timestamps, but that should all just be relative time). I was running FreeBSD 11.0-RELEASE-p10 #0 r317487+8c96ad701987(releng/11.0) during the time that the bad snapshot I've done similar things before (and similar things after) with no problems. I can send|recv data from that filesystem up to a point (@20170526) without issues, but kernel panic after that. Right now, it seems to be isolated to the "zroot" since I can receive incrementals below that (ROOT, var, usr, etc). I'm trying to narrow it down some more. I've been extremely frustrated trying to get a kernel dump for this panic. Currently: FreeBSD jormungandr.phouka.net 11.1-RC1 FreeBSD 11.1-RC1 #79 r313908+8df37be70f94(releng/11.1): Thu Jul 6 18:20:07 PDT 2017 warlock@jormungandr.phouka.net:/usr/obj/usr/src/sys/GENERIC amd64 From rc.conf: dumpdev="/dev/ada0p1" dumpdir="/var/crash" savecore_enable="YES" # dumpon -l ada0p1 # savecore -Cv unable to open bounds file, using 0 checking for kernel dump on device /dev/ada0p1 mediasize = 35433480192 bytes sectorsize = 512 bytes magic mismatch on last dump header on /dev/ada0p1 No dump exists I've got a pair of ZFS-mirroed Samsung SSD 960 EVO 250GB drives (/dev/nvd[01]) M.2 drives as zroot. It didn't seem like dumpon liked the encrypted swap partitions on there (even with "late" option), so I threw in a extra drive and created a unencrypted swap partition on it with enough space for all the memory, just in case. # gpart show /dev/nvd0 => 40 488397088 nvd0 GPT (233G) 40 1024 1 freebsd-boot (512K) 1064 984 - free - (492K) 2048 4194304 2 freebsd-swap (2.0G) 4196352 484200448 3 freebsd-zfs (231G) 488396800 328 - free - (164K) # gpart show /dev/ada0 => 40 937703008 ada0 GPT (447G) 40 4056 - free - (2.0M) 4096 69206016 1 freebsd-swap (33G) 69210112 868492936 - free - (414G) # grep ' memory ' /var/log/dmesg.today real memory = 34359738368 (32768 MB) avail memory = 33191440384 (31653 MB) Nothing is saved to /var/crash. It exists, only contains "minfree" (with the contents of "2048"). Booting to single user after panic and running savecore by hand doesn't seem to find anything, either. The basic layout of the problem-pool is a stock ZFS layout with some additions and boot environment:
# zfs list -r zspin/zroot
NAME USED AVAIL REFER MOUNTPOINT
zspin/zroot 96.6G 370G 96K /zspin/zroot
zspin/zroot/ROOT 10.6G 370G 96K none
zspin/zroot/ROOT/11.0-releng 48K 370G 3.22G /zspin/zroot
zspin/zroot/ROOT/default 10.6G 370G 8.74G /zspin/zroot
zspin/zroot/aux 200K 370G 96K /zspin/zroot/aux
zspin/zroot/aux/aux 104K 370G 96K /zspin/zroot/aux/aux
zspin/zroot/git 136K 370G 96K /zspin/zroot/git
zspin/zroot/release 136K 370G 96K /zspin/zroot/release
zspin/zroot/tmp 1004K 370G 400K /tmp
zspin/zroot/usr 85.9G 370G 96K /usr
zspin/zroot/usr/home 73.2G 370G 73.1G /zspin/zroot/usr/home
zspin/zroot/usr/ports 9.98G 370G 6.73G /usr/ports
zspin/zroot/usr/src 2.77G 370G 2.77G /usr/src
zspin/zroot/var 2.20M 370G 96K /var
zspin/zroot/var/audit 136K 370G 96K /var/audit
zspin/zroot/var/crash 136K 370G 96K /var/crash
zspin/zroot/var/log 1.00M 370G 336K /var/log
zspin/zroot/var/mail 564K 370G 132K /var/mail
zspin/zroot/var/tmp 248K 370G 96K /var/tmp
There are some snapshots. Different filesystems have different stamps on them, but my current theory is that "aux" is the problem child:
# zfs list -rtall zspin/zroot | fgrep @ | sed -E 's/^[^@]+(@[^ ]+).*$/\1/' | sort | uniq
@2017-05-18-03:55:20
@20170526
@20170527
@20170528
@20170528-2
@backup
@backup2
# zfs list -rt all zspin/zroot
NAME USED AVAIL REFER MOUNTPOINT
zspin/zroot 96.6G 370G 96K /zspin/zroot
zspin/zroot@backup 8K - 96K -
zspin/zroot@backup2 8K - 96K -
zspin/zroot@20170526 8K - 96K -
zspin/zroot@20170527 8K - 96K -
zspin/zroot@20170528 8K - 96K -
zspin/zroot@20170528-2 8K - 96K -
...
zspin/zroot/aux 200K 370G 96K /zspin/zroot/aux
zspin/zroot/aux@20170528 0 - 96K -
zspin/zroot/aux/aux 104K 370G 96K /zspin/zroot/aux/aux
zspin/zroot/aux/aux@20170528 8K - 96K -
zspin/zroot/aux/aux@20170528-2 0 - 96K -
...
This is how I can reliably panic my system:
zfs destroy -rv zaux/ouroboros/zroot
zfs send -RD zspin/zroot@backup | zfs receive -Fu -dv zaux/ouroboros
zfs send -RD -I zspin/zroot@backup zspin/zroot@20170527 | zfs receive -Fu -dv zaux/ouroboros
# panic during this
zfs send -RD -I zspin/zroot@20170527 zspin/zroot@20170528 | zfs receive -Fu -dv zaux/ouroboros
On the zfs receive, the last status message before the panic is receiving zspin/zroot/aux/aux@20170528 (#12 below, no summary of received bytes afterwards).
# zfs send -RD -I zspin/zroot@20170527 zspin/zroot@20170528 | zfs receive -Fu -dvn zaux/ouroboros | cat -n
1 would receive incremental stream of zspin/zroot@20170528 into zaux/ouroboros/zroot@20170528
2 would receive incremental stream of zspin/zroot/var@20170528 into zaux/ouroboros/zroot/var@20170528
3 would receive incremental stream of zspin/zroot/var/audit@20170528 into zaux/ouroboros/zroot/var/audit@20170528
4 would receive incremental stream of zspin/zroot/var/tmp@20170528 into zaux/ouroboros/zroot/var/tmp@20170528
5 would receive incremental stream of zspin/zroot/var/crash@20170528 into zaux/ouroboros/zroot/var/crash@20170528
6 would receive incremental stream of zspin/zroot/var/mail@20170528 into zaux/ouroboros/zroot/var/mail@20170528
7 would receive incremental stream of zspin/zroot/var/log@20170528 into zaux/ouroboros/zroot/var/log@20170528
8 would receive incremental stream of zspin/zroot/tmp@20170528 into zaux/ouroboros/zroot/tmp@20170528
9 would receive incremental stream of zspin/zroot/release@20170528 into zaux/ouroboros/zroot/release@20170528
10 would receive incremental stream of zspin/zroot/git@20170528 into zaux/ouroboros/zroot/git@20170528
11 would receive full stream of zspin/zroot/aux@20170528 into zaux/ouroboros/zroot/aux@20170528
12 would receive full stream of zspin/zroot/aux/aux@20170528 into zaux/ouroboros/zroot/aux@20170528
13 would receive incremental stream of zspin/zroot/ROOT@20170528 into zaux/ouroboros/zroot/ROOT@20170528
14 would receive incremental stream of zspin/zroot/ROOT/default@20170528 into zaux/ouroboros/zroot/ROOT/default@20170528
15 would receive incremental stream of zspin/zroot/usr@20170528 into zaux/ouroboros/zroot/usr@20170528
16 would receive incremental stream of zspin/zroot/usr/home@20170528 into zaux/ouroboros/zroot/usr/home@20170528
17 would receive incremental stream of zspin/zroot/usr/ports@20170528 into zaux/ouroboros/zroot/usr/ports@20170528
18 would receive incremental stream of zspin/zroot/usr/src@20170528 into zaux/ouroboros/zroot/usr/src@20170528
19 would receive incremental stream of zspin/zroot/ROOT/11.0-releng@20170528 into zaux/ouroboros/zroot/ROOT/11.0-releng@20170528
Someone might want to close this because 4+ years later I couldn't possibly reproduce it and 11.1 is long out of date. |