I’m running a root-on-ZFS system and reliably see this panic during boot. It’s a 4 disk raidz-1, no log or cache devices. Hardware is an HP Microserver. Likely culprit (through bisect) is r300881. It’s now running r302028 with r300881 backed out, and booting fine. The panic: panic: solaris assert: refcount(count(&spa->spa_refcount) >= spa->spa_minref || MUTEX_HELD(&spa_namespace_lock), file: /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_misc.c, line: 863 Unfortunately I can’t get a dump, but here’s a picture of the backtrace: https://people.freebsd.org/~kp/zfs_panic.jpg
Created attachment 171628 [details] Initialize needs_update in vdev_geom_set_physpath
Are your disks SAS or SATA? I can't reproduce this bug using a 4 disk RAIDZ1 SATA pool. Also, could you please try the attached patch?
(In reply to Alan Somers from comment #2) They're all SATA disks. 3 x 4TB and one 3TB disk. The patch appears to be working for me, the box boots again.
I am certain that that patch does not address the root cause of your panic, but I'll commit it anyway. Thanks for testing it.
Let me know if there's anything else I can test, or any more information that would help.
A commit references this bug: Author: asomers Date: Tue Jun 21 15:27:16 UTC 2016 New revision: 302058 URL: https://svnweb.freebsd.org/changeset/base/302058 Log: Fix uninitialized variable from r300881 sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c Initialize needs_update in vdev_geom_set_physpath PR: 210409 Reported by: kp Reviewed by: kp Approved by: re (hrs) MFC after: 4 weeks X-MFC-With: 300881 Sponsored by: Spectra Logic Corp Changes: head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
Fixed by 302058
It doesn't seem that either the problem was really caused by r300881 or that it was really fixed by r302058.
I am seeing this problem from time to time (very rarely) in my test VMs. I suspect that under some conditions there is a race between a thread doing the pool import and a txg sync thread spawned by it. If spa_minref is recorded when the sync thread is accessing the pool, then the value would be higher than it should be.
I hit this like 7 times yesterday on r330386. # zpool status pool: scratch state: ONLINE scan: scrub repaired 0 in 0h20m with 0 errors on Mon Jan 29 10:08:25 2018 config: NAME STATE READ WRITE CKSUM scratch ONLINE 0 0 0 gpt/disk2 ONLINE 0 0 0 logs gpt/log1 ONLINE 0 0 0 cache gpt/cache1 ONLINE 0 0 0 errors: No known data errors pool: zroot state: ONLINE scan: scrub repaired 0 in 2h43m with 0 errors on Tue Feb 20 06:42:23 2018 config: NAME STATE READ WRITE CKSUM zroot ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 gpt/disk0 ONLINE 0 0 0 gpt/disk1 ONLINE 0 0 0 logs gpt/log0 ONLINE 0 0 0 cache gpt/cache0 ONLINE 0 0 0 errors: No known data errors # gpart show => 40 1953525088 ada0 GPT (932G) 40 256 1 freebsd-boot (128K) 296 16777216 2 freebsd-swap (8.0G) 16777512 1936747616 3 freebsd-zfs (924G) => 40 1953525088 ada1 GPT (932G) 40 256 1 freebsd-boot (128K) 296 16777216 2 freebsd-swap (8.0G) 16777512 1936747616 3 freebsd-zfs (924G) => 34 250069613 ada2 GPT (119G) 34 2014 - free - (1.0M) 2048 2097152 1 freebsd-zfs (1.0G) 2099200 104857600 2 freebsd-zfs (50G) 106956800 2097152 3 freebsd-zfs (1.0G) 109053952 104857600 4 freebsd-zfs (50G) 213911552 36158095 - free - (17G) => 40 1953525088 ada3 GPT (932G) 40 256 1 freebsd-boot (128K) 296 960495616 2 freebsd-zfs (458G) 960495912 209715200 3 freebsd-swap (100G) 1170211112 783314016 - free - (374G)
(In reply to Bryan Drewery from comment #10) This information does not tell anything particular... I think that the problem might correlate with ZFS restarting some pool activity after a reboot (like processing async free list).
Something weird is that I've consistently seen that if I drop to loader for a moment and then boot, the problem does not show up.
*** Bug 223612 has been marked as a duplicate of this bug. ***
^Triage: I'm sorry that this PR did not get addressed in a timely fashion. By now, the version that it was created against is long out of support. As well, many newer versions of ZFS have been imported. Please re-open if it is still a problem on a supported version.