210409 – zfs: panic during boot, spa_refcount < spa_minref

Bug 210409 - zfs: panic during boot, spa_refcount < spa_minref

Summary: zfs: panic during boot, spa_refcount < spa_minref

Status:	Closed Overcome By Events

Alias:	None

Product:	Base System
Classification:	Unclassified
Component:	kern (show other bugs)
Version:	CURRENT
Hardware:	Any Any

Importance:	--- Affects Only Me
Assignee:	Bugmeister

URL:
Keywords:	crash

Duplicates (1):	223612 (view as bug list)
Depends on:
Blocks:

Reported:	2016-06-20 15:30 UTC by Kristof Provost
Modified:	2025-01-19 07:00 UTC (History)
CC List:	5 users (show)

See Also:

Attachments
Initialize needs_update in vdev_geom_set_physpath (521 bytes, patch) 2016-06-20 23:24 UTC, Alan Somers	no flags	Details \| Diff
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Kristof Provost freebsd_committer

2016-06-20 15:30:57 UTC

I’m running a root-on-ZFS system and reliably see this panic during boot.
It’s a 4 disk raidz-1, no log or cache devices.

Hardware is an HP Microserver.

Likely culprit (through bisect) is r300881.
It’s now running r302028 with r300881 backed out, and booting fine.

The panic:
panic: solaris assert: refcount(count(&spa->spa_refcount) >= spa->spa_minref ||
MUTEX_HELD(&spa_namespace_lock), file:
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_misc.c, line: 863

Unfortunately I can’t get a dump, but here’s a picture of the backtrace:
https://people.freebsd.org/~kp/zfs_panic.jpg

Comment 1 Alan Somers freebsd_committer

2016-06-20 23:24:15 UTC

Created attachment 171628 [details]
Initialize needs_update in vdev_geom_set_physpath

Comment 2 Alan Somers freebsd_committer

2016-06-20 23:25:07 UTC

Are your disks SAS or SATA?  I can't reproduce this bug using a 4 disk RAIDZ1 SATA pool.  Also, could you please try the attached patch?

Comment 3 Kristof Provost freebsd_committer

2016-06-21 08:06:51 UTC

(In reply to Alan Somers from comment #2)
They're all SATA disks. 3 x 4TB and one 3TB disk.

The patch appears to be working for me, the box boots again.

Comment 4 Alan Somers freebsd_committer

2016-06-21 15:09:30 UTC

I am certain that that patch does not address the root cause of your panic, but I'll commit it anyway.  Thanks for testing it.

Comment 5 Kristof Provost freebsd_committer

2016-06-21 15:10:34 UTC

Let me know if there's anything else I can test, or any more information that would help.

Comment 6 commit-hook freebsd_committer

2016-06-21 15:28:07 UTC

A commit references this bug:

Author: asomers
Date: Tue Jun 21 15:27:16 UTC 2016
New revision: 302058
URL: https://svnweb.freebsd.org/changeset/base/302058

Log:
  Fix uninitialized variable from r300881

  sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
  	Initialize needs_update in vdev_geom_set_physpath

  PR:		210409
  Reported by:	kp
  Reviewed by:	kp
  Approved by:	re (hrs)
  MFC after:	4 weeks
  X-MFC-With:	300881
  Sponsored by:	Spectra Logic Corp

Changes:
  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c

Comment 7 Alan Somers freebsd_committer

2016-06-21 15:53:24 UTC

Fixed by 302058

Comment 8 Andriy Gapon freebsd_committer

2016-08-09 13:56:22 UTC

It doesn't seem that either the problem was really caused by r300881 or that it was really fixed by r302058.

Comment 9 Andriy Gapon freebsd_committer

2018-01-09 09:54:29 UTC

I am seeing this problem from time to time (very rarely) in my test VMs.
I suspect that under some conditions there is a race between a thread doing the pool import and a txg sync thread spawned by it.   If spa_minref is recorded when the sync thread is accessing the pool, then the value would be higher than it should be.

Comment 10 Bryan Drewery freebsd_committer

2018-03-05 16:49:41 UTC

I hit this like 7 times yesterday on r330386.

# zpool status
  pool: scratch
 state: ONLINE
  scan: scrub repaired 0 in 0h20m with 0 errors on Mon Jan 29 10:08:25 2018
config:

        NAME          STATE     READ WRITE CKSUM
        scratch       ONLINE       0     0     0
          gpt/disk2   ONLINE       0     0     0
        logs
          gpt/log1    ONLINE       0     0     0
        cache
          gpt/cache1  ONLINE       0     0     0

errors: No known data errors

  pool: zroot
 state: ONLINE
  scan: scrub repaired 0 in 2h43m with 0 errors on Tue Feb 20 06:42:23 2018
config:

        NAME           STATE     READ WRITE CKSUM
        zroot          ONLINE       0     0     0
          mirror-0     ONLINE       0     0     0
            gpt/disk0  ONLINE       0     0     0
            gpt/disk1  ONLINE       0     0     0
        logs
          gpt/log0     ONLINE       0     0     0
        cache
          gpt/cache0   ONLINE       0     0     0

errors: No known data errors

# gpart show
=>        40  1953525088  ada0  GPT  (932G)
          40         256     1  freebsd-boot  (128K)
         296    16777216     2  freebsd-swap  (8.0G)
    16777512  1936747616     3  freebsd-zfs  (924G)

=>        40  1953525088  ada1  GPT  (932G)
          40         256     1  freebsd-boot  (128K)
         296    16777216     2  freebsd-swap  (8.0G)
    16777512  1936747616     3  freebsd-zfs  (924G)

=>       34  250069613  ada2  GPT  (119G)
         34       2014        - free -  (1.0M)
       2048    2097152     1  freebsd-zfs  (1.0G)
    2099200  104857600     2  freebsd-zfs  (50G)
  106956800    2097152     3  freebsd-zfs  (1.0G)
  109053952  104857600     4  freebsd-zfs  (50G)
  213911552   36158095        - free -  (17G)

=>        40  1953525088  ada3  GPT  (932G)
          40         256     1  freebsd-boot  (128K)
         296   960495616     2  freebsd-zfs  (458G)
   960495912   209715200     3  freebsd-swap  (100G)
  1170211112   783314016        - free -  (374G)

Comment 11 Andriy Gapon freebsd_committer

2018-03-07 08:24:59 UTC

(In reply to Bryan Drewery from comment #10)
This information does not tell anything particular...
I think that the problem might correlate with ZFS restarting some pool activity after a reboot (like processing async free list).

Comment 12 Bryan Drewery freebsd_committer

2018-03-09 17:32:25 UTC

Something weird is that I've consistently seen that if I drop to loader for a moment and then boot, the problem does not show up.

Comment 13 Andriy Gapon freebsd_committer

2018-12-28 10:25:09 UTC

*** Bug 223612 has been marked as a duplicate of this bug. ***

Comment 14 Mark Linimon freebsd_committer

2025-01-19 07:00:40 UTC

^Triage: I'm sorry that this PR did not get addressed in a timely fashion.

By now, the version that it was created against is long out of support.
As well, many newer versions of ZFS have been imported.

Please re-open if it is still a problem on a supported version.