267009 – OpenZFS: panic: VERIFY3(0 == nvlist_lookup_uint64(nvl, name, &rv)) failed (0 == 22)

Bug 267009 - OpenZFS: panic: VERIFY3(0 == nvlist_lookup_uint64(nvl, name, &rv)) failed (0 == 22)

Summary: OpenZFS: panic: VERIFY3(0 == nvlist_lookup_uint64(nvl, name, &rv)) failed (0 ...

Status:	Open

Alias:	None

Product:	Base System
Classification:	Unclassified
Component:	kern (show other bugs)
Version:	13.1-STABLE
Hardware:	amd64 Any

Importance:	--- Affects Some People
Assignee:	freebsd-fs (Nobody)

URL:
Keywords:	crash, needs-qa

Depends on:
Blocks:

Reported:	2022-10-13 10:17 UTC by Trond Endrestøl
Modified:	2022-12-02 18:59 UTC (History)
CC List:	3 users (show)

See Also:	256368

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Trond Endrestøl 2022-10-13 10:17:11 UTC

My zpools looked very much like this to begin with:

NAME                            SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
builder01_zroot                 164G   117G  47.2G        -         -    14%    71%  1.00x    ONLINE  -
  raidz1-0                      164G   117G  47.2G        -         -    14%  71.2%      -    ONLINE
    gpt/builder01_zroot0           -      -      -        -         -      -      -      -    ONLINE
    gpt/builder01_zroot1           -      -      -        -         -      -      -      -    ONLINE
    gpt/builder01_zroot2           -      -      -        -         -      -      -      -    ONLINE
logs                               -      -      -        -         -      -      -      -  -
  mirror-1                     16.5G   204K  16.5G        -         -     0%  0.00%      -    ONLINE
    gpt/builder01_zroot_zlog0      -      -      -        -         -      -      -      -    ONLINE
    gpt/builder01_zroot_zlog1      -      -      -        -         -      -      -      -    ONLINE
builder01_zwork                 374G   237G   137G        -         -    38%    63%  1.00x    ONLINE  -
  raidz1-0                      374G   237G   137G        -         -    38%  63.3%      -    ONLINE
    gpt/builder01_zwork0           -      -      -        -         -      -      -      -    ONLINE
    gpt/builder01_zwork1           -      -      -        -         -      -      -      -    ONLINE
    gpt/builder01_zwork2           -      -      -        -         -      -      -      -    ONLINE
logs                               -      -      -        -         -      -      -      -  -
  mirror-1                     16.5G     0K  16.5G        -         -     0%  0.00%      -    ONLINE
    gpt/builder01_zwork_zlog0      -      -      -        -         -      -      -      -    ONLINE
    gpt/builder01_zwork_zlog1      -      -      -        -         -      -      -      -    ONLINE

I wanted to remove the mirrored slogs, resize and re-add them.
They were originally almost 17 GiB by an oversight, and I wanted them to be 16.0 GiB.
I don't know if I should have run zpool labelclear on the slog partitions before resizing, I didn't.

zpool remove builder01_zroot mirror-1
zpool remove builder01_zwork mirror-1

gpart resize -i 1 -s 16G xbd4
gpart resize -i 1 -s 16G xbd5
gpart resize -i 1 -s 16G xbd9
gpart resize -i 1 -s 16G xbd10

zpool add builder01_zroot log mirror gpt/builder01_zroot_zlog0 gpt/builder01_zroot_zlog1
zpool add builder01_zwork log mirror gpt/builder01_zwork_zlog0 gpt/builder01_zwork_zlog1

The listing looked very much like this:

NAME                            SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
builder01_zroot                 164G   117G  47.2G        -         -    14%    71%  1.00x    ONLINE  -
  raidz1-0                      164G   117G  47.2G        -         -    14%  71.2%      -    ONLINE
    gpt/builder01_zroot0           -      -      -        -         -      -      -      -    ONLINE
    gpt/builder01_zroot1           -      -      -        -         -      -      -      -    ONLINE
    gpt/builder01_zroot2           -      -      -        -         -      -      -      -    ONLINE
logs                               -      -      -        -         -      -      -      -  -
  mirror-2                     15.5G    68K  15.5G        -         -     0%  0.00%      -    ONLINE
    gpt/builder01_zroot_zlog0      -      -      -        -         -      -      -      -    ONLINE
    gpt/builder01_zroot_zlog1      -      -      -        -         -      -      -      -    ONLINE
builder01_zwork                 374G   237G   137G        -         -    38%    63%  1.00x    ONLINE  -
  raidz1-0                      374G   237G   137G        -         -    38%  63.3%      -    ONLINE
    gpt/builder01_zwork0           -      -      -        -         -      -      -      -    ONLINE
    gpt/builder01_zwork1           -      -      -        -         -      -      -      -    ONLINE
    gpt/builder01_zwork2           -      -      -        -         -      -      -      -    ONLINE
logs                               -      -      -        -         -      -      -      -  -
  mirror-2                     15.5G     0K  15.5G        -         -     0%  0.00%      -    ONLINE
    gpt/builder01_zwork_zlog0      -      -      -        -         -      -      -      -    ONLINE
    gpt/builder01_zwork_zlog1      -      -      -        -         -      -      -      -    ONLINE

I noticed mirror-1 became mirror-2 for both pools. I expected the pairs would be named mirror-1.

Upon reboot, I got the panic below.

I booted from a DVD, removed the mirrored slogs from both pools, and the system could again boot from the root pool.
I re-added the mirrored slogs to the work pool while the system was running.

The listing now looks like this:

NAME                            SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
builder01_zroot                 164G   117G  47.2G        -         -    14%    71%  1.00x    ONLINE  -
  raidz1-0                      164G   117G  47.2G        -         -    14%  71.2%      -    ONLINE
    gpt/builder01_zroot0           -      -      -        -         -      -      -      -    ONLINE
    gpt/builder01_zroot1           -      -      -        -         -      -      -      -    ONLINE
    gpt/builder01_zroot2           -      -      -        -         -      -      -      -    ONLINE
builder01_zwork                 374G   237G   137G        -         -    38%    63%  1.00x    ONLINE  -
  raidz1-0                      374G   237G   137G        -         -    38%  63.3%      -    ONLINE
    gpt/builder01_zwork0           -      -      -        -         -      -      -      -    ONLINE
    gpt/builder01_zwork1           -      -      -        -         -      -      -      -    ONLINE
    gpt/builder01_zwork2           -      -      -        -         -      -      -      -    ONLINE
logs                               -      -      -        -         -      -      -      -  -
  mirror-3                     15.5G     0K  15.5G        -         -     0%  0.00%      -    ONLINE
    gpt/builder01_zwork_zlog0      -      -      -        -         -      -      -      -    ONLINE
    gpt/builder01_zwork_zlog1      -      -      -        -         -      -      -      -    ONLINE

Maybe the recent OpenZFS commits fixes this issue. If not, maybe the test suite should be extended to cover the kernel's ability to mount a root pool where the vdevs are numbered non-contiguously, if this is what triggers the panic.

Note, local branch commit 7806d3b0243f... as indicated in the BE's name, corresponds to src/stable/13 commit 3ea8c7ad90f75129c52a2b64213c5578af23dc8d, dated Tue Aug 9 15:47:40 2022 -0400.

Here's the panic message, screenshotted, OCR-ed, and edited by hand:

Trying to mount root from zfs:builder01_zroot/ROOT/20220810-190437-stable-13-local-n252030-7806d3b0243f [] ...
cd0 at ata1 bus 0 scbus1 target 1 lun 0
cd0: <QEMU QEMU DVD-ROM 2.5+> Removable CD-ROM SCSI device
cd0: Serial Number QM00004
cd0: 16.700MB/s transfers (WDMA2, ATAPI 12bytes, PIO 65534bytes)
cd0: Attempt to query device size failed: NOT READY, Medium not present
panic: VERIFY3(0 == nvlist_lookup_uint64(nvl, name, &rv)) failed (0 == 22)

cpuid = 0
time = 1
KDB: stack backtrace:
db_trace_self_wrapper() at 0xffffffff805b804b = db_trace_self_urapper+0x2b/frame 0xfffffe009cdec580
vpanic() at 0xffffffff80806fb1 = vpanic+0x151/frame 0xfffffe009cdecSd8
spl_panic() at 0xffffffff8036391a = spl_panic+0x3a/frame 0xfffffe009cdec630
fnvlist_lookup_uint64() at 0xffffffff80385ef3 = fnvlist_lookup_uint64+0x43/frame 0xfffffe009cdec650
spa_import_rootpool() at 0xffffffff8038d10e = spa_import_rootpool+0x5e/frame 0xfffffe009cdec6c0
zfs_mount() at 0xffffffff8039aaaf = zfs_mount+0x41f/frame 0xfffffe009cdec850
vfs_domount_first() at 0xffffffff808e1f03 = vfs_domount_first+0x213/frame 0xfffffe009cdec980
vfs_domount() at 0xffffffff808de855 = vfs_domount+0x2b5/frame 0xfffffe009cdecab0
vfs_donmount() at 0xffffffff808ddd85 = vfs_donmount+0x8d5/frame 0xfffffe009cdecb50
kernel_mount() at 0xffffffff808e100d = kernel_mount+0x3d/frame 0xfffffe009cdecba0
parse_mount() at 0xffffffff808e5364 = parse_mount+0x4d4/frame 0xfffffe009cdecce0
vfs_mountroot() at 0xffffffff808e37b3 = vfs_mountroot+0x763/frame 0xfffffe009cdece50
start_init() at 0xffffffff807932c3 = start_init+0x23/frame 0xfffffe009cdecef0
fork_exit() at 0xffffffff807c2a9e = fork_exit+0x7e/trame 0xfffffe009cdecf30
fork_trampoline() at 0xffffffff80baf89e = fork_trampoline+0xe/frame 0xfffffe009cdecf30
--- trap 0x9ce4aa98, rip= 0xffffffff8079288f, rsp = 0, rbp = 0x20014 ---
mi_startup() at 0xffffffff8079200f = mi_startup+0xdf/frame 0x20014
Uptime: ls
Automatic reboot in 15 seconds - press a key on the console to abort
--> Press a key on the console to reboot,
--> or switch off the system now.

Comment 1 tlb 2022-10-16 09:41:10 UTC

This seems to be connected to/similar to bug #256368:  https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=256368

And I have the same problem here; the above bug happened to me a few months back, causing causing me to remove the ZIL from my bootable zpool.  A few months later, I had completely forgotten about this, and readded the ZIL, then rebooted.

Good times ensued.

Booting from memstick and removing the ZIL devices again allowed the system to boot.  System has been updated to 13.1p2.

Comment 2 Trond Endrestøl 2022-10-22 16:03:08 UTC

I just upgraded to stable/13 commit f187bd281e52a71e0fe0a6bf963d93ff950fcf7a dated Fri Oct 21 02:13:25 2022 -0400. There's no improvement so far.

A workaround is to replicate the root pool's contents somewhere, recreate the root pool with slogs and everything, transfer the contents to the new pool, and set the bootfs property.

Comment 3 Graham Perrin freebsd_committer

2022-10-22 16:55:05 UTC

(In reply to tlb from comment #1)

> Good times ensued.

For clarity, please: did that imply a _bad_ time?

Comment 4 titus m 2022-12-02 18:59:33 UTC

i posted a patch/explanation here https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=256368 
i assume is the same problem