Bug 231251 - [ZFS assfail] 12-ALPHA5 r338529 panic
Summary: [ZFS assfail] 12-ALPHA5 r338529 panic
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: amd64 Any
: --- Affects Only Me
Assignee: freebsd-fs mailing list
URL:
Keywords: regression
Depends on:
Blocks:
 
Reported: 2018-09-09 03:40 UTC by Jeremy Faulkner
Modified: 2019-05-03 14:13 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jeremy Faulkner 2018-09-09 03:40:47 UTC
Panic on 12-ALPHA5 r338529 - Don't know how to reproduce at this time.

https://flic.kr/p/29Nr5Wd
Comment 1 Jeremy Faulkner 2018-09-09 04:04:12 UTC
The panic occurs when siad (siacoin deamon) is started in a 11.2 jail, GENERIC has COMPAT_FREEBSD11 in it.
Comment 2 Jeremy Faulkner 2018-09-10 14:11:18 UTC
More info, at the time of the panics the system was scrubbing a single disk zpool. I've waited until the scrub was finished and then I was able to start the siad application in the 11.2 jail without a panic.
Comment 3 Mark Johnston freebsd_committer 2018-09-26 14:43:49 UTC
What's the panic message? "show panic" at the DDB prompt should print it.
Comment 4 Jeremy Faulkner 2018-09-26 17:52:40 UTC
I'll try to reproduce it. I did make it happen again without scrubbing but it took longer.
Comment 5 Jeremy Faulkner 2018-09-27 00:45:29 UTC
In my attempt to reproduce this panic I've produced a similar but different bt panic. https://flic.kr/p/2bt8hRB

I can leave the machine at the debug console for a while.
Comment 6 Mark Johnston freebsd_committer 2018-09-27 02:45:45 UTC
(In reply to Jeremy Faulkner from comment #5)
This is still on r338529?  Are you able to get a kernel dump ("dump" at the ddb prompt)?
Comment 7 Jeremy Faulkner 2018-09-27 02:56:20 UTC
The system is currently ALPHA7, iirc. 

db> dump
Cannot dump: no dump device specified.
Comment 8 Andriy Gapon freebsd_committer 2018-09-27 06:24:47 UTC
(In reply to Jeremy Faulkner from comment #5)

Quick and dirty OCR for posterity:

KDB: stack backtrace:
db itcu33£flzlf urdpper() dt db trace self urdpper+8x2b/frdme 8xfff1fe8183d62138
vDaQIC() at vpanic*8xlaB/frame Bxfffffe8183d62198
pantc() at panic+8x43ffrane 8xfffffe8188d621f8
assfail() at assfail+8xla/frame Bxfffff98183d62288
arc_release() at arc_release+8x93b/frame Bxfffffe8183d62298
dbuf_redirty() at dbuf_redirty+8x56/frame 8xfffffe8183d622b8
dbuf_dirty() at dbuf_dirty+8x369/frame 8xfffffe8183d62358
dmu_write_uio_dnode() at dmu_urite_uio_dnode+8x118/frame Bxfffffe8183d623d8
dmu_write_uio_dbuf() at dmu_urite_uio_dbuf+8x42/frame BxfffffeBlB3d62488
zfs_freebsd_write() at zfs_freebsd“urite+8x825fframe Bxfffff98183d62638
UUP_NRITE_RPU() at UDP_HRITE_RPU+Bx11f/frame Bxfffffe8183d62748
vn_write() at vn_urite+9x25b/frame 8xfffffe8183d627d8
vn_io_faultmdoio() at vn_io_fault_doio+8x43/frame Bxfffffe8183d62838
vn_io_fault1() at vn_io_fault1+9x171/frame Bxfffffe8193d62978
vn_io~fault() at vn_io_fault+8x195/frame Bxfffff98183d62998
dofileurite() at dofilewrite+8x97lframe Bxfffffe8183d62638
kern_pwritev() at kern_puritev+9x5f/frame Bxfffffe8183d62678
sgs_pwrite() at sgs_pwrite+9x8d/frame BxfffffeBlBBdGZacB
lamd64_sgscall() at amd64_sgscall+9x28olframe Bxfffffe8183d62bf8
‘fast_sgscall_common() at fast_sgscall_common+8x191/frame Bxfffffe8183d62bf9
5950611 (476, FreeBSD ELF64, sgs_pwrite), rip = 9x4837Sa, rsp = Bxc42917ccd8
rbp : 9xc42817cd48

KDB: enter: panic
thread pid 9863 tid 192594
Stopped at kdb_enter+9x3b: movq $B,kdb_uhg
db) Show panic
panic: solaris assert: HDR_ENPTY(hdr), file: lusr/src/sgs/cddllcontrnb/opensolar
is/uts/common/fs/zfs/arc.c, line: 6213
Comment 9 Mark Johnston freebsd_committer 2018-09-27 21:25:10 UTC
I'm trying to reproduce by running siad on a test system.  No crashes so far, but I guess there is some long synchronization step before it starts doing actual work.
Comment 10 Jeremy Faulkner 2018-10-04 12:48:01 UTC
Core dump will be uploaded in about an hour. This panic happened over night while scrubbing and synchronizing siad, and of course daily scripts running.

panic: solaris assert: HDR_EMPTY(hdr), file: /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c, line: 6213
cpuid = 1
time = 1538641234
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0398ce0770
vpanic() at vpanic+0x1a3/frame 0xfffffe0398ce07d0
panic() at panic+0x43/frame 0xfffffe0398ce0830
assfail() at assfail+0x1a/frame 0xfffffe0398ce0840
arc_release() at arc_release+0x940/frame 0xfffffe0398ce08d0
dbuf_redirty() at dbuf_redirty+0x56/frame 0xfffffe0398ce08f0
dmu_buf_will_dirty() at dmu_buf_will_dirty+0x120/frame 0xfffffe0398ce0920
dmu_write_uio_dnode() at dmu_write_uio_dnode+0x129/frame 0xfffffe0398ce09a0
zvol_write() at zvol_write+0x151/frame 0xfffffe0398ce0a00
ctl_be_block_dispatch_zvol() at ctl_be_block_dispatch_zvol+0x228/frame 0xfffffe0398ce0a80
ctl_be_block_worker() at ctl_be_block_worker+0x6c/frame 0xfffffe0398ce0b20
taskqueue_run_locked() at taskqueue_run_locked+0x10c/frame 0xfffffe0398ce0b80
taskqueue_thread_loop() at taskqueue_thread_loop+0x88/frame 0xfffffe0398ce0bb0
fork_exit() at fork_exit+0x84/frame 0xfffffe0398ce0bf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0398ce0bf0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic

__curthread () at ./machine/pcpu.h:230
230             __asm("movq %%gs:%1,%0" : "=r" (td)
(kgdb)
Comment 11 Jeremy Faulkner 2018-10-04 13:37:27 UTC
tar zcvf HDR-core-dump.tar.gz /boot/kernel /boot/kernel.old /var/crash/*.2

https://drive.google.com/open?id=13JI8vxg161Iw_UD8b5Nl-PiZ970CknCT
Comment 12 Mark Linimon freebsd_committer freebsd_triage 2018-10-09 15:50:04 UTC
Is this a regression from an earlier install, or is this a new install?
Comment 13 Jeremy Faulkner 2018-10-09 20:49:30 UTC
This system was upgraded from 11.2-RELENG branch to 12-ALPHA by cloning the boot environment, and then building from source. I forget exactly which ALPHA 4, or 5 but it's currently at 8.
Comment 14 Mark Johnston freebsd_committer 2019-05-01 15:15:34 UTC
Does the issue persist in 12.0-RELEASE?  Quite a few bug fixes went in shortly before the release.
Comment 15 Jeremy Faulkner 2019-05-01 16:50:41 UTC
(In reply to Mark Johnston from comment #14)

I don't know if it still occur. In late October I discovered that some of drives in the raidz2 were actually SMR drives, Seagate doeesn't advertise them as SMR. I reorganized data and drives to use the SMR drives for WORM data and put the data sets with more churn on the normal PMR drives.

I can try putting siad onto the SMR pool later when I can tolerate panicing the system (if it's going to do it).
Comment 16 Jeremy Faulkner 2019-05-02 10:21:10 UTC
constans% uname -a
FreeBSD constans 12.0-STABLE FreeBSD 12.0-STABLE #11 r346745M: Fri Apr 26 11:41:18 EDT 2019     gldisater@constans:/usr/obj/usr/src/amd64.amd64/sys/GENERIC  amd64

panic'd last night - unfortunately none of the swap partitions on that machine are large enough for a dump - I'll add old drive for the dumpdev and try again.
Comment 17 Mark Johnston freebsd_committer 2019-05-02 12:13:01 UTC
(In reply to Jeremy Faulkner from comment #16)
Note that you can add -Z to dumpon_flags in rc.conf to enable crash-time compression of the dump.  Typically this greatly reduces the amount of disk space required.
Comment 18 Jeremy Faulkner 2019-05-03 14:13:09 UTC
Panic'd during daily's again, panic is different from earlier and makes more sense in the context of SMR drives. 

Loaded symbols for /boot/kernel/fdescfs.ko
#0  doadump () at src/sys/amd64/include/pcpu.h:230
230             __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (OFFSETOF_CURTHREAD));
(kgdb) #0  doadump () at src/sys/amd64/include/pcpu.h:230
#1  0xffffffff80bc19e7 in kern_reboot (howto=260)
    at /usr/src/sys/kern/kern_shutdown.c:451
#2  0xffffffff80bc1e49 in vpanic (fmt=<value optimized out>,
    ap=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:877
#3  0xffffffff80bc1c43 in panic (fmt=<value optimized out>)
    at /usr/src/sys/kern/kern_shutdown.c:804
#4  0xffffffff827c6fe9 in vdev_deadman (vd=<value optimized out>)
    at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c:4369
#5  0xffffffff827c6ea1 in vdev_deadman (vd=0xfffff80986498000)
    at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c:4345
#6  0xffffffff827c6ea1 in vdev_deadman (vd=0xfffff80986393000)
    at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c:4345
#7  0xffffffff827b7903 in spa_deadman (arg=0xfffffe00c623a000,
    pending=<value optimized out>)
    at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_misc.c:666
#8  0xffffffff80c1ff64 in taskqueue_run_locked (queue=0xfffff80104525100)
    at /usr/src/sys/kern/subr_taskqueue.c:467
#9  0xffffffff80c21288 in taskqueue_thread_loop (arg=<value optimized out>)
    at /usr/src/sys/kern/subr_taskqueue.c:773
#10 0xffffffff80b82dc2 in fork_exit (
    callout=0xffffffff80c211f0 <taskqueue_thread_loop>,
    arg=0xffffffff8200b8b0, frame=0xfffffe0075daac00)
    at /usr/src/sys/kern/kern_fork.c:1060
#11 0xffffffff81075c1e in fork_trampoline ()
    at /usr/src/sys/amd64/amd64/exception.S:995
#12 0x0000000000000000 in ?? ()
Current language:  auto; currently minimal
(kgdb)