Summary: | [ZFS assfail] 12-ALPHA5 r338529 panic | ||
---|---|---|---|
Product: | Base System | Reporter: | Jeremy Faulkner <gldisater> |
Component: | kern | Assignee: | freebsd-fs (Nobody) <fs> |
Status: | Closed DUPLICATE | ||
Severity: | Affects Only Me | CC: | delphij, linimon, markj |
Priority: | --- | Keywords: | crash, regression |
Version: | CURRENT | ||
Hardware: | amd64 | ||
OS: | Any |
Description
Jeremy Faulkner
2018-09-09 03:40:47 UTC
The panic occurs when siad (siacoin deamon) is started in a 11.2 jail, GENERIC has COMPAT_FREEBSD11 in it. More info, at the time of the panics the system was scrubbing a single disk zpool. I've waited until the scrub was finished and then I was able to start the siad application in the 11.2 jail without a panic. What's the panic message? "show panic" at the DDB prompt should print it. I'll try to reproduce it. I did make it happen again without scrubbing but it took longer. In my attempt to reproduce this panic I've produced a similar but different bt panic. https://flic.kr/p/2bt8hRB I can leave the machine at the debug console for a while. (In reply to Jeremy Faulkner from comment #5) This is still on r338529? Are you able to get a kernel dump ("dump" at the ddb prompt)? The system is currently ALPHA7, iirc. db> dump Cannot dump: no dump device specified. (In reply to Jeremy Faulkner from comment #5) Quick and dirty OCR for posterity: KDB: stack backtrace: db itcu33£flzlf urdpper() dt db trace self urdpper+8x2b/frdme 8xfff1fe8183d62138 vDaQIC() at vpanic*8xlaB/frame Bxfffffe8183d62198 pantc() at panic+8x43ffrane 8xfffffe8188d621f8 assfail() at assfail+8xla/frame Bxfffff98183d62288 arc_release() at arc_release+8x93b/frame Bxfffffe8183d62298 dbuf_redirty() at dbuf_redirty+8x56/frame 8xfffffe8183d622b8 dbuf_dirty() at dbuf_dirty+8x369/frame 8xfffffe8183d62358 dmu_write_uio_dnode() at dmu_urite_uio_dnode+8x118/frame Bxfffffe8183d623d8 dmu_write_uio_dbuf() at dmu_urite_uio_dbuf+8x42/frame BxfffffeBlB3d62488 zfs_freebsd_write() at zfs_freebsd“urite+8x825fframe Bxfffff98183d62638 UUP_NRITE_RPU() at UDP_HRITE_RPU+Bx11f/frame Bxfffffe8183d62748 vn_write() at vn_urite+9x25b/frame 8xfffffe8183d627d8 vn_io_faultmdoio() at vn_io_fault_doio+8x43/frame Bxfffffe8183d62838 vn_io_fault1() at vn_io_fault1+9x171/frame Bxfffffe8193d62978 vn_io~fault() at vn_io_fault+8x195/frame Bxfffff98183d62998 dofileurite() at dofilewrite+8x97lframe Bxfffffe8183d62638 kern_pwritev() at kern_puritev+9x5f/frame Bxfffffe8183d62678 sgs_pwrite() at sgs_pwrite+9x8d/frame BxfffffeBlBBdGZacB lamd64_sgscall() at amd64_sgscall+9x28olframe Bxfffffe8183d62bf8 ‘fast_sgscall_common() at fast_sgscall_common+8x191/frame Bxfffffe8183d62bf9 5950611 (476, FreeBSD ELF64, sgs_pwrite), rip = 9x4837Sa, rsp = Bxc42917ccd8 rbp : 9xc42817cd48 KDB: enter: panic thread pid 9863 tid 192594 Stopped at kdb_enter+9x3b: movq $B,kdb_uhg db) Show panic panic: solaris assert: HDR_ENPTY(hdr), file: lusr/src/sgs/cddllcontrnb/opensolar is/uts/common/fs/zfs/arc.c, line: 6213 I'm trying to reproduce by running siad on a test system. No crashes so far, but I guess there is some long synchronization step before it starts doing actual work. Core dump will be uploaded in about an hour. This panic happened over night while scrubbing and synchronizing siad, and of course daily scripts running. panic: solaris assert: HDR_EMPTY(hdr), file: /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c, line: 6213 cpuid = 1 time = 1538641234 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0398ce0770 vpanic() at vpanic+0x1a3/frame 0xfffffe0398ce07d0 panic() at panic+0x43/frame 0xfffffe0398ce0830 assfail() at assfail+0x1a/frame 0xfffffe0398ce0840 arc_release() at arc_release+0x940/frame 0xfffffe0398ce08d0 dbuf_redirty() at dbuf_redirty+0x56/frame 0xfffffe0398ce08f0 dmu_buf_will_dirty() at dmu_buf_will_dirty+0x120/frame 0xfffffe0398ce0920 dmu_write_uio_dnode() at dmu_write_uio_dnode+0x129/frame 0xfffffe0398ce09a0 zvol_write() at zvol_write+0x151/frame 0xfffffe0398ce0a00 ctl_be_block_dispatch_zvol() at ctl_be_block_dispatch_zvol+0x228/frame 0xfffffe0398ce0a80 ctl_be_block_worker() at ctl_be_block_worker+0x6c/frame 0xfffffe0398ce0b20 taskqueue_run_locked() at taskqueue_run_locked+0x10c/frame 0xfffffe0398ce0b80 taskqueue_thread_loop() at taskqueue_thread_loop+0x88/frame 0xfffffe0398ce0bb0 fork_exit() at fork_exit+0x84/frame 0xfffffe0398ce0bf0 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0398ce0bf0 --- trap 0, rip = 0, rsp = 0, rbp = 0 --- KDB: enter: panic __curthread () at ./machine/pcpu.h:230 230 __asm("movq %%gs:%1,%0" : "=r" (td) (kgdb) tar zcvf HDR-core-dump.tar.gz /boot/kernel /boot/kernel.old /var/crash/*.2 https://drive.google.com/open?id=13JI8vxg161Iw_UD8b5Nl-PiZ970CknCT Is this a regression from an earlier install, or is this a new install? This system was upgraded from 11.2-RELENG branch to 12-ALPHA by cloning the boot environment, and then building from source. I forget exactly which ALPHA 4, or 5 but it's currently at 8. Does the issue persist in 12.0-RELEASE? Quite a few bug fixes went in shortly before the release. (In reply to Mark Johnston from comment #14) I don't know if it still occur. In late October I discovered that some of drives in the raidz2 were actually SMR drives, Seagate doeesn't advertise them as SMR. I reorganized data and drives to use the SMR drives for WORM data and put the data sets with more churn on the normal PMR drives. I can try putting siad onto the SMR pool later when I can tolerate panicing the system (if it's going to do it). constans% uname -a FreeBSD constans 12.0-STABLE FreeBSD 12.0-STABLE #11 r346745M: Fri Apr 26 11:41:18 EDT 2019 gldisater@constans:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64 panic'd last night - unfortunately none of the swap partitions on that machine are large enough for a dump - I'll add old drive for the dumpdev and try again. (In reply to Jeremy Faulkner from comment #16) Note that you can add -Z to dumpon_flags in rc.conf to enable crash-time compression of the dump. Typically this greatly reduces the amount of disk space required. Panic'd during daily's again, panic is different from earlier and makes more sense in the context of SMR drives. Loaded symbols for /boot/kernel/fdescfs.ko #0 doadump () at src/sys/amd64/include/pcpu.h:230 230 __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (OFFSETOF_CURTHREAD)); (kgdb) #0 doadump () at src/sys/amd64/include/pcpu.h:230 #1 0xffffffff80bc19e7 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:451 #2 0xffffffff80bc1e49 in vpanic (fmt=<value optimized out>, ap=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:877 #3 0xffffffff80bc1c43 in panic (fmt=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:804 #4 0xffffffff827c6fe9 in vdev_deadman (vd=<value optimized out>) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c:4369 #5 0xffffffff827c6ea1 in vdev_deadman (vd=0xfffff80986498000) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c:4345 #6 0xffffffff827c6ea1 in vdev_deadman (vd=0xfffff80986393000) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c:4345 #7 0xffffffff827b7903 in spa_deadman (arg=0xfffffe00c623a000, pending=<value optimized out>) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_misc.c:666 #8 0xffffffff80c1ff64 in taskqueue_run_locked (queue=0xfffff80104525100) at /usr/src/sys/kern/subr_taskqueue.c:467 #9 0xffffffff80c21288 in taskqueue_thread_loop (arg=<value optimized out>) at /usr/src/sys/kern/subr_taskqueue.c:773 #10 0xffffffff80b82dc2 in fork_exit ( callout=0xffffffff80c211f0 <taskqueue_thread_loop>, arg=0xffffffff8200b8b0, frame=0xfffffe0075daac00) at /usr/src/sys/kern/kern_fork.c:1060 #11 0xffffffff81075c1e in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:995 #12 0x0000000000000000 in ?? () Current language: auto; currently minimal (kgdb) *** This bug has been marked as a duplicate of bug 245683 *** |