Bug 233997 - Reading ZFS filesystem via NFS causes panic
Summary: Reading ZFS filesystem via NFS causes panic
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 12.0-RELEASE
Hardware: amd64 Any
: --- Affects Only Me
Assignee: freebsd-fs (Nobody)
URL:
Keywords: crash
Depends on:
Blocks:
 
Reported: 2018-12-13 23:49 UTC by Conor
Modified: 2022-10-12 00:49 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Conor 2018-12-13 23:49:28 UTC
Greetings

I'm observing some ZFS-related panics in 12.0-RELEASE.

HW setup:
1) 12.0-RELEASE amd64 NFS server RAID-Z2 (panic)
2) 12.0-RELEASE amd64 NFS client (server panic inducer)

When reading data from the server (1) above from the client (2), I have observed several kernel panics. The following are the recurring ones:

Dec 12 16:29:12 nas kernel: Fatal trap 9: general protection fault while in kernel mode
Dec 12 16:29:12 nas kernel: cpuid = 3; apic id = 06
Dec 12 16:29:12 nas kernel: instruction pointer	= 0x20:0xffffffff80f7573e
Dec 12 16:29:12 nas kernel: stack pointer	        = 0x28:0xfffffe0232c44900
Dec 12 16:29:12 nas kernel: frame pointer	        = 0x28:0xfffffe0232c44900
Dec 12 16:29:12 nas kernel: code segment		= base rx0, limit 0xfffff, type 0x1b
Dec 12 16:29:12 nas kernel: = DPL 0, pres 1, long 1, def32 0, gran 1
Dec 12 16:29:12 nas kernel: processor eflags	= interrupt enabled, resume, IOPL = 0
Dec 12 16:29:12 nas kernel: current process		= 17656 (nfsd: master)
Dec 12 16:29:12 nas kernel: trap number		= 9
Dec 12 16:29:12 nas kernel: panic: general protection fault
Dec 12 16:29:12 nas kernel: cpuid = 3
Dec 12 16:29:12 nas kernel: KDB: stack backtrace:
Dec 12 16:29:12 nas kernel: #0 0xffffffff80b3d587 at kdb_backtrace+0x67
Dec 12 16:29:12 nas kernel: #1 0xffffffff80af6b27 at vpanic+0x177
Dec 12 16:29:12 nas kernel: #2 0xffffffff80af69a3 at panic+0x43
Dec 12 16:29:12 nas kernel: #3 0xffffffff80f77fdf at trap_fatal+0x35f
Dec 12 16:29:12 nas kernel: #4 0xffffffff80f7759e at trap+0x5e
Dec 12 16:29:12 nas kernel: #5 0xffffffff80f57fbc at calltrap+0x8
Dec 12 16:29:12 nas kernel: #6 0xffffffff8228a0cd at abd_copy_to_buf_off+0x9d
Dec 12 16:29:12 nas kernel: #7 0xffffffff8228b3ab at arc_buf_fill+0xab
Dec 12 16:29:12 nas kernel: #8 0xffffffff8228ddab at arc_read+0x7ab
Dec 12 16:29:12 nas kernel: #9 0xffffffff8229848e at dbuf_read+0x72e
Dec 12 16:29:12 nas kernel: #10 0xffffffff822a1313 at dmu_buf_hold_array_by_dnode+0x1d3
Dec 12 16:29:12 nas kernel: #11 0xffffffff822a2c37 at dmu_read_uio_dnode+0x37
Dec 12 16:29:12 nas kernel: #12 0xffffffff822a2bdb at dmu_read_uio_dbuf+0x3b
Dec 12 16:29:12 nas kernel: #13 0xffffffff82357dc1 at zfs_freebsd_read+0x5d1
Dec 12 16:29:12 nas kernel: #14 0xffffffff810fad2c at VOP_READ_APV+0x7c
Dec 12 16:29:12 nas kernel: #15 0xffffffff80a32c72 at nfsvno_read+0x332
Dec 12 16:29:12 nas kernel: #16 0xffffffff80a2aa0a at nfsrvd_read+0x56a
Dec 12 16:29:12 nas kernel: #17 0xffffffff80a17ae1 at nfsrvd_dorpc+0x621
Dec 12 16:29:12 nas kernel: Uptime: 3h50m33s
Dec 12 16:29:12 nas kernel: Dumping 672 out of 8050 MB:..3%..12%..22%..31%..41%..53%..62%..72%..81%..91%

Dec 13 18:44:36 nas kernel: Fatal trap 9: general protection fault while in kernel mode
Dec 13 18:44:36 nas kernel: cpuid = 1; apic id = 02
Dec 13 18:44:36 nas kernel: instruction pointer	= 0x20:0xffffffff827252e0
Dec 13 18:44:36 nas kernel: stack pointer	        = 0x28:0xfffffe004b03b840
Dec 13 18:44:36 nas kernel: frame pointer	        = 0x28:0xfffffe004b03b840
Dec 13 18:44:36 nas kernel: code segment		= base rx0, limit 0xfffff, type 0x1b
Dec 13 18:44:36 nas kernel: 			= DPL 0, pres 1, long 1, def32 0, gran 1
Dec 13 18:44:36 nas kernel: processor eflags	= interrupt enabled, resume, IOPL = 0
Dec 13 18:44:36 nas kernel: current process		= 0 (zio_read_intr_0_8)
Dec 13 18:44:36 nas kernel: trap number		= 9
Dec 13 18:44:36 nas kernel: panic: general protection fault
Dec 13 18:44:36 nas kernel: cpuid = 2
Dec 13 18:44:36 nas kernel: time = 1544726596
Dec 13 18:44:36 nas kernel: KDB: stack backtrace:
Dec 13 18:44:36 nas kernel: #0 0xffffffff80be7977 at kdb_backtrace+0x67
Dec 13 18:44:36 nas kernel: #1 0xffffffff80b9b563 at vpanic+0x1a3
Dec 13 18:44:36 nas kernel: #2 0xffffffff80b9b3b3 at panic+0x43
Dec 13 18:44:36 nas kernel: #3 0xffffffff8107496f at trap_fatal+0x35f
Dec 13 18:44:36 nas kernel: #4 0xffffffff81073dbd at trap+0x6d
Dec 13 18:44:36 nas kernel: #5 0xffffffff8104f1d5 at calltrap+0x8
Dec 13 18:44:36 nas kernel: #6 0xffffffff82662378 at abd_iterate_func+0xa8
Dec 13 18:44:36 nas kernel: #7 0xffffffff82721d44 at zio_checksum_error_impl+0xe4
Dec 13 18:44:36 nas kernel: #8 0xffffffff82722179 at zio_checksum_error+0x89
Dec 13 18:44:36 nas kernel: #9 0xffffffff826f7116 at vdev_raidz_io_done+0x216
Dec 13 18:44:36 nas kernel: #10 0xffffffff8271f1b5 at zio_vdev_io_done+0x1d5
Dec 13 18:44:36 nas kernel: #11 0xffffffff8271b2ec at zio_execute+0xbc
Dec 13 18:44:36 nas kernel: #12 0xffffffff80bf9cb4 at taskqueue_run_locked+0x154
Dec 13 18:44:36 nas kernel: #13 0xffffffff80bfae18 at taskqueue_thread_loop+0x98
Dec 13 18:44:36 nas kernel: #14 0xffffffff80b5bf33 at fork_exit+0x83
Dec 13 18:44:36 nas kernel: #15 0xffffffff810501be at fork_trampoline+0xe
Dec 13 18:44:36 nas kernel: Uptime: 5m16s
Dec 13 18:44:36 nas kernel: Dumping 757 out of 8046 MB:..3%..11%..22%..32%..41%..51%..62%..72%..81%..91%

Dec 13 18:57:13 nas kernel: panic: solaris assert: hdr->b_type == type (0x4000001 == 0x1), file: /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c, line: 2140
Dec 13 18:57:13 nas kernel: cpuid = 0
Dec 13 18:57:13 nas kernel: time = 1544727352
Dec 13 18:57:13 nas kernel: KDB: stack backtrace:
Dec 13 18:57:13 nas kernel: #0 0xffffffff80be7977 at kdb_backtrace+0x67
Dec 13 18:57:13 nas kernel: #1 0xffffffff80b9b563 at vpanic+0x1a3
Dec 13 18:57:13 nas kernel: #2 0xffffffff80b9b3b3 at panic+0x43
Dec 13 18:57:13 nas kernel: #3 0xffffffff829e822c at assfail3+0x2c
Dec 13 18:57:13 nas kernel: #4 0xffffffff82669408 at arc_change_state+0x58
Dec 13 18:57:13 nas kernel: #5 0xffffffff826671c9 at arc_access+0x109
Dec 13 18:57:13 nas kernel: #6 0xffffffff82668ea2 at arc_read_done+0xf2
Dec 13 18:57:13 nas kernel: #7 0xffffffff82720f1e at zio_done+0x88e
Dec 13 18:57:13 nas kernel: #8 0xffffffff8271c2ec at zio_execute+0xbc
Dec 13 18:57:13 nas kernel: #9 0xffffffff80bf9cb4 at taskqueue_run_locked+0x154
Dec 13 18:57:13 nas kernel: #10 0xffffffff80bfae18 at taskqueue_thread_loop+0x98
Dec 13 18:57:13 nas kernel: #11 0xffffffff80b5bf33 at fork_exit+0x83
Dec 13 18:57:13 nas kernel: #12 0xffffffff810501be at fork_trampoline+0xe
Dec 13 18:57:13 nas kernel: Uptime: 11m37s
Dec 13 18:57:13 nas kernel: Dumping 765 out of 8046 MB:..3%..11%..21%..32%..42%..51%..61%..72%..82%..92%

This is an array built with 11.2-RELEASE on the server, and the ZFS tuning parameter is just:

vfs.zfs.arc_max="6G"

Some prior stability issues observed occasionally in 11.2-RELEASE as reported here: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=227784#c11

Steps to reproduce (typically happens under a minute) is performing a heavy read load via NFS, as described above. I haven't had the cycles to do any further investigation just yet.