265569 – [panic] Fatal trap 9: general protection fault while in kernel mode arc_reap

Bug 265569 - [panic] Fatal trap 9: general protection fault while in kernel mode arc_reap

Summary: [panic] Fatal trap 9: general protection fault while in kernel mode arc_reap

Status:	New

Alias:	None

Product:	Base System
Classification:	Unclassified
Component:	kern (show other bugs)
Version:	13.1-RELEASE
Hardware:	amd64 Any

Importance:	--- Affects Only Me
Assignee:	freebsd-bugs (Nobody)

URL:
Keywords:	crash

Depends on:
Blocks:

Reported:	2022-08-02 06:11 UTC by Charlie Stanley
Modified:	2022-10-17 12:17 UTC (History)
CC List:	1 user (show)

See Also:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Charlie Stanley 2022-08-02 06:11:10 UTC

I can consistently trigger this panic on my system by running a userspace backup program (duplicity). The program seems to get to roughly the same place every time and then panics. I am going guess from the stack trace that perhaps my system may have L2ARC corruption? I will disable the L2ARC and see if the problem goes away, but first I wanted to give the opportunity to further investigation.

Other possibly interesting information about my system:

Arc is a pair of nvme drives that are completely full:
                            capacity     operations     bandwidth                                                                                                                                                                                                                     
pool                      alloc   free   read  write   read  write                                                                                                                                                                                                                    
------------------------  -----  -----  -----  -----  -----  -----                                                                                                                                                                                                                    
spinning                  13.0T  30.5T      2      0  26.9K  4.11K                                                                                                                                                                                                                    
  raidz1-0                13.0T  30.5T      2      0  26.7K  3.71K                                                                                                                                                                                                                    
    diskid/DISK-ZJV3CN53      -      -      0      0  10.2K    982                                                                                                                                                                                                                    
    diskid/DISK-ZJV3CTT4      -      -      0      0  3.20K    945                                                                                                                                                                                                                    
    diskid/DISK-ZJV3BQLS      -      -      0      0  9.35K    959                                                                                                                                                                                                                    
    diskid/DISK-ZJV3CM07      -      -      0      0  3.99K    911                                                                                                                                                                                                                    
logs                          -      -      -      -      -      -                                                                                                                                                                                                                    
  mirror-1                   1M  15.5G      0      0    250    413
    nvd0p3                    -      -      0      0    125    206
    nvd1p3                    -      -      0      0    125    206
cache                         -      -      -      -      -      -
  nvd0p5                   209G  91.8M      0      0  10.1K  12.0K
  nvd1p5                   209G  35.7M      0      0  10.3K  10.6K
------------------------  -----  -----  -----  -----  -----  -----


Unread portion of the kernel message buffer:


Fatal trap 9: general protection fault while in kernel mode
cpuid = 1; apic id = 02
instruction pointer     = 0x20:0xffffffff80f46649
stack pointer           = 0x28:0xfffffe015668ad70
frame pointer           = 0x28:0xfffffe015668adb0
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 5 (arc_reap)
trap number             = 9
panic: general protection fault
cpuid = 1
time = 1659411380
KDB: stack backtrace:
#0 0xffffffff80c69465 at kdb_backtrace+0x65
#1 0xffffffff80c1bb1f at vpanic+0x17f
#2 0xffffffff80c1b993 at panic+0x43
#3 0xffffffff810afdf5 at trap_fatal+0x385
#4 0xffffffff81087528 at calltrap+0x8
#5 0xffffffff80f4b70a at bucket_drain+0xda
#6 0xffffffff80f4ba7a at bucket_cache_reclaim_domain+0x2da
#7 0xffffffff80f49412 at zone_reclaim+0x192
#8 0xffffffff821a2e59 at arc_reap_cb+0x9
#9 0xffffffff8231121a at zthr_procedure+0xba
#10 0xffffffff80bd8a5e at fork_exit+0x7e
#11 0xffffffff8108859e at fork_trampoline+0xe
Uptime: 1h10m8s
Dumping 5378 out of 65381 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%

bt__curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
55              __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu,
(kgdb) bt
#0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
#1  doadump (textdump=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:399
#2  0xffffffff80c1b71c in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:487
#3  0xffffffff80c1bb8e in vpanic (fmt=0xffffffff811b4fb9 "%s", ap=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:920
#4  0xffffffff80c1b993 in panic (fmt=<unavailable>) at /usr/src/sys/kern/kern_shutdown.c:844
#5  0xffffffff810afdf5 in trap_fatal (frame=0xfffffe015668acb0, eva=0) at /usr/src/sys/amd64/amd64/trap.c:944
#6  <signal handler called>
#7  0xffffffff80f46649 in slab_free_item (zone=0xfffffe015a7c8000, slab=0xfffff80e00000158, item=<optimized out>) at /usr/src/sys/vm/uma_core.c:4691
#8  zone_release (arg=0xfffffe015a7c8000, bucket=0xfffff80f4784e410, cnt=<optimized out>) at /usr/src/sys/vm/uma_core.c:4730
#9  0xffffffff80f4b70a in bucket_drain (zone=zone@entry=0xfffffe015a7c8000, bucket=bucket@entry=0xfffff80f4784e400) at /usr/src/sys/vm/uma_core.c:1312
#10 0xffffffff80f4ba7a in bucket_free (zone=0xfffffe015a7c8000, bucket=0xfffff80f4784e400, udata=0x0) at /usr/src/sys/vm/uma_core.c:520
#11 bucket_cache_reclaim_domain (zone=zone@entry=0xfffffe015a7c8000, drain=<optimized out>, trim=<optimized out>, domain=<optimized out>, domain@entry=0) at /usr/src/sys/vm/uma_core.c:1509
#12 0xffffffff80f49412 in bucket_cache_reclaim (zone=0xfffffe015a7c8000, drain=<optimized out>, domain=-1) at /usr/src/sys/vm/uma_core.c:1534
#13 zone_reclaim (zone=0xfffffe015a7c8000, zone@entry=0xfffff80003fc5800, domain=<optimized out>, domain@entry=-1, waitok=waitok@entry=1, drain=<optimized out>) at /usr/src/sys/vm/uma_core.c:1674
#14 0xffffffff80f4927c in uma_zone_reclaim_domain (zone=<optimized out>, req=<optimized out>, req@entry=1, domain=-1) at /usr/src/sys/vm/uma_core.c:5232
#15 0xffffffff82158f03 in kmem_cache_reap_soon (cache=<optimized out>) at /usr/src/sys/contrib/openzfs/module/os/freebsd/spl/spl_kmem.c:247
#16 0xffffffff82160451 in abd_cache_reap_now () at /usr/src/sys/contrib/openzfs/module/os/freebsd/zfs/abd_os.c:508
#17 0xffffffff8219a69a in arc_kmem_reap_soon () at /usr/src/sys/contrib/openzfs/module/zfs/arc.c:4874
#18 0xffffffff821a2e59 in arc_reap_cb (arg=0xdffffb000, zthr=0xfffff80005f13a80) at /usr/src/sys/contrib/openzfs/module/zfs/arc.c:5021
#19 0xffffffff8231121a in zthr_procedure (arg=arg@entry=0xfffff80003fc5800) at /usr/src/sys/contrib/openzfs/module/zfs/zthr.c:245
#20 0xffffffff80bd8a5e in fork_exit (callout=0xffffffff82311160 <zthr_procedure>, arg=0xfffff80003fc5800, frame=0xfffffe015668af40) at /usr/src/sys/kern/kern_fork.c:1093
#21 <signal handler called>
#22 mi_startup () at /usr/src/sys/kern/init_main.c:322
#23 0xffffffff80f79189 in swapper () at /usr/src/sys/vm/vm_swapout.c:755
#24 0xffffffff80385022 in btext () at /usr/src/sys/amd64/amd64/locore.S:80

Comment 1 Charlie Stanley 2022-08-02 06:30:09 UTC

I was doing some more debugging and the call chain looks sane until the slab address is computed. The slab pointer seems to be at an odd offset, and when I dereference it, the data looks like garbage to my limited understanding.

(kgdb) frame 8
#8  zone_release (arg=0xfffffe015a7c8000, bucket=0xfffff80f4784e410, cnt=<optimized out>) at /usr/src/sys/vm/uma_core.c:4730
4730                    slab_free_item(zone, slab, item);
(kgdb) print zone
$19 = (uma_zone_t) 0xfffffe015a7c8000
(kgdb) print slab
$20 = (uma_slab_t) 0xfffff80e00000158
(kgdb) print *zone
$21 = {uz_flags = 10551296, uz_size = 4096, uz_ctor = 0x0, uz_dtor = 0x0, uz_smr = 0x0, uz_max_items = 0, uz_bucket_max = 18446744073709551615, uz_bucket_size = 80, uz_bucket_size_max = 254, uz_sleepers = 0, uz_xdomain = 0xfffffe01d3cb4590, uz_keg = 0xfffff80005f13a80, 
  uz_import = 0xffffffff80f4a370 <zone_import>, uz_release = 0xffffffff80f465e0 <zone_release>, uz_arg = 0xfffffe015a7c8000, uz_init = 0x0, uz_fini = 0x0, uz_items = 0, uz_sleeps = 0, uz_link = {le_next = 0x0, le_prev = 0xfffff80005f13a90}, uz_allocs = 0xfffffe01d3cb45a8, 
  uz_frees = 0xfffffe01d3cb45a0, uz_fails = 0xfffffe01d3cb4598, uz_name = 0xfffff80005f0a180 "abd_chunk", uz_ctlname = 0xfffff80005f18de0 "abd_chunk", uz_namecnt = 0, uz_bucket_size_min = 2, uz_reclaimers = 1, uz_oid = 0xfffff80005f07b80, uz_warning = 0x0, uz_ratecheck = {
    tv_sec = 0, tv_usec = 0}, uz_maxaction = {ta_link = {stqe_next = 0x0}, ta_pending = 0, ta_priority = 0 '\000', ta_flags = 0 '\000', ta_func = 0x0, ta_context = 0x0}, uz_cross_lock = {lock_object = {lo_name = 0xffffffff812a0598 "UMA Cross", lo_flags = 16973824, 
      lo_data = 0, lo_witness = 0x0}, mtx_lock = 0}, uz_cpu = 0xfffffe015a7c8180}
(kgdb) print *slab
$22 = {us_link = {le_next = 0xda39576285989539, le_prev = 0x732ca6f15f8ff3dc}, us_freecount = 4147, us_flags = 27 '\033', us_domain = 120 'x', us_free = {__bits = 0xfffff80e00000170}}

Comment 2 Graham Perrin freebsd_committer

2022-10-17 12:17:14 UTC

Keyword: 

    crash

– in lieu of summary line prefix: 

    [panic]

* bulk change for the keyword
* summary lines may be edited manually (not in bulk). 

Keyword descriptions and search interface: 

    <https://bugs.freebsd.org/bugzilla/describekeywords.cgi>