Bug 275063 - kernel using 100% CPU in arc_prune
Summary: kernel using 100% CPU in arc_prune
Status: Closed DUPLICATE of bug 274698
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: Unspecified
Hardware: Any Any
: --- Affects Only Me
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks: 14.0-erratas
  Show dependency treegraph
 
Reported: 2023-11-13 21:10 UTC by Martin Birgmeier
Modified: 2024-04-27 14:33 UTC (History)
9 users (show)

See Also:
martin.dieringer: mfc-stable13?


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Birgmeier 2023-11-13 21:10:33 UTC
Scenario:
- FreeBSD stable/14 at e4fb49e867ae70412205d1aade0b76fae158b551
- Doing an rsync from UFS to ZFS, where both are on the same physical disks
- The scenario always worked flawlessly on releng/12.4
- 4-core CPU
- Running top with H and S keys pressed (show system processes, show threads)

Result:
- 200% CPU time is used: 100% by rsync, another 100% by something labeled "arc_prune" in top
- top snapshot:

last pid:  4132;  load averages:  2.04,  1.92,  1.69                                            up 0+13:28:31  22:08:35
369 threads:   7 running, 342 sleeping, 20 waiting
CPU:  0.0% user,  0.0% nice, 49.9% system,  0.0% interrupt, 50.1% idle
Mem: 64M Active, 142M Inact, 222M Laundry, 2632M Wired, 276M Buf, 12G Free
ARC: 661M Total, 153M MFU, 394M MRU, 16M Anon, 14M Header, 68M Other
     380M Compressed, 941M Uncompressed, 2.48:1 Ratio
Swap: 20G Total, 20G Free

  PID USERNAME    PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
    0 root         -8    -     0B  1376K CPU1     1  28:14 100.00% kernel{arc_prune}
 1491 root        132    0    26M    11M CPU2     2  21:04  96.99% rsync
   10 root        187 ki31     0B    64K CPU0     0 765:33  83.53% idle{idle: cpu0}
   10 root        187 ki31     0B    64K RUN      2 763:29  68.92% idle{idle: cpu2}
   10 root        187 ki31     0B    64K RUN      3 762:42  47.56% idle{idle: cpu3}
   20 root        -16    -     0B    16K vlruwt   3   0:43   3.07% vnlru
    2 root        -60    -     0B    64K WAIT     0   0:54   0.08% clock{clock (0)}
 4132 root         20    0    14M  4188K CPU3     3   0:00   0.06% top
   11 root        -64    -     0B   256K WAIT     0   0:44   0.01% intr{irq29: ahci1}
    4 root        -16    -     0B    48K -        3   0:40   0.01% cam{doneq0}
 2824 root         20    0    25M    13M select   0   0:00   0.01% xterm
   17 root        -16    -     0B    48K psleep   0   0:15   0.01% pagedaemon{dom0}
  779 root         20    0    33M  6976K select   3   0:05   0.01% mountd
  698 root         20    0    13M  1840K select   0   0:04   0.01% powerd
   16 root        -16    -     0B    16K -        0   0:04   0.01% rand_harvestq
 2876 root         20    0    16M  4696K select   0   0:01   0.01% xload
    0 root        -64    -     0B  1376K -        3   0:03   0.00% kernel{fw0_taskq}
   11 root        -64    -     0B   256K WAIT     2   0:53   0.00% intr{irq30: re0}
  695 root         20    0    19M  6028K select   0   0:03   0.00% ntpd
  861 root         20    0    63M    24M select   3   0:01   0.00% httpd
   21 root         16    -     0B    16K syncer   0   0:03   0.00% syncer
   19 root         20    -     0B   144K sdflus   3   0:01   0.00% bufdaemon{/ worker}
   15 root         -8    -     0B  1104K mmp->m   2   0:01   0.00% zfskern{mmp_thread_enter}
   15 root         -8    -     0B  1104K t->zth   0   0:07   0.00% zfskern{arc_reap}
...

Expected result:
- The system should not slow considerably by spending a full core on "arc_prune"

-- Martin
Comment 1 Mark Johnston freebsd_committer freebsd_triage 2023-11-13 21:13:03 UTC
This is probably a duplicate of PR 274698, though the fix (commit e4fb49e867ae70412205d1aade0b76fae158b551) has not landed yet in stable/14.
Comment 2 Martin Birgmeier 2023-11-13 21:18:26 UTC
Thanks for the first answer.

After a reboot, the situation is as follows:

last pid:   977;  load averages:  0.25,  0.12,  0.04                            up 0+00:03:11  22:17:10
368 threads:   6 running, 342 sleeping, 20 waiting
CPU:  0.9% user,  0.0% nice,  3.6% system,  0.5% interrupt, 95.0% idle
Mem: 241M Active, 13M Inact, 962M Wired, 95M Buf, 14G Free
ARC: 404M Total, 63M MFU, 224M MRU, 16K Anon, 5968K Header, 104M Other
     51M Compressed, 291M Uncompressed, 5.70:1 Ratio
Swap: 20G Total, 20G Free

  PID USERNAME    PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
   10 root        187 ki31     0B    64K RUN      2   3:06  95.47% idle{idle: cpu2}
   10 root        187 ki31     0B    64K CPU1     1   3:04  95.24% idle{idle: cpu1}
   10 root        187 ki31     0B    64K CPU3     3   3:07  94.91% idle{idle: cpu3}
   10 root        187 ki31     0B    64K RUN      0   3:06  92.78% idle{idle: cpu0}
  956 root         24    0    51M    11M zio->i   2   0:01   7.21% rsync
  957 root         21    0    23M  8544K select   1   0:01   2.76% rsync
    0 root        -12    -     0B  1376K -        0   0:00   1.25% kernel{z_wr_iss_0}
    4 root        -16    -     0B    48K -        3   0:01   1.23% cam{doneq0}
    0 root        -12    -     0B  1376K -        2   0:00   1.18% kernel{z_wr_iss_1}
    0 root        -12    -     0B  1376K -        3   0:01   1.14% kernel{z_wr_iss_2}
   11 root        -64    -     0B   256K WAIT     1   0:01   1.10% intr{irq29: ahci1}
    0 root        -16    -     0B  1376K -        2   0:00   1.04% kernel{z_wr_int_0}
    0 root        -16    -     0B  1376K -        3   0:00   0.87% kernel{z_wr_int_2}
    0 root        -16    -     0B  1376K -        3   0:00   0.86% kernel{z_wr_int_1}
    0 root        -16    -     0B  1376K RUN      2   0:01   0.79% kernel{z_rd_int_0}
    0 root        -16    -     0B  1376K -        3   0:01   0.63% kernel{z_rd_int_2}
  958 root         20    0    59M    10M select   1   0:00   0.41% rsync
Comment 3 Martin Birgmeier 2023-11-15 18:00:23 UTC
Just for information: This condition is triggered reliably as follows:
- Create a sparse/compressed zvol: zfs create -o compress=on -s -V 20g hal.1/vdisks/903
- In a VM, slice & partition it in the old-fashioned way for carrying UFS filesystems: ada0s4a, ada0s4d (for / and /usr)
- With the VM not running and from the host side, mount one of these filesystems
- Run dd if=/dev/zero of=<mountpoint>/<tmpfile> until filesystem full; this is done to remove unallocated blocks because the sparse/compressed zvol discards blocks of zeros
- The dd operation reliably triggers the arc_prune issue.

The draining of taskq_zone (see PR 274698) is so slow that it would take many hours (on an otherwise completely idle system):

[0]# vmstat -z | egrep '^ITEM|taskq' ; sleep 60 ; vmstat -z | egrep '^ITEM|taskq'
ITEM                   SIZE  LIMIT     USED     FREE      REQ     FAILSLEEP XDOMAIN
taskq_zone:             192,      0, 2221547,   28456, 2980856,   0,   0,   0
ITEM                   SIZE  LIMIT     USED     FREE      REQ     FAILSLEEP XDOMAIN
taskq_zone:             192,      0, 2218760,   31243, 2980868,   0,   0,   0
[0]# 

A linear extrapolation results in about 13.3 hours.

-- Martin
Comment 4 Mark Johnston freebsd_committer freebsd_triage 2023-11-20 14:42:51 UTC
It's too late to get this fixed in 14.0, but we'll ship an EN for this.
Comment 5 Martin Birgmeier 2023-11-20 18:41:48 UTC
I believe you are taking this bug too lightly. With it, 14.0 may be usable for private home use, but not in any serious setting.

There is no harm in postponing a release if there are good reasons for it.

-- Martin
Comment 6 iron.udjin 2023-12-04 01:09:34 UTC
I hitted the same issue on 14.0-RELEASE. When I tried to restart the server, got kernel panic in arc_prune process:

Fatal trap 12: page fault while in kernel mode 
cpuid = 7; apic id = 07 
fault virtual address   = 0x458
fault code      = supervisor read data, page not present
instruction pointer = 0x20:0xffffffff80b4e386
stack pointer           = 0x28:0xfffffe0206679d60
frame pointer           = 0x28:0xfffffe0206679e00
code segment        = base rx0, limit 0xfffff, type 0x1b 
            = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags    = interrupt enabled, resume, IOPL = 0
current process     = 0 (arc_prune)
rdi: ffffffff8226f568 rsi: 0000000000000000 rdx: 0000000000000000
rcx: 0000000000000000  r8: 000000000000007e  r9: fffffe02062fd518
rax: 0000000000000000 rbx: ffffffff8226f568 rbp: fffffe0206679e00
r10: fffff8162b935800 r11: fffff8162b935c00 r12: fffffe0206380560
r13: 0000000000000000 r14: 0000000000000000 r15: 0000000000000000
trap number     = 12 
panic: page fault
cpuid = 7
time = 1701648338
KDB: stack backtrace:
#0 0xffffffff80b9002d at kdb_backtrace+0x5d
#1 0xffffffff80b43132 at vpanic+0x132
#2 0xffffffff80b42ff3 at panic+0x43
#3 0xffffffff8100c85c at trap_fatal+0x40c
#4 0xffffffff8100c8af at trap_pfault+0x4f
#5 0xffffffff80fe3828 at calltrap+0x8
#6 0xffffffff81f78fa7 at arc_prune_task+0x87
#7 0xffffffff81f76ddf at taskq_run+0x1f
#8 0xffffffff80ba5922 at taskqueue_run_locked+0x182
#9 0xffffffff80ba6bb2 at taskqueue_thread_loop+0xc2
#10 0xffffffff80afdb0f at fork_exit+0x7f
#11 0xffffffff80fe488e at fork_trampoline+0xe

__curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:57
warning: Source file is more recent than executable.
57      __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu,
(kgdb) #0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:57
#1  doadump (textdump=<optimized out>)
    at /usr/src/sys/kern/kern_shutdown.c:405
#2  0xffffffff80b42cc7 in kern_reboot (howto=260)
    at /usr/src/sys/kern/kern_shutdown.c:526
#3  0xffffffff80b4319f in vpanic (fmt=0xffffffff81136b3b "%s",
    ap=ap@entry=0xfffffe0206679bb0) at /usr/src/sys/kern/kern_shutdown.c:970
#4  0xffffffff80b42ff3 in panic (fmt=<unavailable>)
    at /usr/src/sys/kern/kern_shutdown.c:894
#5  0xffffffff8100c85c in trap_fatal (frame=0xfffffe0206679ca0, eva=1112)
    at /usr/src/sys/amd64/amd64/trap.c:952
#6  0xffffffff8100c8af in trap_pfault (frame=0xfffffe0206679ca0,
    usermode=false, signo=<optimized out>, ucode=<optimized out>)
    at /usr/src/sys/amd64/amd64/trap.c:760
#7  <signal handler called>
#8  _sx_xlock_hard (sx=0xffffffff8226f568 <callb_table+152>,
    x=<optimized out>, opts=opts@entry=0) at /usr/src/sys/kern/kern_sx.c:684
#9  0xffffffff81f78fa7 in __sx_xlock (td=0xfffffe0206380560, opts=0,
    file=0x0, line=0, sx=<optimized out>) at /usr/src/sys/sys/sx.h:166
#10 arc_prune_task (arg=<optimized out>)
    at /usr/src/sys/contrib/openzfs/module/os/freebsd/zfs/arc_os.c:149
#11 0xffffffff81f76ddf in taskq_run (arg=0xfffff8026e484300,
    pending=<optimized out>) 
    at /usr/src/sys/contrib/openzfs/module/os/freebsd/spl/spl_taskq.c:320
#12 0xffffffff80ba5922 in taskqueue_run_locked (
    queue=queue@entry=0xfffff8010e55ce00) 
    at /usr/src/sys/kern/subr_taskqueue.c:512
#13 0xffffffff80ba6bb2 in taskqueue_thread_loop (
    arg=arg@entry=0xfffff8011e59ba70)
    at /usr/src/sys/kern/subr_taskqueue.c:824
#14 0xffffffff80afdb0f in fork_exit (
    callout=0xffffffff80ba6af0 <taskqueue_thread_loop>,
    arg=0xfffff8011e59ba70, frame=0xfffffe0206679f40)
    at /usr/src/sys/kern/kern_fork.c:1160
#15 <signal handler called>
(kgdb)
Comment 7 Mark Johnston freebsd_committer freebsd_triage 2023-12-05 18:59:37 UTC
Fixed in 14.0-RELEASE-p2.
Comment 8 Maxim Usatov 2024-03-11 20:20:55 UTC
I have kernel{arc_prune} stuck at 100% CPU with system completely unresponsive on FreeBSD 13.3-RELEASE releng/13.3-n257428-80d2b634ddf0. Can reproduce every time by doing rsync from internal ZFS NVMe storage to an external UFS SATA drive attached via USB. Is there a hope of getting this fixed in 13.3 as well?
Comment 9 Trev 2024-03-12 00:08:41 UTC
I still have arc_prune stuck at 100% for days despite having recently compiled a 13-STABLE (stable/13-8b84d2da9: Fri Mar  8 15:06:13 AEDT 2024).
Comment 10 Graham Perrin 2024-03-15 07:15:35 UTC
(In reply to Maxim Usatov from comment #8)
(In reply to Trev from comment #9)

> 13

For tracking, can you open a new bug for this? 

Whilst the mfc-stable13 request was not specifically rejected before closure, FreeBSD-EN-23:18.openzfs affected FreeBSD 14.0 alone:

<https://www.freebsd.org/security/advisories/FreeBSD-EN-23:18.openzfs.asc>

Thank you
Comment 11 Paolo Tealdi 2024-03-15 07:31:48 UTC
I Verified the bug on two of my machines updated to Freebsd 13.3-RELEASE (those with more load) amongst 10. On one of them the bug disappeared when updated to FreeBSD 14.0-RELEASE-p5: i'm updating the second one :-)
Could be useful to backport the patch to 13.3-RELEASE ...
Comment 12 Maxim Usatov 2024-03-15 10:29:16 UTC
(In reply to Graham Perrin from comment #10)

I have opened bug #277717.
Comment 13 Olivier Certner freebsd_committer freebsd_triage 2024-04-27 14:33:51 UTC
This bug should have been closed as a duplicate of 274698.  Please the latter for more information on the fix.

*** This bug has been marked as a duplicate of bug 274698 ***