Bug 274698 - arc_prune consuming 100% CPU
Summary: arc_prune consuming 100% CPU
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 15.0-CURRENT
Hardware: Any Any
: --- Affects Only Me
Assignee: Alexander Motin
URL:
Keywords:
: 275063 (view as bug list)
Depends on:
Blocks:
 
Reported: 2023-10-24 16:45 UTC by Mark Johnston
Modified: 2024-04-27 14:36 UTC (History)
7 users (show)

See Also:
mav: mfc-stable14+
mav: mfc-stable13?


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Mark Johnston freebsd_committer freebsd_triage 2023-10-24 16:45:40 UTC
In the past couple of months I see a new problem: ZFS' arc_prune thread will occasionally start consuming a CPU and run for hours.  Looking at "vmstat -z | grep taskq", I can see that a huge number of task allocations are pending, and they are slowly draining.

It looks like back-to-back calls to arc_evict() can queue up a silly amount of work.  The Linux implementation of arc_prune_async() is careful to avoid this problem.
Comment 1 Alexander Motin freebsd_committer freebsd_triage 2023-10-26 17:37:54 UTC
Yea, queuing multiple tasks for arc_prune same time makes no sense and in addition to CPU usage you see likely causes over-pruning after the pressure is already resolved.  After thinking of several possible approaches I am going to unify FreeBSD code with Linux there, that should be good step by itself.
Comment 2 Alexander Motin freebsd_committer freebsd_triage 2023-10-26 18:47:09 UTC
Please test/review https://github.com/openzfs/zfs/pull/15456 .
Comment 3 Mark Johnston freebsd_committer freebsd_triage 2023-11-03 13:58:57 UTC
Fixed in main now, commit 799e09f75a31e80a1702a850838c79879af8b917.
Comment 4 Alexander Motin freebsd_committer freebsd_triage 2023-11-03 14:05:14 UTC
I'll watch it included into OpenZFS 2.2 updates for stable/14.  Closing the PR.
Comment 5 Martin Birgmeier 2023-11-13 21:26:57 UTC
I would be happy about a quick merge to stable/14 - see PR #275063.

:-)

Thanks, Martin
Comment 6 Alexander Motin freebsd_committer freebsd_triage 2023-11-13 21:30:29 UTC
It is now part of https://github.com/openzfs/zfs/pull/15498 on its way to ZFS 2.2.1.
Comment 7 Martin Birgmeier 2023-11-16 20:02:57 UTC
For me this (see also bug 275063) turns out to be a real blocker. FreeBSD 14.0 should not be released with this issue unsolved.

-- Martin
Comment 8 commit-hook freebsd_committer freebsd_triage 2024-04-12 13:01:52 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=330954bdb822af6bc07d487b1ecd7f8fda9c4def

commit 330954bdb822af6bc07d487b1ecd7f8fda9c4def
Author:     Alexander Motin <mav@FreeBSD.org>
AuthorDate: 2023-10-30 23:56:04 +0000
Commit:     Olivier Certner <olce@FreeBSD.org>
CommitDate: 2024-04-12 13:00:11 +0000

    Unify arc_prune_async() code, fix excessive ARC pruning

    There is no sense to have separate implementations for FreeBSD and Linux.  Make
    Linux code shared as more functional and just register FreeBSD-specific prune
    callback with arc_add_prune_callback() API.

    Aside of code cleanup this fixes excessive pruning on FreeBSD.

    [olce: This code comes from the OpenZFS pull request:
    https://github.com/openzfs/zfs/pull/16083, vendor-merged into our tree.  Its
    commit message has been slightly adapted to the present context.  The upstream
    pull request has been reviewed and merged into 'zfs-2.1.16-staging' as
    5b81b1bf5e6d6aeb8a87175dcb12b529185cac2f, which should come into our tree at the
    next vendor import.  This is the same code that was merged into stable/14 and
    main as part of vendor merges, and released as an EN (FreeBSD-EN-23:18.openzfs)
    over releng/14.0 by markj@.]

    PR:             275594, 274698
    Reported by:    Seigo Tanimura <seigo.tanimura@gmail.com>, markj, and others
    Tested by:      olce
    Approved by:    emaste (mentor)
    Obtained from:  OpenZFS
    Sponsored by:   iXsystems, Inc.
    Sponsored by:   The FreeBSD Foundation
    Signed-off-by:  Alexander Motin <mav@FreeBSD.org>

 sys/contrib/openzfs/include/os/linux/zfs/sys/zpl.h |  2 +-
 sys/contrib/openzfs/include/sys/arc.h              |  2 +-
 sys/contrib/openzfs/include/sys/arc_impl.h         |  1 -
 sys/contrib/openzfs/module/os/freebsd/zfs/arc_os.c | 62 ----------------------
 .../openzfs/module/os/freebsd/zfs/zfs_vfsops.c     | 32 +++++++++++
 sys/contrib/openzfs/module/os/linux/zfs/arc_os.c   | 51 ------------------
 .../openzfs/module/os/linux/zfs/zpl_super.c        |  2 +-
 sys/contrib/openzfs/module/zfs/arc.c               | 52 ++++++++++++++++++
 8 files changed, 87 insertions(+), 117 deletions(-)
Comment 9 commit-hook freebsd_committer freebsd_triage 2024-04-24 20:21:39 UTC
A commit in branch releng/13.3 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=266b3bd3f26d30f7be56b7ec9d31f3db2285b4ce

commit 266b3bd3f26d30f7be56b7ec9d31f3db2285b4ce
Author:     Alexander Motin <mav@FreeBSD.org>
AuthorDate: 2023-10-30 23:56:04 +0000
Commit:     Gordon Tetlow <gordon@FreeBSD.org>
CommitDate: 2024-04-24 20:06:16 +0000

    Unify arc_prune_async() code, fix excessive ARC pruning

    There is no sense to have separate implementations for FreeBSD and Linux.  Make
    Linux code shared as more functional and just register FreeBSD-specific prune
    callback with arc_add_prune_callback() API.

    Aside of code cleanup this fixes excessive pruning on FreeBSD.

    [olce: This code comes from the OpenZFS pull request:
    https://github.com/openzfs/zfs/pull/16083, vendor-merged into our tree.  Its
    commit message has been slightly adapted to the present context.  The upstream
    pull request has been reviewed and merged into 'zfs-2.1.16-staging' as
    5b81b1bf5e6d6aeb8a87175dcb12b529185cac2f, which should come into our tree at the
    next vendor import.  This is the same code that was merged into stable/14 and
    main as part of vendor merges, and released as an EN (FreeBSD-EN-23:18.openzfs)
    over releng/14.0 by markj@.]

    PR:             275594, 274698
    Reported by:    Seigo Tanimura <seigo.tanimura@gmail.com>, markj, and others
    Tested by:      olce
    Approved by:    emaste (mentor)
    Approved by:    so
    Obtained from:  OpenZFS
    Sponsored by:   iXsystems, Inc.
    Sponsored by:   The FreeBSD Foundation
    Signed-off-by:  Alexander Motin <mav@FreeBSD.org>

    (cherry picked from commit 330954bdb822af6bc07d487b1ecd7f8fda9c4def)

 sys/contrib/openzfs/include/os/linux/zfs/sys/zpl.h |  2 +-
 sys/contrib/openzfs/include/sys/arc.h              |  2 +-
 sys/contrib/openzfs/include/sys/arc_impl.h         |  1 -
 sys/contrib/openzfs/module/os/freebsd/zfs/arc_os.c | 62 ----------------------
 .../openzfs/module/os/freebsd/zfs/zfs_vfsops.c     | 32 +++++++++++
 sys/contrib/openzfs/module/os/linux/zfs/arc_os.c   | 51 ------------------
 .../openzfs/module/os/linux/zfs/zpl_super.c        |  2 +-
 sys/contrib/openzfs/module/zfs/arc.c               | 52 ++++++++++++++++++
 8 files changed, 87 insertions(+), 117 deletions(-)
Comment 10 Olivier Certner freebsd_committer freebsd_triage 2024-04-27 07:15:30 UTC
Commits above correspond to a backport of the fix to stable/13 and then releng/13.3 (the fix appears in 13.3-RELEASE-p2).  This backport was made as part of the investigation in bug 275594, and fixes the most visible behaviors reported there.  It also fixes bug 277717 (a duplicated report of a part of bug 275594).

Here is the full chronology of fixes:
The fix initially went into main with import of OpenZFS as merge f8b1db88b827 (n266198; Wed, 1 Nov 2023 09:13:42 UTC).  It went in stable/14 in f7f5c2419ea7 (n265783; Wed, 22 Nov 2023 11:43:59 UTC) through import of OpenZFS 2.2.1, after the release of 14.0.  It was then backported to releng/14.0, and an EN issued (FreeBSD-EN-23:18.openzfs), the merge in releng/14.0 happening in 64c5eaab835b (n265389; Mon, 4 Dec 2023 14:03:22 UTC).  It was backported into stable/13 (direct import, without an OpenZFS import of 2.1.x; the fix also appears upstream, so won't be lost at the next import) in 330954bdb822 (n257698; Fri, 12 Apr 2024 13:00:11 UTC; this is comment #8 above).  And it was finally backported to releng/13.3 in 266b3bd3f26d (n257432; Thu, 24 Apr 2024 20:06:16 UTC; this is comment #9 above).
Comment 11 Olivier Certner freebsd_committer freebsd_triage 2024-04-27 14:33:51 UTC
*** Bug 275063 has been marked as a duplicate of this bug. ***