Bug 273610 - Panic (vm_page_assert_xbusied: page 0xfffffe0001beaed8 busy_lock 0xfffffffe not owned by me)
Summary: Panic (vm_page_assert_xbusied: page 0xfffffe0001beaed8 busy_lock 0xfffffffe n...
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 15.0-CURRENT
Hardware: Any Any
: --- Affects Only Me
Assignee: Mark Johnston
URL: https://github.com/freebsd/freebsd-sr...
Keywords: crash
: 272403 (view as bug list)
Depends on:
Blocks:
 
Reported: 2023-09-07 06:07 UTC by Graham Perrin
Modified: 2023-10-09 18:15 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Graham Perrin 2023-09-07 06:07:11 UTC
Dump header from device: /dev/ada0p2
  Architecture: amd64
  Architecture Version: 2
  Dump Length: 1684303872
  Blocksize: 512
  Compression: none
  Dumptime: 2023-09-06 18:47:12 +0100
  Hostname: mowa219-gjp4-8570p-freebsd
  Magic: FreeBSD Kernel Dump
  Version String: FreeBSD 15.0-CURRENT amd64 1500000 #10 main-n265135-07bc20e4740d-dirty: Sat Sep  2 17:36:59 BST 2023
    grahamperrin@mowa219-gjp4-8570p-freebsd:/usr/obj/usr/src/amd64.amd64/sys/GENERIC
  Panic String: 
  Dump Parity: 1722260791vm_page_assert_xbusied: page 0xfffffe0001beaed8 busy_lock 0xfffffffe not owned by me @ /usr/src/sys/vm/vm_page.c:1181
  Bounds: 1
  Dump Status: good

----

% bectl list -c creation | tail -n 4
n265053-315ee00fa961-d -      -          74.4M 2023-09-02 07:52
n265135-07bc20e4740d-a -      -          139M  2023-09-02 20:12
n265135-07bc20e4740d-b -      -          211M  2023-09-05 19:49
n265135-07bc20e4740d-c NR     /          468G  2023-09-06 18:33
% 

----

Before the panic: with n265135-07bc20e4740d-b running, I created then mounted n265135-07bc20e4740d-c, upgraded its packages, unmounted then activated the environment, then restarted the OS. 

If I recall correctly: the panic occurred very close to the end of the shutdown routine.

----

sysutils/panicmail

panicmail.1 and panicmail.1.enc exist at /var/crash but not yet in my Gmail Inbox.

----

Unread portion of the kernel message buffer:
panic: vm_page_assert_xbusied: page 0xfffffe0001beaed8 busy_lock 0xfffffffe not owned by me @ /usr/src/sys/vm/vm_page.c:1181
cpuid = 3
time = 1694022432
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0106209ac0
vpanic() at vpanic+0x132/frame 0xfffffe0106209bf0
panic() at panic+0x43/frame 0xfffffe0106209c50
vm_page_xunbusy_hard_unchecked() at vm_page_xunbusy_hard_unchecked/frame 0xfffffe0106209c60
swapoff_one() at swapoff_one+0x3db/frame 0xfffffe0106209d00
kern_swapoff() at kern_swapoff+0x1bc/frame 0xfffffe0106209e00
amd64_syscall() at amd64_syscall+0x138/frame 0xfffffe0106209f30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe0106209f30
--- syscall (582, FreeBSD ELF64, swapoff), rip = 0x1aab74d9745a, rsp = 0x1aab73279628, rbp = 0x1aab73279760 ---
KDB: enter: panic
Uptime: 14h51m24s
Dumping 1606 out of 16244 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%

__curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:57
57              __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu,
(kgdb) #0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:57
#1  doadump (textdump=textdump@entry=1)
    at /usr/src/sys/kern/kern_shutdown.c:405
#2  0xffffffff80b57f60 in kern_reboot (howto=260)
    at /usr/src/sys/kern/kern_shutdown.c:526
#3  0xffffffff80b5845f in vpanic (
    fmt=0xffffffff8127f6c1 "vm_page_assert_xbusied: page %p busy_lock %#x not owned by me @ %s:%d", ap=ap@entry=0xfffffe0106209c30)
    at /usr/src/sys/kern/kern_shutdown.c:970
#4  0xffffffff80b58203 in panic (fmt=<unavailable>)
    at /usr/src/sys/kern/kern_shutdown.c:894
#5  0xffffffff80f0d680 in vm_page_xunbusy_hard (m=<unavailable>)
    at /usr/src/sys/vm/vm_page.c:1181
#6  0xffffffff80ee4f9b in swap_pager_swapoff_object (sp=0xfffff8015fb30980, 
    object=0xfffff8002ffd6420) at /usr/src/sys/vm/swap_pager.c:1895
#7  swap_pager_swapoff (sp=0xfffff8015fb30980)
    at /usr/src/sys/vm/swap_pager.c:1953
#8  swapoff_one (sp=sp@entry=0xfffff8015fb30980, cred=<optimized out>, 
    flags=flags@entry=0) at /usr/src/sys/vm/swap_pager.c:2573
#9  0xffffffff80ee49fc in kern_swapoff (td=0xfffffe01031cf020, 
    name=<optimized out>, name_seg=UIO_USERSPACE, flags=0)
    at /usr/src/sys/vm/swap_pager.c:2505
#10 0xffffffff8105e748 in syscallenter (td=<optimized out>)
    at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:187
#11 amd64_syscall (td=0xfffffe01031cf020, traced=0)
    at /usr/src/sys/amd64/amd64/trap.c:1197
#12 <signal handler called>
#13 0x00001aab74d9745a in ?? ()
Backtrace stopped: Cannot access memory at address 0x1aab73279628
(kgdb) 

------------------------------------------------------------------------
ps -axlww

…
Comment 1 Graham Perrin 2023-09-07 06:14:31 UTC
Duplicate of bug 272403?
Comment 2 Mark Johnston freebsd_committer freebsd_triage 2023-10-01 15:42:36 UTC
Do you still have the kernel dump available?  It would be useful to see the output of

(kgdb) p *(vm_page_t)0xfffffe0001beaed8
Comment 3 Mark Johnston freebsd_committer freebsd_triage 2023-10-01 15:47:12 UTC
*** Bug 272403 has been marked as a duplicate of this bug. ***
Comment 4 Mark Johnston freebsd_committer freebsd_triage 2023-10-01 15:58:13 UTC
I suspect that this would be fixed by https://reviews.freebsd.org/D42029
Comment 5 Graham Perrin 2023-10-02 04:04:48 UTC
(In reply to Mark Johnston from comment #2)

root@mowa219-gjp4-8570p-freebsd:~ # kgdb -c /var/crash/vmcore.1
kgdb: couldn't find a suitable kernel image
root@mowa219-gjp4-8570p-freebsd:~ # 


I guess, the command will succeed if I temporarily boot either of the two boot environments that were indicated in opening comment 0. True?
Comment 6 Mark Johnston freebsd_committer freebsd_triage 2023-10-02 11:46:14 UTC
(In reply to Graham Perrin from comment #5)
I'd guess so.  But you don't really need to boot them, you can just mount the BE somewhere and either point kgdb at the kernel, or chroot into the BE.

At this point though I think it's not necessary anymore, I found a bug which can cause this panic.
Comment 7 commit-hook freebsd_committer freebsd_triage 2023-10-02 11:51:16 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=e61568aeeec7667789e6c9d4837e074edecc990e

commit e61568aeeec7667789e6c9d4837e074edecc990e
Author:     Mark Johnston <markj@FreeBSD.org>
AuthorDate: 2023-10-02 11:49:27 +0000
Commit:     Mark Johnston <markj@FreeBSD.org>
CommitDate: 2023-10-02 11:49:52 +0000

    swap_pager: Fix a race in swap_pager_swapoff_object()

    When we disable swapping to a device, we scan the full VM object list
    looking for objects with swap trie nodes that reference the device in
    question.  The pages corresponding to those nodes are paged in.

    While paging in, we drop the VM object lock.  Moreover, we do not hold a
    reference for the object; swap_pager_swapoff_object() merely bumps the
    paging-in-progress counter.  vm_object_terminate() waits for this
    counter to drain before proceeding and freeing pages.

    However, swap_pager_swapoff_object() decrements the counter before
    re-acquiring the VM object lock, which means that vm_object_terminate()
    can race to acquire the lock and free the pages.  Then,
    swap_pager_swapoff_object() ends up unbusying a freed page.  Fix the
    problem by acquiring the lock before waking up sleepers.

    PR:             273610
    Reported by:    Graham Perrin <grahamperrin@gmail.com>
    Reviewed by:    kib
    MFC after:      1 week
    Differential Revision:  https://reviews.freebsd.org/D42029

 sys/vm/swap_pager.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
Comment 8 commit-hook freebsd_committer freebsd_triage 2023-10-09 00:59:54 UTC
A commit in branch stable/14 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=aa229a59adeaf49517183c8117a239e2b68012f5

commit aa229a59adeaf49517183c8117a239e2b68012f5
Author:     Mark Johnston <markj@FreeBSD.org>
AuthorDate: 2023-10-02 11:49:27 +0000
Commit:     Mark Johnston <markj@FreeBSD.org>
CommitDate: 2023-10-09 00:41:35 +0000

    swap_pager: Fix a race in swap_pager_swapoff_object()

    When we disable swapping to a device, we scan the full VM object list
    looking for objects with swap trie nodes that reference the device in
    question.  The pages corresponding to those nodes are paged in.

    While paging in, we drop the VM object lock.  Moreover, we do not hold a
    reference for the object; swap_pager_swapoff_object() merely bumps the
    paging-in-progress counter.  vm_object_terminate() waits for this
    counter to drain before proceeding and freeing pages.

    However, swap_pager_swapoff_object() decrements the counter before
    re-acquiring the VM object lock, which means that vm_object_terminate()
    can race to acquire the lock and free the pages.  Then,
    swap_pager_swapoff_object() ends up unbusying a freed page.  Fix the
    problem by acquiring the lock before waking up sleepers.

    PR:             273610
    Reported by:    Graham Perrin <grahamperrin@gmail.com>
    Reviewed by:    kib
    MFC after:      1 week
    Differential Revision:  https://reviews.freebsd.org/D42029

    (cherry picked from commit e61568aeeec7667789e6c9d4837e074edecc990e)

 sys/vm/swap_pager.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
Comment 9 commit-hook freebsd_committer freebsd_triage 2023-10-09 00:59:58 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=d8cb6c173417f47b2337c12ab662a13c6e147789

commit d8cb6c173417f47b2337c12ab662a13c6e147789
Author:     Mark Johnston <markj@FreeBSD.org>
AuthorDate: 2023-10-02 11:49:27 +0000
Commit:     Mark Johnston <markj@FreeBSD.org>
CommitDate: 2023-10-09 00:42:30 +0000

    swap_pager: Fix a race in swap_pager_swapoff_object()

    When we disable swapping to a device, we scan the full VM object list
    looking for objects with swap trie nodes that reference the device in
    question.  The pages corresponding to those nodes are paged in.

    While paging in, we drop the VM object lock.  Moreover, we do not hold a
    reference for the object; swap_pager_swapoff_object() merely bumps the
    paging-in-progress counter.  vm_object_terminate() waits for this
    counter to drain before proceeding and freeing pages.

    However, swap_pager_swapoff_object() decrements the counter before
    re-acquiring the VM object lock, which means that vm_object_terminate()
    can race to acquire the lock and free the pages.  Then,
    swap_pager_swapoff_object() ends up unbusying a freed page.  Fix the
    problem by acquiring the lock before waking up sleepers.

    PR:             273610
    Reported by:    Graham Perrin <grahamperrin@gmail.com>
    Reviewed by:    kib
    MFC after:      1 week
    Differential Revision:  https://reviews.freebsd.org/D42029

    (cherry picked from commit e61568aeeec7667789e6c9d4837e074edecc990e)

 sys/vm/swap_pager.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
Comment 10 commit-hook freebsd_committer freebsd_triage 2023-10-09 18:13:44 UTC
A commit in branch releng/14.0 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=6f35c2380737fbef590ed48ed0669eebd1656287

commit 6f35c2380737fbef590ed48ed0669eebd1656287
Author:     Mark Johnston <markj@FreeBSD.org>
AuthorDate: 2023-10-02 11:49:27 +0000
Commit:     Mark Johnston <markj@FreeBSD.org>
CommitDate: 2023-10-09 18:07:02 +0000

    swap_pager: Fix a race in swap_pager_swapoff_object()

    When we disable swapping to a device, we scan the full VM object list
    looking for objects with swap trie nodes that reference the device in
    question.  The pages corresponding to those nodes are paged in.

    While paging in, we drop the VM object lock.  Moreover, we do not hold a
    reference for the object; swap_pager_swapoff_object() merely bumps the
    paging-in-progress counter.  vm_object_terminate() waits for this
    counter to drain before proceeding and freeing pages.

    However, swap_pager_swapoff_object() decrements the counter before
    re-acquiring the VM object lock, which means that vm_object_terminate()
    can race to acquire the lock and free the pages.  Then,
    swap_pager_swapoff_object() ends up unbusying a freed page.  Fix the
    problem by acquiring the lock before waking up sleepers.

    Approved by:    re (gjb)
    PR:             273610
    Reported by:    Graham Perrin <grahamperrin@gmail.com>
    Reviewed by:    kib
    MFC after:      1 week
    Differential Revision:  https://reviews.freebsd.org/D42029

    (cherry picked from commit e61568aeeec7667789e6c9d4837e074edecc990e)
    (cherry picked from commit aa229a59adeaf49517183c8117a239e2b68012f5)

 sys/vm/swap_pager.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)