Bug 282994

Summary: Repeated kernel panics
Product: Base System Reporter: jSML4ThWwBID69YC
Component: kernAssignee: Mark Johnston <markj>
Status: Closed FIXED    
Severity: Affects Some People CC: grahamperrin, jSML4ThWwBID69YC, markj, pat
Priority: --- Keywords: crash
Version: 14.2-RELEASE   
Hardware: amd64   
OS: Any   

Description jSML4ThWwBID69YC 2024-11-27 00:01:51 UTC
Hello, 
 
I've run into an issue where multiple FreeBSD 14.1-p6 and 14.2-RC1 servers are crashing with a kernel panic and rebooting. This happens multiple times a day on each server. Sometimes it takes a few hours, sometimes it crashes in minutes of startup. 

Please see the 14.2-RC1 dump below. 

--------------------------------------------------------------------------------
Reading symbols from /boot/kernel/kernel...
Reading symbols from /usr/lib/debug//boot/kernel/kernel.debug...

Unread portion of the kernel message buffer:

Fatal trap 12: page fault while in kernel mode
cpuid = 6; apic id = 12
fault virtual address	= 0xd0
fault code		= supervisor write data, page not present
instruction pointer	= 0x20:0xffffffff80bdc2f2
stack pointer	        = 0x28:0xfffffe03505a1d30
frame pointer	        = 0x28:0xfffffe03505a1d60
code segment		= base rx0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 81514 (php)
rdi: fffff8010000d900 rsi: fffff810d3d97210 rdx: fffff8150afe4c00
rcx: fffff8010000dd90  r8: 000000000000001a  r9: 0000000000000000
rax: 0000000000000000 rbx: fffff80cc151c500 rbp: fffffe03505a1d60
r10: 0000000000000000 r11: 0000000000000001 r12: 0000000000000180
r13: 00000000f4cf7018 r14: fffff802fc69b800 r15: 0000000000000000
trap number		= 12
panic: page fault
cpuid = 6
time = 1732641372
KDB: stack backtrace:
#0 0xffffffff80b8b89d at kdb_backtrace+0x5d
#1 0xffffffff80b3dc01 at vpanic+0x131
#2 0xffffffff80b3dac3 at panic+0x43
#3 0xffffffff81025a0b at trap_fatal+0x40b
#4 0xffffffff81025a56 at trap_pfault+0x46
#5 0xffffffff80ffc398 at calltrap+0x8
#6 0xffffffff80bdcbe3 at kern_shm_open2+0x443
#7 0xffffffff80bddc31 at sys_shm_open2+0x21
#8 0xffffffff810262c5 at amd64_syscall+0x115
#9 0xffffffff80ffccab at fast_syscall_common+0xf8
Uptime: 1h33m33s
Dumping 8536 out of 163689 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%

Reading symbols from /boot/kernel/accf_http.ko...
Reading symbols from /usr/lib/debug//boot/kernel/accf_http.ko.debug...
Reading symbols from /boot/kernel/accf_data.ko...
Reading symbols from /usr/lib/debug//boot/kernel/accf_data.ko.debug...
Reading symbols from /boot/kernel/zfs.ko...
Reading symbols from /usr/lib/debug//boot/kernel/zfs.ko.debug...
Reading symbols from /boot/kernel/accf_dns.ko...
Reading symbols from /usr/lib/debug//boot/kernel/accf_dns.ko.debug...
Reading symbols from /boot/kernel/geom_mirror.ko...
Reading symbols from /usr/lib/debug//boot/kernel/geom_mirror.ko.debug...
Reading symbols from /boot/kernel/cryptodev.ko...
Reading symbols from /usr/lib/debug//boot/kernel/cryptodev.ko.debug...
Reading symbols from /boot/kernel/mac_seeotheruids.ko...
Reading symbols from /usr/lib/debug//boot/kernel/mac_seeotheruids.ko.debug...
Reading symbols from /boot/kernel/coretemp.ko...
Reading symbols from /usr/lib/debug//boot/kernel/coretemp.ko.debug...
Reading symbols from /boot/kernel/mlx4en.ko...
Reading symbols from /usr/lib/debug//boot/kernel/mlx4en.ko.debug...
Reading symbols from /boot/kernel/mlx4.ko...
Reading symbols from /usr/lib/debug//boot/kernel/mlx4.ko.debug...
Reading symbols from /boot/kernel/pf.ko...
Reading symbols from /usr/lib/debug//boot/kernel/pf.ko.debug...
Reading symbols from /boot/kernel/fusefs.ko...
Reading symbols from /usr/lib/debug//boot/kernel/fusefs.ko.debug...
Reading symbols from /boot/kernel/acpi_wmi.ko...
Reading symbols from /usr/lib/debug//boot/kernel/acpi_wmi.ko.debug...
Reading symbols from /boot/kernel/if_lagg.ko...
Reading symbols from /usr/lib/debug//boot/kernel/if_lagg.ko.debug...
Reading symbols from /boot/kernel/if_infiniband.ko...
Reading symbols from /usr/lib/debug//boot/kernel/if_infiniband.ko.debug...
Reading symbols from /boot/kernel/if_bridge.ko...
Reading symbols from /usr/lib/debug//boot/kernel/if_bridge.ko.debug...
Reading symbols from /boot/kernel/bridgestp.ko...
Reading symbols from /usr/lib/debug//boot/kernel/bridgestp.ko.debug...
Reading symbols from /boot/kernel/mac_ntpd.ko...
Reading symbols from /usr/lib/debug//boot/kernel/mac_ntpd.ko.debug...
Reading symbols from /boot/kernel/nullfs.ko...
--Type <RET> for more, q to quit, c to continue without paging--
Reading symbols from /usr/lib/debug//boot/kernel/nullfs.ko.debug...
Reading symbols from /boot/kernel/if_epair.ko...
Reading symbols from /usr/lib/debug//boot/kernel/if_epair.ko.debug...
__curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:57
57		__asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu,

(kgdb) frame 8
#8  shm_alloc (ucred=0xfffff80cc151c500, mode=mode@entry=384, largepage=<optimized out>) at /usr/src/sys/kern/uipc_shm.c:954
954			obj->un_pager.swp.swp_priv = shmfd;
(kgdb) p vfs_hash_tbl
$1 = (struct vfs_hash_head *) 0xfffffe0109b12000
(kgdb) p mp
No symbol "mp" in current context.
(kgdb) p hash
No symbol "hash" in current context.
(kgdb) p *mp
No symbol "mp" in current context.
(kgdb) 
------------------

The servers this is happening on all have a few things in common. 

1. They all access data storage over MooseFS. 
2. They all use fusefs-libs3-3.16.2 and moosefs3-client-3.0.117_1
3. They all have many nullfs mounts. 5k+
4: All user process are run inside standard thick jails. 
5. They all use RCTL to control resources used by users.
6. They all run many hundreds of processes owned by different users. All the process access data over the network, including the data files which are distributed storage. 

Please let me know if here is anything I can do to help.
Comment 1 jSML4ThWwBID69YC 2024-11-27 15:37:55 UTC
Adding to this, I noticed that affected systems show other signs of trouble. 

1. The 'top' command takes thirty to sixty seconds to launch or longer in some cases. 

2. Nullfs filesystem mounts take several minutes to apply. They used to happen in the blink of an eye. 

The systems have plenty of free memory, cpu, and low disk i/o. I suspect the cause of the slow down is not resource related, but something else in the kernel. 

Of note, the affected systems are running two different CPU types. Two are Intel, and two are AMD Epyc.
Comment 2 Graham Perrin 2024-11-29 03:56:21 UTC
(In reply to jSML4ThWwBID69YC from comment #0)

> 1. They all access data storage over MooseFS. 
> 2. They all use fusefs-libs3-3.16.2 and moosefs3-client-3.0.117_1
> …

Noted: 

FreeBSD ports packaging for MooseFS 4x · Issue #592 · moosefs/moosefs
<https://github.com/moosefs/moosefs/issues/592>
Comment 3 Mark Johnston freebsd_committer freebsd_triage 2024-11-29 15:27:05 UTC
> 5. They all use RCTL to control resources used by users.

The problem is that shm_alloc() assumes that object allocation won't fail, but this is false:
- racct rules which restrict swap usage can cause swap_reserve_by_cred() to fail;
- some overcommit modes limit swap reservations such that swap_reserve_by_cred() can fail.

I suspect that changing your racct rules will work around the problem.  RACCT_SWAP works by limiting the maximum number and size of swap-backed objects, rather than swap device usage.  In particular, it treats swap object allocation as a "reservation" and that counts against the limit before swap space is actually used.  I'm a bit skeptical that this implementation is very useful.

I wrote a patch which prevents the crash, but note that you'll get shm object allocation failures instead, which might cause all kinds of problems: https://reviews.freebsd.org/D47839

(In reply to jSML4ThWwBID69YC from comment #1)
Changing your racct rules might address these problems as well.
Comment 4 jSML4ThWwBID69YC 2024-11-29 16:11:10 UTC
(In reply to Mark Johnston from comment #3)

Thank you! 

There are rctl rules in place for every user account, exempting root and system services. Here's an example. 

# USERID < replace with actual uid.

user:USERID:pcpu:deny=100/user
user:USERID:maxproc:deny=50/user
user:USERID:memoryuse:deny=1024MB/user
user:USERID:swapuse:deny=1024MB/user
user:USERID:readbps:throttle=200MB/user
user:USERID:writebps:throttle=200MB/user

Are the other symptoms also due to this? These happen randomly even when running as root with no rctl rules applied. 

- top takes thirty seconds to several minutes to load. 
- ps -aux takes several minutes to return results. 
- Nullfs mounts take several minutes to apply. 

The above issues started at the same time as the panics. All the affected systems are running 14.2-RC1. They were previously running 14.1-p6 with the same issue. My guess was it had something to do with FreeBSD-SA-24:14.umtx.

> I wrote a patch which prevents the crash, but note that you'll get shm object allocation failures instead, which might cause all kinds of problems: https://reviews.freebsd.org/D47839

`all kinds of problems` sounds scary. I'm getting the impression that rctl is no longer safe to use, but what other options are there for resource control? Is it just the swapuse that's an issue? 

I'll try pulling the swap rules and see if it changes anything.
Comment 5 Mark Johnston freebsd_committer freebsd_triage 2024-11-29 16:19:03 UTC
> `all kinds of problems` sounds scary.

Well, I just meant that without the patch, hitting the swapuse limit could cause a panic; with the patch, it'll cause shm_alloc() to fail.  In the backtrace you pasted, the running process was php, so with the patch applied, either some php script or the runtime would have hit an error instead of triggering a panic.  Maybe it can tolerate such errors just fine, but I don't know - that's all I meant by "all kinds of problems".

> These happen randomly even when running as root with no rctl rules applied. 
> - top takes thirty seconds to several minutes to load. 
> - ps -aux takes several minutes to return results. 
> - Nullfs mounts take several minutes to apply.

Are you sure that no rctl rules are applied there?

If you sent ctrl-T to the terminal while top/ps/mount are running, what gets printed?
Comment 6 jSML4ThWwBID69YC 2024-11-29 16:35:14 UTC
(In reply to Mark Johnston from comment #5)

A user-land program crashing because of not having enough resources is acceptable. I can always add more resources when needed. It's a hosting environment with non-trusted users, so resource controls are pretty important to maintain system stability. 

The rctl rules only apply to user ids 5000+. Everything below that is not covered by rctl. Here's an example of top being slow to launch. 

root@compute03:~ # top
load: 18.74  cmd: top 61547 [sysctl mem] 2.25r 0.00u 0.00s 0% 3212k
load: 18.12  cmd: top 61547 [sysctl mem] 3.21r 0.00u 0.00s 0% 3212k
load: 18.12  cmd: top 61547 [sysctl mem] 3.77r 0.00u 0.00s 0% 3212k
load: 18.12  cmd: top 61547 [sysctl mem] 4.54r 0.00u 0.00s 0% 3212k
load: 18.12  cmd: top 61547 [sysctl mem] 4.88r 0.00u 0.00s 0% 3212k
load: 18.12  cmd: top 61547 [sysctl mem] 5.40r 0.00u 0.00s 0% 3212k
load: 18.12  cmd: top 61547 [sysctl mem] 6.35r 0.00u 0.00s 0% 3212k
load: 18.12  cmd: top 61547 [sysctl mem] 6.57r 0.00u 0.00s 0% 3212k
load: 18.12  cmd: top 61547 [sysctl mem] 7.15r 0.00u 0.00s 0% 3212k
load: 17.63  cmd: top 61547 [sysctl mem] 9.09r 0.00u 0.00s 0% 3212k
load: 17.63  cmd: top 61547 [sysctl mem] 9.34r 0.00u 0.00s 0% 3212k
load: 17.63  cmd: top 61547 [sysctl mem] 9.80r 0.00u 0.00s 0% 3212k
load: 17.63  cmd: top 61547 [sysctl mem] 10.13r 0.00u 0.00s 0% 3212k
load: 17.63  cmd: top 61547 [sysctl mem] 10.69r 0.00u 0.00s 0% 3212k
load: 17.63  cmd: top 61547 [sysctl mem] 11.16r 0.00u 0.00s 0% 3212k
load: 17.63  cmd: top 61547 [sysctl mem] 11.82r 0.00u 0.00s 0% 3212k
load: 17.63  cmd: top 61547 [sysctl mem] 12.06r 0.00u 0.00s 0% 3212k
load: 17.63  cmd: top 61547 [sysctl mem] 12.42r 0.00u 0.00s 0% 3212k
load: 17.63  cmd: top 61547 [sysctl mem] 12.98r 0.00u 0.00s 0% 3212k
load: 17.34  cmd: top 61547 [sysctl mem] 13.61r 0.00u 0.00s 0% 3212k
load: 17.34  cmd: top 61547 [sysctl mem] 14.09r 0.00u 0.00s 0% 3212k
load: 17.34  cmd: top 61547 [sysctl mem] 14.57r 0.00u 0.00s 0% 3212k
load: 17.34  cmd: top 61547 [sysctl mem] 15.84r 0.00u 0.00s 0% 3212k
load: 17.34  cmd: top 61547 [sysctl mem] 18.32r 0.00u 0.00s 0% 3212k
load: 17.34  cmd: top 61547 [sysctl mem] 18.61r 0.00u 0.00s 0% 3212k
load: 16.83  cmd: top 61547 [sysctl mem] 19.69r 0.00u 0.00s 0% 3212k
load: 16.83  cmd: top 61547 [sysctl mem] 20.07r 0.00u 0.00s 0% 3212k
load: 16.83  cmd: top 61547 [sysctl mem] 20.48r 0.00u 0.00s 0% 3212k
load: 16.83  cmd: top 61547 [sysctl mem] 20.89r 0.00u 0.00s 0% 3212k
load: 16.83  cmd: top 61547 [sysctl mem] 21.59r 0.00u 0.00s 0% 3212k
load: 16.83  cmd: top 61547 [sysctl mem] 22.37r 0.00u 0.00s 0% 3212k
load: 16.83  cmd: top 61547 [sysctl mem] 23.09r 0.00u 0.00s 0% 3212k
load: 16.83  cmd: top 61547 [sysctl mem] 23.82r 0.00u 0.00s 0% 3212k
Comment 7 Mark Johnston freebsd_committer freebsd_triage 2024-11-29 17:15:49 UTC
(In reply to jSML4ThWwBID69YC from comment #6)
This is caused by a different thread blocking within a sysctl handler.  It's hard to say what might be causing that.  (Normally, "procstat -kka" would give a clue there, but that is also implemented by sysctl and thus subject to the same problem, so: https://reviews.freebsd.org/D47842)
Comment 8 jSML4ThWwBID69YC 2024-11-29 17:26:37 UTC
(In reply to Mark Johnston from comment #7)

Thank you. I'm rebuilding the affected kernels with both patches.
Comment 9 jSML4ThWwBID69YC 2024-11-29 18:41:29 UTC
https://reviews.freebsd.org/D47839 is causing kernel panics on jail startup. I'm trying to get a crash dump now.
Comment 10 Mark Johnston freebsd_committer freebsd_triage 2024-11-29 18:53:33 UTC
(In reply to jSML4ThWwBID69YC from comment #9)
It is probably best to hold off on testing that patch for now, until the code review is finished.
Comment 11 jSML4ThWwBID69YC 2024-11-29 19:05:01 UTC
(In reply to Mark Johnston from comment #10)

Here's the kernel panic with that patch applied. 


```
Reading symbols from /boot/kernel/kernel...
Reading symbols from /usr/lib/debug//boot/kernel/kernel.debug...

Unread portion of the kernel message buffer:
panic: Duplicate free of 0xfffff8195a354400 from zone 0xfffffe014be04a80(malloc-1024) slab 0xfffff818838c1c58(1)
cpuid = 11
time = 1732905479
KDB: stack backtrace:
#0 0xffffffff80b7e5ed at kdb_backtrace+0x5d
#1 0xffffffff80b31061 at vpanic+0x131
#2 0xffffffff80b30e83 at panic+0x43
#3 0xffffffff80eb8163 at uma_dbg_free+0x103
#4 0xffffffff80eb0c46 at uma_zfree_arg+0x96
#5 0xffffffff80b048b5 at free+0xb5
#6 0xffffffff80bd1215 at sys_shm_unlink+0xc5
#7 0xffffffff8102fa28 at amd64_syscall+0x158
#8 0xffffffff810026cb at fast_syscall_common+0xf8
Uptime: 5m2s
Dumping 6677 out of 196467 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%

Reading symbols from /boot/kernel/accf_dns.ko...
Reading symbols from /usr/lib/debug//boot/kernel/accf_dns.ko.debug...
Reading symbols from /boot/kernel/cryptodev.ko...
Reading symbols from /usr/lib/debug//boot/kernel/cryptodev.ko.debug...
Reading symbols from /boot/kernel/accf_data.ko...
Reading symbols from /usr/lib/debug//boot/kernel/accf_data.ko.debug...
Reading symbols from /boot/kernel/mac_seeotheruids.ko...
Reading symbols from /usr/lib/debug//boot/kernel/mac_seeotheruids.ko.debug...
Reading symbols from /boot/kernel/accf_http.ko...
Reading symbols from /usr/lib/debug//boot/kernel/accf_http.ko.debug...
Reading symbols from /boot/kernel/zfs.ko...
Reading symbols from /usr/lib/debug//boot/kernel/zfs.ko.debug...
Reading symbols from /boot/kernel/geom_mirror.ko...
Reading symbols from /usr/lib/debug//boot/kernel/geom_mirror.ko.debug...
Reading symbols from /boot/kernel/geom_eli.ko...
Reading symbols from /usr/lib/debug//boot/kernel/geom_eli.ko.debug...
Reading symbols from /boot/kernel/coretemp.ko...
Reading symbols from /usr/lib/debug//boot/kernel/coretemp.ko.debug...
Reading symbols from /boot/kernel/mlx4en.ko...
Reading symbols from /usr/lib/debug//boot/kernel/mlx4en.ko.debug...
Reading symbols from /boot/kernel/mlx4.ko...
Reading symbols from /usr/lib/debug//boot/kernel/mlx4.ko.debug...
Reading symbols from /boot/kernel/fusefs.ko...
Reading symbols from /usr/lib/debug//boot/kernel/fusefs.ko.debug...
Reading symbols from /boot/kernel/acpi_wmi.ko...
Reading symbols from /usr/lib/debug//boot/kernel/acpi_wmi.ko.debug...
Reading symbols from /boot/kernel/ioat.ko...
Reading symbols from /usr/lib/debug//boot/kernel/ioat.ko.debug...
Reading symbols from /boot/kernel/if_lagg.ko...
Reading symbols from /usr/lib/debug//boot/kernel/if_lagg.ko.debug...
Reading symbols from /boot/kernel/if_infiniband.ko...
Reading symbols from /usr/lib/debug//boot/kernel/if_infiniband.ko.debug...
Reading symbols from /boot/kernel/if_bridge.ko...
Reading symbols from /usr/lib/debug//boot/kernel/if_bridge.ko.debug...
Reading symbols from /boot/kernel/bridgestp.ko...
--Type <RET> for more, q to quit, c to continue without paging--
Reading symbols from /usr/lib/debug//boot/kernel/bridgestp.ko.debug...
Reading symbols from /boot/kernel/uhid.ko...
Reading symbols from /usr/lib/debug//boot/kernel/uhid.ko.debug...
Reading symbols from /boot/kernel/ums.ko...
Reading symbols from /usr/lib/debug//boot/kernel/ums.ko.debug...
Reading symbols from /boot/kernel/wmt.ko...
Reading symbols from /usr/lib/debug//boot/kernel/wmt.ko.debug...
Reading symbols from /boot/kernel/mac_ntpd.ko...
Reading symbols from /usr/lib/debug//boot/kernel/mac_ntpd.ko.debug...
Reading symbols from /boot/kernel/nullfs.ko...
Reading symbols from /usr/lib/debug//boot/kernel/nullfs.ko.debug...
Reading symbols from /boot/kernel/if_epair.ko...
Reading symbols from /usr/lib/debug//boot/kernel/if_epair.ko.debug...
__curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:57
57		__asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu,

(kgdb) frame 8
#8  0xffffffff80b048b5 in _free (addr=0xfffff8195a354400, addr@entry=0xffffffff818e5fe0 <M_SHMFD>, mtp=0xffffffff818e5fe0 <M_SHMFD>, dozero=false) at /usr/src/sys/kern/kern_malloc.c:955
955			uma_zfree_arg(zone, addr, slab);
(kgdb) p vfs_hash_tbl
$1 = (struct vfs_hash_head *) 0xfffffe0140280000
(kgdb) p mp
No symbol "mp" in current context.
(kgdb)  p hash
No symbol "hash" in current context.
(kgdb) p *mp
No symbol "mp" in current context.
(kgdb) 
```

I'm more than happy to test any patches, experimental or not. For now, I'll try disabling the rctl swap rules to see if that helps.
Comment 12 commit-hook freebsd_committer freebsd_triage 2024-12-04 18:36:30 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=f3b7dbdad53b31492757417fc1336ed74ec80fd8

commit f3b7dbdad53b31492757417fc1336ed74ec80fd8
Author:     Mark Johnston <markj@FreeBSD.org>
AuthorDate: 2024-12-04 01:04:33 +0000
Commit:     Mark Johnston <markj@FreeBSD.org>
CommitDate: 2024-12-04 16:22:50 +0000

    shm: Handle swap pager allocation failures

    shm_alloc() can fail if swap reservation fails (i.e., vm.overcommit is
    non-zero) or racct is imposing some limits on swap usage.

    PR:             282994
    MFC after:      2 weeks
    Reviewed by:    olce, kib
    Differential Revision:  https://reviews.freebsd.org/D47839

 sys/kern/kern_umtx.c |  8 +++++-
 sys/kern/uipc_shm.c  | 80 ++++++++++++++++++++++++++++++++--------------------
 2 files changed, 57 insertions(+), 31 deletions(-)
Comment 13 Mark Johnston freebsd_committer freebsd_triage 2024-12-04 20:18:46 UTC
(In reply to jSML4ThWwBID69YC from comment #9)
If you could apply the committed patch and the sysctl patch, then try to get "procstat -kka" output as root while the system is apparently hanging, we can try to further diagnose the cause of the hang.
Comment 14 jSML4ThWwBID69YC 2024-12-06 01:26:21 UTC
(In reply to Mark Johnston from comment #13)

In progress. Thank you.
Comment 15 jSML4ThWwBID69YC 2024-12-13 00:50:12 UTC
The systems are no longer crashing with https://reviews.freebsd.org/D47839 applied. This is on 14.2-RELEASE with a patched generic kernel.

Thank you
Comment 16 commit-hook freebsd_committer freebsd_triage 2024-12-18 13:47:22 UTC
A commit in branch stable/14 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=9daf6d64192281f8f438d9df770927d2e599a25c

commit 9daf6d64192281f8f438d9df770927d2e599a25c
Author:     Mark Johnston <markj@FreeBSD.org>
AuthorDate: 2024-12-04 01:04:33 +0000
Commit:     Mark Johnston <markj@FreeBSD.org>
CommitDate: 2024-12-18 13:43:56 +0000

    shm: Handle swap pager allocation failures

    shm_alloc() can fail if swap reservation fails (i.e., vm.overcommit is
    non-zero) or racct is imposing some limits on swap usage.

    PR:             282994
    MFC after:      2 weeks
    Reviewed by:    olce, kib
    Differential Revision:  https://reviews.freebsd.org/D47839

    (cherry picked from commit f3b7dbdad53b31492757417fc1336ed74ec80fd8)

 sys/kern/kern_umtx.c |  8 +++++-
 sys/kern/uipc_shm.c  | 80 ++++++++++++++++++++++++++++++++--------------------
 2 files changed, 57 insertions(+), 31 deletions(-)
Comment 17 commit-hook freebsd_committer freebsd_triage 2024-12-21 19:27:03 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=7d1d9cc440f800858b6ec8dfb5a41c853fc8c36d

commit 7d1d9cc440f800858b6ec8dfb5a41c853fc8c36d
Author:     Mark Johnston <markj@FreeBSD.org>
AuthorDate: 2024-12-21 19:25:32 +0000
Commit:     Mark Johnston <markj@FreeBSD.org>
CommitDate: 2024-12-21 19:25:32 +0000

    sysctl: Do not serialize requests when running as root

    Bugs or unexpected behaviour can cause a user thread to block in a
    sysctl handler for a long time.  "procstat -kka" is the most useful tool
    to see why this might happen, but it can block on sysctlmemlock too.

    Since the purpose of this lock is merely to ensure userspace can't wire
    too much memory, don't require it for requests from privileged threads.

    PR:             282994
    Reviewed by:    kib, jhb
    MFC after:      2 weeks
    Differential Revision:  https://reviews.freebsd.org/D47842

 sys/kern/kern_sysctl.c | 10 ++++++----
 sys/sys/priv.h         |  1 +
 2 files changed, 7 insertions(+), 4 deletions(-)
Comment 18 Mark Johnston freebsd_committer freebsd_triage 2024-12-24 16:35:24 UTC
(In reply to jSML4ThWwBID69YC from comment #15)
The sysctl patch is now also committed, so we can try again to debug those hangs using "procstat -kka" (as root).  Since that's a separate issue from the crash, I'll close this bug report - please open a new one to investigate the hangs.
Comment 19 commit-hook freebsd_committer freebsd_triage 2025-01-04 14:08:24 UTC
A commit in branch stable/14 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=eecdd412ff5b9f2462b9fbad700e301fa420002e

commit eecdd412ff5b9f2462b9fbad700e301fa420002e
Author:     Mark Johnston <markj@FreeBSD.org>
AuthorDate: 2024-12-21 19:25:32 +0000
Commit:     Mark Johnston <markj@FreeBSD.org>
CommitDate: 2025-01-04 13:56:59 +0000

    sysctl: Do not serialize requests when running as root

    Bugs or unexpected behaviour can cause a user thread to block in a
    sysctl handler for a long time.  "procstat -kka" is the most useful tool
    to see why this might happen, but it can block on sysctlmemlock too.

    Since the purpose of this lock is merely to ensure userspace can't wire
    too much memory, don't require it for requests from privileged threads.

    PR:             282994
    Reviewed by:    kib, jhb
    MFC after:      2 weeks
    Differential Revision:  https://reviews.freebsd.org/D47842

    (cherry picked from commit 7d1d9cc440f800858b6ec8dfb5a41c853fc8c36d)

 sys/kern/kern_sysctl.c | 10 ++++++----
 sys/sys/priv.h         |  1 +
 2 files changed, 7 insertions(+), 4 deletions(-)