Summary: | Repeated kernel panics | ||
---|---|---|---|
Product: | Base System | Reporter: | jSML4ThWwBID69YC |
Component: | kern | Assignee: | Mark Johnston <markj> |
Status: | Closed FIXED | ||
Severity: | Affects Some People | CC: | grahamperrin, jSML4ThWwBID69YC, markj, pat |
Priority: | --- | Keywords: | crash |
Version: | 14.2-RELEASE | ||
Hardware: | amd64 | ||
OS: | Any |
Description
jSML4ThWwBID69YC
2024-11-27 00:01:51 UTC
Adding to this, I noticed that affected systems show other signs of trouble. 1. The 'top' command takes thirty to sixty seconds to launch or longer in some cases. 2. Nullfs filesystem mounts take several minutes to apply. They used to happen in the blink of an eye. The systems have plenty of free memory, cpu, and low disk i/o. I suspect the cause of the slow down is not resource related, but something else in the kernel. Of note, the affected systems are running two different CPU types. Two are Intel, and two are AMD Epyc. (In reply to jSML4ThWwBID69YC from comment #0) > 1. They all access data storage over MooseFS. > 2. They all use fusefs-libs3-3.16.2 and moosefs3-client-3.0.117_1 > … Noted: FreeBSD ports packaging for MooseFS 4x · Issue #592 · moosefs/moosefs <https://github.com/moosefs/moosefs/issues/592> > 5. They all use RCTL to control resources used by users. The problem is that shm_alloc() assumes that object allocation won't fail, but this is false: - racct rules which restrict swap usage can cause swap_reserve_by_cred() to fail; - some overcommit modes limit swap reservations such that swap_reserve_by_cred() can fail. I suspect that changing your racct rules will work around the problem. RACCT_SWAP works by limiting the maximum number and size of swap-backed objects, rather than swap device usage. In particular, it treats swap object allocation as a "reservation" and that counts against the limit before swap space is actually used. I'm a bit skeptical that this implementation is very useful. I wrote a patch which prevents the crash, but note that you'll get shm object allocation failures instead, which might cause all kinds of problems: https://reviews.freebsd.org/D47839 (In reply to jSML4ThWwBID69YC from comment #1) Changing your racct rules might address these problems as well. (In reply to Mark Johnston from comment #3) Thank you! There are rctl rules in place for every user account, exempting root and system services. Here's an example. # USERID < replace with actual uid. user:USERID:pcpu:deny=100/user user:USERID:maxproc:deny=50/user user:USERID:memoryuse:deny=1024MB/user user:USERID:swapuse:deny=1024MB/user user:USERID:readbps:throttle=200MB/user user:USERID:writebps:throttle=200MB/user Are the other symptoms also due to this? These happen randomly even when running as root with no rctl rules applied. - top takes thirty seconds to several minutes to load. - ps -aux takes several minutes to return results. - Nullfs mounts take several minutes to apply. The above issues started at the same time as the panics. All the affected systems are running 14.2-RC1. They were previously running 14.1-p6 with the same issue. My guess was it had something to do with FreeBSD-SA-24:14.umtx. > I wrote a patch which prevents the crash, but note that you'll get shm object allocation failures instead, which might cause all kinds of problems: https://reviews.freebsd.org/D47839 `all kinds of problems` sounds scary. I'm getting the impression that rctl is no longer safe to use, but what other options are there for resource control? Is it just the swapuse that's an issue? I'll try pulling the swap rules and see if it changes anything. > `all kinds of problems` sounds scary. Well, I just meant that without the patch, hitting the swapuse limit could cause a panic; with the patch, it'll cause shm_alloc() to fail. In the backtrace you pasted, the running process was php, so with the patch applied, either some php script or the runtime would have hit an error instead of triggering a panic. Maybe it can tolerate such errors just fine, but I don't know - that's all I meant by "all kinds of problems". > These happen randomly even when running as root with no rctl rules applied. > - top takes thirty seconds to several minutes to load. > - ps -aux takes several minutes to return results. > - Nullfs mounts take several minutes to apply. Are you sure that no rctl rules are applied there? If you sent ctrl-T to the terminal while top/ps/mount are running, what gets printed? (In reply to Mark Johnston from comment #5) A user-land program crashing because of not having enough resources is acceptable. I can always add more resources when needed. It's a hosting environment with non-trusted users, so resource controls are pretty important to maintain system stability. The rctl rules only apply to user ids 5000+. Everything below that is not covered by rctl. Here's an example of top being slow to launch. root@compute03:~ # top load: 18.74 cmd: top 61547 [sysctl mem] 2.25r 0.00u 0.00s 0% 3212k load: 18.12 cmd: top 61547 [sysctl mem] 3.21r 0.00u 0.00s 0% 3212k load: 18.12 cmd: top 61547 [sysctl mem] 3.77r 0.00u 0.00s 0% 3212k load: 18.12 cmd: top 61547 [sysctl mem] 4.54r 0.00u 0.00s 0% 3212k load: 18.12 cmd: top 61547 [sysctl mem] 4.88r 0.00u 0.00s 0% 3212k load: 18.12 cmd: top 61547 [sysctl mem] 5.40r 0.00u 0.00s 0% 3212k load: 18.12 cmd: top 61547 [sysctl mem] 6.35r 0.00u 0.00s 0% 3212k load: 18.12 cmd: top 61547 [sysctl mem] 6.57r 0.00u 0.00s 0% 3212k load: 18.12 cmd: top 61547 [sysctl mem] 7.15r 0.00u 0.00s 0% 3212k load: 17.63 cmd: top 61547 [sysctl mem] 9.09r 0.00u 0.00s 0% 3212k load: 17.63 cmd: top 61547 [sysctl mem] 9.34r 0.00u 0.00s 0% 3212k load: 17.63 cmd: top 61547 [sysctl mem] 9.80r 0.00u 0.00s 0% 3212k load: 17.63 cmd: top 61547 [sysctl mem] 10.13r 0.00u 0.00s 0% 3212k load: 17.63 cmd: top 61547 [sysctl mem] 10.69r 0.00u 0.00s 0% 3212k load: 17.63 cmd: top 61547 [sysctl mem] 11.16r 0.00u 0.00s 0% 3212k load: 17.63 cmd: top 61547 [sysctl mem] 11.82r 0.00u 0.00s 0% 3212k load: 17.63 cmd: top 61547 [sysctl mem] 12.06r 0.00u 0.00s 0% 3212k load: 17.63 cmd: top 61547 [sysctl mem] 12.42r 0.00u 0.00s 0% 3212k load: 17.63 cmd: top 61547 [sysctl mem] 12.98r 0.00u 0.00s 0% 3212k load: 17.34 cmd: top 61547 [sysctl mem] 13.61r 0.00u 0.00s 0% 3212k load: 17.34 cmd: top 61547 [sysctl mem] 14.09r 0.00u 0.00s 0% 3212k load: 17.34 cmd: top 61547 [sysctl mem] 14.57r 0.00u 0.00s 0% 3212k load: 17.34 cmd: top 61547 [sysctl mem] 15.84r 0.00u 0.00s 0% 3212k load: 17.34 cmd: top 61547 [sysctl mem] 18.32r 0.00u 0.00s 0% 3212k load: 17.34 cmd: top 61547 [sysctl mem] 18.61r 0.00u 0.00s 0% 3212k load: 16.83 cmd: top 61547 [sysctl mem] 19.69r 0.00u 0.00s 0% 3212k load: 16.83 cmd: top 61547 [sysctl mem] 20.07r 0.00u 0.00s 0% 3212k load: 16.83 cmd: top 61547 [sysctl mem] 20.48r 0.00u 0.00s 0% 3212k load: 16.83 cmd: top 61547 [sysctl mem] 20.89r 0.00u 0.00s 0% 3212k load: 16.83 cmd: top 61547 [sysctl mem] 21.59r 0.00u 0.00s 0% 3212k load: 16.83 cmd: top 61547 [sysctl mem] 22.37r 0.00u 0.00s 0% 3212k load: 16.83 cmd: top 61547 [sysctl mem] 23.09r 0.00u 0.00s 0% 3212k load: 16.83 cmd: top 61547 [sysctl mem] 23.82r 0.00u 0.00s 0% 3212k (In reply to jSML4ThWwBID69YC from comment #6) This is caused by a different thread blocking within a sysctl handler. It's hard to say what might be causing that. (Normally, "procstat -kka" would give a clue there, but that is also implemented by sysctl and thus subject to the same problem, so: https://reviews.freebsd.org/D47842) (In reply to Mark Johnston from comment #7) Thank you. I'm rebuilding the affected kernels with both patches. https://reviews.freebsd.org/D47839 is causing kernel panics on jail startup. I'm trying to get a crash dump now. (In reply to jSML4ThWwBID69YC from comment #9) It is probably best to hold off on testing that patch for now, until the code review is finished. (In reply to Mark Johnston from comment #10) Here's the kernel panic with that patch applied. ``` Reading symbols from /boot/kernel/kernel... Reading symbols from /usr/lib/debug//boot/kernel/kernel.debug... Unread portion of the kernel message buffer: panic: Duplicate free of 0xfffff8195a354400 from zone 0xfffffe014be04a80(malloc-1024) slab 0xfffff818838c1c58(1) cpuid = 11 time = 1732905479 KDB: stack backtrace: #0 0xffffffff80b7e5ed at kdb_backtrace+0x5d #1 0xffffffff80b31061 at vpanic+0x131 #2 0xffffffff80b30e83 at panic+0x43 #3 0xffffffff80eb8163 at uma_dbg_free+0x103 #4 0xffffffff80eb0c46 at uma_zfree_arg+0x96 #5 0xffffffff80b048b5 at free+0xb5 #6 0xffffffff80bd1215 at sys_shm_unlink+0xc5 #7 0xffffffff8102fa28 at amd64_syscall+0x158 #8 0xffffffff810026cb at fast_syscall_common+0xf8 Uptime: 5m2s Dumping 6677 out of 196467 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% Reading symbols from /boot/kernel/accf_dns.ko... Reading symbols from /usr/lib/debug//boot/kernel/accf_dns.ko.debug... Reading symbols from /boot/kernel/cryptodev.ko... Reading symbols from /usr/lib/debug//boot/kernel/cryptodev.ko.debug... Reading symbols from /boot/kernel/accf_data.ko... Reading symbols from /usr/lib/debug//boot/kernel/accf_data.ko.debug... Reading symbols from /boot/kernel/mac_seeotheruids.ko... Reading symbols from /usr/lib/debug//boot/kernel/mac_seeotheruids.ko.debug... Reading symbols from /boot/kernel/accf_http.ko... Reading symbols from /usr/lib/debug//boot/kernel/accf_http.ko.debug... Reading symbols from /boot/kernel/zfs.ko... Reading symbols from /usr/lib/debug//boot/kernel/zfs.ko.debug... Reading symbols from /boot/kernel/geom_mirror.ko... Reading symbols from /usr/lib/debug//boot/kernel/geom_mirror.ko.debug... Reading symbols from /boot/kernel/geom_eli.ko... Reading symbols from /usr/lib/debug//boot/kernel/geom_eli.ko.debug... Reading symbols from /boot/kernel/coretemp.ko... Reading symbols from /usr/lib/debug//boot/kernel/coretemp.ko.debug... Reading symbols from /boot/kernel/mlx4en.ko... Reading symbols from /usr/lib/debug//boot/kernel/mlx4en.ko.debug... Reading symbols from /boot/kernel/mlx4.ko... Reading symbols from /usr/lib/debug//boot/kernel/mlx4.ko.debug... Reading symbols from /boot/kernel/fusefs.ko... Reading symbols from /usr/lib/debug//boot/kernel/fusefs.ko.debug... Reading symbols from /boot/kernel/acpi_wmi.ko... Reading symbols from /usr/lib/debug//boot/kernel/acpi_wmi.ko.debug... Reading symbols from /boot/kernel/ioat.ko... Reading symbols from /usr/lib/debug//boot/kernel/ioat.ko.debug... Reading symbols from /boot/kernel/if_lagg.ko... Reading symbols from /usr/lib/debug//boot/kernel/if_lagg.ko.debug... Reading symbols from /boot/kernel/if_infiniband.ko... Reading symbols from /usr/lib/debug//boot/kernel/if_infiniband.ko.debug... Reading symbols from /boot/kernel/if_bridge.ko... Reading symbols from /usr/lib/debug//boot/kernel/if_bridge.ko.debug... Reading symbols from /boot/kernel/bridgestp.ko... --Type <RET> for more, q to quit, c to continue without paging-- Reading symbols from /usr/lib/debug//boot/kernel/bridgestp.ko.debug... Reading symbols from /boot/kernel/uhid.ko... Reading symbols from /usr/lib/debug//boot/kernel/uhid.ko.debug... Reading symbols from /boot/kernel/ums.ko... Reading symbols from /usr/lib/debug//boot/kernel/ums.ko.debug... Reading symbols from /boot/kernel/wmt.ko... Reading symbols from /usr/lib/debug//boot/kernel/wmt.ko.debug... Reading symbols from /boot/kernel/mac_ntpd.ko... Reading symbols from /usr/lib/debug//boot/kernel/mac_ntpd.ko.debug... Reading symbols from /boot/kernel/nullfs.ko... Reading symbols from /usr/lib/debug//boot/kernel/nullfs.ko.debug... Reading symbols from /boot/kernel/if_epair.ko... Reading symbols from /usr/lib/debug//boot/kernel/if_epair.ko.debug... __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:57 57 __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu, (kgdb) frame 8 #8 0xffffffff80b048b5 in _free (addr=0xfffff8195a354400, addr@entry=0xffffffff818e5fe0 <M_SHMFD>, mtp=0xffffffff818e5fe0 <M_SHMFD>, dozero=false) at /usr/src/sys/kern/kern_malloc.c:955 955 uma_zfree_arg(zone, addr, slab); (kgdb) p vfs_hash_tbl $1 = (struct vfs_hash_head *) 0xfffffe0140280000 (kgdb) p mp No symbol "mp" in current context. (kgdb) p hash No symbol "hash" in current context. (kgdb) p *mp No symbol "mp" in current context. (kgdb) ``` I'm more than happy to test any patches, experimental or not. For now, I'll try disabling the rctl swap rules to see if that helps. A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=f3b7dbdad53b31492757417fc1336ed74ec80fd8 commit f3b7dbdad53b31492757417fc1336ed74ec80fd8 Author: Mark Johnston <markj@FreeBSD.org> AuthorDate: 2024-12-04 01:04:33 +0000 Commit: Mark Johnston <markj@FreeBSD.org> CommitDate: 2024-12-04 16:22:50 +0000 shm: Handle swap pager allocation failures shm_alloc() can fail if swap reservation fails (i.e., vm.overcommit is non-zero) or racct is imposing some limits on swap usage. PR: 282994 MFC after: 2 weeks Reviewed by: olce, kib Differential Revision: https://reviews.freebsd.org/D47839 sys/kern/kern_umtx.c | 8 +++++- sys/kern/uipc_shm.c | 80 ++++++++++++++++++++++++++++++++-------------------- 2 files changed, 57 insertions(+), 31 deletions(-) (In reply to jSML4ThWwBID69YC from comment #9) If you could apply the committed patch and the sysctl patch, then try to get "procstat -kka" output as root while the system is apparently hanging, we can try to further diagnose the cause of the hang. (In reply to Mark Johnston from comment #13) In progress. Thank you. The systems are no longer crashing with https://reviews.freebsd.org/D47839 applied. This is on 14.2-RELEASE with a patched generic kernel. Thank you A commit in branch stable/14 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=9daf6d64192281f8f438d9df770927d2e599a25c commit 9daf6d64192281f8f438d9df770927d2e599a25c Author: Mark Johnston <markj@FreeBSD.org> AuthorDate: 2024-12-04 01:04:33 +0000 Commit: Mark Johnston <markj@FreeBSD.org> CommitDate: 2024-12-18 13:43:56 +0000 shm: Handle swap pager allocation failures shm_alloc() can fail if swap reservation fails (i.e., vm.overcommit is non-zero) or racct is imposing some limits on swap usage. PR: 282994 MFC after: 2 weeks Reviewed by: olce, kib Differential Revision: https://reviews.freebsd.org/D47839 (cherry picked from commit f3b7dbdad53b31492757417fc1336ed74ec80fd8) sys/kern/kern_umtx.c | 8 +++++- sys/kern/uipc_shm.c | 80 ++++++++++++++++++++++++++++++++-------------------- 2 files changed, 57 insertions(+), 31 deletions(-) A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=7d1d9cc440f800858b6ec8dfb5a41c853fc8c36d commit 7d1d9cc440f800858b6ec8dfb5a41c853fc8c36d Author: Mark Johnston <markj@FreeBSD.org> AuthorDate: 2024-12-21 19:25:32 +0000 Commit: Mark Johnston <markj@FreeBSD.org> CommitDate: 2024-12-21 19:25:32 +0000 sysctl: Do not serialize requests when running as root Bugs or unexpected behaviour can cause a user thread to block in a sysctl handler for a long time. "procstat -kka" is the most useful tool to see why this might happen, but it can block on sysctlmemlock too. Since the purpose of this lock is merely to ensure userspace can't wire too much memory, don't require it for requests from privileged threads. PR: 282994 Reviewed by: kib, jhb MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D47842 sys/kern/kern_sysctl.c | 10 ++++++---- sys/sys/priv.h | 1 + 2 files changed, 7 insertions(+), 4 deletions(-) (In reply to jSML4ThWwBID69YC from comment #15) The sysctl patch is now also committed, so we can try again to debug those hangs using "procstat -kka" (as root). Since that's a separate issue from the crash, I'll close this bug report - please open a new one to investigate the hangs. A commit in branch stable/14 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=eecdd412ff5b9f2462b9fbad700e301fa420002e commit eecdd412ff5b9f2462b9fbad700e301fa420002e Author: Mark Johnston <markj@FreeBSD.org> AuthorDate: 2024-12-21 19:25:32 +0000 Commit: Mark Johnston <markj@FreeBSD.org> CommitDate: 2025-01-04 13:56:59 +0000 sysctl: Do not serialize requests when running as root Bugs or unexpected behaviour can cause a user thread to block in a sysctl handler for a long time. "procstat -kka" is the most useful tool to see why this might happen, but it can block on sysctlmemlock too. Since the purpose of this lock is merely to ensure userspace can't wire too much memory, don't require it for requests from privileged threads. PR: 282994 Reviewed by: kib, jhb MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D47842 (cherry picked from commit 7d1d9cc440f800858b6ec8dfb5a41c853fc8c36d) sys/kern/kern_sysctl.c | 10 ++++++---- sys/sys/priv.h | 1 + 2 files changed, 7 insertions(+), 4 deletions(-) |