Fatal trap 9: general protection fault while in kernel mode cpuid = 0; apic id = 00 instruction pointer = 0x20:0xffffffff80c4bff0 stack pointer = 0x28:0xfffffe0013690870 frame pointer = 0x28:0xfffffe0013690870 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 41551 (copy_db) trap number = 9 panic: general protection fault cpuid = 0 time = 1602757996 KDB: stack backtrace: #0 0xffffffff80c1d297 at kdb_backtrace+0x67 #1 0xffffffff80bd05cd at vpanic+0x19d #2 0xffffffff80bd0423 at panic+0x43 #3 0xffffffff810a7d2c at trap_fatal+0x39c #4 0xffffffff810a713c at trap+0x6c #5 0xffffffff81081a0c at calltrap+0x8 #6 0xffffffff80c4bb99 at sys_semop+0x729 #7 0xffffffff810a88e4 at amd64_syscall+0x364 #8 0xffffffff81082330 at fast_syscall_common+0x101 Backtrace from kgdb: (kgdb) bt #0 __curthread () at /usr/src/sys/amd64/include/pcpu.h:234 #1 doadump (textdump=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:371 #2 0xffffffff80bd01c8 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:451 #3 0xffffffff80bd0629 in vpanic (fmt=<optimized out>, ap=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:877 #4 0xffffffff80bd0423 in panic (fmt=<unavailable>) at /usr/src/sys/kern/kern_shutdown.c:804 #5 0xffffffff810a7d2c in trap_fatal (frame=0xfffffe00136907b0, eva=0) at /usr/src/sys/amd64/amd64/trap.c:943 #6 0xffffffff810a713c in trap (frame=0xfffffe00136907b0) at /usr/src/sys/amd64/amd64/trap.c:221 #7 <signal handler called> #8 0xffffffff80c4bff0 in semu_alloc (td=<optimized out>) at /usr/src/sys/kern/sysv_sem.c:420 #9 semundo_adjust (td=0xfffff8000fe17000, supptr=0xfffffe00136908e0, semid=1, semseq=1, semnum=0, adjval=1) at /usr/src/sys/kern/sysv_sem.c:468 #10 0xffffffff80c4bb99 in sys_semop (td=0xfffff8000fe17000, uap=<optimized out>) at /usr/src/sys/kern/sysv_sem.c:1337 #11 0xffffffff810a88e4 in syscallenter (td=0xfffff8000fe17000) at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:135 #12 amd64_syscall (td=0xfffff8000fe17000, traced=0) at /usr/src/sys/amd64/amd64/trap.c:1186 #13 <signal handler called> If worth mentioning, my /boot/loader.conf contains kern.ipc.semopm=300 kern.ipc.semume=500 System is running in a VMWare instance, in case this is relevant.
Can you provide a minimal reproducer for the issue ?
Hi, I've tried to create something that replicates the behavior, but unfortunately I dont have any luck with it... What happens internally is that a main process forks off 50 ish smaller processes that need to do system maintenance on 50 database files, and when attaching to the shared memory this segfault occurs.
Without reproducer I cannot say anything. Perhaps try on 12.2, there were a fix that might be relevant, r358242 MFC of r357984.
Fair enough. Is there a way to find out what the calling process was actually doing to cause this? kgdb only gives me the kernel fault, but doesn't give me anything on the state of the calling process.
You can try to do something with e.g. ktrace, but this would be hard because system panics and records are not written. Might be sync NFS mount from other machine help, but I do not expect it. So are you able to reproduce it at will, even with complex scenario ? Try 12.2 or HEAD, you can install only kernel. Perhaps enable INVARIANTS when doing so.
I can reproduce it with relative ease, i'm on 7 vmcore files so far. Updating now to 12.2-RC2
I suspect I figured it out, please try the patch from https://reviews.freebsd.org/D26826 That said, I am curious why do you need to adjust semume.
Hi, Thanks, though I did not receive the panic this morning after upgrading to 12.2, will check again tomorrow. If this fault still persists I'll patch in your suggestion. I needed to increase SEMUME as some processes were complaining they could not semget (EINVAL)
(In reply to Olef from comment #8) I am quite sure that there is the issue I described in the review, and since it is a memory corruption kind of bug, it is quite specific to the kernel/machine/ load when and how it manifests itself. I suggest you to add the patch to your kernel and try the procedure that caused panic, manually, several time.
I will, would it also manifest itself in 12.2 or shall I create a new VM ?
(In reply to Olef from comment #10) The issue that patch fixes is in HEAD, stable/12, and all 12.x releases. But since it is memory corruption, specific manifestation of it can be arbitrary, for instance you might get data corruption instead of panic. Yes, you can test with 12.2 VM.
Hi, So, in 12.2 RC2 I indeed still got the kernel panics after initial upgrade. After patching the kernel I've not received this anymore in the last 3 days so all seems to work fine. Thanks for your help! PS: Increasing SEMUSZ would have also done the trick ?
(In reply to Olef from comment #12) Yes increasing kern.ipc.semusz would also help, but you need to carefully calculate how large to set it. For instance, it is arch-dependent.
A commit references this bug: Author: kib Date: Thu Oct 22 09:28:12 UTC 2020 New revision: 366932 URL: https://svnweb.freebsd.org/changeset/base/366932 Log: sysv_sem: semusz depends on semume. Size of the per-process semaphore undo structure (semusz) depends on the number of the per-process undos. If kern.ipc.semume is adjusted, semusz must be adjusted as well, and it makes no sense to delegate adjustment to user. Make it automatic. Reported and tested by: Olef <o.vandestadt@gmail.com> PR: 250361 Reviewed by: jhb, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D26826 Changes: head/sys/kern/sysv_sem.c
A commit references this bug: Author: kib Date: Thu Oct 29 11:09:48 UTC 2020 New revision: 367128 URL: https://svnweb.freebsd.org/changeset/base/367128 Log: MFC r366932: sysv_sem: semusz depends on semume. PR: 250361 Changes: _U stable/12/ stable/12/sys/kern/sysv_sem.c
A commit references this bug: Author: kib Date: Thu Oct 29 11:19:48 UTC 2020 New revision: 367129 URL: https://svnweb.freebsd.org/changeset/base/367129 Log: MFC r366932: sysv_sem: semusz depends on semume. PR: 250361 Changes: _U stable/11/ stable/11/sys/kern/sysv_sem.c
FYI, I was able to reproduce this problem on an Intel NUC, applying the patch solved the problem.