First time this happened to us: Fatal trap 12: page fault while in kernel mode cpuid = 3; apic id = 03 fault virtual address = 0x20 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80c5c6c0 stack pointer = 0x28:0xfffffe0195222df0 frame pointer = 0x28:0xfffffe0195222e00 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = resume, IOPL = 0 current process = 955 (pf purge) trap number = 12 panic: page fault cpuid = 3 time = 1762530441 KDB: stack backtrace: #0 0xffffffff80c43a35 at kdb_backtrace+0x65 #1 0xffffffff80bf7162 at vpanic+0x182 #2 0xffffffff80bf6fd3 at panic+0x43 #3 0xffffffff810b5169 at trap_fatal+0x389 #4 0xffffffff810b51b6 at trap_pfault+0x46 #5 0xffffffff8108c298 at calltrap+0x8 #6 0xffffffff80bd4037 at __mtx_unlock_sleep+0x77 #7 0xffffffff82f5156f at pf_unlink_state+0x2df #8 0xffffffff82f50bcd at pf_purge_expired_states+0x14d #9 0xffffffff82f50a2b at pf_purge_thread+0x13b #10 0xffffffff80bb29ff at fork_exit+0x7f #11 0xffffffff8108d30e at fork_trampoline+0xe
That *might* be fixed by bdea9cbcf2decafeb4da5a0280313efccc09e1b3, but given the lack of detail it's impossible to tell. In any event, 13.5 is a legacy release and I don't expect to do any significant debugging on it.
(In reply to Kristof Provost from comment #1) Sorry, 8efd2acf07bc0e1c3ea1f7390e0f1cfb7cf6f86c not bdea9cbcf2decafeb4da5a0280313efccc09e1b3. The latter is follow-on cleanup, not the actual fix.
Thanks for your swift response, Kristof. The patch applied cleanly and we could build/install a custom kernel using it. If it will help, only time can tell - we haven't seen this panic before, my suspicion is that the node in question was overloaded at the time, making the issue (potentially) more likely to happen. I'm not sure if I will be able to confirm it to be the ultimate fix, as it's a rare issue for us and I didn't spend time on creating any reproducible test case. Do you have any link to the issue that was fixed in 8efd2acf07bc0e1c3ea1f7390e0f1cfb7cf6f86c? I checked the phabricator review, but it didn't have any details beyond what is in the commit message.
(In reply to Michael Gmelin from comment #3) There was no associated bug report, that fix resulted from Mark noticing something odd in the relevant code and then discussing it with me and Gleb.