14-STABLE 935c5a5554e9. Issue was not present as of ff27c3872300. The crash happens pretty reliably within a couple minutes of boot. #0 __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:57 td = <optimized out> #1 doadump (textdump=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:405 error = 0 coredump = <optimized out> #2 0xffffffff8086b987 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:523 once = 0 #3 0xffffffff8086be5e in vpanic (fmt=0xffffffff80e7a878 "%s", ap=ap@entry=0xfffffe0090e36c50) at /usr/src/sys/kern/kern_shutdown.c:967 buf = "page fault", '\000' <repeats 245 times> __pc = 0x0 __pc = 0x0 __pc = 0x0 other_cpus = {__bits = {14, 0 <repeats 15 times>}} td = 0xfffff800079d6000 bootopt = <unavailable> newpanic = <optimized out> #4 0xffffffff8086bcb3 in panic (fmt=<unavailable>) at /usr/src/sys/kern/kern_shutdown.c:891 ap = {{gp_offset = 16, fp_offset = 48, overflow_arg_area = 0xfffffe0090e36c80, reg_save_area = 0xfffffe0090e36c20}} #5 0xffffffff80d63e2b in trap_fatal (frame=0xfffffe0090e36d30, eva=32) at /usr/src/sys/amd64/amd64/trap.c:952 __pc = 0x0 __pc = 0x0 __pc = 0x0 softseg = {ssd_base = 0, ssd_limit = 1048575, ssd_type = 27, ssd_dpl = 0, ssd_p = 1, ssd_long = 1, ssd_def32 = 0, ssd_gran = 1} code = 0 ss = 40 type = <optimized out> gdt = <optimized out> handled = <optimized out> #6 0xffffffff80d63e76 in trap_pfault (frame=<unavailable>, usermode=false, signo=<optimized out>, ucode=<optimized out>) at /usr/src/sys/amd64/amd64/trap.c:760 __pc = 0x0 __pc = 0x0 __pc = 0x0 td = 0xfffff800079d6000 p = <optimized out> eva = <unavailable> map = <optimized out> ftype = <optimized out> rv = <optimized out> #7 <signal handler called> No locals. #8 0xffffffff808d28c0 in turnstile_broadcast (ts=0x0, queue=queue@entry=0) at /usr/src/sys/kern/subr_turnstile.c:900 td = <optimized out> ts1 = <optimized out> tc = <optimized out> #9 0xffffffff80848c63 in __mtx_unlock_sleep (c=<optimized out>, v=<optimized out>) at /usr/src/sys/kern/kern_mutex.c:1056 tid = <optimized out> m = 0xfffffe0091b89548 ts = 0x0 #10 0xffffffff80b6c268 in pf_unlink_state (s=s@entry=0xfffff801c6a56840) at /usr/src/sys/netpfil/pf/pf.c:2146 _v = 0 ih = 0xfffffe0091b89540 #11 0xffffffff80b6b7b8 in pf_purge_expired_states (i=103382, maxcheck=108) at /usr/src/sys/netpfil/pf/pf.c:2206 count = 0 ih = 0xfffffe0091af1970 s = 0xfffff801c6a56840 mrm = <optimized out> #12 0xffffffff80b6b5db in pf_purge_thread (unused=<optimized out>) at /usr/src/sys/netpfil/pf/pf.c:1949 saved_vnet = 0x0 vnet_iter = 0xfffff800010af9c0 #13 0xffffffff8082677f in fork_exit ( callout=0xffffffff80b6b4a0 <pf_purge_thread>, arg=0x0, frame=0xfffffe0090e36f40) at /usr/src/sys/kern/kern_fork.c:1164 __pc = 0x0 __pc = 0x0 td = 0xfffff800079d6000 p = 0xfffffe0010def5a0 dtd = <optimized out> #14 <signal handler called>
(In reply to Daniel Ponte from comment #0) All right, first off: did you mean eff27c3872300 rather than ff27c3872300? Secondly, what is your setup? Post your ruleset and network configuration. Are you using pfsync, pflog, ....? Lastly, build an INVARIANTS kernel and see what assertions you hit. The panic seems to show us locking a hashrow that the state is not in, which explains the panic on mutex unlock (because we're unlocking a different mutex from the one we locked), but there's no obvious way for that to happen. That would be why it's important to provide setup details!
(In reply to Kristof Provost from comment #1) Yes, eff27c3872300, sorry. This machine is primarily a NAT, IPv6 and VPN gateway for my home network, but has some other roles (nginx reverse proxy, xmpp server, asterisk PBX). My ruleset is pretty complex and makes use of policy-based routing. The machine runs a vnet jail. I would rather not post the ruleset but would be happy to email it. I use pflogd, but no pfsync.
(In reply to Kristof Provost from comment #1) With INVARIANTS: panic: pfsync_drop: st->sync_state == q core.txt excerpt attached. uname commit hash differs due to local (non kernel) modifications; this is built from the same eff27c3872300 tree.
Created attachment 251655 [details] INVARIANTS core.txt excerpt
(In reply to Daniel Ponte from comment #4) In comment #2 you stated there's no pfsync, yet this panic is in pfsync. It's very hard to debug things if you're not supplying the requested information. It's downright impossible if the information that is provided is wrong. Provide all of the previously requested information, including the full diff against upstream, and ensure it is accurate.
Is this what it is now? Being harsh to people will to report and help squash bugs to subsystems you actively maintain? ;) Cheers, Franco
The diff against upstream is a patch to make rtadvd(8) deprecate addresses on shutdown. It has nothing to do with this crash. I am not trying to be difficult, but my ruleset is 327 lines long and I will need to anonymize it, which will take time. I'm not certain your tone is warranted; I know my configuration is complex and tends to uncover bugs like this, which I have tried to report every time I encounter them. I am trying to help the Project.
And yes, pfsync is present in the kernel, but is not being used at all.
Removing pfsync from the kernel config has resolved this crash.
(In reply to Kristof Provost from comment #5) > (In reply to Daniel Ponte from comment #4) > In comment #2 you stated there's no pfsync, yet this panic is in pfsync. `vnet_pfsync_init()` vnet sysinit routine would unconditionally calls `pfsync_pointers_init()` [1]. I think that is why Daniel@ found > Removing pfsync from the kernel config has resolved this crash 1. https://cgit.freebsd.org/src/tree/sys/netpfil/pf/if_pfsync.c#n3112