I have a bunch of routers running FreeBSD 10 and 11. They have gif tunnels encrypted with IPsec transport mode between them and this worked fine for years. On each of them there is pf protecting many VLANs and the aforementioned tunnels. Recently I had to connect them to some external cloud service provider and because of that I needed to use IPsec tunnel mode. In order to be able to run pf correctly and without redesigning my ruleset to work on interface facing the Internet and covering internal IP addresses (I understand this is how (de)encapsulation of IPsec tunnel mode would work) I decided to use enc(4) device. Unfortunately as soon as the device is up the system locks up. Sometimes it will work for a minute or two with 100% cpu usage done in bird (which I use to establish BGP sesion to that cloud) or in "intr/swi4: clock". Afterwards it just becomes unresponsive. I built a custom kernel with lock debugging options (https://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug-deadlocks.html) enabled and managed to get the following crash: #0 doadump (textdump=0) at pcpu.h:221 #1 0xffffffff80332a2b in db_dump (dummy=<value optimized out>, dummy2=false, dummy3=0, dummy4=0x0) at /mnt/builder/freebsd.git/sys/ddb/db_command.c:533 #2 0xffffffff80332829 in db_command (cmd_table=<value optimized out>) at /mnt/builder/freebsd.git/sys/ddb/db_command.c:440 #3 0xffffffff80332584 in db_command_loop () at /mnt/builder/freebsd.git/sys/ddb/db_command.c:493 #4 0xffffffff803354bb in db_trap (type=<value optimized out>, code=<value optimized out>) at /mnt/builder/freebsd.git/sys/ddb/db_main.c:251 #5 0xffffffff80646913 in kdb_trap (type=<value optimized out>, code=<value optimized out>, tf=<value optimized out>) at /mnt/builder/freebsd.git/sys/kern/subr_kdb.c:654 #6 0xffffffff808d6a1d in trap (frame=0xfffffe01e47de850) at /mnt/builder/freebsd.git/sys/amd64/amd64/trap.c:556 #7 0xffffffff808b7631 in calltrap () at /mnt/builder/freebsd.git/sys/amd64/amd64/exception.S:236 #8 0xffffffff80645fbb in kdb_enter (why=0xffffffff80abf8f9 "panic", msg=0x80 <Address 0x80 out of bounds>) at cpufunc.h:63 #9 0xffffffff80605ccf in vpanic (fmt=<value optimized out>, ap=0xfffffe01e47de9e0) at /mnt/builder/freebsd.git/sys/kern/kern_shutdown.c:752 #10 0xffffffff80605b26 in kassert_panic (fmt=<value optimized out>) at /mnt/builder/freebsd.git/sys/kern/kern_shutdown.c:649 #11 0xffffffff80601e73 in __rw_rlock (c=0xfffff802449e8e30, file=0xffffffff80ade5f6 "/mnt/builder/freebsd.git/sys/netinet/in_pcb.c", line=1968) at /mnt/builder/freebsd.git/sys/kern/kern_rwlock.c:411 #12 0xffffffff807252d1 in in_pcblookup_hash () at /mnt/builder/freebsd.git/sys/netinet/in_pcb.c:1968 #13 0xffffffff81825495 in pf_socket_lookup (direction=<value optimized out>, pd=0xfffffe01e47df0a8, m=0xfffff80059759600) at /mnt/builder/freebsd.git/sys/modules/pf/../../netpfil/pf/pf.c:2932 #14 0xffffffff8182ad04 in pf_test_rule (rm=0xfffffe01e47df128, sm=0xfffffe01e47df120, direction=<value optimized out>, kif=<value optimized out>, m=0xfffff80059759600, off=<value optimized out>, pd=<value optimized out>, am=0xfffffe01e47df130, rsm=0xfffffe01e47df118, inp=<value optimized out>) at /mnt/builder/freebsd.git/sys/modules/pf/../../netpfil/pf/pf.c:3369 #15 0xffffffff8182705d in pf_test (dir=<value optimized out>, ifp=<value optimized out>, m0=0xfffffe01e47df208, inp=0x0) at /mnt/builder/freebsd.git/sys/modules/pf/../../netpfil/pf/pf.c:5973 #16 0xffffffff81837f0d in pf_check_out (arg=<value optimized out>, m=0xfffffe01e47df208, ifp=0x80, dir=<value optimized out>, inp=0x0) at /mnt/builder/freebsd.git/sys/modules/pf/../../netpfil/pf/pf_ioctl.c:3742 #17 0xffffffff8070a46b in pfil_run_hooks (ph=<value optimized out>, mp=<value optimized out>, ifp=<value optimized out>, dir=<value optimized out>, inp=<value optimized out>) at /mnt/builder/freebsd.git/sys/net/pfil.c:83 #18 0xffffffff8185a4e5 in enc_hhook (hhook_type=4, hhook_id=2, udata=<value optimized out>, ctx_data=0xfffffe01e47df3b0, hdata=<value optimized out>, hosd=0x1d0) at /mnt/builder/freebsd.git/sys/net/if_enc.c:287 #19 0xffffffff805c8f33 in hhook_run_hooks (hhh=<value optimized out>, ctx_data=<value optimized out>, hosd=<value optimized out>) at /mnt/builder/freebsd.git/sys/kern/kern_hhook.c:121 #20 0xffffffff807ebe1f in ipsec_run_hhooks (ctx=0xfffffe01e47df3b0, type=<value optimized out>) at /mnt/builder/freebsd.git/sys/netipsec/ipsec.c:838 #21 0xffffffff807eff05 in ipsec4_process_packet (m=<value optimized out>, isr=0xfffff80008d46080) at /mnt/builder/freebsd.git/sys/netipsec/ipsec_output.c:556 #22 0xffffffff8072b374 in ip_ipsec_output (m=0xfffffe01e47df588, inp=0xfffff802449e8cb0, error=0xfffffe01e47df54c) at /mnt/builder/freebsd.git/sys/netinet/ip_ipsec.c:205 #23 0xffffffff8072d076 in ip_output (m=<value optimized out>, opt=<value optimized out>, ro=<value optimized out>, flags=0, imo=0x0, inp=<value optimized out>) at /mnt/builder/freebsd.git/sys/netinet/ip_output.c:556 #24 0xffffffff807a3ff0 in tcp_output (tp=<value optimized out>) at /mnt/builder/freebsd.git/sys/netinet/tcp_output.c:1422 #25 0xffffffff807af419 in tcp_timer_rexmt (xtp=<value optimized out>) at /mnt/builder/freebsd.git/sys/netinet/tcp_timer.c:812 #26 0xffffffff8061c1cc in softclock_call_cc (c=<value optimized out>, cc=<value optimized out>, direct=<value optimized out>) at /mnt/builder/freebsd.git/sys/kern/kern_timeout.c:729 #27 0xffffffff8061c5c7 in softclock (arg=<value optimized out>) at /mnt/builder/freebsd.git/sys/kern/kern_timeout.c:867 #28 0xffffffff805cb3c6 in intr_event_execute_handlers (p=<value optimized out>, ie=<value optimized out>) at /mnt/builder/freebsd.git/sys/kern/kern_intr.c:1262 #29 0xffffffff805cba46 in ithread_loop (arg=<value optimized out>) at /mnt/builder/freebsd.git/sys/kern/kern_intr.c:1275 #30 0xffffffff805c8ac4 in fork_exit (callout=0xffffffff805cb9a0 <ithread_loop>, arg=0xfffff800034ea360, frame=0xfffffe01e47df9c0) at /mnt/builder/freebsd.git/sys/kern/kern_fork.c:1038 #31 0xffffffff808b7b6e in fork_trampoline () at /mnt/builder/freebsd.git/sys/amd64/amd64/exception.S:611 #32 0x0000000000000000 in ?? () I do have in firewall a rule which allows all connections having a UID, so the ones created by programs on the router. Apart from that most other rules are for forwarding. I'm afraid line numbers in pf.c might be altered, I build this kernel with debug options from the branch I use on my LoadBalancers which contains a few patches for pf, but outside of pf it should be standard kernel. The problem of course exists also on standard FreeBSD 11.0 kernel. I see ticket PR 188063 deals with a very similar issue.
Do you also have the kernel log? I'd be interested in seeing the panic message. pf_socket_lookup() in the backtrace means you'll probably avoid the panic if you disable your UID related rules. Perhaps that can work as a temporary workaround.
Under normal circumstances the kernel does not panic, it just goes 100% usage on bird or on interrupt. The packet which causes it is bird connecting to BGP router on the other side, at least on the dump I obtained from this particular crash. I have the screenshot of crash with debug-enabled kernel but the stack trace is so long, that the original message is not visible anymore. Does kernel debugger console support scrolling? Pressing scroll-lock did not have any effect. The line 411 where the error happens is: 372 __rw_rlock(volatile uintptr_t *c, const char *file, int line) ... 411 KASSERT(rw_wowner(rw) != curthread, 412 ("rw_rlock: wlock already held for %s @ %s:%d", 413 rw->lock_object.lo_name, file, line)); Can I obtain information about the other process holding the lock? (struct thread)*inp->inp_lock->rw_lock points to the same process "swi4: clock (0)". I know the live debugger has some helpers to show locks, but I would need to kill the machine again for that and set up serial console. I would prefer to use existing memory dump if possible.
(In reply to vegeta from comment #2) There is no other thread. We assert that 'rw_wowner(rw) != curthread', so the thread holding the lock must not be the current thread (trying to acquire the lock). That assertion fails, so we are the thread holding the lock, yet we try to lock again. That seems to be in tcp_timer_rexmt(). It holds a write lock on the struct inpcb, so when pf goes to look up the inpcb and tries to get a read lock we run into this assertion failure. (tcp_output() in fact asserts that the inpcb must be write locked when it's called.) I'm not quite sure how to fix this though. In fact, right now I don't understand how this ever works.
(In reply to Kristof Provost from comment #3) > I'm not quite sure how to fix this though. In fact, right now I don't > understand how this ever works. I think we can extend ipsec_ctx_data structure by adding inpcb pointer. It will be initialized for IPSEC_ENC_BEFORE+HHOOK_TYPE_IPSEC_OUT case, and will be NULL for other cases. Then pass this pointer to the pfil_run_hooks(). In this case, I think, pf_test_rule() will not invoke pf_socket_lookup() due to pd->lookup.done = 1. And for other cases pf_socket_lookup() can be called, because we don't hold any inpcbs.
Created attachment 183913 [details] Proposed patch (untested) Patch should be applicable to stable/11 and head/.
I'll try to find a moment to test it, hopefully next week.
(In reply to Andrey V. Elsukov from comment #5) Thanks! I missed the inpcb being passed through netpfil. That explains why it usually works.
I'm sorry, I did not have time to check the patch yet. Maybe next week.
A commit references this bug: Author: ae Date: Mon Jul 31 11:04:36 UTC 2017 New revision: 321779 URL: https://svnweb.freebsd.org/changeset/base/321779 Log: Add inpcb pointer to struct ipsec_ctx_data and pass it to the pfil hook from enc_hhook(). This should solve the problem when pf is used with if_enc(4) interface, and outbound packet with existing PCB checked by pf, and this leads to deadlock due to pf does its own PCB lookup and tries to take rlock when wlock is already held. Now we pass PCB pointer if it is known to the pfil hook, this helps to avoid extra PCB lookup and thus rlock acquiring is not needed. For inbound packets it is safe to pass NULL, because we do not held any PCB locks yet. PR: 220217 MFC after: 3 weeks Sponsored by: Yandex LLC Changes: head/sys/net/if_enc.c head/sys/net/if_enc.h head/sys/netipsec/ipsec.h head/sys/netipsec/ipsec_input.c head/sys/netipsec/ipsec_output.c
A commit references this bug: Author: ae Date: Mon Aug 21 09:03:21 UTC 2017 New revision: 322741 URL: https://svnweb.freebsd.org/changeset/base/322741 Log: MFC r321779: Add inpcb pointer to struct ipsec_ctx_data and pass it to the pfil hook from enc_hhook(). This should solve the problem when pf is used with if_enc(4) interface, and outbound packet with existing PCB checked by pf, and this leads to deadlock due to pf does its own PCB lookup and tries to take rlock when wlock is already held. Now we pass PCB pointer if it is known to the pfil hook, this helps to avoid extra PCB lookup and thus rlock acquiring is not needed. For inbound packets it is safe to pass NULL, because we do not held any PCB locks yet. PR: 220217 Sponsored by: Yandex LLC Changes: _U stable/11/ stable/11/sys/net/if_enc.c stable/11/sys/net/if_enc.h stable/11/sys/netipsec/ipsec.h stable/11/sys/netipsec/ipsec_input.c stable/11/sys/netipsec/ipsec_output.c
Assign to committer that resolved