Bug 220217 - deadlock on enc and pf
Summary: deadlock on enc and pf
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 11.0-RELEASE
Hardware: amd64 Any
: --- Affects Only Me
Assignee: Andrey V. Elsukov
URL:
Keywords: patch
Depends on:
Blocks:
 
Reported: 2017-06-22 23:17 UTC by Kajetan Staszkiewicz
Modified: 2018-10-20 07:09 UTC (History)
3 users (show)

See Also:
koobs: mfc-stable11+


Attachments
Proposed patch (untested) (6.00 KB, patch)
2017-06-29 13:42 UTC, Andrey V. Elsukov
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Kajetan Staszkiewicz 2017-06-22 23:17:49 UTC
I have a bunch of routers running FreeBSD 10 and 11. They have gif tunnels encrypted with IPsec transport mode between them and this worked fine for years. On each of them there is pf protecting many VLANs and the aforementioned tunnels.

Recently I had to connect them to some external cloud service provider and because of that I needed to use IPsec tunnel mode. In order to be able to run pf correctly and without redesigning my ruleset to work on interface facing the Internet and covering internal IP addresses (I understand this is how (de)encapsulation of IPsec tunnel mode would work) I decided to use enc(4) device.

Unfortunately as soon as the device is up the system locks up. Sometimes it will work for a minute or two with 100% cpu usage done in bird (which I use to establish BGP sesion to that cloud) or in "intr/swi4: clock". Afterwards it just becomes unresponsive.

I built a custom kernel with lock debugging options (https://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug-deadlocks.html) enabled and managed to get the following crash:

#0  doadump (textdump=0) at pcpu.h:221                                                                                                                                                                            
#1  0xffffffff80332a2b in db_dump (dummy=<value optimized out>, dummy2=false, dummy3=0, dummy4=0x0) at /mnt/builder/freebsd.git/sys/ddb/db_command.c:533                                                          
#2  0xffffffff80332829 in db_command (cmd_table=<value optimized out>) at /mnt/builder/freebsd.git/sys/ddb/db_command.c:440                                                                                       
#3  0xffffffff80332584 in db_command_loop () at /mnt/builder/freebsd.git/sys/ddb/db_command.c:493                                                                                                                 
#4  0xffffffff803354bb in db_trap (type=<value optimized out>, code=<value optimized out>) at /mnt/builder/freebsd.git/sys/ddb/db_main.c:251                                                                      
#5  0xffffffff80646913 in kdb_trap (type=<value optimized out>, code=<value optimized out>, tf=<value optimized out>) at /mnt/builder/freebsd.git/sys/kern/subr_kdb.c:654                                         
#6  0xffffffff808d6a1d in trap (frame=0xfffffe01e47de850) at /mnt/builder/freebsd.git/sys/amd64/amd64/trap.c:556                                                                                                  
#7  0xffffffff808b7631 in calltrap () at /mnt/builder/freebsd.git/sys/amd64/amd64/exception.S:236                                                                                                                 
#8  0xffffffff80645fbb in kdb_enter (why=0xffffffff80abf8f9 "panic", msg=0x80 <Address 0x80 out of bounds>) at cpufunc.h:63                                                                                       
#9  0xffffffff80605ccf in vpanic (fmt=<value optimized out>, ap=0xfffffe01e47de9e0) at /mnt/builder/freebsd.git/sys/kern/kern_shutdown.c:752                                                                      
#10 0xffffffff80605b26 in kassert_panic (fmt=<value optimized out>) at /mnt/builder/freebsd.git/sys/kern/kern_shutdown.c:649                                                                                      
#11 0xffffffff80601e73 in __rw_rlock (c=0xfffff802449e8e30, file=0xffffffff80ade5f6 "/mnt/builder/freebsd.git/sys/netinet/in_pcb.c", line=1968) at /mnt/builder/freebsd.git/sys/kern/kern_rwlock.c:411            
#12 0xffffffff807252d1 in in_pcblookup_hash () at /mnt/builder/freebsd.git/sys/netinet/in_pcb.c:1968                                                                                                              
#13 0xffffffff81825495 in pf_socket_lookup (direction=<value optimized out>, pd=0xfffffe01e47df0a8, m=0xfffff80059759600) at /mnt/builder/freebsd.git/sys/modules/pf/../../netpfil/pf/pf.c:2932                   
#14 0xffffffff8182ad04 in pf_test_rule (rm=0xfffffe01e47df128, sm=0xfffffe01e47df120, direction=<value optimized out>, kif=<value optimized out>, m=0xfffff80059759600, off=<value optimized out>,                
    pd=<value optimized out>, am=0xfffffe01e47df130, rsm=0xfffffe01e47df118, inp=<value optimized out>) at /mnt/builder/freebsd.git/sys/modules/pf/../../netpfil/pf/pf.c:3369                                     
#15 0xffffffff8182705d in pf_test (dir=<value optimized out>, ifp=<value optimized out>, m0=0xfffffe01e47df208, inp=0x0) at /mnt/builder/freebsd.git/sys/modules/pf/../../netpfil/pf/pf.c:5973                    
#16 0xffffffff81837f0d in pf_check_out (arg=<value optimized out>, m=0xfffffe01e47df208, ifp=0x80, dir=<value optimized out>, inp=0x0)                                                                            
    at /mnt/builder/freebsd.git/sys/modules/pf/../../netpfil/pf/pf_ioctl.c:3742                                                                                                                                   
#17 0xffffffff8070a46b in pfil_run_hooks (ph=<value optimized out>, mp=<value optimized out>, ifp=<value optimized out>, dir=<value optimized out>, inp=<value optimized out>)                                    
    at /mnt/builder/freebsd.git/sys/net/pfil.c:83                                                                                                                                                                 
#18 0xffffffff8185a4e5 in enc_hhook (hhook_type=4, hhook_id=2, udata=<value optimized out>, ctx_data=0xfffffe01e47df3b0, hdata=<value optimized out>, hosd=0x1d0)                                                 
    at /mnt/builder/freebsd.git/sys/net/if_enc.c:287                                                                                                                                                              
#19 0xffffffff805c8f33 in hhook_run_hooks (hhh=<value optimized out>, ctx_data=<value optimized out>, hosd=<value optimized out>) at /mnt/builder/freebsd.git/sys/kern/kern_hhook.c:121
#20 0xffffffff807ebe1f in ipsec_run_hhooks (ctx=0xfffffe01e47df3b0, type=<value optimized out>) at /mnt/builder/freebsd.git/sys/netipsec/ipsec.c:838
#21 0xffffffff807eff05 in ipsec4_process_packet (m=<value optimized out>, isr=0xfffff80008d46080) at /mnt/builder/freebsd.git/sys/netipsec/ipsec_output.c:556
#22 0xffffffff8072b374 in ip_ipsec_output (m=0xfffffe01e47df588, inp=0xfffff802449e8cb0, error=0xfffffe01e47df54c) at /mnt/builder/freebsd.git/sys/netinet/ip_ipsec.c:205
#23 0xffffffff8072d076 in ip_output (m=<value optimized out>, opt=<value optimized out>, ro=<value optimized out>, flags=0, imo=0x0, inp=<value optimized out>)
    at /mnt/builder/freebsd.git/sys/netinet/ip_output.c:556
#24 0xffffffff807a3ff0 in tcp_output (tp=<value optimized out>) at /mnt/builder/freebsd.git/sys/netinet/tcp_output.c:1422
#25 0xffffffff807af419 in tcp_timer_rexmt (xtp=<value optimized out>) at /mnt/builder/freebsd.git/sys/netinet/tcp_timer.c:812
#26 0xffffffff8061c1cc in softclock_call_cc (c=<value optimized out>, cc=<value optimized out>, direct=<value optimized out>) at /mnt/builder/freebsd.git/sys/kern/kern_timeout.c:729
#27 0xffffffff8061c5c7 in softclock (arg=<value optimized out>) at /mnt/builder/freebsd.git/sys/kern/kern_timeout.c:867
#28 0xffffffff805cb3c6 in intr_event_execute_handlers (p=<value optimized out>, ie=<value optimized out>) at /mnt/builder/freebsd.git/sys/kern/kern_intr.c:1262
#29 0xffffffff805cba46 in ithread_loop (arg=<value optimized out>) at /mnt/builder/freebsd.git/sys/kern/kern_intr.c:1275
#30 0xffffffff805c8ac4 in fork_exit (callout=0xffffffff805cb9a0 <ithread_loop>, arg=0xfffff800034ea360, frame=0xfffffe01e47df9c0) at /mnt/builder/freebsd.git/sys/kern/kern_fork.c:1038
#31 0xffffffff808b7b6e in fork_trampoline () at /mnt/builder/freebsd.git/sys/amd64/amd64/exception.S:611
#32 0x0000000000000000 in ?? ()

I do have in firewall a rule which allows all connections having a UID, so the ones created by programs on the router. Apart from that most other rules are for forwarding.

I'm afraid line numbers in pf.c might be altered, I build this kernel with debug options from the branch I use on my LoadBalancers which contains a few patches for pf, but outside of pf it should be standard kernel. The problem of course exists also on standard FreeBSD 11.0 kernel.

I see ticket PR 188063 deals with a very similar issue.
Comment 1 Kristof Provost freebsd_committer freebsd_triage 2017-06-23 10:19:40 UTC
Do you also have the kernel log? I'd be interested in seeing the panic message.

pf_socket_lookup() in the backtrace means you'll probably avoid the panic if you disable your UID related rules. Perhaps that can work as a temporary workaround.
Comment 2 Kajetan Staszkiewicz 2017-06-23 10:58:16 UTC
Under normal circumstances the kernel does not panic, it just goes 100% usage on bird or on interrupt. The packet which causes it is bird connecting to BGP router on the other side, at least on the dump I obtained from this particular crash.

I have the screenshot of crash with debug-enabled kernel but the stack trace is so long, that the original message is not visible anymore. Does kernel debugger console support scrolling? Pressing scroll-lock did not have any effect.

The line 411 where the error happens is:

372 __rw_rlock(volatile uintptr_t *c, const char *file, int line)
...
411         KASSERT(rw_wowner(rw) != curthread,
412             ("rw_rlock: wlock already held for %s @ %s:%d",                                                
413             rw->lock_object.lo_name, file, line));

Can I obtain information about the other process holding the lock? (struct thread)*inp->inp_lock->rw_lock points to the same process "swi4: clock (0)". I know the live debugger has some helpers to show locks, but I would need to kill the machine again for that and set up serial console. I would prefer to use existing memory dump if possible.
Comment 3 Kristof Provost freebsd_committer freebsd_triage 2017-06-24 09:22:53 UTC
(In reply to vegeta from comment #2)
There is no other thread. We assert that 'rw_wowner(rw) != curthread', so the thread holding the lock must not be the current thread (trying to acquire the lock).

That assertion fails, so we are the thread holding the lock, yet we try to lock again.

That seems to be in tcp_timer_rexmt(). It holds a write lock on the struct inpcb, so when pf goes to look up the inpcb and tries to get a read lock we run into this assertion failure. (tcp_output() in fact asserts that the inpcb must be write locked when it's called.)

I'm not quite sure how to fix this though. In fact, right now I don't understand how this ever works.
Comment 4 Andrey V. Elsukov freebsd_committer freebsd_triage 2017-06-29 13:12:55 UTC
(In reply to Kristof Provost from comment #3)
> I'm not quite sure how to fix this though. In fact, right now I don't
> understand how this ever works.

I think we can extend ipsec_ctx_data structure by adding inpcb pointer. It will be initialized for IPSEC_ENC_BEFORE+HHOOK_TYPE_IPSEC_OUT case, and will be NULL for other cases. Then pass this pointer to the pfil_run_hooks(). In this case, I think, pf_test_rule() will not invoke pf_socket_lookup() due to pd->lookup.done = 1. And for other cases pf_socket_lookup() can be called, because we don't hold any inpcbs.
Comment 5 Andrey V. Elsukov freebsd_committer freebsd_triage 2017-06-29 13:42:28 UTC
Created attachment 183913 [details]
Proposed patch (untested)

Patch should be applicable to stable/11 and head/.
Comment 6 Kajetan Staszkiewicz 2017-06-29 17:39:34 UTC
I'll try to find a moment to test it, hopefully next week.
Comment 7 Kristof Provost freebsd_committer freebsd_triage 2017-06-30 07:57:05 UTC
(In reply to Andrey V. Elsukov from comment #5)
Thanks! I missed the inpcb being passed through netpfil. That explains why it usually works.
Comment 8 Kajetan Staszkiewicz 2017-07-12 10:15:46 UTC
I'm sorry, I did not have time to check the patch yet. Maybe next week.
Comment 9 commit-hook freebsd_committer freebsd_triage 2017-07-31 11:05:25 UTC
A commit references this bug:

Author: ae
Date: Mon Jul 31 11:04:36 UTC 2017
New revision: 321779
URL: https://svnweb.freebsd.org/changeset/base/321779

Log:
  Add inpcb pointer to struct ipsec_ctx_data and pass it to the pfil hook
  from enc_hhook().

  This should solve the problem when pf is used with if_enc(4) interface,
  and outbound packet with existing PCB checked by pf, and this leads to
  deadlock due to pf does its own PCB lookup and tries to take rlock when
  wlock is already held.

  Now we pass PCB pointer if it is known to the pfil hook, this helps to
  avoid extra PCB lookup and thus rlock acquiring is not needed.
  For inbound packets it is safe to pass NULL, because we do not held any
  PCB locks yet.

  PR:		220217
  MFC after:	3 weeks
  Sponsored by:	Yandex LLC

Changes:
  head/sys/net/if_enc.c
  head/sys/net/if_enc.h
  head/sys/netipsec/ipsec.h
  head/sys/netipsec/ipsec_input.c
  head/sys/netipsec/ipsec_output.c
Comment 10 commit-hook freebsd_committer freebsd_triage 2017-08-21 09:03:58 UTC
A commit references this bug:

Author: ae
Date: Mon Aug 21 09:03:21 UTC 2017
New revision: 322741
URL: https://svnweb.freebsd.org/changeset/base/322741

Log:
  MFC r321779:
    Add inpcb pointer to struct ipsec_ctx_data and pass it to the pfil hook
    from enc_hhook().

    This should solve the problem when pf is used with if_enc(4) interface,
    and outbound packet with existing PCB checked by pf, and this leads to
    deadlock due to pf does its own PCB lookup and tries to take rlock when
    wlock is already held.

    Now we pass PCB pointer if it is known to the pfil hook, this helps to
    avoid extra PCB lookup and thus rlock acquiring is not needed.
    For inbound packets it is safe to pass NULL, because we do not held any
    PCB locks yet.

    PR:		220217
    Sponsored by:	Yandex LLC

Changes:
_U  stable/11/
  stable/11/sys/net/if_enc.c
  stable/11/sys/net/if_enc.h
  stable/11/sys/netipsec/ipsec.h
  stable/11/sys/netipsec/ipsec_input.c
  stable/11/sys/netipsec/ipsec_output.c
Comment 11 Kubilay Kocak freebsd_committer freebsd_triage 2018-10-20 07:09:51 UTC
Assign to committer that resolved