Bug 228782 - Deadlock on pf
Summary: Deadlock on pf
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: amd64 Any
: --- Affects Only Me
Assignee: Kristof Provost
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-06-06 12:07 UTC by Tatiana
Modified: 2018-06-16 11:43 UTC (History)
3 users (show)

See Also:


Attachments
fail script (211 bytes, application/x-shellscript)
2018-06-06 12:07 UTC, Tatiana
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Tatiana 2018-06-06 12:07:05 UTC
Created attachment 194044 [details]
fail script

I believe we've faced a problem similar to bug #220217 (https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=220217)

We've configured a machine with FreeBSD-12.0-CURRENT-r334337 software and two interfaces (em0, em1). 
Two pf rules are added:
1) some user rule (it is essential for the crash)
2) a route rule on em1
(The script reproducing the situation is attached -- fail.sh) 

Trying to establish tcp-connection via route leads to the kernel crash:

panic: rw_rlock: wlock already held for tcpinp @ /usr/src/sys/netinet/in_pcb.c:2092

#0  doadump (textdump=0) at pcpu.h:231
#1  0xffffffff8043a51b in db_dump (dummy=<value optimized out>, 
    dummy2=<value optimized out>, dummy3=<value optimized out>, 
    dummy4=<value optimized out>) at /usr/src/sys/ddb/db_command.c:574
#2  0xffffffff8043a2e9 in db_command (cmd_table=<value optimized out>)
    at /usr/src/sys/ddb/db_command.c:481
#3  0xffffffff8043a064 in db_command_loop ()
    at /usr/src/sys/ddb/db_command.c:534
#4  0xffffffff8043d29f in db_trap (type=<value optimized out>, 
    code=<value optimized out>) at /usr/src/sys/ddb/db_main.c:252
#5  0xffffffff80bbcd83 in kdb_trap (type=3, code=0, tf=<value optimized out>)
    at /usr/src/sys/kern/subr_kdb.c:697
#6  0xffffffff810442c8 in trap (frame=0xfffffe0000581c20)
    at /usr/src/sys/amd64/amd64/trap.c:605
#7  0xffffffff8101eddc in calltrap ()
    at /usr/src/sys/amd64/amd64/exception.S:230
#8  0xffffffff80bbc45b in kdb_enter (why=0xffffffff812d4f5f "panic", 
    msg=<value optimized out>) at cpufunc.h:65
#9  0xffffffff80b74db0 in vpanic (fmt=<value optimized out>, 
    ap=0xfffffe0000581d90) at /usr/src/sys/kern/kern_shutdown.c:852
#10 0xffffffff80b74b60 in kassert_panic (fmt=<value optimized out>)
    at /usr/src/sys/kern/kern_shutdown.c:749
#11 0xffffffff80b70589 in __rw_rlock_int (rw=0xfffff80003cf91f8, 
    file=0xffffffff812bc2f4 "/usr/src/sys/netinet/in_pcb.c", line=2092)
    at /usr/src/sys/kern/kern_rwlock.c:668
#12 0xffffffff80d04e3c in in_pcblookup_hash ()
    at /usr/src/sys/netinet/in_pcb.c:2092
#13 0xffffffff82821d8f in pf_socket_lookup (direction=<value optimized out>, 
    pd=0xfffffe0000582380, m=0xfffff80003b6a400)
    at /usr/src/sys/netpfil/pf/pf.c:2966
#14 0xffffffff82827b33 in pf_test_rule (rm=0xfffffe0000582428, 
    sm=0xfffffe0000582440, direction=2, kif=0xfffff8000342bb00, 
    m=0xfffff80003b6a400, off=20, pd=0xfffffe0000582380, 
    am=0xfffffe0000582400, rsm=0xfffffe00005823f0, inp=0x0)
    at /usr/src/sys/netpfil/pf/pf.c:3403
#15 0xffffffff828238b6 in pf_test (dir=<value optimized out>, 
    pflags=<value optimized out>, ifp=<value optimized out>, 
    m0=0xfffffe0000582558, inp=0x0) at /usr/src/sys/netpfil/pf/pf.c:5979
#16 0xffffffff82823591 in pf_test (dir=<value optimized out>, 
    pflags=<value optimized out>, ifp=<value optimized out>, 
    m0=0xfffffe0000582670, inp=<value optimized out>)
    at /usr/src/sys/netpfil/pf/pf.c:5513
#17 0xffffffff8283561d in pf_check_out (arg=<value optimized out>, 
    m=0xfffffe0000582670, ifp=<value optimized out>, 
    dir=<value optimized out>, flags=<value optimized out>, 
    inp=<value optimized out>) at /usr/src/sys/netpfil/pf/pf_ioctl.c:3782
#18 0xffffffff80c9cce7 in pfil_run_hooks (ph=<value optimized out>, 
    mp=0xfffffe0000582768, ifp=0xfffff80003468800, dir=2, flags=0, 
    inp=0xfffff80003cf91d8) at /usr/src/sys/net/pfil.c:111
#19 0xffffffff80d0dd7a in ip_output (m=<value optimized out>, 
    opt=<value optimized out>, ro=<value optimized out>, 
    flags=<value optimized out>, imo=0x0, inp=<value optimized out>)
    at /usr/src/sys/netinet/ip_output.c:120
#20 0xffffffff80d92dee in tcp_output (tp=<value optimized out>)
    at /usr/src/sys/netinet/tcp_output.c:1456
#21 0xffffffff80da2f58 in tcp_usr_connect (so=0xfffff80003e23358, 
    nam=0xfffff80003048d60, td=<value optimized out>)
    at /usr/src/sys/netinet/tcp_usrreq.c:547
#22 0xffffffff80c0f958 in soconnectat (fd=-100, so=0xfffff80003e23358, 
    nam=0x300000012, td=0xfffffe0000582a38)
    at /usr/src/sys/kern/uipc_socket.c:1228
#23 0xffffffff80c16e4f in kern_connectat (td=0xfffff8000389d000, dirfd=-100, 
    fd=<value optimized out>, sa=0xfffff80003048d60)
    at /usr/src/sys/kern/uipc_syscalls.c:513
#24 0xffffffff80c16d17 in sys_connect (td=0xfffff8000389d000, 
    uap=0xfffff8000389d3c0) at /usr/src/sys/kern/uipc_syscalls.c:474
#25 0xffffffff8104504c in amd64_syscall (td=0xfffff8000389d000, traced=0)
    at subr_syscall.c:135
#26 0xffffffff8101f6bd in fast_syscall_common ()
    at /usr/src/sys/amd64/amd64/exception.S:493
#27 0x000000080041084a in ?? ()


The bug #220217 is resolved by passing inpcb pointer to the pfil hook from enc_hhook. 
In our case pfil_hook is already called with it (not with NULL). But a further method pf_test is called twice - the second time from pf_route with inpcb=NULL. This eventually leads to the same deadlock as in bug #220217 case.

I suppose that passing inpcb pointer to pf_route and then to the pf_test can fix this, but I would really appreciate hearing experts opinion.
Comment 1 Kristof Provost freebsd_committer freebsd_triage 2018-06-07 18:47:35 UTC
Thank you for an excellent bug report.

I've managed to reproduce this, and at first glance your analysis seems to be correct.
I'm working on this, and hope to have a fix soon. If you don't see updates from me in the next two weeks feel free to remind me.
Comment 2 Kristof Provost freebsd_committer freebsd_triage 2018-06-07 20:20:53 UTC
I have a patch on http://people.freebsd.org/~kp/patches/228782.patch which should fix your problem.

I'm still writing the test cases for this so we include this case in our automated pf tests.
Comment 3 Tatiana 2018-06-08 10:48:09 UTC
(In reply to Kristof Provost from comment #2)
Thanks!

Also while applying the patch to our project I've noticed that you've probably missed changing a call to pf_test6 from pf_route6

Something like:
-		if (pf_test6(PF_FWD, ifp, &m0, NULL) != PF_PASS)
+		if (pf_test6(PF_FWD, ifp, &m0, inp) != PF_PASS)
Comment 4 commit-hook freebsd_committer freebsd_triage 2018-06-09 17:16:35 UTC
A commit references this bug:

Author: kp
Date: Sat Jun  9 14:17:07 UTC 2018
New revision: 334876
URL: https://svnweb.freebsd.org/changeset/base/334876

Log:
  pf: Fix deadlock with route-to

  If a locally generated packet is routed (with route-to/reply-to/dup-to) out of
  a different interface it's passed through the firewall again. This meant we
  lost the inp pointer and if we required the pointer (e.g. for user ID matching)
  we'd deadlock trying to acquire an inp lock we've already got.

  Pass the inp pointer along with pf_route()/pf_route6().

  PR:		228782
  MFC after:	1 week

Changes:
  head/sys/netpfil/pf/pf.c
Comment 5 Kristof Provost freebsd_committer freebsd_triage 2018-06-09 18:42:51 UTC
(In reply to Tatiana from comment #3)
Yes, you're right. That's okay in the committed patch.
Comment 6 commit-hook freebsd_committer freebsd_triage 2018-06-16 09:32:18 UTC
A commit references this bug:

Author: kp
Date: Sat Jun 16 09:32:05 UTC 2018
New revision: 335251
URL: https://svnweb.freebsd.org/changeset/base/335251

Log:
  MFC r334876:

  pf: Fix deadlock with route-to

  If a locally generated packet is routed (with route-to/reply-to/dup-to) out of
  a different interface it's passed through the firewall again. This meant we
  lost the inp pointer and if we required the pointer (e.g. for user ID matching)
  we'd deadlock trying to acquire an inp lock we've already got.

  Pass the inp pointer along with pf_route()/pf_route6().

  PR:		228782

Changes:
_U  stable/11/
  stable/11/sys/netpfil/pf/pf.c
Comment 7 commit-hook freebsd_committer freebsd_triage 2018-06-16 11:42:58 UTC
A commit references this bug:

Author: kp
Date: Sat Jun 16 11:42:27 UTC 2018
New revision: 335252
URL: https://svnweb.freebsd.org/changeset/base/335252

Log:
  MFC r334876:

  pf: Fix deadlock with route-to

  If a locally generated packet is routed (with route-to/reply-to/dup-to) out of
  a different interface it's passed through the firewall again. This meant we
  lost the inp pointer and if we required the pointer (e.g. for user ID matching)
  we'd deadlock trying to acquire an inp lock we've already got.

  Pass the inp pointer along with pf_route()/pf_route6().

  PR:		228782

Changes:
_U  stable/10/
  stable/10/sys/netpfil/pf/pf.c