Bug 254478 - Panic when using ipfw and divert sockets
Summary: Panic when using ipfw and divert sockets
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 12.2-RELEASE
Hardware: amd64 Any
: --- Affects Only Me
Assignee: freebsd-net (Nobody)
URL:
Keywords: crash, needs-qa
Depends on:
Blocks:
 
Reported: 2021-03-22 13:21 UTC by Daniel Kempkens
Modified: 2021-04-06 06:51 UTC (History)
2 users (show)

See Also:
koobs: mfc-stable13?
koobs: mfc-stable12?


Attachments
proposed patch (untested) (525 bytes, patch)
2021-03-24 09:19 UTC, Andrey V. Elsukov
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Daniel Kempkens 2021-03-22 13:21:30 UTC
# freebsd-version
12.2-RELEASE-p4

On some of our production systems, we're seeing rather frequent (up to multiple times per day) kernel panics that appear to be related to our use of divert sockets.

We use ipfw and divert sockets to basically implement DPI for HTTP traffic. We divert all initial packets of a HTTP connection and once we've seen the Host header, the packet is passed to a rule with "keep-state". We stop diverting packets (after the Host header) via a "check-state" rule before the divert rule(s). The (relevant) rules look something like this:

$ipfw add 2000 check-state
$ipfw add 2002 divert 9002 tcp from any to $dpi_dest 80
$ipfw add 2004 skipto 3000 ip from any to any
$ipfw add 2005 skipto 4000 tcp from any to any diverted keep-state
$ipfw add 2006 skipto 4000 tcp from any to any diverted

Our current thinking is that the panic is somehow related to the amount of packets we divert, since we currently only see the panic in production (and not in staging or dev).

We were able to save a crash dump and did an initial analysis, but were unable to figure out what exactly is going wrong (and where).
kgdb presented us with the following backtrace:

(kgdb) bt
#0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
#1  doadump (textdump=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:371
#2  0xffffffff80bbec45 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:451
#3  0xffffffff80bbf083 in vpanic (fmt=<optimized out>, ap=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:880
#4  0xffffffff80bbeea3 in panic (fmt=<unavailable>) at /usr/src/sys/kern/kern_shutdown.c:807
#5  0xffffffff8108e911 in trap_fatal (frame=0xfffffe00005f42d0, eva=368) at /usr/src/sys/amd64/amd64/trap.c:921
#6  0xffffffff8108e96f in trap_pfault (frame=0xfffffe00005f42d0, usermode=<optimized out>, signo=<optimized out>, ucode=<optimized out>) at /usr/src/sys/amd64/amd64/trap.c:739
#7  0xffffffff8108dfb6 in trap (frame=0xfffffe00005f42d0) at /usr/src/sys/amd64/amd64/trap.c:405
#8  <signal handler called>
#9  0xffffffff82515392 in atomic_fcmpset_long (dst=<optimized out>, src=18446735296944592704, expect=<optimized out>) at /usr/src/sys/amd64/include/atomic.h:221
#10 divert_packet (m=0xfffff80130033e00, incoming=<optimized out>) at /usr/src/sys/netinet/ip_divert.c:282
#11 0xffffffff8247fc12 in ipfw_divert (m0=0xfffffe00005f4550, incoming=1, rule=<optimized out>, tee=<optimized out>) at /usr/src/sys/netpfil/ipfw/ip_fw_pfil.c:531
#12 ipfw_check_packet (arg=<optimized out>, m0=0xfffffe00005f4550, ifp=<optimized out>, dir=1, inp=0x0) at /usr/src/sys/netpfil/ipfw/ip_fw_pfil.c:285
#13 0xffffffff80ce07b0 in pfil_run_hooks (ph=<optimized out>, mp=0xfffffe00005f45b8, ifp=0xfffff8000466b800, dir=1, flags=0, inp=0x0) at /usr/src/sys/net/pfil.c:117
#14 0xffffffff80d463eb in ip_tryforward (m=0xfffff80130033e00) at /usr/src/sys/netinet/ip_fastfwd.c:234
#15 0xffffffff80d48c74 in ip_input (m=0xfffff80130033e00) at /usr/src/sys/netinet/ip_input.c:575
#16 0xffffffff80cdf98a in netisr_dispatch_src (proto=1, source=<optimized out>, m=0x1) at /usr/src/sys/net/netisr.c:1124
#17 0xffffffff80cc2b68 in ether_demux (ifp=0xfffff8000466b800, m=0xfffff804800ad740) at /usr/src/sys/net/if_ethersubr.c:879
#18 0xffffffff824ee06e in ng_ether_rcv_upper (hook=<optimized out>, item=<optimized out>) at /usr/src/sys/netgraph/ng_ether.c:741
#19 0xffffffff824f457c in ng_apply_item (node=0xfffff80127857e00, item=0xfffff8058854be80, rw=0) at /usr/src/sys/netgraph/ng_base.c:2403
#20 0xffffffff824f42f8 in ng_snd_item (item=0xfffff8058854be80, flags=0) at /usr/src/sys/netgraph/ng_base.c:2320
#21 0xffffffff824f457c in ng_apply_item (node=0xfffff80484ee3700, item=0xfffff8058854be80, rw=0) at /usr/src/sys/netgraph/ng_base.c:2403
#22 0xffffffff824f42f8 in ng_snd_item (item=0xfffff8058854be80, flags=0) at /usr/src/sys/netgraph/ng_base.c:2320
#23 0xffffffff824edcac in ng_ether_input (ifp=<optimized out>, mp=0xfffffe00005f4950) at /usr/src/sys/netgraph/ng_ether.c:255
#24 0xffffffff80cc3cbb in ether_input_internal (ifp=0xfffff8000466b800, m=0xfffff80130033e00) at /usr/src/sys/net/if_ethersubr.c:616
#25 ether_nh_input (m=<optimized out>) at /usr/src/sys/net/if_ethersubr.c:697
#26 0xffffffff80cdf98a in netisr_dispatch_src (proto=5, source=<optimized out>, m=0x1) at /usr/src/sys/net/netisr.c:1124
#27 0xffffffff80cc2f8b in ether_input (ifp=0xfffff8000466b800, m=0xfffff804800ad740) at /usr/src/sys/net/if_ethersubr.c:787
#28 0xffffffff80cdc146 in iflib_rxeof (rxq=<optimized out>, budget=<optimized out>) at /usr/src/sys/net/iflib.c:2945
#29 0xffffffff80cd6652 in _task_fn_rx (context=0xfffffe00a0924900) at /usr/src/sys/net/iflib.c:3868
#30 0xffffffff80c09941 in gtaskqueue_run_locked (queue=0xfffff80480082d00) at /usr/src/sys/kern/subr_gtaskqueue.c:362
#31 0xffffffff80c09606 in gtaskqueue_thread_loop (arg=<optimized out>) at /usr/src/sys/kern/subr_gtaskqueue.c:537
#32 0xffffffff80b8088e in fork_exit (callout=0xffffffff80c09550 <gtaskqueue_thread_loop>, arg=0xfffffe00007890e0, frame=0xfffffe00005f4c00) at /usr/src/sys/kern/kern_fork.c:1080
#33 <signal handler called>

Frame 10 points to the following source code:

(kgdb) list /usr/src/sys/netinet/ip_divert.c:282
277        CK_LIST_FOREACH(inp, &V_divcb, inp_list) {
278            /* XXX why does only one socket match? */
279            if (inp->inp_lport == nport) {
280                INP_RLOCK(inp);
281                sa = inp->inp_socket;
282                SOCKBUF_LOCK(&sa->so_rcv);
283                if (sbappendaddr_locked(&sa->so_rcv,
284                    (struct sockaddr *)&divsrc, m,
285                    (struct mbuf *)0) == 0) {
286                    SOCKBUF_UNLOCK(&sa->so_rcv);

It appears that sa is NULL when it probably shouldn't be:

(kgdb) print *sa
Cannot access memory at address 0x0

(kgdb) print *inp
$6 = {[...], inp_socket = 0x0, [...]}

We can provide more information if needed. If you need the entire crash dump, we might be able to share it privately.
Comment 1 Andrey V. Elsukov freebsd_committer 2021-03-24 09:02:03 UTC
I think such problem can be reproduced if you will do open/close divert socket in a loop. Is it possible that your application sometimes does that?
Can you show what contains inp in the last kgdb command?
Comment 2 Andrey V. Elsukov freebsd_committer 2021-03-24 09:19:57 UTC
Created attachment 223546 [details]
proposed patch (untested)
Comment 3 Daniel Kempkens 2021-03-24 15:42:50 UTC
(In reply to Andrey V. Elsukov from comment #1)

> I think such problem can be reproduced if you will do open/close divert socket in a loop. Is it possible that your application sometimes does that?

We were not able to confirm that this actually happens (yet), but we're fairly certain that under the right conditions what you described can indeed happen.

We have added some logging to confirm our suspicions.

Sadly no ETA on when we can try this in production again. We're still working on a setup to reproduce this in our development environment.

> Can you show what contains inp in the last kgdb command?

Sure!

(kgdb) print *inp
$6 = {inp_hash = {cle_next = 0xfffff804bfdd5b70, cle_prev = 0xfffff8012784ae20}, inp_pcbgrouphash = {cle_next = 0x0, cle_prev = 0x0}, inp_lock = {lock_object = {
      lo_name = 0xffffffff82515931 "divinp", lo_flags = 90898432, lo_data = 0, lo_witness = 0x0}, rw_lock = 33}, inp_hpts = {tqe_next = 0x0, tqe_prev = 0x0},
  inp_hpts_request = 0, inp_in_hpts = 0 '____preserved_4____00', inp_in_input = 0 '____preserved_4____00', inp_hpts_cpu = 0, inp_refcount = 1, inp_flags = 8388616, inp_flags2 = 16, inp_input_cpu = 0,
  inp_hpts_cpu_set = 0 '____preserved_4____00', inp_input_cpu_set = 0 '____preserved_4____00', inp_hpts_calls = 0 '____preserved_4____00', inp_input_calls = 0 '____preserved_4____00', inp_spare_bits2 = 0 '____preserved_4____00', inp_spare_byte = 0 '____preserved_4____00',
  inp_ppcb = 0x0, inp_socket = 0x0, inp_hptsslot = 0, inp_hpts_drop_reas = 0, inp_input = {tqe_next = 0x0, tqe_prev = 0x0}, inp_pcbinfo = 0xfffffe00006f4538,
  inp_pcbgroup = 0x0, inp_pcbgroup_wild = {cle_next = 0x0, cle_prev = 0x0}, inp_cred = 0xfffff801318fa200, inp_flow = 0, inp_vflag = 1 '____preserved_4____01', inp_ip_ttl = 0 '____preserved_4____00',
  inp_ip_p = 2 '____preserved_4____02', inp_ip_minttl = 0 '____preserved_4____00', inp_flowid = 0, inp_snd_tag = 0x0, inp_flowtype = 0, inp_rss_listen_bucket = 0, inp_inc = {inc_flags = 0 '____preserved_4____00',
    inc_len = 0 '____preserved_4____00', inc_fibnum = 0, inc_ie = {ie_fport = 0, ie_lport = 10787, ie_dependfaddr = {id46_addr = {ia46_pad32 = {0, 0, 0}, ia46_addr4 = {s_addr = 0}}, id6_addr = {
          __u6_addr = {__u6_addr8 = '____preserved_4____00' <repeats 15 times>, __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}}}, ie_dependladdr = {id46_addr = {
          ia46_pad32 = {0, 0, 0}, ia46_addr4 = {s_addr = 0}}, id6_addr = {__u6_addr = {__u6_addr8 = '____preserved_4____00' <repeats 15 times>, __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0},
            __u6_addr32 = {0, 0, 0, 0}}}}, ie6_zoneid = 0}}, inp_label = 0x0, inp_sp = 0xfffff80127913ee0, {inp_ip_tos = 0 '____preserved_4____00', inp_options = 0x0, inp_moptions = 0x0}, {
    in6p_options = 0x0, in6p_outputopts = 0x0, in6p_moptions = 0x0, in6p_icmp6filt = 0x0, in6p_cksum = 0, in6p_hops = 0}, inp_portlist = {cle_next = 0x0,
    cle_prev = 0xfffff80127c450a0}, inp_phd = 0xfffff80127c45080, inp_gencnt = 74, spare_ptr = 0x0, inp_rt_cookie = 0, {inp_route = {ro_rt = 0x0, ro_lle = 0x0,
      ro_prepend = 0x0, ro_plen = 0, ro_flags = 256, ro_mtu = 0, spare = 0, ro_dst = {sa_len = 0 '____preserved_4____00', sa_family = 0 '____preserved_4____00', sa_data = '____preserved_4____00' <repeats 13 times>}},
    inp_route6 = {ro_rt = 0x0, ro_lle = 0x0, ro_prepend = 0x0, ro_plen = 0, ro_flags = 256, ro_mtu = 0, spare = 0, ro_dst = {sin6_len = 0 '____preserved_4____00', sin6_family = 0 '____preserved_4____00',
        sin6_port = 0, sin6_flowinfo = 0, sin6_addr = {__u6_addr = {__u6_addr8 = '____preserved_4____00' <repeats 15 times>, __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0,
              0}}}, sin6_scope_id = 0}}}, inp_list = {cle_next = 0xfffff804bfdd5b70, cle_prev = 0xfffffe00006f4530}, inp_epoch_ctx = {data = {
      0xffffffff80d43b00 <in_pcbfree_deferred>, 0xfffff80127c45088}}}
Comment 4 Andrey V. Elsukov freebsd_committer 2021-03-24 15:49:21 UTC
(In reply to Daniel Kempkens from comment #3)

inp_flags2 = 16

it looks like my guess was correct, PCB has INP_FREED flag set, so its socket is already closed and should not be used. Proposed patch should help.
Comment 5 commit-hook freebsd_committer 2021-03-30 09:43:05 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=c80a4b76ceacc5aab322e7ac1407eea8c90cb3b1

commit c80a4b76ceacc5aab322e7ac1407eea8c90cb3b1
Author:     Andrey V. Elsukov <ae@FreeBSD.org>
AuthorDate: 2021-03-30 09:31:09 +0000
Commit:     Andrey V. Elsukov <ae@FreeBSD.org>
CommitDate: 2021-03-30 09:31:09 +0000

    ipdivert: check that PCB is still valid after taking INPCB_RLOCK.

    We are inspecting PCBs of divert sockets under NET_EPOCH section,
    but PCB could be already detached and we should check INP_FREED flag
    when we took INP_RLOCK.

    PR:             254478
    MFC after:      1 week
    Differential Revision:  https://reviews.freebsd.org/D29420

 sys/netinet/ip_divert.c | 4 ++++
 1 file changed, 4 insertions(+)
Comment 6 commit-hook freebsd_committer 2021-04-06 06:50:52 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=6b8c65318e81a451b33ed57b84a5495284dcb20f

commit 6b8c65318e81a451b33ed57b84a5495284dcb20f
Author:     Andrey V. Elsukov <ae@FreeBSD.org>
AuthorDate: 2021-03-30 09:31:09 +0000
Commit:     Andrey V. Elsukov <ae@FreeBSD.org>
CommitDate: 2021-04-06 06:47:54 +0000

    ipdivert: check that PCB is still valid after taking INPCB_RLOCK.

    We are inspecting PCBs of divert sockets under NET_EPOCH section,
    but PCB could be already detached and we should check INP_FREED flag
    when we took INP_RLOCK.

    PR:             254478
    Differential Revision:  https://reviews.freebsd.org/D29420

    (cherry picked from commit c80a4b76ceacc5aab322e7ac1407eea8c90cb3b1)

 sys/netinet/ip_divert.c | 4 ++++
 1 file changed, 4 insertions(+)
Comment 7 commit-hook freebsd_committer 2021-04-06 06:51:53 UTC
A commit in branch stable/12 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=38c299fe856216d6ab38eb5e04d9ee4f8c22995d

commit 38c299fe856216d6ab38eb5e04d9ee4f8c22995d
Author:     Andrey V. Elsukov <ae@FreeBSD.org>
AuthorDate: 2021-03-30 09:31:09 +0000
Commit:     Andrey V. Elsukov <ae@FreeBSD.org>
CommitDate: 2021-04-06 06:50:55 +0000

    ipdivert: check that PCB is still valid after taking INPCB_RLOCK.

    We are inspecting PCBs of divert sockets under NET_EPOCH section,
    but PCB could be already detached and we should check INP_FREED flag
    when we took INP_RLOCK.

    PR:             254478
    Differential Revision:  https://reviews.freebsd.org/D29420

    (cherry picked from commit c80a4b76ceacc5aab322e7ac1407eea8c90cb3b1)

 sys/netinet/ip_divert.c | 4 ++++
 1 file changed, 4 insertions(+)