254478 – Panic when using ipfw and divert sockets

Bug 254478 - Panic when using ipfw and divert sockets

Summary: Panic when using ipfw and divert sockets

Status:	Closed FIXED

Alias:	None

Product:	Base System
Classification:	Unclassified
Component:	kern (show other bugs)
Version:	12.2-RELEASE
Hardware:	amd64 Any

Importance:	--- Affects Only Me
Assignee:	Andrey V. Elsukov

URL:
Keywords:	crash

Depends on:
Blocks:

Reported:	2021-03-22 13:21 UTC by Daniel Kempkens
Modified:	2021-12-01 22:58 UTC (History)
CC List:	6 users (show)

See Also:

Flags:	koobs: mfc-stable13+ koobs: mfc-stable12+

Attachments
proposed patch (untested) (525 bytes, patch) 2021-03-24 09:19 UTC, Andrey V. Elsukov	no flags	Details \| Diff
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Daniel Kempkens 2021-03-22 13:21:30 UTC

# freebsd-version
12.2-RELEASE-p4

On some of our production systems, we're seeing rather frequent (up to multiple times per day) kernel panics that appear to be related to our use of divert sockets.

We use ipfw and divert sockets to basically implement DPI for HTTP traffic. We divert all initial packets of a HTTP connection and once we've seen the Host header, the packet is passed to a rule with "keep-state". We stop diverting packets (after the Host header) via a "check-state" rule before the divert rule(s). The (relevant) rules look something like this:

$ipfw add 2000 check-state
$ipfw add 2002 divert 9002 tcp from any to $dpi_dest 80
$ipfw add 2004 skipto 3000 ip from any to any
$ipfw add 2005 skipto 4000 tcp from any to any diverted keep-state
$ipfw add 2006 skipto 4000 tcp from any to any diverted

Our current thinking is that the panic is somehow related to the amount of packets we divert, since we currently only see the panic in production (and not in staging or dev).

We were able to save a crash dump and did an initial analysis, but were unable to figure out what exactly is going wrong (and where).
kgdb presented us with the following backtrace:

(kgdb) bt
#0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
#1  doadump (textdump=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:371
#2  0xffffffff80bbec45 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:451
#3  0xffffffff80bbf083 in vpanic (fmt=<optimized out>, ap=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:880
#4  0xffffffff80bbeea3 in panic (fmt=<unavailable>) at /usr/src/sys/kern/kern_shutdown.c:807
#5  0xffffffff8108e911 in trap_fatal (frame=0xfffffe00005f42d0, eva=368) at /usr/src/sys/amd64/amd64/trap.c:921
#6  0xffffffff8108e96f in trap_pfault (frame=0xfffffe00005f42d0, usermode=<optimized out>, signo=<optimized out>, ucode=<optimized out>) at /usr/src/sys/amd64/amd64/trap.c:739
#7  0xffffffff8108dfb6 in trap (frame=0xfffffe00005f42d0) at /usr/src/sys/amd64/amd64/trap.c:405
#8  <signal handler called>
#9  0xffffffff82515392 in atomic_fcmpset_long (dst=<optimized out>, src=18446735296944592704, expect=<optimized out>) at /usr/src/sys/amd64/include/atomic.h:221
#10 divert_packet (m=0xfffff80130033e00, incoming=<optimized out>) at /usr/src/sys/netinet/ip_divert.c:282
#11 0xffffffff8247fc12 in ipfw_divert (m0=0xfffffe00005f4550, incoming=1, rule=<optimized out>, tee=<optimized out>) at /usr/src/sys/netpfil/ipfw/ip_fw_pfil.c:531
#12 ipfw_check_packet (arg=<optimized out>, m0=0xfffffe00005f4550, ifp=<optimized out>, dir=1, inp=0x0) at /usr/src/sys/netpfil/ipfw/ip_fw_pfil.c:285
#13 0xffffffff80ce07b0 in pfil_run_hooks (ph=<optimized out>, mp=0xfffffe00005f45b8, ifp=0xfffff8000466b800, dir=1, flags=0, inp=0x0) at /usr/src/sys/net/pfil.c:117
#14 0xffffffff80d463eb in ip_tryforward (m=0xfffff80130033e00) at /usr/src/sys/netinet/ip_fastfwd.c:234
#15 0xffffffff80d48c74 in ip_input (m=0xfffff80130033e00) at /usr/src/sys/netinet/ip_input.c:575
#16 0xffffffff80cdf98a in netisr_dispatch_src (proto=1, source=<optimized out>, m=0x1) at /usr/src/sys/net/netisr.c:1124
#17 0xffffffff80cc2b68 in ether_demux (ifp=0xfffff8000466b800, m=0xfffff804800ad740) at /usr/src/sys/net/if_ethersubr.c:879
#18 0xffffffff824ee06e in ng_ether_rcv_upper (hook=<optimized out>, item=<optimized out>) at /usr/src/sys/netgraph/ng_ether.c:741
#19 0xffffffff824f457c in ng_apply_item (node=0xfffff80127857e00, item=0xfffff8058854be80, rw=0) at /usr/src/sys/netgraph/ng_base.c:2403
#20 0xffffffff824f42f8 in ng_snd_item (item=0xfffff8058854be80, flags=0) at /usr/src/sys/netgraph/ng_base.c:2320
#21 0xffffffff824f457c in ng_apply_item (node=0xfffff80484ee3700, item=0xfffff8058854be80, rw=0) at /usr/src/sys/netgraph/ng_base.c:2403
#22 0xffffffff824f42f8 in ng_snd_item (item=0xfffff8058854be80, flags=0) at /usr/src/sys/netgraph/ng_base.c:2320
#23 0xffffffff824edcac in ng_ether_input (ifp=<optimized out>, mp=0xfffffe00005f4950) at /usr/src/sys/netgraph/ng_ether.c:255
#24 0xffffffff80cc3cbb in ether_input_internal (ifp=0xfffff8000466b800, m=0xfffff80130033e00) at /usr/src/sys/net/if_ethersubr.c:616
#25 ether_nh_input (m=<optimized out>) at /usr/src/sys/net/if_ethersubr.c:697
#26 0xffffffff80cdf98a in netisr_dispatch_src (proto=5, source=<optimized out>, m=0x1) at /usr/src/sys/net/netisr.c:1124
#27 0xffffffff80cc2f8b in ether_input (ifp=0xfffff8000466b800, m=0xfffff804800ad740) at /usr/src/sys/net/if_ethersubr.c:787
#28 0xffffffff80cdc146 in iflib_rxeof (rxq=<optimized out>, budget=<optimized out>) at /usr/src/sys/net/iflib.c:2945
#29 0xffffffff80cd6652 in _task_fn_rx (context=0xfffffe00a0924900) at /usr/src/sys/net/iflib.c:3868
#30 0xffffffff80c09941 in gtaskqueue_run_locked (queue=0xfffff80480082d00) at /usr/src/sys/kern/subr_gtaskqueue.c:362
#31 0xffffffff80c09606 in gtaskqueue_thread_loop (arg=<optimized out>) at /usr/src/sys/kern/subr_gtaskqueue.c:537
#32 0xffffffff80b8088e in fork_exit (callout=0xffffffff80c09550 <gtaskqueue_thread_loop>, arg=0xfffffe00007890e0, frame=0xfffffe00005f4c00) at /usr/src/sys/kern/kern_fork.c:1080
#33 <signal handler called>

Frame 10 points to the following source code:

(kgdb) list /usr/src/sys/netinet/ip_divert.c:282
277        CK_LIST_FOREACH(inp, &V_divcb, inp_list) {
278            /* XXX why does only one socket match? */
279            if (inp->inp_lport == nport) {
280                INP_RLOCK(inp);
281                sa = inp->inp_socket;
282                SOCKBUF_LOCK(&sa->so_rcv);
283                if (sbappendaddr_locked(&sa->so_rcv,
284                    (struct sockaddr *)&divsrc, m,
285                    (struct mbuf *)0) == 0) {
286                    SOCKBUF_UNLOCK(&sa->so_rcv);

It appears that sa is NULL when it probably shouldn't be:

(kgdb) print *sa
Cannot access memory at address 0x0

(kgdb) print *inp
$6 = {[...], inp_socket = 0x0, [...]}

We can provide more information if needed. If you need the entire crash dump, we might be able to share it privately.

Comment 1 Andrey V. Elsukov freebsd_committer

2021-03-24 09:02:03 UTC

I think such problem can be reproduced if you will do open/close divert socket in a loop. Is it possible that your application sometimes does that?
Can you show what contains inp in the last kgdb command?

Comment 2 Andrey V. Elsukov freebsd_committer

2021-03-24 09:19:57 UTC

Created attachment 223546 [details]
proposed patch (untested)

Comment 3 Daniel Kempkens 2021-03-24 15:42:50 UTC

(In reply to Andrey V. Elsukov from comment #1)

> I think such problem can be reproduced if you will do open/close divert socket in a loop. Is it possible that your application sometimes does that?

We were not able to confirm that this actually happens (yet), but we're fairly certain that under the right conditions what you described can indeed happen.

We have added some logging to confirm our suspicions.

Sadly no ETA on when we can try this in production again. We're still working on a setup to reproduce this in our development environment.

> Can you show what contains inp in the last kgdb command?

Sure!

(kgdb) print *inp
$6 = {inp_hash = {cle_next = 0xfffff804bfdd5b70, cle_prev = 0xfffff8012784ae20}, inp_pcbgrouphash = {cle_next = 0x0, cle_prev = 0x0}, inp_lock = {lock_object = {
      lo_name = 0xffffffff82515931 "divinp", lo_flags = 90898432, lo_data = 0, lo_witness = 0x0}, rw_lock = 33}, inp_hpts = {tqe_next = 0x0, tqe_prev = 0x0},
  inp_hpts_request = 0, inp_in_hpts = 0 '____preserved_4____00', inp_in_input = 0 '____preserved_4____00', inp_hpts_cpu = 0, inp_refcount = 1, inp_flags = 8388616, inp_flags2 = 16, inp_input_cpu = 0,
  inp_hpts_cpu_set = 0 '____preserved_4____00', inp_input_cpu_set = 0 '____preserved_4____00', inp_hpts_calls = 0 '____preserved_4____00', inp_input_calls = 0 '____preserved_4____00', inp_spare_bits2 = 0 '____preserved_4____00', inp_spare_byte = 0 '____preserved_4____00',
  inp_ppcb = 0x0, inp_socket = 0x0, inp_hptsslot = 0, inp_hpts_drop_reas = 0, inp_input = {tqe_next = 0x0, tqe_prev = 0x0}, inp_pcbinfo = 0xfffffe00006f4538,
  inp_pcbgroup = 0x0, inp_pcbgroup_wild = {cle_next = 0x0, cle_prev = 0x0}, inp_cred = 0xfffff801318fa200, inp_flow = 0, inp_vflag = 1 '____preserved_4____01', inp_ip_ttl = 0 '____preserved_4____00',
  inp_ip_p = 2 '____preserved_4____02', inp_ip_minttl = 0 '____preserved_4____00', inp_flowid = 0, inp_snd_tag = 0x0, inp_flowtype = 0, inp_rss_listen_bucket = 0, inp_inc = {inc_flags = 0 '____preserved_4____00',
    inc_len = 0 '____preserved_4____00', inc_fibnum = 0, inc_ie = {ie_fport = 0, ie_lport = 10787, ie_dependfaddr = {id46_addr = {ia46_pad32 = {0, 0, 0}, ia46_addr4 = {s_addr = 0}}, id6_addr = {
          __u6_addr = {__u6_addr8 = '____preserved_4____00' <repeats 15 times>, __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}}}, ie_dependladdr = {id46_addr = {
          ia46_pad32 = {0, 0, 0}, ia46_addr4 = {s_addr = 0}}, id6_addr = {__u6_addr = {__u6_addr8 = '____preserved_4____00' <repeats 15 times>, __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0},
            __u6_addr32 = {0, 0, 0, 0}}}}, ie6_zoneid = 0}}, inp_label = 0x0, inp_sp = 0xfffff80127913ee0, {inp_ip_tos = 0 '____preserved_4____00', inp_options = 0x0, inp_moptions = 0x0}, {
    in6p_options = 0x0, in6p_outputopts = 0x0, in6p_moptions = 0x0, in6p_icmp6filt = 0x0, in6p_cksum = 0, in6p_hops = 0}, inp_portlist = {cle_next = 0x0,
    cle_prev = 0xfffff80127c450a0}, inp_phd = 0xfffff80127c45080, inp_gencnt = 74, spare_ptr = 0x0, inp_rt_cookie = 0, {inp_route = {ro_rt = 0x0, ro_lle = 0x0,
      ro_prepend = 0x0, ro_plen = 0, ro_flags = 256, ro_mtu = 0, spare = 0, ro_dst = {sa_len = 0 '____preserved_4____00', sa_family = 0 '____preserved_4____00', sa_data = '____preserved_4____00' <repeats 13 times>}},
    inp_route6 = {ro_rt = 0x0, ro_lle = 0x0, ro_prepend = 0x0, ro_plen = 0, ro_flags = 256, ro_mtu = 0, spare = 0, ro_dst = {sin6_len = 0 '____preserved_4____00', sin6_family = 0 '____preserved_4____00',
        sin6_port = 0, sin6_flowinfo = 0, sin6_addr = {__u6_addr = {__u6_addr8 = '____preserved_4____00' <repeats 15 times>, __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0,
              0}}}, sin6_scope_id = 0}}}, inp_list = {cle_next = 0xfffff804bfdd5b70, cle_prev = 0xfffffe00006f4530}, inp_epoch_ctx = {data = {
      0xffffffff80d43b00 <in_pcbfree_deferred>, 0xfffff80127c45088}}}

Comment 4 Andrey V. Elsukov freebsd_committer

2021-03-24 15:49:21 UTC

(In reply to Daniel Kempkens from comment #3)

inp_flags2 = 16

it looks like my guess was correct, PCB has INP_FREED flag set, so its socket is already closed and should not be used. Proposed patch should help.

Comment 5 commit-hook freebsd_committer

2021-03-30 09:43:05 UTC

A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=c80a4b76ceacc5aab322e7ac1407eea8c90cb3b1

commit c80a4b76ceacc5aab322e7ac1407eea8c90cb3b1
Author:     Andrey V. Elsukov <ae@FreeBSD.org>
AuthorDate: 2021-03-30 09:31:09 +0000
Commit:     Andrey V. Elsukov <ae@FreeBSD.org>
CommitDate: 2021-03-30 09:31:09 +0000

    ipdivert: check that PCB is still valid after taking INPCB_RLOCK.

    We are inspecting PCBs of divert sockets under NET_EPOCH section,
    but PCB could be already detached and we should check INP_FREED flag
    when we took INP_RLOCK.

    PR:             254478
    MFC after:      1 week
    Differential Revision:  https://reviews.freebsd.org/D29420

 sys/netinet/ip_divert.c | 4 ++++
 1 file changed, 4 insertions(+)

Comment 6 commit-hook freebsd_committer

2021-04-06 06:50:52 UTC

A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=6b8c65318e81a451b33ed57b84a5495284dcb20f

commit 6b8c65318e81a451b33ed57b84a5495284dcb20f
Author:     Andrey V. Elsukov <ae@FreeBSD.org>
AuthorDate: 2021-03-30 09:31:09 +0000
Commit:     Andrey V. Elsukov <ae@FreeBSD.org>
CommitDate: 2021-04-06 06:47:54 +0000

    ipdivert: check that PCB is still valid after taking INPCB_RLOCK.

    We are inspecting PCBs of divert sockets under NET_EPOCH section,
    but PCB could be already detached and we should check INP_FREED flag
    when we took INP_RLOCK.

    PR:             254478
    Differential Revision:  https://reviews.freebsd.org/D29420

    (cherry picked from commit c80a4b76ceacc5aab322e7ac1407eea8c90cb3b1)

 sys/netinet/ip_divert.c | 4 ++++
 1 file changed, 4 insertions(+)

Comment 7 commit-hook freebsd_committer

2021-04-06 06:51:53 UTC

A commit in branch stable/12 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=38c299fe856216d6ab38eb5e04d9ee4f8c22995d

commit 38c299fe856216d6ab38eb5e04d9ee4f8c22995d
Author:     Andrey V. Elsukov <ae@FreeBSD.org>
AuthorDate: 2021-03-30 09:31:09 +0000
Commit:     Andrey V. Elsukov <ae@FreeBSD.org>
CommitDate: 2021-04-06 06:50:55 +0000

    ipdivert: check that PCB is still valid after taking INPCB_RLOCK.

    We are inspecting PCBs of divert sockets under NET_EPOCH section,
    but PCB could be already detached and we should check INP_FREED flag
    when we took INP_RLOCK.

    PR:             254478
    Differential Revision:  https://reviews.freebsd.org/D29420

    (cherry picked from commit c80a4b76ceacc5aab322e7ac1407eea8c90cb3b1)

 sys/netinet/ip_divert.c | 4 ++++
 1 file changed, 4 insertions(+)

Comment 8 Jack 2021-07-11 12:15:20 UTC

I am getting an instant kernel panic when try a machine behind the nat tries to access the internet.

I have the following firewall rules in ipfw bge0 being the public interface and bge1 being the private with ip 10.0.0.1

ipfw nat 1 config if bge0 same_ports
ipfw add 100 nat 1 ip4 from any to me in via bge0
ipfw add 101 nat 1 ip4 from 10.0.0.0/24 to any out via bge0
ipfw add allow ip from any to any

Comment 9 Jack 2021-07-11 12:24:53 UTC

-try

Forgot to mention it's happening on the latest git src
13.0-STABLE FreeBSD 13.0-STABLE #0 stable/13-n246228-47c5e288eee9: Sun Jul 11 01:16:08 PDT 2021

natd is also nonstop crashing after the update
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=257079

Comment 10 karl 2021-07-11 12:40:51 UTC

(In reply to Jack from comment #9)

This may be unrelated to the above but rather caused by the recent splay-tree implementation if you have recently pulled source updates.  There is a hotfix on the -STABLE mailing list for it and a review in process at https://reviews.freebsd.org/D31132

If your panic backtrace looks like this it is likely that bug:

HouseKeeping() at HouseKeeping+0x1c/frame 0xfffffe0017b6b320
LibAliasInLocked() at LibAliasInLocked+0x2f/frame 0xfffffe0017b6b3e0
LibAliasIn() at LibAliasIn+0x46/frame 0xfffffe0017b6b410
ipfw_nat() at ipfw_nat+0x234/frame 0xfffffe0017b6b460

Comment 11 Jack 2021-07-11 12:50:27 UTC

(In reply to karl from comment #10)
Yes, that appears to be the similar bt I'm getting.

(gdb) bt full
#0  0x0000000800254e20 in HouseKeeping () from /lib/libalias.so.7
No symbol table info available.
#1  0x00000008002510ca in ?? () from /lib/libalias.so.7
No symbol table info available.
#2  0x0000000000206e0f in ?? ()
No symbol table info available.
#3  0x00000000002053d7 in ?? ()
No symbol table info available.
#4  0x000000000020460f in ?? ()
No symbol table info available.
#5  0x0000000000204510 in ?? ()
No symbol table info available.
#6  0x0000000000000000 in ?? ()
No symbol table info available.

I'm also getting instant kernel panics when I tried doing ipfw nat without natd in https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=257079

Not sure if it's related.

Comment 12 karl 2021-07-11 12:55:25 UTC

(In reply to Jack from comment #11)

Same bug. 

Hotfix here will stop it (note this is 12.2-STABLE, may be slightly different offsets in 13+); I have roughly a day on it now with no panic and previously it was blowing up within minutes:

[karl@NewFS /usr/src]$ git diff .
diff --git a/sys/netinet/libalias/alias_db.c b/sys/netinet/libalias/alias_db.c
index 783a8ca3baa..45faf5a2a40 100644
--- a/sys/netinet/libalias/alias_db.c
+++ b/sys/netinet/libalias/alias_db.c
@@ -1769,7 +1769,8 @@ HouseKeeping(struct libalias *la)
         * Reduce the amount of house keeping work substantially by
         * sampling over the packets.
         */
-       if (packets % packet_limit == 0) {
+       if (packet_limit == 0 || packets % packet_limit == 0) {
+/*     if (packets % packet_limit == 0) {      */
                time_t now;

 #ifdef _KERNEL

Comment 13 Jack 2021-07-11 23:04:34 UTC

How did this bug get into -stable? ;)

Comment 14 Lutz Donnerhacke freebsd_committer

2021-07-12 07:47:54 UTC

(In reply to Jack from comment #13)

The low rate issue was not covered by explicit tests and real world use cases. Shit happens.

Comment 15 Mark Johnston freebsd_committer

2021-12-01 13:15:56 UTC

Is there anything left to do for this bug?

Comment 16 Jack 2021-12-01 22:41:56 UTC

So far so good here on latest 13-stable.

Comment 17 Kubilay Kocak freebsd_committer

2021-12-01 22:58:15 UTC

^Triage: Assign to committer that resolved and track stable merges.

Please re-open this issue with additional information if there are any changes remaining required to resolve this issue.