# freebsd-version 12.2-RELEASE-p4 On some of our production systems, we're seeing rather frequent (up to multiple times per day) kernel panics that appear to be related to our use of divert sockets. We use ipfw and divert sockets to basically implement DPI for HTTP traffic. We divert all initial packets of a HTTP connection and once we've seen the Host header, the packet is passed to a rule with "keep-state". We stop diverting packets (after the Host header) via a "check-state" rule before the divert rule(s). The (relevant) rules look something like this: $ipfw add 2000 check-state $ipfw add 2002 divert 9002 tcp from any to $dpi_dest 80 $ipfw add 2004 skipto 3000 ip from any to any $ipfw add 2005 skipto 4000 tcp from any to any diverted keep-state $ipfw add 2006 skipto 4000 tcp from any to any diverted Our current thinking is that the panic is somehow related to the amount of packets we divert, since we currently only see the panic in production (and not in staging or dev). We were able to save a crash dump and did an initial analysis, but were unable to figure out what exactly is going wrong (and where). kgdb presented us with the following backtrace: (kgdb) bt #0 __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 #1 doadump (textdump=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:371 #2 0xffffffff80bbec45 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:451 #3 0xffffffff80bbf083 in vpanic (fmt=<optimized out>, ap=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:880 #4 0xffffffff80bbeea3 in panic (fmt=<unavailable>) at /usr/src/sys/kern/kern_shutdown.c:807 #5 0xffffffff8108e911 in trap_fatal (frame=0xfffffe00005f42d0, eva=368) at /usr/src/sys/amd64/amd64/trap.c:921 #6 0xffffffff8108e96f in trap_pfault (frame=0xfffffe00005f42d0, usermode=<optimized out>, signo=<optimized out>, ucode=<optimized out>) at /usr/src/sys/amd64/amd64/trap.c:739 #7 0xffffffff8108dfb6 in trap (frame=0xfffffe00005f42d0) at /usr/src/sys/amd64/amd64/trap.c:405 #8 <signal handler called> #9 0xffffffff82515392 in atomic_fcmpset_long (dst=<optimized out>, src=18446735296944592704, expect=<optimized out>) at /usr/src/sys/amd64/include/atomic.h:221 #10 divert_packet (m=0xfffff80130033e00, incoming=<optimized out>) at /usr/src/sys/netinet/ip_divert.c:282 #11 0xffffffff8247fc12 in ipfw_divert (m0=0xfffffe00005f4550, incoming=1, rule=<optimized out>, tee=<optimized out>) at /usr/src/sys/netpfil/ipfw/ip_fw_pfil.c:531 #12 ipfw_check_packet (arg=<optimized out>, m0=0xfffffe00005f4550, ifp=<optimized out>, dir=1, inp=0x0) at /usr/src/sys/netpfil/ipfw/ip_fw_pfil.c:285 #13 0xffffffff80ce07b0 in pfil_run_hooks (ph=<optimized out>, mp=0xfffffe00005f45b8, ifp=0xfffff8000466b800, dir=1, flags=0, inp=0x0) at /usr/src/sys/net/pfil.c:117 #14 0xffffffff80d463eb in ip_tryforward (m=0xfffff80130033e00) at /usr/src/sys/netinet/ip_fastfwd.c:234 #15 0xffffffff80d48c74 in ip_input (m=0xfffff80130033e00) at /usr/src/sys/netinet/ip_input.c:575 #16 0xffffffff80cdf98a in netisr_dispatch_src (proto=1, source=<optimized out>, m=0x1) at /usr/src/sys/net/netisr.c:1124 #17 0xffffffff80cc2b68 in ether_demux (ifp=0xfffff8000466b800, m=0xfffff804800ad740) at /usr/src/sys/net/if_ethersubr.c:879 #18 0xffffffff824ee06e in ng_ether_rcv_upper (hook=<optimized out>, item=<optimized out>) at /usr/src/sys/netgraph/ng_ether.c:741 #19 0xffffffff824f457c in ng_apply_item (node=0xfffff80127857e00, item=0xfffff8058854be80, rw=0) at /usr/src/sys/netgraph/ng_base.c:2403 #20 0xffffffff824f42f8 in ng_snd_item (item=0xfffff8058854be80, flags=0) at /usr/src/sys/netgraph/ng_base.c:2320 #21 0xffffffff824f457c in ng_apply_item (node=0xfffff80484ee3700, item=0xfffff8058854be80, rw=0) at /usr/src/sys/netgraph/ng_base.c:2403 #22 0xffffffff824f42f8 in ng_snd_item (item=0xfffff8058854be80, flags=0) at /usr/src/sys/netgraph/ng_base.c:2320 #23 0xffffffff824edcac in ng_ether_input (ifp=<optimized out>, mp=0xfffffe00005f4950) at /usr/src/sys/netgraph/ng_ether.c:255 #24 0xffffffff80cc3cbb in ether_input_internal (ifp=0xfffff8000466b800, m=0xfffff80130033e00) at /usr/src/sys/net/if_ethersubr.c:616 #25 ether_nh_input (m=<optimized out>) at /usr/src/sys/net/if_ethersubr.c:697 #26 0xffffffff80cdf98a in netisr_dispatch_src (proto=5, source=<optimized out>, m=0x1) at /usr/src/sys/net/netisr.c:1124 #27 0xffffffff80cc2f8b in ether_input (ifp=0xfffff8000466b800, m=0xfffff804800ad740) at /usr/src/sys/net/if_ethersubr.c:787 #28 0xffffffff80cdc146 in iflib_rxeof (rxq=<optimized out>, budget=<optimized out>) at /usr/src/sys/net/iflib.c:2945 #29 0xffffffff80cd6652 in _task_fn_rx (context=0xfffffe00a0924900) at /usr/src/sys/net/iflib.c:3868 #30 0xffffffff80c09941 in gtaskqueue_run_locked (queue=0xfffff80480082d00) at /usr/src/sys/kern/subr_gtaskqueue.c:362 #31 0xffffffff80c09606 in gtaskqueue_thread_loop (arg=<optimized out>) at /usr/src/sys/kern/subr_gtaskqueue.c:537 #32 0xffffffff80b8088e in fork_exit (callout=0xffffffff80c09550 <gtaskqueue_thread_loop>, arg=0xfffffe00007890e0, frame=0xfffffe00005f4c00) at /usr/src/sys/kern/kern_fork.c:1080 #33 <signal handler called> Frame 10 points to the following source code: (kgdb) list /usr/src/sys/netinet/ip_divert.c:282 277 CK_LIST_FOREACH(inp, &V_divcb, inp_list) { 278 /* XXX why does only one socket match? */ 279 if (inp->inp_lport == nport) { 280 INP_RLOCK(inp); 281 sa = inp->inp_socket; 282 SOCKBUF_LOCK(&sa->so_rcv); 283 if (sbappendaddr_locked(&sa->so_rcv, 284 (struct sockaddr *)&divsrc, m, 285 (struct mbuf *)0) == 0) { 286 SOCKBUF_UNLOCK(&sa->so_rcv); It appears that sa is NULL when it probably shouldn't be: (kgdb) print *sa Cannot access memory at address 0x0 (kgdb) print *inp $6 = {[...], inp_socket = 0x0, [...]} We can provide more information if needed. If you need the entire crash dump, we might be able to share it privately.
I think such problem can be reproduced if you will do open/close divert socket in a loop. Is it possible that your application sometimes does that? Can you show what contains inp in the last kgdb command?
Created attachment 223546 [details] proposed patch (untested)
(In reply to Andrey V. Elsukov from comment #1) > I think such problem can be reproduced if you will do open/close divert socket in a loop. Is it possible that your application sometimes does that? We were not able to confirm that this actually happens (yet), but we're fairly certain that under the right conditions what you described can indeed happen. We have added some logging to confirm our suspicions. Sadly no ETA on when we can try this in production again. We're still working on a setup to reproduce this in our development environment. > Can you show what contains inp in the last kgdb command? Sure! (kgdb) print *inp $6 = {inp_hash = {cle_next = 0xfffff804bfdd5b70, cle_prev = 0xfffff8012784ae20}, inp_pcbgrouphash = {cle_next = 0x0, cle_prev = 0x0}, inp_lock = {lock_object = { lo_name = 0xffffffff82515931 "divinp", lo_flags = 90898432, lo_data = 0, lo_witness = 0x0}, rw_lock = 33}, inp_hpts = {tqe_next = 0x0, tqe_prev = 0x0}, inp_hpts_request = 0, inp_in_hpts = 0 '____preserved_4____00', inp_in_input = 0 '____preserved_4____00', inp_hpts_cpu = 0, inp_refcount = 1, inp_flags = 8388616, inp_flags2 = 16, inp_input_cpu = 0, inp_hpts_cpu_set = 0 '____preserved_4____00', inp_input_cpu_set = 0 '____preserved_4____00', inp_hpts_calls = 0 '____preserved_4____00', inp_input_calls = 0 '____preserved_4____00', inp_spare_bits2 = 0 '____preserved_4____00', inp_spare_byte = 0 '____preserved_4____00', inp_ppcb = 0x0, inp_socket = 0x0, inp_hptsslot = 0, inp_hpts_drop_reas = 0, inp_input = {tqe_next = 0x0, tqe_prev = 0x0}, inp_pcbinfo = 0xfffffe00006f4538, inp_pcbgroup = 0x0, inp_pcbgroup_wild = {cle_next = 0x0, cle_prev = 0x0}, inp_cred = 0xfffff801318fa200, inp_flow = 0, inp_vflag = 1 '____preserved_4____01', inp_ip_ttl = 0 '____preserved_4____00', inp_ip_p = 2 '____preserved_4____02', inp_ip_minttl = 0 '____preserved_4____00', inp_flowid = 0, inp_snd_tag = 0x0, inp_flowtype = 0, inp_rss_listen_bucket = 0, inp_inc = {inc_flags = 0 '____preserved_4____00', inc_len = 0 '____preserved_4____00', inc_fibnum = 0, inc_ie = {ie_fport = 0, ie_lport = 10787, ie_dependfaddr = {id46_addr = {ia46_pad32 = {0, 0, 0}, ia46_addr4 = {s_addr = 0}}, id6_addr = { __u6_addr = {__u6_addr8 = '____preserved_4____00' <repeats 15 times>, __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}}}, ie_dependladdr = {id46_addr = { ia46_pad32 = {0, 0, 0}, ia46_addr4 = {s_addr = 0}}, id6_addr = {__u6_addr = {__u6_addr8 = '____preserved_4____00' <repeats 15 times>, __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}}}, ie6_zoneid = 0}}, inp_label = 0x0, inp_sp = 0xfffff80127913ee0, {inp_ip_tos = 0 '____preserved_4____00', inp_options = 0x0, inp_moptions = 0x0}, { in6p_options = 0x0, in6p_outputopts = 0x0, in6p_moptions = 0x0, in6p_icmp6filt = 0x0, in6p_cksum = 0, in6p_hops = 0}, inp_portlist = {cle_next = 0x0, cle_prev = 0xfffff80127c450a0}, inp_phd = 0xfffff80127c45080, inp_gencnt = 74, spare_ptr = 0x0, inp_rt_cookie = 0, {inp_route = {ro_rt = 0x0, ro_lle = 0x0, ro_prepend = 0x0, ro_plen = 0, ro_flags = 256, ro_mtu = 0, spare = 0, ro_dst = {sa_len = 0 '____preserved_4____00', sa_family = 0 '____preserved_4____00', sa_data = '____preserved_4____00' <repeats 13 times>}}, inp_route6 = {ro_rt = 0x0, ro_lle = 0x0, ro_prepend = 0x0, ro_plen = 0, ro_flags = 256, ro_mtu = 0, spare = 0, ro_dst = {sin6_len = 0 '____preserved_4____00', sin6_family = 0 '____preserved_4____00', sin6_port = 0, sin6_flowinfo = 0, sin6_addr = {__u6_addr = {__u6_addr8 = '____preserved_4____00' <repeats 15 times>, __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}}, sin6_scope_id = 0}}}, inp_list = {cle_next = 0xfffff804bfdd5b70, cle_prev = 0xfffffe00006f4530}, inp_epoch_ctx = {data = { 0xffffffff80d43b00 <in_pcbfree_deferred>, 0xfffff80127c45088}}}
(In reply to Daniel Kempkens from comment #3) inp_flags2 = 16 it looks like my guess was correct, PCB has INP_FREED flag set, so its socket is already closed and should not be used. Proposed patch should help.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=c80a4b76ceacc5aab322e7ac1407eea8c90cb3b1 commit c80a4b76ceacc5aab322e7ac1407eea8c90cb3b1 Author: Andrey V. Elsukov <ae@FreeBSD.org> AuthorDate: 2021-03-30 09:31:09 +0000 Commit: Andrey V. Elsukov <ae@FreeBSD.org> CommitDate: 2021-03-30 09:31:09 +0000 ipdivert: check that PCB is still valid after taking INPCB_RLOCK. We are inspecting PCBs of divert sockets under NET_EPOCH section, but PCB could be already detached and we should check INP_FREED flag when we took INP_RLOCK. PR: 254478 MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D29420 sys/netinet/ip_divert.c | 4 ++++ 1 file changed, 4 insertions(+)
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=6b8c65318e81a451b33ed57b84a5495284dcb20f commit 6b8c65318e81a451b33ed57b84a5495284dcb20f Author: Andrey V. Elsukov <ae@FreeBSD.org> AuthorDate: 2021-03-30 09:31:09 +0000 Commit: Andrey V. Elsukov <ae@FreeBSD.org> CommitDate: 2021-04-06 06:47:54 +0000 ipdivert: check that PCB is still valid after taking INPCB_RLOCK. We are inspecting PCBs of divert sockets under NET_EPOCH section, but PCB could be already detached and we should check INP_FREED flag when we took INP_RLOCK. PR: 254478 Differential Revision: https://reviews.freebsd.org/D29420 (cherry picked from commit c80a4b76ceacc5aab322e7ac1407eea8c90cb3b1) sys/netinet/ip_divert.c | 4 ++++ 1 file changed, 4 insertions(+)
A commit in branch stable/12 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=38c299fe856216d6ab38eb5e04d9ee4f8c22995d commit 38c299fe856216d6ab38eb5e04d9ee4f8c22995d Author: Andrey V. Elsukov <ae@FreeBSD.org> AuthorDate: 2021-03-30 09:31:09 +0000 Commit: Andrey V. Elsukov <ae@FreeBSD.org> CommitDate: 2021-04-06 06:50:55 +0000 ipdivert: check that PCB is still valid after taking INPCB_RLOCK. We are inspecting PCBs of divert sockets under NET_EPOCH section, but PCB could be already detached and we should check INP_FREED flag when we took INP_RLOCK. PR: 254478 Differential Revision: https://reviews.freebsd.org/D29420 (cherry picked from commit c80a4b76ceacc5aab322e7ac1407eea8c90cb3b1) sys/netinet/ip_divert.c | 4 ++++ 1 file changed, 4 insertions(+)
I am getting an instant kernel panic when try a machine behind the nat tries to access the internet. I have the following firewall rules in ipfw bge0 being the public interface and bge1 being the private with ip 10.0.0.1 ipfw nat 1 config if bge0 same_ports ipfw add 100 nat 1 ip4 from any to me in via bge0 ipfw add 101 nat 1 ip4 from 10.0.0.0/24 to any out via bge0 ipfw add allow ip from any to any
-try Forgot to mention it's happening on the latest git src 13.0-STABLE FreeBSD 13.0-STABLE #0 stable/13-n246228-47c5e288eee9: Sun Jul 11 01:16:08 PDT 2021 natd is also nonstop crashing after the update https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=257079
(In reply to Jack from comment #9) This may be unrelated to the above but rather caused by the recent splay-tree implementation if you have recently pulled source updates. There is a hotfix on the -STABLE mailing list for it and a review in process at https://reviews.freebsd.org/D31132 If your panic backtrace looks like this it is likely that bug: HouseKeeping() at HouseKeeping+0x1c/frame 0xfffffe0017b6b320 LibAliasInLocked() at LibAliasInLocked+0x2f/frame 0xfffffe0017b6b3e0 LibAliasIn() at LibAliasIn+0x46/frame 0xfffffe0017b6b410 ipfw_nat() at ipfw_nat+0x234/frame 0xfffffe0017b6b460
(In reply to karl from comment #10) Yes, that appears to be the similar bt I'm getting. (gdb) bt full #0 0x0000000800254e20 in HouseKeeping () from /lib/libalias.so.7 No symbol table info available. #1 0x00000008002510ca in ?? () from /lib/libalias.so.7 No symbol table info available. #2 0x0000000000206e0f in ?? () No symbol table info available. #3 0x00000000002053d7 in ?? () No symbol table info available. #4 0x000000000020460f in ?? () No symbol table info available. #5 0x0000000000204510 in ?? () No symbol table info available. #6 0x0000000000000000 in ?? () No symbol table info available. I'm also getting instant kernel panics when I tried doing ipfw nat without natd in https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=257079 Not sure if it's related.
(In reply to Jack from comment #11) Same bug. Hotfix here will stop it (note this is 12.2-STABLE, may be slightly different offsets in 13+); I have roughly a day on it now with no panic and previously it was blowing up within minutes: [karl@NewFS /usr/src]$ git diff . diff --git a/sys/netinet/libalias/alias_db.c b/sys/netinet/libalias/alias_db.c index 783a8ca3baa..45faf5a2a40 100644 --- a/sys/netinet/libalias/alias_db.c +++ b/sys/netinet/libalias/alias_db.c @@ -1769,7 +1769,8 @@ HouseKeeping(struct libalias *la) * Reduce the amount of house keeping work substantially by * sampling over the packets. */ - if (packets % packet_limit == 0) { + if (packet_limit == 0 || packets % packet_limit == 0) { +/* if (packets % packet_limit == 0) { */ time_t now; #ifdef _KERNEL
How did this bug get into -stable? ;)
(In reply to Jack from comment #13) The low rate issue was not covered by explicit tests and real world use cases. Shit happens.
Is there anything left to do for this bug?
So far so good here on latest 13-stable.
^Triage: Assign to committer that resolved and track stable merges. Please re-open this issue with additional information if there are any changes remaining required to resolve this issue.