Hi! I am on commit 20f9e2a723f5f560d6219e28f36dd3b8f8561b3a. My installation uses bridge interface which connects wired and wireless interfaces. After ~30 seconds of uptime, FreeBSD panics. Backtrace seems like nonsense to me. Here m != NULL and in the next frame (in m_dup()) m == NULL. WTF?) (kgdb) frame 9 #9 0xffffffff80c8ed6a in m_dup (m=0x0, how=<optimized out>, how@entry=1) at /usr/src/sys/kern/uipc_mbuf.c:686 686 remain = m->m_pkthdr.len; (kgdb) frame 10 #10 0xffffffff82c249d2 in bridge_broadcast (sc=sc@entry=0xfffff8001362a800, src_if=src_if@entry=0xfffff80010a57800, m=0xfffff8002cdbe200, m@entry=0xfffff80003a0ea00, runfilt=runfilt@entry=0) at /usr/src/sys/net/if_bridge.c:2587 2587 mc = m_dup(m, M_NOWAIT); (kgdb) print m $1 = (struct mbuf *) 0xfffff8002cdbe200 Here is saved core + kernel.full (~50Mb compressed, 1Gb uncompressed). https://drive.google.com/file/d/1O8zmuUuDjRnjcwdBxT7YNH0f-_r6j862/view?usp=sharing Please, share your thoughts how to fix this.
Here is another backtrace from 13.0-RC3: (kgdb) bt #0 __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 #1 doadump (textdump=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:399 #2 0xffffffff80bfd076 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:486 #3 0xffffffff80bfd4f0 in vpanic (fmt=<optimized out>, ap=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:919 #4 0xffffffff80bfd2f3 in panic (fmt=<unavailable>) at /usr/src/sys/kern/kern_shutdown.c:843 #5 0xffffffff810b2187 in trap_fatal (frame=0xfffffe00d70282a0, eva=28) at /usr/src/sys/amd64/amd64/trap.c:915 #6 0xffffffff810b21df in trap_pfault (frame=frame@entry=0xfffffe00d70282a0, usermode=false, signo=<optimized out>, signo@entry=0x0, ucode=<optimized out>, ucode@entry=0x0) at /usr/src/sys/amd64/amd64/trap.c:732 #7 0xffffffff810b183d in trap (frame=0xfffffe00d70282a0) at /usr/src/sys/amd64/amd64/trap.c:398 #8 <signal handler called> #9 tcp_m_copym (m=m@entry=0x0, off0=0, plen=plen@entry=0xfffffe00d702854c, seglimit=seglimit@entry=0, segsize=segsize@entry=0, sb=0xfffff80206e9a9c0, hw_tls=<optimized out>) at /usr/src/sys/netinet/tcp_output.c:1930 #10 0xffffffff80dbced1 in tcp_output (tp=0xfffffe0121146478) at /usr/src/sys/netinet/tcp_output.c:1078 #11 0xffffffff80db44db in tcp_do_segment (m=0xfffff80029176d00, th=<optimized out>, so=<optimized out>, tp=0xfffffe0121146478, drop_hdrlen=52, tlen=<optimized out>, iptos=0 '\000') at /usr/src/sys/netinet/tcp_input.c:3270 #12 0xffffffff80db15ce in tcp_input (mp=<optimized out>, offp=<optimized out>, proto=<optimized out>) at /usr/src/sys/netinet/tcp_input.c:1382 #13 0xffffffff80da3f85 in ip_input (m=0x0) at /usr/src/sys/netinet/ip_input.c:833 #14 0xffffffff80d327ca in netisr_dispatch_src (proto=1, source=<optimized out>, source@entry=0, m=0x0) at /usr/src/sys/net/netisr.c:1143 #15 0xffffffff80d32abf in netisr_dispatch (proto=0, m=0xfffffe00d702854c) at /usr/src/sys/net/netisr.c:1234 #16 0xffffffff80d16f58 in ether_demux (ifp=ifp@entry=0xfffff80003b51800, m=0x0) at /usr/src/sys/net/if_ethersubr.c:923 #17 0xffffffff80d182dc in ether_input_internal (ifp=0xfffff80003b51800, m=0x0) at /usr/src/sys/net/if_ethersubr.c:709 #18 ether_nh_input (m=<optimized out>) at /usr/src/sys/net/if_ethersubr.c:739 #19 0xffffffff80d327ca in netisr_dispatch_src (proto=proto@entry=5, source=<optimized out>, source@entry=0, m=0x0, m@entry=0xfffff80029176d00) at /usr/src/sys/net/netisr.c:1143 #20 0xffffffff80d32abf in netisr_dispatch (proto=0, proto@entry=5, m=0xfffffe00d702854c, m@entry=0xfffff80029176d00) at /usr/src/sys/net/netisr.c:1234 #21 0xffffffff80d173a9 in ether_input (ifp=<optimized out>, m=0xfffff80029176d00) at /usr/src/sys/net/if_ethersubr.c:830 #22 0xffffffff809175ed in re_rxeof (sc=<optimized out>, sc@entry=0xfffffe00c5444000, rx_npktsp=0x0) at /usr/src/sys/dev/re/if_re.c:2388 #23 0xffffffff80914e40 in re_intr_msi (xsc=0xfffffe00c5444000) at /usr/src/sys/dev/re/if_re.c:2684 #24 0xffffffff80bbe5bd in intr_event_execute_handlers (p=<optimized out>, ie=0xfffff80003b4a600) at /usr/src/sys/kern/kern_intr.c:1168 #25 ithread_execute_handlers (p=<optimized out>, ie=0xfffff80003b4a600) at /usr/src/sys/kern/kern_intr.c:1181 #26 ithread_loop (arg=arg@entry=0xfffff80003b12e40) at /usr/src/sys/kern/kern_intr.c:1269 #27 0xffffffff80bbb3be in fork_exit (callout=0xffffffff80bbe370 <ithread_loop>, arg=0xfffff80003b12e40, frame=0xfffffe00d7028b80) at /usr/src/sys/kern/kern_fork.c:1069 #28 <signal handler called> It seems the problem is not in if_bridge, but in some thing called mbuf) Here is my network interfaces in FreeBSD 12.2: ifconfig re0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=82099<RXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,WOL_MAGIC,LINKSTATE> ether ac:9e:17:4e:9f:04 media: Ethernet autoselect (1000baseT <full-duplex>) status: active nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> re1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=8209b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,WOL_MAGIC,LINKSTATE> ether 10:fe:ed:02:b9:18 inet 176.193.192.183 netmask 0xffffe000 broadcast 176.193.223.255 media: Ethernet autoselect (1000baseT <full-duplex>) status: active nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384 options=680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6> inet6 ::1 prefixlen 128 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x3 inet 127.0.0.1 netmask 0xff000000 groups: lo nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> wlan0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500 ether 30:b5:c2:6b:4a:8e groups: wlan ssid skynetV25 channel 1 (2412 MHz 11g) bssid 30:b5:c2:6b:4a:8e regdomain 32924 country CN indoor ecm authmode WPA2/802.11i privacy MIXED deftxkey 2 AES-CCM 2:128-bit txpower 20 scanvalid 60 protmode CTS wme burst dtimperiod 1 -dfs parent interface: ath0 media: IEEE 802.11 Wireless Ethernet autoselect mode 11g <hostap> status: running nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> bridge0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 ether 02:27:ad:54:af:00 inet 192.168.20.1 netmask 0xffffff00 broadcast 192.168.20.255 id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15 maxage 20 holdcnt 6 proto stp-rstp maxaddr 2000 timeout 1200 root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0 member: re0 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP> ifmaxaddr 0 port 1 priority 128 path cost 55 member: wlan0 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP> ifmaxaddr 0 port 4 priority 128 path cost 33333 groups: bridge nd6 options=9<PERFORMNUD,IFDISABLED> wg0: flags=8080c1<UP,RUNNING,NOARP,MULTICAST> metric 0 mtu 1420 options=880000<LINKSTATE> inet 10.72.53.108 netmask 0xffffffff inet6 fe80::ae9e:17ff:fe4e:9f04%wg0 prefixlen 64 scopeid 0x6 inet6 fc00:bbbb:bbbb:bb01::9:356b prefixlen 128 groups: wg listen-port: 12674 private-key: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX public-key: gkF73UggCUanPjFZmznv3BjkiuMxXuXbYiUSbuDJTjw= media: Ethernet autoselect (25GBase-ACC <full-duplex>) status: active nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> I backported if_wg from 13.0. When upgrading to 13.0 I do not create it. Firewall rules: 00100 0 0 reass ip from any to any frag in 00200 131190 110141293 skipto 30000 ip from any to any layer2 00300 0 0 check-state :before-nat 00400 36470 2220647 allow ip from any to any tagged 100 via bridge0 00500 8 2616 allow udp from any 68 to me 67 in via bridge0 keep-state :before-nat 00600 3 144 allow icmp from any to any via bridge0 00700 6 780 deny ip from any to 192.168.20.0/24 in via bridge0 00800 3134 210604 allow ip from any to any via re0 00900 1177 89601 allow ip from any to any via wlan0 01000 79865 101970562 allow ip from any to any via bridge0 01100 16 1664 allow ip from any to any via lo0 01200 0 0 allow ip from any to any via tap0 01300 10428 6033288 allow udp from me to 185.213.155.130 51820 out via re1 keep-state :before-nat 01400 0 0 allow icmp from any to any via re1 01500 287 45445 deny ip from any to any via re1 01600 5501 4584315 nat 1 ip from any to any in via wg0 01700 0 0 check-state :after-nat 01800 0 0 skipto 20000 tcp from me to any 53 out via wg0 setup keep-state :after-nat 01900 96 10310 skipto 20000 udp from me to any 53 out via wg0 keep-state :after-nat 02000 0 0 skipto 20000 tcp from me to any 80,443,22,43,9418 out via wg0 setup keep-state :after-nat 02100 0 0 skipto 20000 udp from any 68 to any 67 out via wg0 keep-state :after-nat 02200 282 21432 skipto 20000 udp from me to any 123 out via wg0 keep-state :after-nat 02300 0 0 skipto 20000 icmp from any to any via wg0 02400 0 0 deny ip from me to any out via wg0 02500 1 48 deny ip from any to me in via wg0 20000 4882 750396 nat 1 ip from any to any out via wg0 20100 10378 5334303 allow ip from any to any via wg0 20200 0 0 deny ip from any to any 30000 6303 0 allow ip from any to any mac-type 0x0806 30100 0 0 allow tag 100 ip from any to any MAC any cc:af:78:58:73:a2 in 30200 0 0 allow tag 100 ip from any to any MAC cc:af:78:58:73:a2 any out 30300 37494 2287884 allow tag 100 ip from any to any MAC any 60:45:cb:64:2a:65 in 30400 67051 97788491 allow tag 100 ip from any to any MAC 60:45:cb:64:2a:65 any out 30500 0 0 allow tag 100 ip from any to any MAC any 3c:7c:3f:3c:52:5b in 30600 0 0 allow tag 100 ip from any to any MAC 3c:7c:3f:3c:52:5b any out 30700 20342 10064918 allow ip from any to any 65535 54 7970 allow ip from any to any Also there is kernel nat on interface re1.
Given the second backtrace and the other bugs (254309, 254244) without if_bridge showing NULL mbufs I'm fairly confident this isn't a bridge bug. It's certainly worth updating to make sure you have the fix mentioned in those two bugs and trying again.
(In reply to Kristof Provost from comment #2) Thanks Kristof. Any clues as to the responsible component(s) or suggested code owners/maintainers to loop in?
(In reply to Kubilay Kocak from comment #3) It may already be fixed. Reporter needs to re-test with a build that includes e9f029831fa5747ae1b405f5716c52cb4ebf1e04 (or the cherry-picked equivalent on whatever branch they're using).
Bugs 254309, 254244 are irrelevant to my case. The line net.link.ether.ipfw=1 in /etc/sysctl.conf causes all the trouble. I provided my firewall rules, so you can see what ipfw does to layer2 frames. Does this help to find a problem? I think the bug may be in some changes to ipfw between 12.2 and 13.0
@Richard Does e9f02983 look relevent/related to this issue? @Reporter Please test a kernel with e9f02983 applied to confirm that the issue is still reproducible
> Please test a kernel with e9f02983 applied to confirm that the issue is still reproducible Yes it is reproducible
Rescue Retransmission for SACK is not in releng/13.0, thus D29315 is unlikely to be the problem here. There was another known panic (but with a KASSERT giving it away), fixed with D29083, but that is fixed in RC3.
The two cores look vastly different - and the 2nd one would be interesting to me to see what is going on in detail. There is no trace of TCP stack involvement in the original (BETA3) core dump.
I think, the main search vector is why there are no panics when net.link.ether.ipfw sysctl is set to 0 and there are panics when it is set to 1. I pasted my ipfw rules earlier, so you can see how I filter layer2 frames. Unfortunately, debugging is even more difficult because I cannot shut down my computer with FreeBSD13. But this is for another PR.
Can you try the 'net/realtek-re-kmod' port instead of the in tree re(4) driver? The in tree driver is missing microcode/phy fixups that seem mandatory for many RealTek cards.
Can you share the 2nd core dump and kernel?
Can you reproduce by it yourself by adding net.link.ether.ipfw = 0 to /etc/sysctl.conf and writing firewall rules like these: #!/bin/sh IPFW="/sbin/ipfw -q" IFACE="wg0" PUB_IFACE="re1" SKIP_IP="skipto 20000" SKIP_ETHER="skipto 30000" # Ports list: SSH="22" TELNET="23" SMTP="25" WHOIS="43" WWW="80" HTTPS="443" POP3="110" SSMTP="465" POP3S="995" GIT="9418" FTPC="21" FTPD="20" IRC="6660-7000" NTP="123" OPENPORTS="$WWW,$HTTPS" OPENPORTS="$OPENPORTS,$SSH,$WHOIS,$GIT" GOODMACS="cc:af:78:58:73:a2 60:45:cb:64:2a:65 3c:7c:3f:3c:52:5b" GOODMACS_TAG="100" SUBNET="192.168.20.0/24" LOCALIFACES="re0 wlan0 bridge0 lo0 tap0" $IPFW -f flush $IPFW -f nat flush # Start NAT $IPFW nat 1 config if $IFACE log same_ports reset # Deny fragmented packets $IPFW add reass ip from any to any frag in #$IPFW add $SKIP_ETHER ip from any to any layer2 $IPFW add check-state :before-nat # Drop connections to LAN from untrusted macs #$IPFW add allow ip from any to any tagged $GOODMACS_TAG via bridge0 # Allow DHCP #$IPFW add allow udp from any 68 to me dst-port 67 in via bridge0 keep-state :before-nat # And ICMP #$IPFW add allow icmp from any to any via bridge0 # Drop everything else #$IPFW add deny ip from any to $SUBNET in via bridge0 # Enable LAN traffic for lan_iface in $LOCALIFACES; do $IPFW add allow ip from any to any via $lan_iface done # Public iface setup # Wireguard $IPFW add allow udp from me to 185.213.155.130 dst-port 51820 out via $PUB_IFACE keep-state :before-nat # OpenVPN #$IPFW add allow udp from me to any dst-port 1197 out via $PUB_IFACE keep-state :before-nat $IPFW add allow icmp from any to any via $PUB_IFACE $IPFW add deny ip from any to any via $PUB_IFACE $IPFW add nat 1 ip from any to any in via $IFACE $IPFW add check-state :after-nat # Allow DNS for this machine $IPFW add $SKIP_IP tcp from me to any 53 out via $IFACE setup keep-state :after-nat $IPFW add $SKIP_IP udp from me to any 53 out via $IFACE keep-state :after-nat # All common open ports $IPFW add $SKIP_IP tcp from me to any $OPENPORTS out \ via $IFACE setup keep-state :after-nat # DHCP $IPFW add $SKIP_IP udp from any 68 to any dst-port 67 out via $IFACE keep-state :after-nat # NTP $IPFW add $SKIP_IP udp from me to any $NTP out via $IFACE keep-state :after-nat # Allow ICMP $IPFW add $SKIP_IP icmp from any to any via $IFACE $IPFW add deny all from me to any out via $IFACE $IPFW add deny all from any to me in via $IFACE $IPFW add 20000 nat 1 ip from any to any out via $IFACE $IPFW add allow ip from any to any via $IFACE $IPFW add deny ip from any to any # Ethernet-layer processing $IPFW add 30000 allow ip from any to any mac-type arp for mac in $GOODMACS; do $IPFW add allow tag $GOODMACS_TAG ip from any to any mac any $mac in $IPFW add allow tag $GOODMACS_TAG ip from any to any mac $mac any out done $IPFW add allow ip from any to any You can drop all rules about VPN, home VLAN, etc. Just leave layer2 filtering.
Disregard my last message. Tryed layer2 filtering on my second machine. All works fine. It causes trouble only at the first one.
Is this panic still occurring? PR 256439 looks very similar. If you haven't already, are you able to test with options INVARIANTS options INVARIANT_SUPPORT added to the kernel config?
> Is this panic still occurring? PR 256439 looks very similar. From the top of releng/13.0, yes. I think this is driver-specific, because layer2 filtering works on other machines (turning it off solved the problem with panics). I have the following network hardware: re1@pci0:4:0:0: class=0x020000 rev=0x06 hdr=0x00 vendor=0x10ec device=0x8168 subvendor=0x7470 subdevice=0x3468 vendor = 'Realtek Semiconductor Co., Ltd.' device = 'RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller' class = network subclass = ethernet ath0@pci0:5:5:0: class=0x028000 rev=0x01 hdr=0x00 vendor=0x168c device=0x002d subvendor=0x168c subdevice=0x0300 vendor = 'Qualcomm Atheros' device = 'AR9227 Wireless Network Adapter' class = network Tried both realtek-re-kmod and the drivers from base with no result. I will do some more debugging if I can, but I need that machine up and running almost everyday and that hinders my progress.
(In reply to shamaz.mazum from comment #16) Given that the problem appears to be tied to L2 ipfw hooks, please try the patch here: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=256439#c29
#(In reply to Mark Johnston from comment #17) The proposed solution works for me, thanks! I was wrong, this is not a bug in the driver ;) Is there a chance that it will be committed in releng/13.0?
(In reply to shamaz.mazum from comment #18) Yes, we will probably release an erratum for this, so it'll be included in 13.0-RELEASE-p3.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=bc6a2267fffeafd3946637607a74cfd639398f9d commit bc6a2267fffeafd3946637607a74cfd639398f9d Author: Mark Johnston <markj@FreeBSD.org> AuthorDate: 2021-06-16 13:46:56 +0000 Commit: Mark Johnston <markj@FreeBSD.org> CommitDate: 2021-06-16 13:46:56 +0000 ipfw: Update the pfil mbuf pointer in ipfw_check_frame() ipfw_chk() might call m_pullup() and thus can change the mbuf chain head. In this case, the new chain head has to be returned to the pfil hook caller, otherwise the pfil hook caller is left with a dangling pointer. Note that this affects only the link-layer hooks installed when the net.link.ether.ipfw sysctl is set to 1. PR: 256439, 254015, 255069, 255104 Fixes: f355cb3e6 Reviewed by: ae MFC after: 3 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D30764 sys/netpfil/ipfw/ip_fw_pfil.c | 2 ++ 1 file changed, 2 insertions(+)
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=ed1acef3fe3053b418ce3e41036ccf24957253a4 commit ed1acef3fe3053b418ce3e41036ccf24957253a4 Author: Mark Johnston <markj@FreeBSD.org> AuthorDate: 2021-06-16 13:46:56 +0000 Commit: Mark Johnston <markj@FreeBSD.org> CommitDate: 2021-06-19 14:08:49 +0000 ipfw: Update the pfil mbuf pointer in ipfw_check_frame() ipfw_chk() might call m_pullup() and thus can change the mbuf chain head. In this case, the new chain head has to be returned to the pfil hook caller, otherwise the pfil hook caller is left with a dangling pointer. Note that this affects only the link-layer hooks installed when the net.link.ether.ipfw sysctl is set to 1. PR: 256439, 254015, 255069, 255104 Fixes: f355cb3e6 Reviewed by: ae Sponsored by: The FreeBSD Foundation (cherry picked from commit bc6a2267fffeafd3946637607a74cfd639398f9d) sys/netpfil/ipfw/ip_fw_pfil.c | 2 ++ 1 file changed, 2 insertions(+)
^Triage: This is the earliest report of the stable/13 bridge/ipfw crash, but the patch was provided in bug 256439 (the latest bug id), though this issue has the most analysis/comments/detail Accordingly, assign this to committer resolving and make this the parent/original (non-dupe) of the others/later issues. Track merges to stable/* too This current pends an EN (see comment 19), loop in re@
*** Bug 255069 has been marked as a duplicate of this bug. ***
*** Bug 255104 has been marked as a duplicate of this bug. ***
*** Bug 256439 has been marked as a duplicate of this bug. ***
A commit in branch releng/13.0 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=4647d115ff849534c9d6712cc2da32509721e20e commit 4647d115ff849534c9d6712cc2da32509721e20e Author: Mark Johnston <markj@FreeBSD.org> AuthorDate: 2021-06-16 13:46:56 +0000 Commit: Mark Johnston <markj@FreeBSD.org> CommitDate: 2021-06-29 17:09:43 +0000 ipfw: Update the pfil mbuf pointer in ipfw_check_frame() ipfw_chk() might call m_pullup() and thus can change the mbuf chain head. In this case, the new chain head has to be returned to the pfil hook caller, otherwise the pfil hook caller is left with a dangling pointer. Note that this affects only the link-layer hooks installed when the net.link.ether.ipfw sysctl is set to 1. Approved by: so Security: EN-21:21.ipfw PR: 256439, 254015, 255069, 255104 Fixes: f355cb3e6 Reviewed by: ae Sponsored by: The FreeBSD Foundation (cherry picked from commit bc6a2267fffeafd3946637607a74cfd639398f9d) (cherry picked from commit ed1acef3fe3053b418ce3e41036ccf24957253a4) sys/netpfil/ipfw/ip_fw_pfil.c | 2 ++ 1 file changed, 2 insertions(+)
This will appear in 13.0-p3 later today. Thanks to everyone who helped debug and test.