Created attachment 255963 [details] Log files from /var/crash and lspci I had two similar kernel panics in 4 days. It's apparently caused by or triggered by the processing of incoming UDP packets. The source(?) port appears to have been 43780 in both cases. Looking through the backtrace, I cannot easily explain this. # uname -a FreeBSD thinkpad-x270 14.2-RELEASE FreeBSD 14.2-RELEASE releng/14.2-n269506-c8918d6c7412 GENERIC amd64 This bug never ever happened on 14.1, pointing to a regression. I attached `lspci -vvxxx` as well. The network driver is em(4). Next, I will try intel-em-kmod. Since I219-LM is not a particularly rare adapter, I assume this might affect many people. I have the vmcore files as well, but they are large (around 2GB), so I will upload them somewhere only on request.
It cannot have anything to do with the ethernet driver, since I wasn't connected via ethernet, but via wifi (if_iwm), which is also apparent from the backtrace. So it might have something to do with this driver. I doubt so however since it did not change between 14.1 and 14.2.
Hi! Can you please share the core file in a private email to me? You may use my PGP key to encrypt your mail: https://docs.freebsd.org/en/articles/pgpkeys/#_gleb_smirnoff_glebiusfreebsd_org
(In reply to Gleb Smirnoff from comment #2) I am a bit afraid, that these dumps could contain private information. Perhaps you can tell me what to look for in kgdb instead.
Dan Epure submitted similar panic to me, also FreeBSD 14.2-RELEASE. The backtrace is the same, the invalid mbuf without M_PKTHDR set is originating from iwm(4) driver. I'm 99% confident that bug is there and not in IP, UDP or socket buffer code. And indeed high change this is a regression from 14.1-RELEASE.
There is only cosmetic difference between 14.1 and 14.2 in sys/net80211 and there is no difference in the iwm(4) driver. So it is either not a regression, or if it is a regression, then it is not exactly clear which exact kernel module is regressed, could be mbuf allocator KPI. Robin, can you please lookup in your core file in the iwm_handle_rxb() does the original mbuf (pointer 0xfffff80034366100) has M_PKTHDR flag set?
May I have a suggestion: no matter what, can we please fix the panic message in stable/14 directly if this no longer applies to main to at least print m0 %p and some of the mbuf flags; Seems this wasn't done in 2014 but would have probably answered half of the questions here and now. The iwm rx ring is populated from: m = m_getjcl(M_NOWAIT, MT_DATA, M_PKTHDR, IWM_RBUF_SIZE); And according to the offsets (line numbers) what is passed up comes out of: 5321 * We need to start m_copym() at offset 0, to get the 5322 * M_PKTHDR flag preserved. 5323 */ 5324 m1 = m_copym(m, 0, M_COPYALL, M_NOWAIT); 5325 if (m1) { 5326 if (iwm_rx_mpdu(sc, m1, offset, stolen)) So given we assume to have a copy (reference count increased) with M_PKTHDR the question then is where in the stack up could we lose this? ip_input (at least on releng/14.2) does: M_ASSERTPKTHDR() but that's likely compiled out on the release. Assuming we pass that the next question would be: IPsec? Firewall? I don't see much else in the path to lose the M_PKTHDR. The only other difference between (likely 14.1? and 14.2) there is that I moved the epoch from the driver into net80211 but that should not have an effect on that mbuf.
(from comment #0) > The source(?) port appears to have been 43780 in both cases. The port number 43780 in the backtrace is in network byte order, so the actual port is 1195; probably used for OpenVPN. From core.txt.[45] if_ovpn.ko is loaded, I guess if_ovpn is used when the panic occurred. It looks that the problem exists around ovpn_udp_input() in sys/net/if_ovpn.c. The function calls m_unshare() and might return false after m_unshare() to tell the caller that the original mbuf is stil available. But m_unshare() seems to drop M_PKTHDR by calling m_move_pkthdr() (m_move_pkthdr() drops M_PKTHDR). I think the implementation of m_unshare() is not good because it changes the original mbuf, but I'm not sure.
(In reply to Gleb Smirnoff from comment #5) Yes, M_PKTHDR was set! (kgdb) f 21 #21 0xffffffff83ced349 in iwm_handle_rxb (sc=0xfffffe00f916a000, m=0xfffff80034366100) at /usr/src/sys/dev/iwm/if_iwm.c:5326 5326 if (iwm_rx_mpdu(sc, m1, offset, stolen)) (kgdb) p *m $1 = {{m_next = 0x0, m_slist = {sle_next = 0x0}, m_stailq = {stqe_next = 0x0}}, {m_nextpkt = 0x0, m_slistpkt = {sle_next = 0x0}, m_stailqpkt = {stqe_next = 0x0}}, m_data = 0xfffff802ab0a4000 "L", m_len = 4096, m_type = 1, m_flags = 3, {{{m_pkthdr = {{snd_tag = 0x0, rcvif = 0x0, {rcvidx = 0, rcvgen = 0}}, {leaf_rcvif = 0x0, {leaf_rcvidx = 0, leaf_rcvgen = 0}}, tags = {slh_first = 0x0}, len = 4096, flowid = 0, csum_flags = 0, fibnum = 0, numa_domain = 255 '\377', rsstype = 0 '\000', {rcv_tstmp = 0, {l2hlen = 0 '\000', l3hlen = 0 '\000', l4hlen = 0 '\000', l5hlen = 0 '\000', inner_l2hlen = 0 '\000', inner_l3hlen = 0 '\000', inner_l4hlen = 0 '\000', inner_l5hlen = 0 '\000'}}, PH_per = {eight = "\000\000\000\000\000\000\000", sixteen = {0, 0, 0, 0}, thirtytwo = {0, 0}, sixtyfour = {0}, unintptr = {0}, ptr = 0x0}, {PH_loc = {eight = "\000\000\000\000\000\000\000", sixteen = {0, 0, 0, 0}, thirtytwo = {0, 0}, sixtyfour = {0}, unintptr = {0}, ptr = 0x0}, memlen = 0}}, { m_epg_npgs = 0 '\000', m_epg_nrdy = 0 '\000', m_epg_hdrlen = 0 '\000', m_epg_trllen = 0 '\000', m_epg_1st_off = 0, m_epg_last_len = 0, m_epg_flags = 0 '\000', m_epg_record_type = 0 '\000', __spare = "\000", m_epg_enc_cnt = 0, m_epg_tls = 0x0, m_epg_so = 0x1000, m_epg_seqno = 71776119061217280, m_epg_stailq = {stqe_next = 0x0}}}, {m_ext = {{ext_count = 1, ext_cnt = 0x4188000000000001}, ext_size = 4096, ext_type = 3, ext_flags = 1, {{ext_buf = 0xfffff802ab0a4000 "L", ext_arg2 = 0x0}, {extpg_pa = {18446735289076039680, 0, 4831882610, 12234123588679374156, 0}, extpg_trail = "\000\000\000\000\001\000\000\000J\021\231y\246\254\023\334_\331a\352\374\237oF\230\237Gk\231_\033}\202\261\3712\276rU\0268[\347\233$\021\224\177{.\374\301\224\033\236>\274G\223\372\f", <incomplete sequence \356\230>, extpg_hdr = "\3117!\033Y\236l\3721ҕ\232\370\017\273Z\035\201\272\272\327f?"}}, ext_free = 0x0, ext_arg1 = 0x0}, m_pktdat = 0xfffff80034366160 "\001"}}, m_dat = 0xfffff80034366120 ""}} (kgdb) p m->m_flags $2 = 3 M_PKTHDR is defined as 0x00000002.
(In reply to takahiro.kurosawa from comment #7) Yes, I might have been using OpenVPN at the time of the crash. But I don't remember whether I had an active tunnel at the time.
(In reply to Robin Haberkorn from comment #8) You probably want to check m1 as that is the mbuf that goes up the stack.
(In reply to Bjoern A. Zeeb from comment #10) glebius@FreeBSD.org was asking for the original mbuf. The copy doesn't have M_PKTHDR set. But this doesn't surprise, considering that the pointer is passed right up to sbappendaddr_locked(). Do you think, it could have been lost in m_copym()?
(In reply to Robin Haberkorn from comment #11) Yes I understand why the question was for the original one; I just wanted to understand as the "copy" should have it too then and when manually going up the stack I couldn't spot where it might be cleared. I wonder what Dan Epure's report looked like, Glebius? Do you have a backtrace from that? Same code paths? Also UDP? Also ovpn loaded? I don't know how the openvpn kernel module is hooking into the input path; if it's using u_tun_func then that at least would explain why we do not see it in the backtrace. And then Takahiro Kurosawa's analysis from #c7 sounds fairly valid if m_unshare can lose the PKTHDR on the copy. OpenVPN doesn't have to be active; it seems if there's an interface the filtering function will be hooked up. And then it's not a regression in iwm or the network stack; then it's imply the feature. Adding Kristof and bouncing this back to net@.
(In reply to Robin Haberkorn from comment #9) The openvpn process was actually running according to core.txt. This means, yes, there definitely was an active tunnel at the time of the crash.
(In reply to Bjoern A. Zeeb from comment #12) if_ovpn sets u_tun_func, yes, via udp_set_kernel_tunneling(): https://cgit.freebsd.org/src/tree/sys/net/if_ovpn.c#n635 I'm not sure how m_unshare() would feature in this, given that if it succeeds if_ovpn keeps the packet (so we'd see a panic there if it messed up), and if it fails we return 'true' (indicating the filter ate the packet, and no further UDP processing happens).
Created attachment 259236 [details] testcase to reproduce the similar panic I've been able to reproduce a similar panic with the attached patch for /usr/tests/sys/net/if_ovpn/if_ovpn. The testcase uses a broadcast packet for the openvpn port that needs m_copym() calls to deliver it to each socket. m_copym() increases reference counts of mbuf clusters that results in m_unshare() returning a new mbuf and dropping M_PKTHDR of the original mbuf. To reproduce the panic: ---- cd /usr/tests/sys/net cp -pr if_ovpn if_ovpn.283426 cd if_ovpn.283426 patch -p1 < /path/to/repro-283426.diff kldload if_epair if_bridge if_ovpn sync; sync; sync kyua debug if_ovpn:bz283426 ----
Thanks for that test case, it was extremely helpful. I see what's happening here now. The if_ovpn filter function calls m_unshare(), which allocates a new mbuf, but then decides it doesn't want the packet after all. That's normally fine, but we can't tell the caller that we've messed with the m pointer, so the caller continues with a now invalid mbuf pointer. I hope to propose a fix later today.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=04a7134c1e92c7752ffdc665f99ae26db70866c0 commit 04a7134c1e92c7752ffdc665f99ae26db70866c0 Author: Kristof Provost <kp@FreeBSD.org> AuthorDate: 2025-04-01 13:19:26 +0000 Commit: Kristof Provost <kp@FreeBSD.org> CommitDate: 2025-04-01 14:15:29 +0000 if_ovpn: fix use-after-free of mbuf m_unshare() can return a new mbuf pointer. We update the 'm' pointer in ovpn_udp_input(), but if we decide to pass on the packet (e.g. because it's for an unknown peer) the caller (udp_append()) continues with the old 'm' pointer, eventually resulting in a use-after-free. Re-order operations in ovpn_udp_input() so that we don't modify the 'm' pointer until we're committed to keeping the packet. PR: 283426 Test case by: takahiro.kurosawa@gmail.com MFC after: 2 weeks Sponsored by: Rubicon Communications, LLC ("Netgate") sys/net/if_ovpn.c | 12 +++--- tests/sys/net/if_ovpn/if_ovpn.sh | 81 ++++++++++++++++++++++++++++++++++++++++ 2 files changed, 87 insertions(+), 6 deletions(-)