Bug 283426 - panic in sbappendaddr_locked() - if_ovpn related?
Summary: panic in sbappendaddr_locked() - if_ovpn related?
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 14.2-RELEASE
Hardware: amd64 Any
: --- Affects Many People
Assignee: freebsd-net (Nobody)
URL:
Keywords: crash, regression
Depends on:
Blocks:
 
Reported: 2024-12-20 00:38 UTC by Robin Haberkorn
Modified: 2025-04-01 14:26 UTC (History)
4 users (show)

See Also:


Attachments
Log files from /var/crash and lspci (94.74 KB, application/gzip)
2024-12-20 00:38 UTC, Robin Haberkorn
no flags Details
testcase to reproduce the similar panic (2.06 KB, patch)
2025-04-01 02:53 UTC, takahiro.kurosawa
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Robin Haberkorn 2024-12-20 00:38:56 UTC
Created attachment 255963 [details]
Log files from /var/crash and lspci

I had two similar kernel panics in 4 days. It's apparently caused by or triggered by the processing of incoming UDP packets. The source(?) port appears to have been 43780 in both cases.

Looking through the backtrace, I cannot easily explain this.

# uname -a
FreeBSD thinkpad-x270 14.2-RELEASE FreeBSD 14.2-RELEASE releng/14.2-n269506-c8918d6c7412 GENERIC amd64

This bug never ever happened on 14.1, pointing to a regression.

I attached `lspci -vvxxx` as well. The network driver is em(4).
Next, I will try intel-em-kmod.

Since I219-LM is not a particularly rare adapter, I assume this might affect many people.

I have the vmcore files as well, but they are large (around 2GB), so I will upload them somewhere only on request.
Comment 1 Robin Haberkorn 2024-12-20 19:27:13 UTC
It cannot have anything to do with the ethernet driver, since I wasn't connected via ethernet, but via wifi (if_iwm), which is also apparent from the backtrace. So it might have something to do with this driver. I doubt so however since it did not change between 14.1 and 14.2.
Comment 2 Gleb Smirnoff freebsd_committer freebsd_triage 2024-12-24 18:44:16 UTC
Hi! Can you please share the core file in a private email to me?

You may use my PGP key to encrypt your mail:
https://docs.freebsd.org/en/articles/pgpkeys/#_gleb_smirnoff_glebiusfreebsd_org
Comment 3 Robin Haberkorn 2024-12-25 17:37:34 UTC
(In reply to Gleb Smirnoff from comment #2)

I am a bit afraid, that these dumps could contain private information. Perhaps you can tell me what to look for in kgdb instead.
Comment 4 Gleb Smirnoff freebsd_committer freebsd_triage 2025-03-17 16:44:12 UTC
Dan Epure submitted similar panic to me, also FreeBSD 14.2-RELEASE. The
backtrace is the same, the invalid mbuf without M_PKTHDR set is originating
from iwm(4) driver.  I'm 99% confident that bug is there and not in IP, UDP or
socket buffer code.  And indeed high change this is a regression from
14.1-RELEASE.
Comment 5 Gleb Smirnoff freebsd_committer freebsd_triage 2025-03-17 17:05:32 UTC
There is only cosmetic difference between 14.1 and 14.2 in sys/net80211
and there is no difference in the iwm(4) driver.  So it is either not
a regression, or if it is a regression, then it is not exactly clear
which exact kernel module is regressed, could be mbuf allocator KPI.

Robin, can you please lookup in your core file in the iwm_handle_rxb()
does the original mbuf (pointer 0xfffff80034366100) has M_PKTHDR flag
set?
Comment 6 Bjoern A. Zeeb freebsd_committer freebsd_triage 2025-03-17 19:18:40 UTC
May I have a suggestion:  no matter what, can we please fix the panic message in stable/14 directly if this no longer applies to main to at least print m0 %p and some of the mbuf flags;  Seems this wasn't done in 2014 but would have probably answered half of the questions here and now.



The iwm rx ring is populated from:
   m = m_getjcl(M_NOWAIT, MT_DATA, M_PKTHDR, IWM_RBUF_SIZE);

And according to the offsets (line numbers) what is passed up comes out of:
   5321                          * We need to start m_copym() at offset 0, to get the
   5322                          * M_PKTHDR flag preserved.
   5323                          */
   5324                         m1 = m_copym(m, 0, M_COPYALL, M_NOWAIT);
   5325                         if (m1) {
   5326                                 if (iwm_rx_mpdu(sc, m1, offset, stolen))

So given we assume to have a copy (reference count increased) with M_PKTHDR the question then is where in the stack up could we lose this?

ip_input (at least on releng/14.2) does: M_ASSERTPKTHDR() but that's likely compiled out on the release.

Assuming we pass that the next question would be: IPsec?  Firewall?

I don't see much else in the path to lose the M_PKTHDR.

The only other difference between (likely 14.1? and 14.2) there is that I moved the epoch from the driver into net80211 but that should not have an effect on that mbuf.
Comment 7 takahiro.kurosawa 2025-03-18 02:51:33 UTC
(from comment #0)
> The source(?) port appears to have been 43780 in both cases.

The port number 43780 in the backtrace is in network byte order,
so the actual port is 1195; probably used for OpenVPN.
From core.txt.[45] if_ovpn.ko is loaded, I guess if_ovpn is used when
the panic occurred.

It looks that the problem exists around ovpn_udp_input() in sys/net/if_ovpn.c.
The function calls m_unshare() and might return false after m_unshare()
to tell the caller that the original mbuf is stil available.
But m_unshare() seems to drop M_PKTHDR by calling m_move_pkthdr() 
(m_move_pkthdr() drops M_PKTHDR).
I think the implementation of m_unshare() is not good because it changes
the original mbuf, but I'm not sure.
Comment 8 Robin Haberkorn 2025-03-18 15:43:48 UTC
(In reply to Gleb Smirnoff from comment #5)

Yes, M_PKTHDR was set!

(kgdb) f 21
#21 0xffffffff83ced349 in iwm_handle_rxb (sc=0xfffffe00f916a000, m=0xfffff80034366100) at /usr/src/sys/dev/iwm/if_iwm.c:5326
5326                                    if (iwm_rx_mpdu(sc, m1, offset, stolen))
(kgdb) p *m
$1 = {{m_next = 0x0, m_slist = {sle_next = 0x0}, m_stailq = {stqe_next = 0x0}}, {m_nextpkt = 0x0, m_slistpkt = {sle_next = 0x0}, m_stailqpkt = {stqe_next = 0x0}}, m_data = 0xfffff802ab0a4000 "L", m_len = 4096,
  m_type = 1, m_flags = 3, {{{m_pkthdr = {{snd_tag = 0x0, rcvif = 0x0, {rcvidx = 0, rcvgen = 0}}, {leaf_rcvif = 0x0, {leaf_rcvidx = 0, leaf_rcvgen = 0}}, tags = {slh_first = 0x0}, len = 4096, flowid = 0,
          csum_flags = 0, fibnum = 0, numa_domain = 255 '\377', rsstype = 0 '\000', {rcv_tstmp = 0, {l2hlen = 0 '\000', l3hlen = 0 '\000', l4hlen = 0 '\000', l5hlen = 0 '\000', inner_l2hlen = 0 '\000',
              inner_l3hlen = 0 '\000', inner_l4hlen = 0 '\000', inner_l5hlen = 0 '\000'}}, PH_per = {eight = "\000\000\000\000\000\000\000", sixteen = {0, 0, 0, 0}, thirtytwo = {0, 0}, sixtyfour = {0},
            unintptr = {0}, ptr = 0x0}, {PH_loc = {eight = "\000\000\000\000\000\000\000", sixteen = {0, 0, 0, 0}, thirtytwo = {0, 0}, sixtyfour = {0}, unintptr = {0}, ptr = 0x0}, memlen = 0}}, {
          m_epg_npgs = 0 '\000', m_epg_nrdy = 0 '\000', m_epg_hdrlen = 0 '\000', m_epg_trllen = 0 '\000', m_epg_1st_off = 0, m_epg_last_len = 0, m_epg_flags = 0 '\000', m_epg_record_type = 0 '\000',
          __spare = "\000", m_epg_enc_cnt = 0, m_epg_tls = 0x0, m_epg_so = 0x1000, m_epg_seqno = 71776119061217280, m_epg_stailq = {stqe_next = 0x0}}}, {m_ext = {{ext_count = 1, ext_cnt = 0x4188000000000001},
          ext_size = 4096, ext_type = 3, ext_flags = 1, {{ext_buf = 0xfffff802ab0a4000 "L", ext_arg2 = 0x0}, {extpg_pa = {18446735289076039680, 0, 4831882610, 12234123588679374156, 0},
              extpg_trail = "\000\000\000\000\001\000\000\000J\021\231y\246\254\023\334_\331a\352\374\237oF\230\237Gk\231_\033}\202\261\3712\276rU\0268[\347\233$\021\224\177{.\374\301\224\033\236>\274G\223\372\f", <incomplete sequence \356\230>, extpg_hdr = "\3117!\033Y\236l\3721ҕ\232\370\017\273Z\035\201\272\272\327f?"}}, ext_free = 0x0, ext_arg1 = 0x0}, m_pktdat = 0xfffff80034366160 "\001"}},
    m_dat = 0xfffff80034366120 ""}}
(kgdb) p m->m_flags
$2 = 3

M_PKTHDR is defined as 0x00000002.
Comment 9 Robin Haberkorn 2025-03-18 15:45:37 UTC
(In reply to takahiro.kurosawa from comment #7)

Yes, I might have been using OpenVPN at the time of the crash.
But I don't remember whether I had an active tunnel at the time.
Comment 10 Bjoern A. Zeeb freebsd_committer freebsd_triage 2025-03-18 16:25:06 UTC
(In reply to Robin Haberkorn from comment #8)

You probably want to check m1 as that is the mbuf that goes up the stack.
Comment 11 Robin Haberkorn 2025-03-20 09:19:56 UTC
(In reply to Bjoern A. Zeeb from comment #10)

glebius@FreeBSD.org was asking for the original mbuf. The copy doesn't have M_PKTHDR set. But this doesn't surprise, considering that the pointer is passed right up to sbappendaddr_locked(). Do you think, it could have been lost in m_copym()?
Comment 12 Bjoern A. Zeeb freebsd_committer freebsd_triage 2025-03-20 10:16:58 UTC
(In reply to Robin Haberkorn from comment #11)

Yes I understand why the question was for the original one;  I just wanted to understand as the "copy" should have it too then and when manually going up the stack I couldn't spot where it might be cleared.

I wonder what Dan Epure's report looked like, Glebius?  Do you have a backtrace from that?  Same code paths?  Also UDP? Also ovpn loaded?

I don't know how the openvpn kernel module is hooking into the input path;  if it's using u_tun_func then that at least would explain why we do not see it in the backtrace.  And then Takahiro Kurosawa's analysis from #c7 sounds fairly valid if m_unshare can lose the PKTHDR on the copy.  OpenVPN doesn't have to be active;  it seems if there's an interface the filtering function will be hooked up.  And then it's not a regression in iwm or the network stack; then it's imply the feature.

Adding Kristof and bouncing this back to net@.
Comment 13 Robin Haberkorn 2025-03-20 21:54:56 UTC
(In reply to Robin Haberkorn from comment #9)

The openvpn process was actually running according to core.txt. This means, yes, there definitely was an active tunnel at the time of the crash.
Comment 14 Kristof Provost freebsd_committer freebsd_triage 2025-03-21 01:15:17 UTC
(In reply to Bjoern A. Zeeb from comment #12)
if_ovpn sets u_tun_func, yes, via udp_set_kernel_tunneling():  https://cgit.freebsd.org/src/tree/sys/net/if_ovpn.c#n635

I'm not sure how m_unshare() would feature in this, given that if it succeeds if_ovpn keeps the packet (so we'd see a panic there if it messed up), and if it fails we return 'true' (indicating the filter ate the packet, and no further UDP processing happens).
Comment 15 takahiro.kurosawa 2025-04-01 02:53:11 UTC
Created attachment 259236 [details]
testcase to reproduce the similar panic

I've been able to reproduce a similar panic with the attached patch
for /usr/tests/sys/net/if_ovpn/if_ovpn.
The testcase uses a broadcast packet for the openvpn port
that needs m_copym() calls to deliver it to each socket.
m_copym() increases reference counts of mbuf clusters that results
in m_unshare() returning a new mbuf and dropping M_PKTHDR of the
original mbuf.

To reproduce the panic:

----
cd /usr/tests/sys/net
cp -pr if_ovpn if_ovpn.283426
cd if_ovpn.283426
patch -p1 < /path/to/repro-283426.diff
kldload if_epair if_bridge if_ovpn
sync; sync; sync
kyua debug if_ovpn:bz283426
----
Comment 16 Kristof Provost freebsd_committer freebsd_triage 2025-04-01 12:28:20 UTC
Thanks for that test case, it was extremely helpful.

I see what's happening here now. The if_ovpn filter function calls m_unshare(), which allocates a new mbuf, but then decides it doesn't want the packet after all.
That's normally fine, but we can't tell the caller that we've messed with the m pointer, so the caller continues with a now invalid mbuf pointer.

I hope to propose a fix later today.
Comment 17 commit-hook freebsd_committer freebsd_triage 2025-04-01 14:26:15 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=04a7134c1e92c7752ffdc665f99ae26db70866c0

commit 04a7134c1e92c7752ffdc665f99ae26db70866c0
Author:     Kristof Provost <kp@FreeBSD.org>
AuthorDate: 2025-04-01 13:19:26 +0000
Commit:     Kristof Provost <kp@FreeBSD.org>
CommitDate: 2025-04-01 14:15:29 +0000

    if_ovpn: fix use-after-free of mbuf

    m_unshare() can return a new mbuf pointer. We update the 'm' pointer in
    ovpn_udp_input(), but if we decide to pass on the packet (e.g. because it's for
    an unknown peer) the caller (udp_append()) continues with the old 'm' pointer,
    eventually resulting in a use-after-free.

    Re-order operations in ovpn_udp_input() so that we don't modify the 'm' pointer
    until we're committed to keeping the packet.

    PR:             283426
    Test case by:   takahiro.kurosawa@gmail.com
    MFC after:      2 weeks
    Sponsored by:   Rubicon Communications, LLC ("Netgate")

 sys/net/if_ovpn.c                | 12 +++---
 tests/sys/net/if_ovpn/if_ovpn.sh | 81 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 87 insertions(+), 6 deletions(-)