Bug 272616

Summary: [panic] Reproducible kernel panic related to sendfile and IPSec
Product: Base System Reporter: Eugene Grosbein <eugen>
Component: kernAssignee: freebsd-net (Nobody) <net>
Status: Closed FIXED    
Severity: Affects Some People CC: ae, afedorov, franco, glebius, jhb, kib, markj
Priority: --- Keywords: crash
Version: 13.2-STABLE   
Hardware: Any   
OS: Any   
See Also: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=271393
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=271991
Attachments:
Description Flags
ipsec: ensure that mbufs are mapped if ipsec is enabled none

Description Eugene Grosbein freebsd_committer freebsd_triage 2023-07-20 11:19:33 UTC
This PR is similar to https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=254419 except of pf(4) not in use.

I can reproduce the panic every attempt by fetching small plain text file (residing on ZFS) over HTTP/1.1 from my Apache httpd server using sendfile().

The traffic in question goes through gif(4) interface with mtu=1500 over ixl0 10Gbps interface with mtu=1500, so some IP fragmentation should occur.

First time it happened, the kernel generated crashdump just fine, rebooted and the crashdump was saved. Next my attempt reproduced same panic but kernel hang after printing "Uptime: 22m27s". I can experiment with this machine freely as it is my workstation not in service. And I have iKVM plus IPMI SOL working (serial console).

Unread portion of the kernel message buffer:



Fatal trap 12: page fault while in kernel mode
cpuid = 2; apic id = 04
fault virtual address   = 0x0
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff810bad5a
stack pointer           = 0x28:0xfffffe011dd8f4b0
frame pointer           = 0x28:0xfffffe011dd8f4b0
code segment            = base rx0, limit 0xfffff, type 0x1b

Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 02
fault virtual address   = 0x0
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff810bad5a
stack pointer           = 0x28:0xfffffe01771db4e0
frame pointer           = 0x28:0xfffffe01771db4e0
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled,                    = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 81478 (httpd)
trap number             = 12
panic: page fault
cpuid = 2
time = 1689822623
KDB: stack backtrace:
#0 0xffffffff80c53f15 at kdb_backtrace+0x65
#1 0xffffffff80c07852 at vpanic+0x152
#2 0xffffffff80c076f3 at panic+0x43
#3 0xffffffff810bede7 at trap_fatal+0x387
#4 0xffffffff810bee3f at trap_pfault+0x4f
#5 0xffffffff81096a78 at calltrap+0x8
#6 0xffffffff80c9c999 at m_unshare+0x3a9
#7 0xffffffff82d19534 at esp_output+0x184
#8 0xffffffff82d15fc6 at ipsec4_perform_request+0x3b6
#9 0xffffffff82d16113 at ipsec4_common_output+0x83
#10 0xffffffff80e3894c at ipsec_kmod_output+0x2c
#11 0xffffffff80dbc6df at ip_output+0xb8f
#12 0xffffffff80dd3a54 at tcp_output+0x1d74
#13 0xffffffff80de599f at tcp_usr_send+0x17f
#14 0xffffffff80c04ff1 at vn_sendfile+0x1251
#15 0xffffffff80c05fa7 at sendfile+0x117
#16 0xffffffff810bf6dc at amd64_syscall+0x10c
#17 0xffffffff8109738b at fast_syscall_common+0xf8
Uptime: 4d5h15m40s
Dumping 2283 out of 16249 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%

warning: Could not load shared library symbols for nvidia.ko.
Do you need "set solib-search-path" or "set sysroot"?
__curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
55              __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu,
ESC[?2004h(kgdb) bt
ESC[?2004l#0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
#1  doadump (textdump=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:396
#2  0xffffffff80c07419 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:484
#3  0xffffffff80c078bf in vpanic (fmt=<optimized out>, ap=ap@entry=0xfffffe011dd8f300)
    at /usr/src/sys/kern/kern_shutdown.c:923
#4  0xffffffff80c076f3 in panic (fmt=<unavailable>) at /usr/src/sys/kern/kern_shutdown.c:847
#5  0xffffffff810bede7 in trap_fatal (frame=0xfffffe011dd8f3f0, eva=0)
    at /usr/src/sys/amd64/amd64/trap.c:942
#6  0xffffffff810bee3f in trap_pfault (frame=0xfffffe011dd8f3f0, usermode=false,
    signo=<optimized out>, ucode=<optimized out>) at /usr/src/sys/amd64/amd64/trap.c:761
#7  <signal handler called>
#8  memcpy_erms () at /usr/src/sys/amd64/amd64/support.S:553
#9  0xffffffff80c9c999 in m_unshare (m0=0xfffff80146cc8200, how=1)
    at /usr/src/sys/kern/uipc_mbuf.c:2047
#10 0xffffffff82d19534 in esp_output () from /boot/kernel/ipsec.ko
#11 0xffffffff82d15fc6 in ipsec4_perform_request () from /boot/kernel/ipsec.ko
#12 0xffffffff82d16113 in ipsec4_common_output () from /boot/kernel/ipsec.ko
#13 0xffffffff80e3894c in ipsec_kmod_output (sc=0xfffff8001828ea00, sc@entry=0x18,
    m=0xfffff8002a388925, inp=0x3f8, inp@entry=0xfffff80133df99b0)
    at /usr/src/sys/netipsec/subr_ipsec.c:369
#14 0xffffffff80dbc6df in ip_output (m=0x0, m@entry=0xfffff80146cc8200, opt=<optimized out>,
    ro=<optimized out>, flags=0, imo=0x10, imo@entry=0x0, inp=0xfffff80133df99b0)
    at /usr/src/sys/netinet/ip_output.c:680
#15 0xffffffff80dd3a54 in tcp_output (tp=0xfffffe011d38d518)
    at /usr/src/sys/netinet/tcp_output.c:1541
#16 0xffffffff80de599f in tcp_usr_send (so=0xfffff8002a50cb10, flags=0, m=0x0, nam=0x0,
    control=<optimized out>, td=0xfffffe0176dcb720) at /usr/src/sys/netinet/tcp_usrreq.c:1178
#17 0xffffffff80c04ff1 in vn_sendfile (fp=<optimized out>, sockfd=22, hdr_uio=0x0, trl_uio=0x0,
    offset=<optimized out>, nbytes=1038, sent=0xfffffe011dd8fdc8, flags=0, td=0xfffffe0176dcb720)
    at /usr/src/sys/kern/kern_sendfile.c:1188
#18 0xffffffff80c05fa7 in fo_sendfile (fp=0xfffff8002a388925, sockfd=0, hdr_uio=0x3f8,
    trl_uio=0x3f8, offset=-2194227530512, nbytes=9, sent=0xfffffe011dd8fdc8, flags=708348197,
    td=0xfffffe0176dcb720) at /usr/src/sys/sys/file.h:416
#19 sendfile (td=0xfffffe0176dcb720, uap=0xfffffe0176dcbb08, compat=<optimized out>)
    at /usr/src/sys/kern/kern_sendfile.c:1326
#20 0xffffffff810bf6dc in syscallenter (td=0xfffffe0176dcb720)
    at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:190
#21 amd64_syscall (td=0xfffffe0176dcb720, traced=0) at /usr/src/sys/amd64/amd64/trap.c:1183
#22 <signal handler called>
#23 0x0000000828695a5a in ?? ()
Backtrace stopped: Cannot access memory at address 0x82077d418
Comment 1 Eugene Grosbein freebsd_committer freebsd_triage 2023-07-20 11:36:34 UTC
Adding more people to Cc: that may have an opinion. Some say m_unshare() should be extended to process mbufs with M_EXTPG, should it?
Comment 2 Konstantin Belousov freebsd_committer freebsd_triage 2023-07-20 12:09:37 UTC
m_unshare() is not enough.  Really software IPSEC requires mapped mbufs.
Even hw inline accel seems to need it, unfortunately.

Try something like the attached patch.
Comment 3 Konstantin Belousov freebsd_committer freebsd_triage 2023-07-20 12:10:22 UTC
Created attachment 243503 [details]
ipsec: ensure that mbufs are mapped if ipsec is enabled
Comment 4 Eugene Grosbein freebsd_committer freebsd_triage 2023-07-20 12:28:46 UTC
(In reply to Konstantin Belousov from comment #3)

The patch did not apply to stable/13, so I applied it manually, rebuilt and reinstalled GENERIC and it really helped: no more panics even with default
kern.ipc.mb_use_ext_pgs=1
Comment 5 Eugene Grosbein freebsd_committer freebsd_triage 2023-07-20 13:15:42 UTC
I was too quick... Indeed, I cannot reproduce the panic with patched kernel, but the machine started to experience sudden resets without anything printed to serial console between "Login: " after boot and next BIOS POST messages:

boot time                                  Thu Jul 20 20:10
boot time                                  Thu Jul 20 19:59
boot time                                  Thu Jul 20 19:48
boot time                                  Thu Jul 20 19:37
boot time                                  Thu Jul 20 19:26

I switched to kernel.old for now.
Comment 6 Mark Johnston freebsd_committer freebsd_triage 2023-07-20 13:44:24 UTC
(In reply to Konstantin Belousov from comment #2)
> Really software IPSEC requires mapped mbufs. Even hw inline accel seems to need it, unfortunately.

Why is that?  At least for sw it's only the payload that is unmapped, and crypto providers can handle that.
Comment 7 Konstantin Belousov freebsd_committer freebsd_triage 2023-07-20 13:47:18 UTC
(In reply to Mark Johnston from comment #6)
By payload you mean mbuf data, right?
IPSEC needs to match packet IP header against policy to decide should
it do anything with it at all.  Then it needs to select SA based on IP
header, policy, and perhaps system defaults.  All that requires access
to the mbuf data.

After the SA is selected, transformations are applied, which call into OCF.
Comment 8 Mark Johnston freebsd_committer freebsd_triage 2023-07-20 13:51:25 UTC
(In reply to Konstantin Belousov from comment #7)
I mean, protocol headers (IP, TCP, etc.) are still mapped.  More specifically, each mbuf in a chain can be mapped or not, and the IP header will generally be accessible even if the packet data is unmapped.
Comment 9 Konstantin Belousov freebsd_committer freebsd_triage 2023-07-20 14:07:24 UTC
(In reply to Mark Johnston from comment #8)
Is it guaranteed that all protocol headers are mapped?

Anyway, even quick look over the fundamental m_makespace() needed for
ESP injection shows that it is not ready for unmapped mbufs. IMO.
Comment 10 Mark Johnston freebsd_committer freebsd_triage 2023-07-20 14:23:18 UTC
(In reply to Konstantin Belousov from comment #9)
Well, there is no real guarantee, but if you only need to access the IP header, then mb_unmapped_to_ext() is overkill.  In practice, protocol headers generated by the kernel will live in mapped mbufs that are separate from unmapped data.  To be safer, we could introduce a mbuf function which guarantees that the first N bytes of the chain are mapped.

m_makespace() needs a bit of work but fundamentally I don't see any problems with IPSec+unmapped mbufs.  Really the bug here is that m_unshare() operates on the entire mbuf chain instead of stopping once we've gotten far enough to inject an IPSec header.
Comment 11 Konstantin Belousov freebsd_committer freebsd_triage 2023-07-20 14:31:28 UTC
(In reply to Mark Johnston from comment #10)
Your reply is not much different from my evaluation: IPSEC needs complete
audit to ensure that it works with unmapped pages in mbufs.  Until this is
done, either extpg should be administratively disabled, or a workaround used
that I posted in the patch.
Comment 12 Mark Johnston freebsd_committer freebsd_triage 2023-07-20 14:38:21 UTC
(In reply to Konstantin Belousov from comment #11)
I just wanted to establish the distinction between, "IPSec fundamentally cannot work with unmapped mbufs," and "IPSec is not yet ready to handle unmapped mbufs."  I wasn't sure which one you meant with the initial comment.  I think your patch is reasonable for 14.0.
Comment 13 Eugene Grosbein freebsd_committer freebsd_triage 2023-07-20 14:58:17 UTC
(In reply to Eugene Grosbein from comment #5)

I realized that I recently enabled IPMI watchdog and our watchdogd(8) daemon but loaded ipmi.ko once manually not enabling its load at reboot, so booted with patched kernel and without ipmi.ko resulted in system reset by the watchdog every 10 minutes (and after switch to kernel.old, too). Fixed this pilot error.
Comment 14 Aleksandr Fedorov freebsd_committer freebsd_triage 2023-07-20 15:07:04 UTC
The main problem is that we don’t know where the mbuf will fly from with the M_EXTPG flag. Now it's an IPSEC, tomorrow something else. I think all functions that work with mbuf's should correctly handle unmapped mbuf's.

But as a temporary patch, the solution proposed by kib@ is quite suitable.

And m_unshare() should handle unmapped mbuf's correctly.
Comment 15 commit-hook freebsd_committer freebsd_triage 2023-07-21 19:01:57 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=bc310a95c58a3c570ed7e5103371453881e36ba1

commit bc310a95c58a3c570ed7e5103371453881e36ba1
Author:     Konstantin Belousov <kib@FreeBSD.org>
AuthorDate: 2023-07-20 12:08:24 +0000
Commit:     Konstantin Belousov <kib@FreeBSD.org>
CommitDate: 2023-07-21 18:51:13 +0000

    ip output: ensure that mbufs are mapped if ipsec is enabled

    Ipsec needs access to packet headers to determine if a policy is
    applicable. It seems that typically IP headers are mapped, but the code
    is arguably needs to check this before blindly accessing them. Then,
    operations like m_unshare() and m_makespace() are not yet ready for
    unmapped mbufs.

    Ensure that the packet is mapped before calling into IPSEC_OUTPUT().

    PR:     272616
    Reviewed by:    jhb, markj
    Sponsored by:   NVidia networking
    MFC after:      1 week
    Differential revision:  https://reviews.freebsd.org/D41112

 sys/netinet/ip_output.c   | 6 ++++++
 sys/netinet6/ip6_output.c | 6 ++++++
 2 files changed, 12 insertions(+)
Comment 16 commit-hook freebsd_committer freebsd_triage 2023-07-28 01:27:18 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=7b335e9f690e77841e3eb7dbf3403429b10fe222

commit 7b335e9f690e77841e3eb7dbf3403429b10fe222
Author:     Konstantin Belousov <kib@FreeBSD.org>
AuthorDate: 2023-07-20 12:08:24 +0000
Commit:     Konstantin Belousov <kib@FreeBSD.org>
CommitDate: 2023-07-28 01:14:01 +0000

    ip output: ensure that mbufs are mapped if ipsec is enabled

    PR:     272616

    (cherry picked from commit bc310a95c58a3c570ed7e5103371453881e36ba1)

 sys/netinet/ip_output.c   | 6 ++++++
 sys/netinet6/ip6_output.c | 6 ++++++
 2 files changed, 12 insertions(+)
Comment 17 Eugene Grosbein freebsd_committer freebsd_triage 2023-08-15 12:40:11 UTC
Fixed in head and merged to stable/13.