This PR is similar to https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=254419 except of pf(4) not in use. I can reproduce the panic every attempt by fetching small plain text file (residing on ZFS) over HTTP/1.1 from my Apache httpd server using sendfile(). The traffic in question goes through gif(4) interface with mtu=1500 over ixl0 10Gbps interface with mtu=1500, so some IP fragmentation should occur. First time it happened, the kernel generated crashdump just fine, rebooted and the crashdump was saved. Next my attempt reproduced same panic but kernel hang after printing "Uptime: 22m27s". I can experiment with this machine freely as it is my workstation not in service. And I have iKVM plus IPMI SOL working (serial console). Unread portion of the kernel message buffer: Fatal trap 12: page fault while in kernel mode cpuid = 2; apic id = 04 fault virtual address = 0x0 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff810bad5a stack pointer = 0x28:0xfffffe011dd8f4b0 frame pointer = 0x28:0xfffffe011dd8f4b0 code segment = base rx0, limit 0xfffff, type 0x1b Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 02 fault virtual address = 0x0 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff810bad5a stack pointer = 0x28:0xfffffe01771db4e0 frame pointer = 0x28:0xfffffe01771db4e0 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 81478 (httpd) trap number = 12 panic: page fault cpuid = 2 time = 1689822623 KDB: stack backtrace: #0 0xffffffff80c53f15 at kdb_backtrace+0x65 #1 0xffffffff80c07852 at vpanic+0x152 #2 0xffffffff80c076f3 at panic+0x43 #3 0xffffffff810bede7 at trap_fatal+0x387 #4 0xffffffff810bee3f at trap_pfault+0x4f #5 0xffffffff81096a78 at calltrap+0x8 #6 0xffffffff80c9c999 at m_unshare+0x3a9 #7 0xffffffff82d19534 at esp_output+0x184 #8 0xffffffff82d15fc6 at ipsec4_perform_request+0x3b6 #9 0xffffffff82d16113 at ipsec4_common_output+0x83 #10 0xffffffff80e3894c at ipsec_kmod_output+0x2c #11 0xffffffff80dbc6df at ip_output+0xb8f #12 0xffffffff80dd3a54 at tcp_output+0x1d74 #13 0xffffffff80de599f at tcp_usr_send+0x17f #14 0xffffffff80c04ff1 at vn_sendfile+0x1251 #15 0xffffffff80c05fa7 at sendfile+0x117 #16 0xffffffff810bf6dc at amd64_syscall+0x10c #17 0xffffffff8109738b at fast_syscall_common+0xf8 Uptime: 4d5h15m40s Dumping 2283 out of 16249 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% warning: Could not load shared library symbols for nvidia.ko. Do you need "set solib-search-path" or "set sysroot"? __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 55 __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu, ESC[?2004h(kgdb) bt ESC[?2004l#0 __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 #1 doadump (textdump=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:396 #2 0xffffffff80c07419 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:484 #3 0xffffffff80c078bf in vpanic (fmt=<optimized out>, ap=ap@entry=0xfffffe011dd8f300) at /usr/src/sys/kern/kern_shutdown.c:923 #4 0xffffffff80c076f3 in panic (fmt=<unavailable>) at /usr/src/sys/kern/kern_shutdown.c:847 #5 0xffffffff810bede7 in trap_fatal (frame=0xfffffe011dd8f3f0, eva=0) at /usr/src/sys/amd64/amd64/trap.c:942 #6 0xffffffff810bee3f in trap_pfault (frame=0xfffffe011dd8f3f0, usermode=false, signo=<optimized out>, ucode=<optimized out>) at /usr/src/sys/amd64/amd64/trap.c:761 #7 <signal handler called> #8 memcpy_erms () at /usr/src/sys/amd64/amd64/support.S:553 #9 0xffffffff80c9c999 in m_unshare (m0=0xfffff80146cc8200, how=1) at /usr/src/sys/kern/uipc_mbuf.c:2047 #10 0xffffffff82d19534 in esp_output () from /boot/kernel/ipsec.ko #11 0xffffffff82d15fc6 in ipsec4_perform_request () from /boot/kernel/ipsec.ko #12 0xffffffff82d16113 in ipsec4_common_output () from /boot/kernel/ipsec.ko #13 0xffffffff80e3894c in ipsec_kmod_output (sc=0xfffff8001828ea00, sc@entry=0x18, m=0xfffff8002a388925, inp=0x3f8, inp@entry=0xfffff80133df99b0) at /usr/src/sys/netipsec/subr_ipsec.c:369 #14 0xffffffff80dbc6df in ip_output (m=0x0, m@entry=0xfffff80146cc8200, opt=<optimized out>, ro=<optimized out>, flags=0, imo=0x10, imo@entry=0x0, inp=0xfffff80133df99b0) at /usr/src/sys/netinet/ip_output.c:680 #15 0xffffffff80dd3a54 in tcp_output (tp=0xfffffe011d38d518) at /usr/src/sys/netinet/tcp_output.c:1541 #16 0xffffffff80de599f in tcp_usr_send (so=0xfffff8002a50cb10, flags=0, m=0x0, nam=0x0, control=<optimized out>, td=0xfffffe0176dcb720) at /usr/src/sys/netinet/tcp_usrreq.c:1178 #17 0xffffffff80c04ff1 in vn_sendfile (fp=<optimized out>, sockfd=22, hdr_uio=0x0, trl_uio=0x0, offset=<optimized out>, nbytes=1038, sent=0xfffffe011dd8fdc8, flags=0, td=0xfffffe0176dcb720) at /usr/src/sys/kern/kern_sendfile.c:1188 #18 0xffffffff80c05fa7 in fo_sendfile (fp=0xfffff8002a388925, sockfd=0, hdr_uio=0x3f8, trl_uio=0x3f8, offset=-2194227530512, nbytes=9, sent=0xfffffe011dd8fdc8, flags=708348197, td=0xfffffe0176dcb720) at /usr/src/sys/sys/file.h:416 #19 sendfile (td=0xfffffe0176dcb720, uap=0xfffffe0176dcbb08, compat=<optimized out>) at /usr/src/sys/kern/kern_sendfile.c:1326 #20 0xffffffff810bf6dc in syscallenter (td=0xfffffe0176dcb720) at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:190 #21 amd64_syscall (td=0xfffffe0176dcb720, traced=0) at /usr/src/sys/amd64/amd64/trap.c:1183 #22 <signal handler called> #23 0x0000000828695a5a in ?? () Backtrace stopped: Cannot access memory at address 0x82077d418
Adding more people to Cc: that may have an opinion. Some say m_unshare() should be extended to process mbufs with M_EXTPG, should it?
m_unshare() is not enough. Really software IPSEC requires mapped mbufs. Even hw inline accel seems to need it, unfortunately. Try something like the attached patch.
Created attachment 243503 [details] ipsec: ensure that mbufs are mapped if ipsec is enabled
(In reply to Konstantin Belousov from comment #3) The patch did not apply to stable/13, so I applied it manually, rebuilt and reinstalled GENERIC and it really helped: no more panics even with default kern.ipc.mb_use_ext_pgs=1
I was too quick... Indeed, I cannot reproduce the panic with patched kernel, but the machine started to experience sudden resets without anything printed to serial console between "Login: " after boot and next BIOS POST messages: boot time Thu Jul 20 20:10 boot time Thu Jul 20 19:59 boot time Thu Jul 20 19:48 boot time Thu Jul 20 19:37 boot time Thu Jul 20 19:26 I switched to kernel.old for now.
(In reply to Konstantin Belousov from comment #2) > Really software IPSEC requires mapped mbufs. Even hw inline accel seems to need it, unfortunately. Why is that? At least for sw it's only the payload that is unmapped, and crypto providers can handle that.
(In reply to Mark Johnston from comment #6) By payload you mean mbuf data, right? IPSEC needs to match packet IP header against policy to decide should it do anything with it at all. Then it needs to select SA based on IP header, policy, and perhaps system defaults. All that requires access to the mbuf data. After the SA is selected, transformations are applied, which call into OCF.
(In reply to Konstantin Belousov from comment #7) I mean, protocol headers (IP, TCP, etc.) are still mapped. More specifically, each mbuf in a chain can be mapped or not, and the IP header will generally be accessible even if the packet data is unmapped.
(In reply to Mark Johnston from comment #8) Is it guaranteed that all protocol headers are mapped? Anyway, even quick look over the fundamental m_makespace() needed for ESP injection shows that it is not ready for unmapped mbufs. IMO.
(In reply to Konstantin Belousov from comment #9) Well, there is no real guarantee, but if you only need to access the IP header, then mb_unmapped_to_ext() is overkill. In practice, protocol headers generated by the kernel will live in mapped mbufs that are separate from unmapped data. To be safer, we could introduce a mbuf function which guarantees that the first N bytes of the chain are mapped. m_makespace() needs a bit of work but fundamentally I don't see any problems with IPSec+unmapped mbufs. Really the bug here is that m_unshare() operates on the entire mbuf chain instead of stopping once we've gotten far enough to inject an IPSec header.
(In reply to Mark Johnston from comment #10) Your reply is not much different from my evaluation: IPSEC needs complete audit to ensure that it works with unmapped pages in mbufs. Until this is done, either extpg should be administratively disabled, or a workaround used that I posted in the patch.
(In reply to Konstantin Belousov from comment #11) I just wanted to establish the distinction between, "IPSec fundamentally cannot work with unmapped mbufs," and "IPSec is not yet ready to handle unmapped mbufs." I wasn't sure which one you meant with the initial comment. I think your patch is reasonable for 14.0.
(In reply to Eugene Grosbein from comment #5) I realized that I recently enabled IPMI watchdog and our watchdogd(8) daemon but loaded ipmi.ko once manually not enabling its load at reboot, so booted with patched kernel and without ipmi.ko resulted in system reset by the watchdog every 10 minutes (and after switch to kernel.old, too). Fixed this pilot error.
The main problem is that we don’t know where the mbuf will fly from with the M_EXTPG flag. Now it's an IPSEC, tomorrow something else. I think all functions that work with mbuf's should correctly handle unmapped mbuf's. But as a temporary patch, the solution proposed by kib@ is quite suitable. And m_unshare() should handle unmapped mbuf's correctly.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=bc310a95c58a3c570ed7e5103371453881e36ba1 commit bc310a95c58a3c570ed7e5103371453881e36ba1 Author: Konstantin Belousov <kib@FreeBSD.org> AuthorDate: 2023-07-20 12:08:24 +0000 Commit: Konstantin Belousov <kib@FreeBSD.org> CommitDate: 2023-07-21 18:51:13 +0000 ip output: ensure that mbufs are mapped if ipsec is enabled Ipsec needs access to packet headers to determine if a policy is applicable. It seems that typically IP headers are mapped, but the code is arguably needs to check this before blindly accessing them. Then, operations like m_unshare() and m_makespace() are not yet ready for unmapped mbufs. Ensure that the packet is mapped before calling into IPSEC_OUTPUT(). PR: 272616 Reviewed by: jhb, markj Sponsored by: NVidia networking MFC after: 1 week Differential revision: https://reviews.freebsd.org/D41112 sys/netinet/ip_output.c | 6 ++++++ sys/netinet6/ip6_output.c | 6 ++++++ 2 files changed, 12 insertions(+)
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=7b335e9f690e77841e3eb7dbf3403429b10fe222 commit 7b335e9f690e77841e3eb7dbf3403429b10fe222 Author: Konstantin Belousov <kib@FreeBSD.org> AuthorDate: 2023-07-20 12:08:24 +0000 Commit: Konstantin Belousov <kib@FreeBSD.org> CommitDate: 2023-07-28 01:14:01 +0000 ip output: ensure that mbufs are mapped if ipsec is enabled PR: 272616 (cherry picked from commit bc310a95c58a3c570ed7e5103371453881e36ba1) sys/netinet/ip_output.c | 6 ++++++ sys/netinet6/ip6_output.c | 6 ++++++ 2 files changed, 12 insertions(+)
Fixed in head and merged to stable/13.