Summary: | accessing freed inpcb in udp6_bind | ||||||
---|---|---|---|---|---|---|---|
Product: | Base System | Reporter: | Weldon Godfrey <weldon> | ||||
Component: | kern | Assignee: | Gleb Smirnoff <glebius> | ||||
Status: | Closed FIXED | ||||||
Severity: | Affects Only Me | CC: | dpetrov67, glebius, grahamperrin, markj, rmf, rscheff, takeda, tuexen, vedran | ||||
Priority: | --- | Keywords: | crash | ||||
Version: | 14.0-RELEASE | Flags: | markj:
needs_errata+
|
||||
Hardware: | amd64 | ||||||
OS: | Any | ||||||
Attachments: |
|
Description
Weldon Godfrey
2023-09-17 11:02:52 UTC
Do you have a backtrace with line number information? This can be obtained from the full dump, in core.*.txt. (In reply to Mark Johnston from comment #1) I got this error on 14.0-RC4: Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 10 fault virtual address = 0x8 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80be4343 stack pointer = 0x28:0xfffffe0080454ae0 frame pointer = 0x28:0xfffffe0080454b10 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 12 (swi1: netisr 0) rdi: fffff801d713c290 rsi: fffff8018fed5100 rdx: ffffffff82a10090 rcx: 0000000000001000 r8: ffffffff82a10090 r9: fffffe0080454b50 rax: 0000000000000001 rbx: fffff801d713c290 rbp: fffffe0080454b10 r10: 000000000000000a r11: fffffe0080454740 r12: 0000000000000000 r13: 0000000000000000 r14: fffff8018fed5100 r15: 0000000000001000 trap number = 12 panic: page fault cpuid = 0 time = 1699775882 KDB: stack backtrace: #0 0xffffffff80b9002d at kdb_backtrace+0x5d #1 0xffffffff80b43132 at vpanic+0x132 #2 0xffffffff80b42ff3 at panic+0x43 #3 0xffffffff8100c85c at trap_fatal+0x40c #4 0xffffffff8100c8af at trap_pfault+0x4f #5 0xffffffff80fe3828 at calltrap+0x8 #6 0xffffffff80be454b at sbdrop+0x3b #7 0xffffffff80d11c34 at tcp_do_segment+0x2e04 #8 0xffffffff80d0e4e5 at tcp_input_with_port+0xfe5 #9 0xffffffff80d0ee1b at tcp_input+0xb #10 0xffffffff80cfe40d at ip_input+0x23d #11 0xffffffff80c845b8 at swi_net+0x128 #12 0xffffffff80b01497 at ithread_loop+0x257 #13 0xffffffff80afdb0f at fork_exit+0x7f #14 0xffffffff80fe488e at fork_trampoline+0xe Where are the core files? (In reply to Vedran Miletic from comment #2) Assuming you have coredumps enabled (e.g., "dumpdev" is set in /etc/rc.conf), they should appear in /var/crash after a reboot. Created attachment 246274 [details]
xz compressed core.txt
(In reply to Vedran Miletic from comment #4) Thanks. There is no need to compress core.txt. Are you able to test FreeBSD-CURRENT? It has assertions enabled which may catch the problem more usefully. CC a couple of TCP folks. PR 268699 looks similar. A full core would certainly be helpful to see the mbufs in play at the time of the panic. (In reply to Mark Johnston from comment #5) Unfortunately, not really, and this is the first time that it happened so I am not that worried. I'll CC to bug 268699. It has happened to me 3 times since the original report on various releases. Not all that often but it is a lightly used webserver I am just now installing gdb, should have done that earlier, sorry. I do have vmcore files, but they are too large to attach, if someone wants one and has another way for me to send, let me know. Otherwise, I will attach core.txt on next crash since i should have gdb installed then. Last crash yesterday, this is the version info. I will be upgrading latest RELENG shortly FreeBSD venom-f5 14.0-RELEASE FreeBSD 14.0-RELEASE #2 releng/14.0-n265380-f9716eee8ab: Tue Nov 21 13:04:48 CST 2023 root@venom-f5:/usr/obj/usr/src/amd64.amd64/sys/VENOM-F5 amd64 I ran kdbg on the vmcore file, I got this...I don't know if this helps. root@venom-f5:/var/crash # kgdb /boot/kernel/kernel /var/crash/vmcore.3 GNU gdb (GDB) 13.2 [GDB v13.2 for FreeBSD] Copyright (C) 2023 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-portbld-freebsd14.0". Type "show configuration" for configuration details. For bug reporting instructions, please see: <https://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from /boot/kernel/kernel... Reading symbols from /usr/lib/debug//boot/kernel/kernel.debug... Unread portion of the kernel message buffer: cpuid = 1; apic id = 01 fault virtual address = 0xb8 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80d4e890 stack pointer = 0x28:0xfffffe01017ddc70 frame pointer = 0x28:0xfffffe01017ddcf0 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 676 (isc-net-0005) rdi: ffffffff816c7a30 rsi: ffffffff816c7a30 rdx: 0000000000010200 rcx: 0000000000000000 r8: fffff8001b0c1400 r9: 0000000000000000 rax: fffff8014f794a80 rbx: fffff8010c069540 rbp: fffffe01017ddcf0 r10: 0000000000000000 r11: fffffe0103a59700 r12: fffff80245b80e20 r13: 000000000000c2e5 r14: 000000000000c2e5 r15: fffff8001b0c1400 trap number = 12 panic: page fault cpuid = 1 time = 1702438402 KDB: stack backtrace: #0 0xffffffff80b9302d at kdb_backtrace+0x5d #1 0xffffffff80b46132 at vpanic+0x132 #2 0xffffffff80b45ff3 at panic+0x43 #3 0xffffffff8103285c at trap_fatal+0x40c #4 0xffffffff810328af at trap_pfault+0x4f #5 0xffffffff810098e8 at calltrap+0x8 #6 0xffffffff80d708ac at udp6_bind+0x13c #7 0xffffffff80be9132 at sobind+0x32 #8 0xffffffff80bf05c5 at kern_bindat+0xc5 #9 0xffffffff80bf045b at sys_bind+0x9b #10 0xffffffff81033119 at amd64_syscall+0x109 #11 0xffffffff8100a1fb at fast_syscall_common+0xf8 Uptime: 19d16h43m33s Dumping 1669 out of 16338 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:57 57 __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu, Adding gleb; the latest backtrace has no hint of TCP in there, but UDP; maybe the mbuf field reuse we've been discussing this week may have something to do with this? Thanks for adding me, Richard! This unlikely is related to the mbuf field reuse that we discussed. However, it is very likely the problem is in the area I recently worked on. Weldon, I'm interested in the vmcore file + contents of /boot/kernel + contents of /usr/lib/debug/boot/kernel. Do you have an opportunity to put them on web? Please compress them aggressively and you can use my PGP key to encrypt it. Key id is 5D05CC22 and you can obtain or verify it here https://docs.freebsd.org/en/articles/pgpkeys/#_gleb_smirnoff_glebiusfreebsd_org (In reply to Gleb Smirnoff from comment #11) thanks, email with link to file sent to you. Getting very similar (with upd6_bind) crashes on multiple machines (vmware guests) every few days after upgrade to recent 14/stable. Kernel was compiled with "nooptions SCTP", otherwise it is generic. Weldon gave me access to the core. Writing up summary for Mark and other interested parties. The panic happens at in6_pcb.c:257: (t->inp_socket->so_options & SO_REUSEPORT) || The temporary inpcb t has NULL inp_socket. It also has INP_FREED flag set. The inpcb had been found with in6_pcblookup_local() which doesn't do INP_FREED check, neither acquires the inpcb lock. It relies on the hash lock, that we hold. And the freed inpcb has INP_INHASHLIST flag set, which is definitely a problem. These two flags should be a xor. Unless me or Mark quickly find a problem in the code with our eyes, we would need somebody, e.g. Weldon Godfrey or Dmitry Petrov to run a kernel compiled with INVARIANTS option and probably with additional patch that would catch creation of invalid inpcb. Please let me know if you can assist with this. I see the problem. The inpcb destruction order has a flaw. We first clear inp_socket, then set INP_FREED flag, then call in_pcbremhash(). This isn't compatible with inpcb_lookup_local() which doesn't use inpcb lock. Coming with a patch soon. Weldon, Dmitry, please test the patch from this review. Let me know if you have any problems with applying it. https://reviews.freebsd.org/D43122 Applied patch. Patch successfully updated intended files. recompiled/installed kernel and rebooted. Looks good so far Same here - no issues so far with the patch on top of latest 14/stable on one of the machines. It can take awhile for me to be confident it fixes the issue. In my case it was typically crashing after 2-5 days, and I never managed to reproduce it on demand. Note, I do not really use IPv6 on my local network. Could it be that CARP protocol triggers the issue [as it sends some traffic to ff02::12]? Dmitry, if you see udp6_bind() in the panic trace, then definitely some process requested to bind to a IPv6 address. You can see the process name in the panic banner in the core. A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=a13039e2709277b1c3b159e694cc909a5e044151 commit a13039e2709277b1c3b159e694cc909a5e044151 Author: Gleb Smirnoff <glebius@FreeBSD.org> AuthorDate: 2023-12-27 16:34:37 +0000 Commit: Gleb Smirnoff <glebius@FreeBSD.org> CommitDate: 2023-12-27 16:34:37 +0000 inpcb: reoder inpcb destruction First, merge in_pcbdetach() with in_pcbfree(). The comment for in_pcbdetach() was no longer correct. Then, make sure we remove the inpcb from the hash before we commit any destructive actions on it. There are couple functions that rely on the hash lock skipping SMR + inpcb lock to lookup an inpcb. Although there are no known functions that similarly rely on the global inpcb list lock, also do list removal before destructive actions. PR: 273890 Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D43122 sys/netinet/in_pcb.c | 39 +++++++++++++++------------------------ sys/netinet/in_pcb.h | 1 - sys/netinet/raw_ip.c | 1 - sys/netinet/tcp_syncache.c | 2 -- sys/netinet/tcp_usrreq.c | 2 -- sys/netinet/udp_usrreq.c | 1 - sys/netinet6/raw_ip6.c | 1 - sys/netinet6/udp6_usrreq.c | 1 - 8 files changed, 15 insertions(+), 33 deletions(-) I encountered a crash today. I'm wondering (based on subject line if this is the same issue, or this is another bug?).
It's from 14.0-RELEASE-p3 #6:
> Backtrace:
> Reading symbols from /boot/kernel/kernel...
> Reading symbols from /usr/lib/debug//boot/kernel/kernel.debug...
> __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:57
> 57 __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu,
> (kgdb) #0 __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:57
> #1 doadump (textdump=textdump@entry=1)
> at /usr/src/sys/kern/kern_shutdown.c:405
> #2 0xffffffff8078cdec in kern_reboot (howto=260)
> at /usr/src/sys/kern/kern_shutdown.c:526
> #3 0xffffffff8078d2df in vpanic (fmt=0xffffffff80c16199 "%s",
> ap=ap@entry=0xfffffe00b4558ac0) at /usr/src/sys/kern/kern_shutdown.c:970
> #4 0xffffffff8078d133 in panic (fmt=<unavailable>)
> at /usr/src/sys/kern/kern_shutdown.c:894
> #5 0xffffffff80bb70dc in trap_fatal (frame=0xfffffe00b4558bb0, eva=184)
> at /usr/src/sys/amd64/amd64/trap.c:952
> #6 0xffffffff80bb712f in trap_pfault (frame=0xfffffe00b4558bb0,
> usermode=false, signo=<optimized out>, ucode=<optimized out>)
> at /usr/src/sys/amd64/amd64/trap.c:760
> #7 <signal handler called>
> #8 0xffffffff809a0820 in in6_pcbbind (inp=inp@entry=0xfffff800cf7dde00,
> sin6=sin6@entry=0xfffff801d552db60, cred=0xfffff800920c9b00)
> at /usr/src/sys/netinet6/in6_pcb.c:257
> #9 0xffffffff809c283c in udp6_bind (so=<optimized out>,
> nam=0xfffff801d552db60, td=<optimized out>)
> at /usr/src/sys/netinet6/udp6_usrreq.c:1059
> #10 0xffffffff80832992 in sobind (so=0xffffffff80e6c220 <prison0>,
> so@entry=0xfffff80138f1ab40, nam=0xffffffff80e6c220 <prison0>,
> nam@entry=0xfffff801d552db60, td=0x10200, td@entry=0xfffffe00b42a2900)
> at /usr/src/sys/kern/uipc_socket.c:940
> #11 0xffffffff80839e25 in kern_bindat (td=td@entry=0xfffffe00b42a2900,
> dirfd=dirfd@entry=-100, fd=<optimized out>,
> sa=sa@entry=0xfffff801d552db60) at /usr/src/sys/kern/uipc_syscalls.c:223
> #12 0xffffffff80839cbb in sys_bind (td=0xfffffe00b42a2900,
> uap=0xfffffe00b42a2d00) at /usr/src/sys/kern/uipc_syscalls.c:190
> #13 0xffffffff80bb7fec in syscallenter (td=0xfffffe00b42a2900)
> at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:159
> #14 amd64_syscall (td=0xfffffe00b42a2900, traced=0)
> at /usr/src/sys/amd64/amd64/trap.c:1197
> #15 <signal handler called>
> #16 0x0000000827adb5aa in ?? ()
> Backtrace stopped: Cannot access memory at address 0x82c3d5af8
> (kgdb)
On Tue Jan 2 21:46:25 2024 UTC, takeda@takeda.tk wrote: > I encountered a crash today. I'm wondering (based on subject line if this is the > same issue, or this is another bug?). This looks similar. Please try out the patch. I plan to merge the fix to stable/14 this week. But extra testing of the patch bore I merge is much appreciated. I had no issues with the original patch (as of 2023-12-19) so far. Ok so I just rebuilt the kernel and restarted the machine. Please keep in mind that previous kernel was built on Dec 17th and this is first time it crashed. I will update this bug if it crashes again though. A commit in branch stable/14 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=2bfe735277b8858dd7ad937e0bf2286bdfb45182 commit 2bfe735277b8858dd7ad937e0bf2286bdfb45182 Author: Gleb Smirnoff <glebius@FreeBSD.org> AuthorDate: 2023-12-27 16:34:37 +0000 Commit: Gleb Smirnoff <glebius@FreeBSD.org> CommitDate: 2024-01-09 00:29:38 +0000 inpcb: reoder inpcb destruction First, merge in_pcbdetach() with in_pcbfree(). The comment for in_pcbdetach() was no longer correct. Then, make sure we remove the inpcb from the hash before we commit any destructive actions on it. There are couple functions that rely on the hash lock skipping SMR + inpcb lock to lookup an inpcb. Although there are no known functions that similarly rely on the global inpcb list lock, also do list removal before destructive actions. PR: 273890 Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D43122 (cherry picked from commit a13039e2709277b1c3b159e694cc909a5e044151) sys/netinet/in_pcb.c | 39 +++++++++++++++------------------------ sys/netinet/in_pcb.h | 1 - sys/netinet/raw_ip.c | 1 - sys/netinet/tcp_syncache.c | 2 -- sys/netinet/tcp_usrreq.c | 2 -- sys/netinet/udp_usrreq.c | 1 - sys/netinet6/raw_ip6.c | 1 - sys/netinet6/udp6_usrreq.c | 1 - 8 files changed, 15 insertions(+), 33 deletions(-) Merged to stable/14. Will be fixed in FreeBSD 14.1-RELEASE. *** Bug 276270 has been marked as a duplicate of this bug. *** Since this has come up several times, let's reopen this until an erratum is released. A commit in branch releng/14.0 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=9db5ae3ec45f069f48808a2916ea1c5374e75e6c commit 9db5ae3ec45f069f48808a2916ea1c5374e75e6c Author: Gleb Smirnoff <glebius@FreeBSD.org> AuthorDate: 2023-12-27 16:34:37 +0000 Commit: Gordon Tetlow <gordon@FreeBSD.org> CommitDate: 2024-02-14 05:40:23 +0000 inpcb: reoder inpcb destruction First, merge in_pcbdetach() with in_pcbfree(). The comment for in_pcbdetach() was no longer correct. Then, make sure we remove the inpcb from the hash before we commit any destructive actions on it. There are couple functions that rely on the hash lock skipping SMR + inpcb lock to lookup an inpcb. Although there are no known functions that similarly rely on the global inpcb list lock, also do list removal before destructive actions. PR: 273890 Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D43122 Approved by: so Security: FreeBSD-EN-24:04.ip (cherry picked from commit a13039e2709277b1c3b159e694cc909a5e044151) (cherry picked from commit 2bfe735277b8858dd7ad937e0bf2286bdfb45182) sys/netinet/in_pcb.c | 39 +++++++++++++++------------------------ sys/netinet/in_pcb.h | 1 - sys/netinet/raw_ip.c | 1 - sys/netinet/tcp_syncache.c | 2 -- sys/netinet/tcp_usrreq.c | 2 -- sys/netinet/udp_usrreq.c | 1 - sys/netinet6/raw_ip6.c | 1 - sys/netinet6/udp6_usrreq.c | 1 - 8 files changed, 15 insertions(+), 33 deletions(-) An EN for 14.0 was released. |