Summary: | 11.1-R crashing in sendfile syscall, as used by a uwsgi process | ||||||
---|---|---|---|---|---|---|---|
Product: | Base System | Reporter: | Mark.Martinec | ||||
Component: | kern | Assignee: | Gleb Smirnoff <glebius> | ||||
Status: | Closed FIXED | ||||||
Severity: | Affects Some People | CC: | chris, glebius, smh | ||||
Priority: | --- | Keywords: | crash | ||||
Version: | 11.1-RELEASE | Flags: | koobs:
mfc-stable11+
|
||||
Hardware: | amd64 | ||||||
OS: | Any | ||||||
See Also: | https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=241162 | ||||||
Attachments: |
|
Description
Mark.Martinec
2017-09-12 14:26:28 UTC
Unfortunately it doesn't look like the attached crash info is usable: skoll.ijs.si dumped core - see /var/crash/vmcore.4 Tue Sep 12 16:05:13 CEST 2017 (no debugging symbols found)... skoll.ijs.si (no debugging symbols found)... (no debugging symbols found)... (no debugging symbols found)... panic: page fault GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd"...(no debugging symbols found)... Attempt to extract a component of a value that is not a structure pointer. Attempt to extract a component of a value that is not a structure pointer. #0 0xffffffff80a6b95a in doadump () (kgdb) #0 0xffffffff80a6b95a in doadump () #1 0xffffffff80a6b983 in doadump () #2 0xfffffe02397da180 in ?? () #3 0xffffffff80a6b6f1 in kern_reboot () Previous frame identical to this frame (corrupt stack?) Was the kernel changed or debug files deleted? Can kgdb read the crash.4? > Unfortunately it doesn't look like the attached crash info is usable > Was the kernel changed or debug files deleted? Hmm, not that I would know. The date of /boot/kernel/kernel matches the time of the upgrade. Upgrade was by freebsd-upgrade, the currently running kernel looks like a genuine GENERIC 11.1-RELEASE-p1 : # uname -a FreeBSD skoll.ijs.si 11.1-RELEASE-p1 FreeBSD 11.1-RELEASE-p1 #0: Wed Aug 9 11:55:48 UTC 2017 root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64 # ll /boot/kernel/kernel -r-xr-xr-x 1 root wheel 26852232 Sep 9 22:33 /boot/kernel/kernel # file /boot/kernel/kernel /boot/kernel/kernel: ELF 64-bit LSB executable, x86-64, version 1 (FreeBSD), dynamically linked, interpreter /red/herring, not stripped # cat /var/crash/info.4 Dump header from device: /dev/gpt/swap0 Architecture: amd64 Architecture Version: 2 Dump Length: 1031667712 Blocksize: 512 Dumptime: Tue Sep 12 15:00:21 2017 Hostname: skoll.ijs.si Magic: FreeBSD Kernel Dump Version String: FreeBSD 11.1-RELEASE-p1 #0: Wed Aug 9 11:55:48 UTC 2017 root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC Panic String: page fault Dump Parity: 568441422 Bounds: 4 Dump Status: good Swap partition is on gpt and is not encrypted: # swapinfo Device 1K-blocks Used Avail Capacity /dev/gpt/swap0 8388608 0 8388608 0% /dev/gpt/swap1 8388608 0 8388608 0% > Can kgdb read the crash.4? Apparently not: # kgdb -c /var/crash/vmcore.4 /boot/kernel/kernel GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd"...(no debugging symbols found)... Attempt to extract a component of a value that is not a structure pointer. Attempt to extract a component of a value that is not a structure pointer. #0 0xffffffff80a6b95a in doadump () (kgdb) (kgdb) bt #0 0xffffffff80a6b95a in doadump () #1 0xffffffff80a6b983 in doadump () #2 0xfffffe02397da180 in ?? () #3 0xffffffff80a6b6f1 in kern_reboot () Previous frame identical to this frame (corrupt stack?) The devel/gdb is even more specific: # gdb -c /var/crash/vmcore.4 /boot/kernel/kernel GNU gdb (GDB) 8.0 [GDB v8.0 for FreeBSD] Copyright (C) 2017 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-portbld-freebsd11.0". Type "show configuration" for configuration details. For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from /boot/kernel/kernel...(no debugging symbols found)...done. "/var/crash/vmcore.4" is not a core dump: File format not recognized Rebuilding kernel now with "makeoptions DEBUG=-g" ... > Rebuilding kernel now with "makeoptions DEBUG=-g" ... Looks like an improvement. Tonight there were four more crashes like this. # ll /boot/kernel/kernel /var/crash/vmcore.8 -r-xr-xr-x 1 root wheel 26852240 Sep 13 00:26 /boot/kernel/kernel -rw------- 1 root wheel 1039286272 Sep 13 08:00 /var/crash/vmcore.8 # kgdb /boot/kernel/kernel /var/crash/vmcore.8 GNU gdb (GDB) 8.0 [GDB v8.0 for FreeBSD] Copyright (C) 2017 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-portbld-freebsd11.0". Type "show configuration" for configuration details. For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from /boot/kernel/kernel...Reading symbols from /usr/lib/debug//boot/kernel/kernel.debug...done. done. Unread portion of the kernel message buffer: [15738] [15738] [15738] Fatal trap 12: page fault while in kernel mode [15738] cpuid = 1; apic id = 01 [15738] fault virtual address = 0xe8 [15738] fault code = supervisor write data, page not present [15738] instruction pointer = 0x20:0xffffffff80afefb2 [15738] stack pointer = 0x28:0xfffffe02391355a0 [15738] frame pointer = 0x28:0xfffffe02391355e0 [15738] code segment = base rx0, limit 0xfffff, type 0x1b [15738] = DPL 0, pres 1, long 1, def32 0, gran 1 [15738] processor eflags = interrupt enabled, resume, IOPL = 0 [15738] current process = 90843 (uwsgi) [15738] trap number = 12 [15738] panic: page fault [15738] cpuid = 1 [15738] KDB: stack backtrace: [15738] #0 0xffffffff80aada97 at kdb_backtrace+0x67 [15738] #1 0xffffffff80a6bb76 at vpanic+0x186 [15738] #2 0xffffffff80a6b9e3 at panic+0x43 [15738] #3 0xffffffff80edf832 at trap_fatal+0x322 [15738] #4 0xffffffff80edf889 at trap_pfault+0x49 [15738] #5 0xffffffff80edf0c6 at trap+0x286 [15738] #6 0xffffffff80ec3641 at calltrap+0x8 [15738] #7 0xffffffff80a6a2af at sendfile_iodone+0xbf [15738] #8 0xffffffff80a69eae at vn_sendfile+0x124e [15738] #9 0xffffffff80a6a4dd at sendfile+0x13d [15738] #10 0xffffffff80ee0394 at amd64_syscall+0x6c4 [15738] #11 0xffffffff80ec392b at Xfast_syscall+0xfb [15738] Uptime: 4h22m18s [15738] Dumping 991 out of 8129 MB:..2%..12%..21%..31%..41%..51%..62%..72%..81%..91% __curthread () at ./machine/pcpu.h:222 222 __asm("movq %%gs:%1,%0" : "=r" (td) (kgdb) bt #0 __curthread () at ./machine/pcpu.h:222 #1 doadump (textdump=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:298 #2 0xffffffff80a6b6f1 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:366 #3 0xffffffff80a6bbb0 in vpanic (fmt=<optimized out>, ap=0xfffffe0239135240) at /usr/src/sys/kern/kern_shutdown.c:759 #4 0xffffffff80a6b9e3 in panic (fmt=<unavailable>) at /usr/src/sys/kern/kern_shutdown.c:690 #5 0xffffffff80edf832 in trap_fatal (frame=0xfffffe02391354e0, eva=232) at /usr/src/sys/amd64/amd64/trap.c:801 #6 0xffffffff80edf889 in trap_pfault (frame=0xfffffe02391354e0, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:655 #7 0xffffffff80edf0c6 in trap (frame=0xfffffe02391354e0) at /usr/src/sys/amd64/amd64/trap.c:421 #8 <signal handler called> #9 0xffffffff80afefb2 in atomic_fcmpset_long (dst=0xe8, expect=<optimized out>, src=<optimized out>) at ./machine/atomic.h:224 #10 uipc_ready (so=<optimized out>, m=0xfffff80014fb8800, count=4) at /usr/src/sys/kern/uipc_usrreq.c:1075 #11 0xffffffff80a6a2af in sendfile_iodone (arg=0xfffff800344f5c00, pg=<optimized out>, count=<optimized out>, error=0) at /usr/src/sys/kern/kern_sendfile.c:293 #12 0xffffffff80a69eae in vn_sendfile (fp=<optimized out>, sockfd=<optimized out>, hdr_uio=0x0, trl_uio=<optimized out>, offset=<optimized out>, nbytes=<optimized out>, sent=<optimized out>, flags=<optimized out>, td=<optimized out>) at /usr/src/sys/kern/kern_sendfile.c:851 #13 0xffffffff80a6a4dd in fo_sendfile (fp=0xffffffff81d1d388 <unp_link_rwlock+24>, sockfd=88170496, hdr_uio=0x1, trl_uio=0x1, offset=0, nbytes=18446735281999667200, sent=0x1fffffff8, flags=4, td=0xfffff80105416000) at /usr/src/sys/sys/file.h:378 #14 sendfile (td=0xfffff80105416000, uap=0xfffffe0239135a30, compat=0) at /usr/src/sys/kern/kern_sendfile.c:977 #15 0xffffffff80ee0394 in syscallenter (td=<optimized out>, sa=<optimized out>) at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:135 #16 amd64_syscall (td=0xfffff80105416000, traced=0) at /usr/src/sys/amd64/amd64/trap.c:902 #17 <signal handler called> #18 0x000000080221761a in ?? () Backtrace stopped: Cannot access memory at address 0x7fffffffc868 Could you try compiling with WITNESS? I'm wondering if it might be related to: http://freebsd.1045724.x6.nabble.com/r315684-panic-sleepq-add-td-0xfffff80003c01a40-to-sleep-on-wchan-0xfffff80006f0873c-with-sleeping-prd-td6176029.html A commit references this bug: Author: glebius Date: Wed Sep 13 16:47:23 UTC 2017 New revision: 323552 URL: https://svnweb.freebsd.org/changeset/base/323552 Log: Fix two issues with not ready data in sockets (read: sendfile) in UNIX sockets. o Check that socket is still connected in uipc_ready(). If not we are responsible to free mbufs. o In uipc_send() if socket appears to be disconnected, but we are sending data with pending I/Os, don't free mbufs. Reported by: Kevin Bowling <kbowling llnw.com> Tested by: Kevin Bowling <kbowling llnw.com> PR: 222259 Reported by: Mark Martinec <Mark.Martinec ijs.si> MFC after: 3 days Changes: head/sys/kern/uipc_usrreq.c Mark, can you please test the patch that I just committed to the head branch? > Mark, can you please test the patch that I just committed to the head branch?
Great, thanks for the patch!
Running it now, will report how it fares tomorrow...
So far so good. Looks like a success! A commit references this bug: Author: glebius Date: Sat Sep 16 02:32:00 UTC 2017 New revision: 323634 URL: https://svnweb.freebsd.org/changeset/base/323634 Log: Merge r323552: Fix two issues with not ready data in sockets (read: sendfile) in UNIX sockets. o Check that socket is still connected in uipc_ready(). If not we are responsible to free mbufs. o In uipc_send() if socket appears to be disconnected, but we are sending data with pending I/Os, don't free mbufs. PR: 222259 Changes: _U stable/11/ stable/11/sys/kern/uipc_usrreq.c |