FreeBSD ihor-4.amdmi3.ru 12.0-RELEASE-p10 FreeBSD 12.0-RELEASE-p10 GENERIC amd64 running under KVM just panicked after 110d uptime. Reading symbols from /boot/kernel/kernel... Reading symbols from /usr/lib/debug//boot/kernel/kernel.debug... Unread portion of the kernel message buffer: Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x8 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80c2f0f0 stack pointer = 0x28:0xfffffe0022b15790 frame pointer = 0x28:0xfffffe0022b157c0 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 64606 (nginx) trap number = 12 panic: page fault cpuid = 0 time = 1570625343 KDB: stack backtrace: #0 0xffffffff80be7977 at kdb_backtrace+0x67 #1 0xffffffff80b9b563 at vpanic+0x1a3 #2 0xffffffff80b9b3b3 at panic+0x43 #3 0xffffffff81074bff at trap_fatal+0x35f #4 0xffffffff81074c59 at trap_pfault+0x49 #5 0xffffffff8107427e at trap+0x29e #6 0xffffffff8104f705 at calltrap+0x8 #7 0xffffffff80c2e1f8 at sbdestroy+0x28 #8 0xffffffff80c30a55 at sofree+0x285 #9 0xffffffff80c31600 at soclose+0x330 #10 0xffffffff80b4363a at _fdrop+0x1a #11 0xffffffff80b466e4 at closef+0x244 #12 0xffffffff80b43b69 at closefp+0x99 #13 0xffffffff810756d9 at amd64_syscall+0x369 #14 0xffffffff8104ffed at fast_syscall_common+0x101 Uptime: 110d17h32m42s Dumping 394 out of 3033 MB:..5%..13%..21%..33%..41%..53%..61%..74%..82%..94% 0xffffffff80b9b32b in __curthread () at ./machine/pcpu.h:234 234 ./machine/pcpu.h: No such file or directory. (kgdb) bt #0 0xffffffff80b9b32b in __curthread () at ./machine/pcpu.h:234 #1 vpanic (fmt=0x0, ap=0x0) at /usr/src/sys/kern/kern_shutdown.c:808 #2 0xffffffff80b9b14b in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:449 #3 0xffffffff80b9b5c3 in kthread_shutdown (arg=0x100, howto=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:940 #4 0xffffffff80b9b3b3 in vpanic (fmt=0xffffffff8127ebc7 "supervisor", ap=0xfffffe0022b154e0) at /usr/src/sys/kern/kern_shutdown.c:823 #5 0xffffffff81074bff in trap_fatal (frame=0xfffffe0022b156d0, eva=18446741875268343056) at /usr/src/sys/amd64/amd64/trap.c:929 #6 0xffffffff81074c59 in trap_pfault (frame=0xfffffe0022b156d0, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:765 #7 0xffffffff8107427e in trap (frame=0xfffffe0022b156d0) at /usr/src/sys/amd64/amd64/trap.c:441 #8 <signal handler called> #9 0xffffffff80c2f0f0 in sbcut_internal (sb=0xfffff800b351a828, len=<optimized out>) at /usr/src/sys/kern/uipc_sockbuf.c:1163 #10 0xffffffff80c2e1f8 in sbappend_locked (sb=0xfffff800b351a828, m=0x156f, flags=<optimized out>) at /usr/src/sys/kern/uipc_sockbuf.c:668 #11 0xffffffff80c30a55 in sofree (so=0xfffff800b351a828) at /usr/src/sys/kern/uipc_socket.c:1005 #12 0xffffffff80c31600 in atomic_fcmpset_long (dst=<optimized out>, src=0, expect=<optimized out>) at ./machine/atomic.h:221 #13 soclose (so=<optimized out>) at /usr/src/sys/kern/uipc_socket.c:1141 #14 0xffffffff80b4363a in ia32_pause () at /usr/src/sys/sys/seq.h:130 #15 seq_read (seqp=0x0) at /usr/src/sys/sys/seq.h:131 #16 fget_unlocked (fdp=0xfffff8002268e4b0, fd=5487, needrightsp=0xfffff800aef35000, fpp=0xfffff8002268e4b0, seqp=0xfffff8008a975000) at /usr/src/sys/kern/kern_descrip.c:2652 #17 0xffffffff80b466e4 in atomic_fcmpset_long (src=1, dst=<optimized out>, expect=<optimized out>) at ./machine/atomic.h:221 #18 __sx_xunlock (line=0, sx=<optimized out>, td=<optimized out>, file=<optimized out>) at /usr/src/sys/sys/sx.h:179 #19 fdclose (td=<optimized out>, fp=<optimized out>, idx=-1969795008) at /usr/src/sys/kern/kern_descrip.c:2386 #20 0xffffffff80b43b69 in closefp (fdp=0xfffff8008a975000, fd=<optimized out>, fp=<optimized out>, td=0xfffff800aef35000, holdleaders=0) at /usr/src/sys/kern/kern_descrip.c:1202 #21 0xffffffff810756d9 in syscallenter (td=<optimized out>) at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:135 #22 amd64_syscall (td=0xfffff800aef35000, traced=0) at /usr/src/sys/amd64/amd64/trap.c:1076 #23 <signal handler called> #24 0x000000080097009a in ?? () Backtrace stopped: Cannot access memory at address 0x7fffffffe6e8 (kgdb) bt full #0 0xffffffff80b9b32b in __curthread () at ./machine/pcpu.h:234 td = <optimized out> #1 vpanic (fmt=0x0, ap=0x0) at /usr/src/sys/kern/kern_shutdown.c:808 buf = "page fault", '\000' <repeats 245 times> td = <optimized out> bootopt = <optimized out> newpanic = <optimized out> other_cpus = <optimized out> #2 0xffffffff80b9b14b in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:449 _ep = 0x0 _t = <optimized out> _el = <optimized out> once = <error reading variable once (Cannot access memory at address 0x0)> #3 0xffffffff80b9b5c3 in kthread_shutdown (arg=0x100, howto=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:940 td = 0x100 error = <optimized out> #4 0xffffffff80b9b3b3 in vpanic (fmt=0xffffffff8127ebc7 "supervisor", ap=0xfffffe0022b154e0) at /usr/src/sys/kern/kern_shutdown.c:823 buf = "page fault", '\000' <repeats 245 times> td = 0xfffff800aef35000 bootopt = <optimized out> newpanic = <optimized out> other_cpus = <optimized out> #5 0xffffffff81074bff in trap_fatal (frame=0xfffffe0022b156d0, eva=18446741875268343056) at /usr/src/sys/amd64/amd64/trap.c:929 softseg = {ssd_base = 0, ssd_limit = 1048575, ssd_type = 27, ssd_dpl = 0, ssd_p = 1, ssd_long = 1, ssd_def32 = 0, ssd_gran = 1} msg = <optimized out> ss = 16 code = <optimized out> type = <optimized out> handled = <optimized out> #6 0xffffffff81074c59 in trap_pfault (frame=0xfffffe0022b156d0, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:765 td = 0xfffff800aef35000 p = <optimized out> eva = 8 va = <optimized out> map = <optimized out> ftype = <optimized out> rv = <optimized out> #7 0xffffffff8107427e in trap (frame=0xfffffe0022b156d0) at /usr/src/sys/amd64/amd64/trap.c:441 td = 0xfffff800aef35000 dr6 = <error reading variable dr6 (Cannot access memory at address 0x0)> addr = -8795947097504 ucode = <error reading variable ucode (Cannot access memory at address 0x3)> signo = <error reading variable signo (Cannot access memory at address 0xa)> p = <optimized out> type = 12 ksi = <optimized out> #8 <signal handler called> No locals. #9 0xffffffff80c2f0f0 in sbcut_internal (sb=0xfffff800b351a828, len=<optimized out>) at /usr/src/sys/kern/uipc_sockbuf.c:1163 m = 0x0 next = <optimized out> mfree = <optimized out> #10 0xffffffff80c2e1f8 in sbappend_locked (sb=0xfffff800b351a828, m=0x156f, flags=<optimized out>) at /usr/src/sys/kern/uipc_sockbuf.c:668 n = 0x1ff #11 0xffffffff80c30a55 in sofree (so=0xfffff800b351a828) at /usr/src/sys/kern/uipc_socket.c:1005 _tid = <optimized out> _v = <error reading variable _v (Cannot access memory at address 0x0)> sol = <optimized out> pr = 0xffffffff81b23290 <localsw> #12 0xffffffff80c31600 in atomic_fcmpset_long (dst=<optimized out>, src=0, expect=<optimized out>) at ./machine/atomic.h:221 res = <optimized out> #13 soclose (so=<optimized out>) at /usr/src/sys/kern/uipc_socket.c:1141 _v = <optimized out> sp = <optimized out> error = <error reading variable error (Cannot access memory at address 0x0)> saved_vnet = <optimized out> lqueue = <optimized out> listening = <optimized out> #14 0xffffffff80b4363a in ia32_pause () at /usr/src/sys/sys/seq.h:130 No locals. #15 seq_read (seqp=0x0) at /usr/src/sys/sys/seq.h:131 ret = 577299632 #16 fget_unlocked (fdp=0xfffff8002268e4b0, fd=5487, needrightsp=0xfffff800aef35000, fpp=0xfffff8002268e4b0, seqp=0xfffff8008a975000) at /usr/src/sys/kern/kern_descrip.c:2652 fdt = 0xfffff8008a975000 count = <optimized out> haverights = {cr_rights = {18446735280551710720, 18446735278193829040}} fp = <optimized out> seq = <optimized out> error = <optimized out> fde = <optimized out> #17 0xffffffff80b466e4 in atomic_fcmpset_long (src=1, dst=<optimized out>, expect=<optimized out>) at ./machine/atomic.h:221 res = 255 '\377' #18 __sx_xunlock (line=0, sx=<optimized out>, td=<optimized out>, file=<optimized out>) at /usr/src/sys/sys/sx.h:179 x = 18446744071573878330 #19 fdclose (td=<optimized out>, fp=<optimized out>, idx=-1969795008) at /usr/src/sys/kern/kern_descrip.c:2386 fdp = 0xfffff800aef35000 #20 0xffffffff80b43b69 in closefp (fdp=0xfffff8008a975000, fd=<optimized out>, fp=<optimized out>, td=0xfffff800aef35000, holdleaders=0) at /usr/src/sys/kern/kern_descrip.c:1202 error = 577299632 #21 0xffffffff810756d9 in syscallenter (td=<optimized out>) at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:135 p = <optimized out> error = <optimized out> sa = 0xfffff800aef353b0 traced = <optimized out> _v = <optimized out> _tid = <optimized out> _v = <optimized out> _v = <optimized out> _tid = <optimized out> _v = <optimized out> _v = <optimized out> _tid = <optimized out> _v = <optimized out> _v = <optimized out> _tid = <optimized out> _v = <optimized out> #22 amd64_syscall (td=0xfffff800aef35000, traced=0) at /usr/src/sys/amd64/amd64/trap.c:1076 ksi = <optimized out> error = <optimized out> #23 <signal handler called> No locals. #24 0x000000080097009a in ?? () No symbol table info available.
I think there is some chance that this problem can be related to https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=239893
Looks like exactly the same stack trace I got here: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=237568 Also a crash triggered by Nginx. The bug linked by Andrey does seem related except that I don't see `soabort` anywhere in our stack traces. Is that fix deployed in 12.0-RELEASE-p10?
rlwestlund@gmail.com : I had some crashes in the past with servers running Nginx, but I am not sure if it's the same panic as 2 of them had bad RAM modules so possibly some crashes were related to hardware. The fix described in https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=239893 is in stable/12 and 12.1. Can you upgrade your system to 12.1-RC1 to see if it fixes the issue? Also, do you have sendfile enabled in Nginx?
I've built 12.0-RELEASE kernel with the patch from 239893, but this didn't help. However, I've discovered what has triggered the panic - I have uwsgi behing nginx setup on that box, and the panic appears every ~1-3 hours if sendfile is enabled in uwsgi (sendfile is disabled in nginx). I'll try updating to 12.1-RC now
Similar to https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=222259 (also nginx+uwsgi+sendfile), although that was fixed in 2017.
(In reply to Dmitry Marakasov from comment #4) @Dmitry Could you attach the nginx / uwsgi configurations, as an attachment, sanitized if necessary, that reproduce the issue
12.1-RC1 also panics > Could you attach the nginx / uwsgi configurations, as an attachment That'd be quite heavy, as nginx here serves 10 sites with a wide tree of included configs. I'll try to minimize it. Reproducing is not quite easy too - it happens on production every several hours, not even sure which request leads to it.
(In reply to Dmitry Marakasov from comment #7) Could you provide a backtrace from the 12.1 panic? The line numbers in the initial backtrace don't line up for me.
Is this still a problem in 12.1 or 12.2?
I've enabled sendfile in nginx and uwsgi on my 13.1 prod. If the panic doesn't reproduce in a week I guess this can be closed.
Let's close this, I'm not seeing any panics on 13.1.