Bug 241162 - Panic in closefp() triggered by nginx (uwsgi with sendfile(2) enabled)
Summary: Panic in closefp() triggered by nginx (uwsgi with sendfile(2) enabled)
Status: Closed Overcome By Events
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 12.0-RELEASE
Hardware: Any Any
: --- Affects Some People
Assignee: freebsd-net (Nobody)
URL:
Keywords: crash, needs-patch, needs-qa
Depends on:
Blocks:
 
Reported: 2019-10-09 17:39 UTC by Dmitry Marakasov
Modified: 2022-11-01 13:30 UTC (History)
11 users (show)

See Also:
koobs: mfc-stable12?
koobs: mfc-stable11?


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Dmitry Marakasov freebsd_committer freebsd_triage 2019-10-09 17:39:50 UTC
FreeBSD ihor-4.amdmi3.ru 12.0-RELEASE-p10 FreeBSD 12.0-RELEASE-p10 GENERIC  amd64
running under KVM just panicked after 110d uptime.

Reading symbols from /boot/kernel/kernel...
Reading symbols from /usr/lib/debug//boot/kernel/kernel.debug...

Unread portion of the kernel message buffer:


Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x8
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff80c2f0f0
stack pointer           = 0x28:0xfffffe0022b15790
frame pointer           = 0x28:0xfffffe0022b157c0
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 64606 (nginx)
trap number             = 12
panic: page fault
cpuid = 0
time = 1570625343
KDB: stack backtrace:
#0 0xffffffff80be7977 at kdb_backtrace+0x67
#1 0xffffffff80b9b563 at vpanic+0x1a3
#2 0xffffffff80b9b3b3 at panic+0x43
#3 0xffffffff81074bff at trap_fatal+0x35f
#4 0xffffffff81074c59 at trap_pfault+0x49
#5 0xffffffff8107427e at trap+0x29e
#6 0xffffffff8104f705 at calltrap+0x8
#7 0xffffffff80c2e1f8 at sbdestroy+0x28
#8 0xffffffff80c30a55 at sofree+0x285
#9 0xffffffff80c31600 at soclose+0x330
#10 0xffffffff80b4363a at _fdrop+0x1a
#11 0xffffffff80b466e4 at closef+0x244
#12 0xffffffff80b43b69 at closefp+0x99
#13 0xffffffff810756d9 at amd64_syscall+0x369
#14 0xffffffff8104ffed at fast_syscall_common+0x101
Uptime: 110d17h32m42s
Dumping 394 out of 3033 MB:..5%..13%..21%..33%..41%..53%..61%..74%..82%..94%

0xffffffff80b9b32b in __curthread () at ./machine/pcpu.h:234
234     ./machine/pcpu.h: No such file or directory.
(kgdb) bt
#0  0xffffffff80b9b32b in __curthread () at ./machine/pcpu.h:234
#1  vpanic (fmt=0x0, ap=0x0) at /usr/src/sys/kern/kern_shutdown.c:808
#2  0xffffffff80b9b14b in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:449
#3  0xffffffff80b9b5c3 in kthread_shutdown (arg=0x100, howto=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:940
#4  0xffffffff80b9b3b3 in vpanic (fmt=0xffffffff8127ebc7 "supervisor", ap=0xfffffe0022b154e0) at /usr/src/sys/kern/kern_shutdown.c:823
#5  0xffffffff81074bff in trap_fatal (frame=0xfffffe0022b156d0, eva=18446741875268343056) at /usr/src/sys/amd64/amd64/trap.c:929
#6  0xffffffff81074c59 in trap_pfault (frame=0xfffffe0022b156d0, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:765
#7  0xffffffff8107427e in trap (frame=0xfffffe0022b156d0) at /usr/src/sys/amd64/amd64/trap.c:441
#8  <signal handler called>
#9  0xffffffff80c2f0f0 in sbcut_internal (sb=0xfffff800b351a828, len=<optimized out>) at /usr/src/sys/kern/uipc_sockbuf.c:1163
#10 0xffffffff80c2e1f8 in sbappend_locked (sb=0xfffff800b351a828, m=0x156f, flags=<optimized out>) at /usr/src/sys/kern/uipc_sockbuf.c:668
#11 0xffffffff80c30a55 in sofree (so=0xfffff800b351a828) at /usr/src/sys/kern/uipc_socket.c:1005
#12 0xffffffff80c31600 in atomic_fcmpset_long (dst=<optimized out>, src=0, expect=<optimized out>) at ./machine/atomic.h:221
#13 soclose (so=<optimized out>) at /usr/src/sys/kern/uipc_socket.c:1141
#14 0xffffffff80b4363a in ia32_pause () at /usr/src/sys/sys/seq.h:130
#15 seq_read (seqp=0x0) at /usr/src/sys/sys/seq.h:131
#16 fget_unlocked (fdp=0xfffff8002268e4b0, fd=5487, needrightsp=0xfffff800aef35000, fpp=0xfffff8002268e4b0, seqp=0xfffff8008a975000) at /usr/src/sys/kern/kern_descrip.c:2652
#17 0xffffffff80b466e4 in atomic_fcmpset_long (src=1, dst=<optimized out>, expect=<optimized out>) at ./machine/atomic.h:221
#18 __sx_xunlock (line=0, sx=<optimized out>, td=<optimized out>, file=<optimized out>) at /usr/src/sys/sys/sx.h:179
#19 fdclose (td=<optimized out>, fp=<optimized out>, idx=-1969795008) at /usr/src/sys/kern/kern_descrip.c:2386
#20 0xffffffff80b43b69 in closefp (fdp=0xfffff8008a975000, fd=<optimized out>, fp=<optimized out>, td=0xfffff800aef35000, holdleaders=0) at /usr/src/sys/kern/kern_descrip.c:1202
#21 0xffffffff810756d9 in syscallenter (td=<optimized out>) at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:135
#22 amd64_syscall (td=0xfffff800aef35000, traced=0) at /usr/src/sys/amd64/amd64/trap.c:1076
#23 <signal handler called>
#24 0x000000080097009a in ?? ()
Backtrace stopped: Cannot access memory at address 0x7fffffffe6e8
(kgdb) bt full
#0  0xffffffff80b9b32b in __curthread () at ./machine/pcpu.h:234
        td = <optimized out>
#1  vpanic (fmt=0x0, ap=0x0) at /usr/src/sys/kern/kern_shutdown.c:808
        buf = "page fault", '\000' <repeats 245 times>
        td = <optimized out>
        bootopt = <optimized out>
        newpanic = <optimized out>
        other_cpus = <optimized out>
#2  0xffffffff80b9b14b in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:449
        _ep = 0x0
        _t = <optimized out>
        _el = <optimized out>
        once = <error reading variable once (Cannot access memory at address 0x0)>
#3  0xffffffff80b9b5c3 in kthread_shutdown (arg=0x100, howto=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:940
        td = 0x100
        error = <optimized out>
#4  0xffffffff80b9b3b3 in vpanic (fmt=0xffffffff8127ebc7 "supervisor", ap=0xfffffe0022b154e0) at /usr/src/sys/kern/kern_shutdown.c:823
        buf = "page fault", '\000' <repeats 245 times>
        td = 0xfffff800aef35000
        bootopt = <optimized out>
        newpanic = <optimized out>
        other_cpus = <optimized out>
#5  0xffffffff81074bff in trap_fatal (frame=0xfffffe0022b156d0, eva=18446741875268343056) at /usr/src/sys/amd64/amd64/trap.c:929
        softseg = {ssd_base = 0, ssd_limit = 1048575, ssd_type = 27, ssd_dpl = 0, ssd_p = 1, ssd_long = 1, ssd_def32 = 0, ssd_gran = 1}
        msg = <optimized out>
        ss = 16
        code = <optimized out>
        type = <optimized out>
        handled = <optimized out>
#6  0xffffffff81074c59 in trap_pfault (frame=0xfffffe0022b156d0, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:765
        td = 0xfffff800aef35000
        p = <optimized out>
        eva = 8
        va = <optimized out>
        map = <optimized out>
        ftype = <optimized out>
        rv = <optimized out>
#7  0xffffffff8107427e in trap (frame=0xfffffe0022b156d0) at /usr/src/sys/amd64/amd64/trap.c:441
        td = 0xfffff800aef35000
        dr6 = <error reading variable dr6 (Cannot access memory at address 0x0)>
        addr = -8795947097504
        ucode = <error reading variable ucode (Cannot access memory at address 0x3)>
        signo = <error reading variable signo (Cannot access memory at address 0xa)>
        p = <optimized out>
        type = 12
        ksi = <optimized out>
#8  <signal handler called>
No locals.
#9  0xffffffff80c2f0f0 in sbcut_internal (sb=0xfffff800b351a828, len=<optimized out>) at /usr/src/sys/kern/uipc_sockbuf.c:1163
        m = 0x0
        next = <optimized out>
        mfree = <optimized out>
#10 0xffffffff80c2e1f8 in sbappend_locked (sb=0xfffff800b351a828, m=0x156f, flags=<optimized out>) at /usr/src/sys/kern/uipc_sockbuf.c:668
        n = 0x1ff
#11 0xffffffff80c30a55 in sofree (so=0xfffff800b351a828) at /usr/src/sys/kern/uipc_socket.c:1005
        _tid = <optimized out>
        _v = <error reading variable _v (Cannot access memory at address 0x0)>
        sol = <optimized out>
        pr = 0xffffffff81b23290 <localsw>
#12 0xffffffff80c31600 in atomic_fcmpset_long (dst=<optimized out>, src=0, expect=<optimized out>) at ./machine/atomic.h:221
        res = <optimized out>
#13 soclose (so=<optimized out>) at /usr/src/sys/kern/uipc_socket.c:1141
        _v = <optimized out>
        sp = <optimized out>
        error = <error reading variable error (Cannot access memory at address 0x0)>
        saved_vnet = <optimized out>
        lqueue = <optimized out>
        listening = <optimized out>
#14 0xffffffff80b4363a in ia32_pause () at /usr/src/sys/sys/seq.h:130
No locals.
#15 seq_read (seqp=0x0) at /usr/src/sys/sys/seq.h:131
        ret = 577299632
#16 fget_unlocked (fdp=0xfffff8002268e4b0, fd=5487, needrightsp=0xfffff800aef35000, fpp=0xfffff8002268e4b0, seqp=0xfffff8008a975000) at /usr/src/sys/kern/kern_descrip.c:2652
        fdt = 0xfffff8008a975000
        count = <optimized out>
        haverights = {cr_rights = {18446735280551710720, 18446735278193829040}}
        fp = <optimized out>
        seq = <optimized out>
        error = <optimized out>
        fde = <optimized out>
#17 0xffffffff80b466e4 in atomic_fcmpset_long (src=1, dst=<optimized out>, expect=<optimized out>) at ./machine/atomic.h:221
        res = 255 '\377'
#18 __sx_xunlock (line=0, sx=<optimized out>, td=<optimized out>, file=<optimized out>) at /usr/src/sys/sys/sx.h:179
        x = 18446744071573878330
#19 fdclose (td=<optimized out>, fp=<optimized out>, idx=-1969795008) at /usr/src/sys/kern/kern_descrip.c:2386
        fdp = 0xfffff800aef35000
#20 0xffffffff80b43b69 in closefp (fdp=0xfffff8008a975000, fd=<optimized out>, fp=<optimized out>, td=0xfffff800aef35000, holdleaders=0) at /usr/src/sys/kern/kern_descrip.c:1202
        error = 577299632
#21 0xffffffff810756d9 in syscallenter (td=<optimized out>) at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:135
        p = <optimized out>
        error = <optimized out>
        sa = 0xfffff800aef353b0
        traced = <optimized out>
        _v = <optimized out>
        _tid = <optimized out>
        _v = <optimized out>
        _v = <optimized out>
        _tid = <optimized out>
        _v = <optimized out>
        _v = <optimized out>
        _tid = <optimized out>
        _v = <optimized out>
        _v = <optimized out>
        _tid = <optimized out>
        _v = <optimized out>
#22 amd64_syscall (td=0xfffff800aef35000, traced=0) at /usr/src/sys/amd64/amd64/trap.c:1076
        ksi = <optimized out>
        error = <optimized out>
#23 <signal handler called>
No locals.
#24 0x000000080097009a in ?? ()
No symbol table info available.
Comment 1 Andrey V. Elsukov freebsd_committer freebsd_triage 2019-10-11 09:09:30 UTC
I think there is some chance that this problem can be related to https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=239893
Comment 2 rlwestlund 2019-10-11 12:12:56 UTC
Looks like exactly the same stack trace I got here: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=237568

Also a crash triggered by Nginx.

The bug linked by Andrey does seem related except that I don't see `soabort` anywhere in our stack traces. Is that fix deployed in 12.0-RELEASE-p10?
Comment 3 Christos Chatzaras 2019-10-12 12:06:13 UTC
rlwestlund@gmail.com :

I had some crashes in the past with servers running Nginx, but I am not sure if it's the same panic as 2 of them had bad RAM modules so possibly some crashes were related to hardware.

The fix described in https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=239893 is in stable/12 and 12.1.

Can you upgrade your system to 12.1-RC1 to see if it fixes the issue?

Also, do you have sendfile enabled in Nginx?
Comment 4 Dmitry Marakasov freebsd_committer freebsd_triage 2019-10-15 14:22:44 UTC
I've built 12.0-RELEASE kernel with the patch from 239893, but this didn't help. However, I've discovered what has triggered the panic - I have uwsgi behing nginx setup on that box, and the panic appears every ~1-3 hours if sendfile is enabled in uwsgi (sendfile is disabled in nginx). I'll try updating to 12.1-RC now
Comment 5 Mark.Martinec 2019-10-15 15:21:59 UTC
Similar to https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=222259
(also nginx+uwsgi+sendfile), although that was fixed in 2017.
Comment 6 Kubilay Kocak freebsd_committer freebsd_triage 2019-10-16 01:10:36 UTC
(In reply to Dmitry Marakasov from comment #4)

@Dmitry Could you attach the nginx / uwsgi configurations, as an attachment, sanitized if necessary, that reproduce the issue
Comment 7 Dmitry Marakasov freebsd_committer freebsd_triage 2019-10-19 11:35:32 UTC
12.1-RC1 also panics

> Could you attach the nginx / uwsgi configurations, as an attachment

That'd be quite heavy, as nginx here serves 10 sites with a wide tree of included configs. I'll try to minimize it. Reproducing is not quite easy too - it happens on production every several hours, not even sure which request leads to it.
Comment 8 Mark Johnston freebsd_committer freebsd_triage 2020-01-06 14:59:35 UTC
(In reply to Dmitry Marakasov from comment #7)
Could you provide a backtrace from the 12.1 panic?  The line numbers in the initial backtrace don't line up for me.
Comment 9 Mark Johnston freebsd_committer freebsd_triage 2020-11-24 21:20:07 UTC
Is this still a problem in 12.1 or 12.2?
Comment 10 Dmitry Marakasov freebsd_committer freebsd_triage 2022-10-12 20:28:52 UTC
I've enabled sendfile in nginx and uwsgi on my 13.1 prod. If the panic doesn't reproduce in a week I guess this can be closed.
Comment 11 Dmitry Marakasov freebsd_committer freebsd_triage 2022-11-01 13:30:41 UTC
Let's close this, I'm not seeing any panics on 13.1.