| Summary: | Fatal trap 12: page fault while in kernel mode, nginx + sendfile on | ||
|---|---|---|---|
| Product: | Base System | Reporter: | Igor Valkov <viaprog> |
| Component: | kern | Assignee: | Mark Johnston <markj> |
| Status: | Closed FIXED | ||
| Severity: | Affects Only Me | CC: | chris, emaste, markj, martin, rick, zlei |
| Priority: | --- | Keywords: | crash |
| Version: | 13.0-STABLE | ||
| Hardware: | amd64 | ||
| OS: | Any | ||
| See Also: | https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=259645 | ||
|
Description
Igor Valkov
2021-03-20 01:19:44 UTC
From the backtrace pf is also involved Are you able to test patches? Based on what you wrote it should be fixed by https://reviews.freebsd.org/D29378 (In reply to Mark Johnston from comment #2) I have applied this patch D29378.id86147.diff nginx + sendfile=on now is working fine without fatal trap 12 some hours. Thanks! (In reply to Igor A. Valkov from comment #3) Thanks for testing. A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=b93a796b06ec013a75a08ac43d8acf6aa94aa970 commit b93a796b06ec013a75a08ac43d8acf6aa94aa970 Author: Mark Johnston <markj@FreeBSD.org> AuthorDate: 2021-03-23 13:38:59 +0000 Commit: Mark Johnston <markj@FreeBSD.org> CommitDate: 2021-03-23 14:04:31 +0000 pf: Handle unmapped mbufs when computing checksums PR: 254419 Reviewed by: gallatin, kp Tested by: Igor A. Valkov <viaprog@gmail.com> MFC after: 3 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D29378 sys/netpfil/pf/pf.c | 9 +++++++++ 1 file changed, 9 insertions(+) A commit in branch releng/13.0 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=fa6d101e5f67246a6804577a9532676eae64c049 commit fa6d101e5f67246a6804577a9532676eae64c049 Author: Mark Johnston <markj@FreeBSD.org> AuthorDate: 2021-03-23 13:38:59 +0000 Commit: Mark Johnston <markj@FreeBSD.org> CommitDate: 2021-03-26 16:33:12 +0000 pf: Handle unmapped mbufs when computing checksums Approved by: re (cperciva) PR: 254419 Reviewed by: gallatin, kp Tested by: Igor A. Valkov <viaprog@gmail.com> Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D29378 (cherry picked from commit b93a796b06ec013a75a08ac43d8acf6aa94aa970) (cherry picked from commit 5fcab6fbcf8b99d1420e681731a07670c38defe3) sys/netpfil/pf/pf.c | 9 +++++++++ 1 file changed, 9 insertions(+) A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=41a8dc361969629706827fb867cedaec3c270e68 commit 41a8dc361969629706827fb867cedaec3c270e68 Author: Mark Johnston <markj@FreeBSD.org> AuthorDate: 2021-03-23 13:38:59 +0000 Commit: Mark Johnston <markj@FreeBSD.org> CommitDate: 2021-03-28 00:23:57 +0000 pf: Handle unmapped mbufs when computing checksums PR: 254419 Reviewed by: gallatin, kp Tested by: Igor A. Valkov <viaprog@gmail.com> Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D29378 (cherry picked from commit b93a796b06ec013a75a08ac43d8acf6aa94aa970) sys/netpfil/pf/pf.c | 9 +++++++++ 1 file changed, 9 insertions(+) I've possibly encountered the same/similar bug. (With help of @_martin, we found the cause of it) See coredump below. (Discussion : https://forums.freebsd.org/threads/random-crash.82385/page-3#post-540831) When using nginx,sendfile (in a jail) via optimization=aggressive in the pf firewall. The mbuf==NULL check fails because mbuf isn't NULL but invalid! -- Fatal trap 12: page fault while in kernel mode cpuid = 3; apic id = 03 fault virtual address = 0x520 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff810659dd stack pointer = 0x28:0xfffffe0051351f80 frame pointer = 0x28:0xfffffe0051351f90 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 12 (swi4: clock (0)) trap number = 12 panic: page fault cpuid = 3 time = 1636304186 KDB: stack backtrace: #0 0xffffffff80c57345 at kdb_backtrace+0x65 #1 0xffffffff80c09d21 at vpanic+0x181 #2 0xffffffff80c09b93 at panic+0x43 #3 0xffffffff8108b187 at trap_fatal+0x387 #4 0xffffffff8108b1df at trap_pfault+0x4f #5 0xffffffff8108a83d at trap+0x27d #6 0xffffffff810617a8 at calltrap+0x8 #7 0xffffffff81065907 at in_cksum_skip+0x77 #8 0xffffffff82956329 at in4_cksum+0x59 #9 0xffffffff829373d0 at pf_return+0x270 #10 0xffffffff82931351 at pf_test_rule+0x1d71 #11 0xffffffff8292cd11 at pf_test+0x17c1 #12 0xffffffff82945bff at pf_check_out+0x1f #13 0xffffffff80d41f87 at pfil_run_hooks+0x97 #14 0xffffffff80db2d71 at ip_output+0xb61 #15 0xffffffff80dc94b4 at tcp_output+0x1b04 #16 0xffffffff80dd7f2f at tcp_timer_rexmt+0x59f #17 0xffffffff80c2598d at softclock_call_cc+0x13d Uptime: 4m56s I'm able to trigger a panic that seems to be related to this PR. Please let me
know if you think new PR should have been opened instead.
Test machine: amd64 13.0-RELEASE-p4 with GENERIC kernel, PF and nginx in jail.
Amount of CPUs and hypervisor don't play a role (tested on VirtualBox/VMware).
Few things needed to be set to trigger the bug. nginx with sendfile had to be on,
PF config needed to be set certain way. Panic occurs almost immediately.
/etc/rc.conf:
cloned_interfaces="lo1"
ipv4_addrs_lo1="10.0.2.100/24"
pf_enable="YES"
pflog_enable="YES"
pf_rules="/etc/pf.conf"
iocage_enable="YES"
/etc/pf.conf:
ext_if="vtnet0"
jail_if="lo1"
wan_ip4="172.20.1.200"
jail_net = "10.0.2.0/24"
ip_webproxy = "10.0.2.103"
webserver_sto = "(max-src-conn 50, overload <overloadlist> flush global)"
tcp_state ="flags S/SAFR modulate state"
table <overloadlist> persist
set block-policy return
set skip on lo0
set optimization aggressive
scrub in all
nat on $ext_if inet from $jail_net to any -> $wan_ip4
rdr on $ext_if inet proto tcp from any to $wan_ip4 port 80 -> $ip_webproxy
block in all
block out all
pass in quick proto tcp from any to any port 22
pass in inet proto tcp from any to $ip_webproxy port 80 $tcp_state $webserver_sto
pass out quick
In jail nginx was installed, active config:
(jail)# grep -vE '^$|^[ ]*#' /usr/local/etc/nginx/nginx.conf
worker_processes 1;
events {
worker_connections 1024;
}
http {
include mime.types;
default_type application/octet-stream;
sendfile on;
keepalive_timeout 65;
server {
listen 80;
server_name localhost;
location / {
root /usr/local/www/nginx;
index index.html index.htm;
}
error_page 500 502 503 504 /50x.html;
location = /50x.html {
root /usr/local/www/nginx-dist;
}
}
access_log off;
error_log off;
}
#
I'm triggering the bug outside of the host by running:
ab -n 999999999 -c 10 http://172.20.1.200/sample250K.bin
System is pagefaulting, usually on 0 or small address such as 0x4f0.
Of all 12 tests I did system crashed in the same function/instruction.
Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address = 0x4f0
fault code = supervisor read data, page not present
instruction pointer = 0x20:0xffffffff81065b8d
stack pointer = 0x28:0xfffffe0051351f80
frame pointer = 0x28:0xfffffe0051351f90
code segment = base rx0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 12 (swi4: clock (0))
trap number = 12
panic: page fault
cpuid = 0
time = 1636488387
KDB: stack backtrace:
#0 0xffffffff80c574c5 at kdb_backtrace+0x65
#1 0xffffffff80c09ea1 at vpanic+0x181
#2 0xffffffff80c09d13 at panic+0x43
#3 0xffffffff8108b1b7 at trap_fatal+0x387
#4 0xffffffff8108b20f at trap_pfault+0x4f
#5 0xffffffff8108a86d at trap+0x27d
#6 0xffffffff81061958 at calltrap+0x8
#7 0xffffffff81065ab7 at in_cksum_skip+0x77
#8 0xffffffff82956329 at in4_cksum+0x59
#9 0xffffffff829373d0 at pf_return+0x270
#10 0xffffffff82931351 at pf_test_rule+0x1d71
#11 0xffffffff8292cd11 at pf_test+0x17c1
#12 0xffffffff82945bff at pf_check_out+0x1f
#13 0xffffffff80d42137 at pfil_run_hooks+0x97
#14 0xffffffff80db2f21 at ip_output+0xb61
#15 0xffffffff80dc9664 at tcp_output+0x1b04
#16 0xffffffff80dd80df at tcp_timer_rexmt+0x59f
#17 0xffffffff80c25b0d at softclock_call_cc+0x13d
kgdb) bt
#0 __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
#1 doadump (textdump=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:399
#2 0xffffffff80c09a96 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:486
#3 0xffffffff80c09f10 in vpanic (fmt=<optimized out>, ap=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:919
#4 0xffffffff80c09d13 in panic (fmt=<unavailable>) at /usr/src/sys/kern/kern_shutdown.c:843
#5 0xffffffff8108b1b7 in trap_fatal (frame=0xfffffe0051351ec0, eva=1264) at /usr/src/sys/amd64/amd64/trap.c:915
#6 0xffffffff8108b20f in trap_pfault (frame=frame@entry=0xfffffe0051351ec0, usermode=false, signo=<optimized out>, signo@entry=0x0, ucode=<optimized out>, ucode@entry=0x0) at /usr/src/sys/amd64/amd64/trap.c:732
#7 0xffffffff8108a86d in trap (frame=0xfffffe0051351ec0) at /usr/src/sys/amd64/amd64/trap.c:398
#8 <signal handler called>
#9 0xffffffff81065b8d in in_cksumdata (buf=<optimized out>, len=len@entry=1448) at /usr/src/sys/amd64/amd64/in_cksum.c:111
#10 0xffffffff81065ab7 in in_cksum_skip (m=0xfffff80036cca700, len=1448, skip=<optimized out>) at /usr/src/sys/amd64/amd64/in_cksum.c:224
#11 0xffffffff82956329 in in4_cksum (m=0x4f0, nxt=<optimized out>, nxt@entry=6 '\006', off=3, len=<optimized out>) at /usr/src/sys/netpfil/pf/in4_cksum.c:117
#12 0xffffffff829373d0 in pf_check_proto_cksum (m=0xfffff80034b5ee00, off=<optimized out>, len=3, p=6 '\006', af=2 '\002') at /usr/src/sys/netpfil/pf/pf.c:5844
#13 pf_return (r=r@entry=0xfffff800360ec800, nr=<optimized out>, nr@entry=0xfffff800033a1800, pd=pd@entry=0xfffffe0051352590, sk=<optimized out>, off=<optimized out>, off@entry=20, m=<optimized out>, m@entry=0xfffff80034b5ee00,
th=0xfffffe0051352660, kif=0xfffff8003649f500, bproto_sum=24767, bip_sum=0, hdrlen=20, reason=0xfffffe005135241e) at /usr/src/sys/netpfil/pf/pf.c:2654
#14 0xffffffff82931351 in pf_test_rule (rm=rm@entry=0xfffffe0051352630, sm=sm@entry=0xfffffe0051352648, direction=direction@entry=2, kif=kif@entry=0xfffff8003649f500, m=m@entry=0xfffff80034b5ee00, off=20, pd=0xfffffe0051352590,
am=0xfffffe0051352620, rsm=0xfffffe0051352610, inp=0xfffff80034e803d0) at /usr/src/sys/netpfil/pf/pf.c:3641
#15 0xffffffff8292cd11 in pf_test (dir=<optimized out>, dir@entry=2, pflags=<optimized out>, ifp=<optimized out>, m0=<optimized out>, m0@entry=0xfffffe0051352808, inp=0xfffff80034e803d0) at /usr/src/sys/netpfil/pf/pf.c:6005
#16 0xffffffff82945bff in pf_check_out (m=0xfffffe0051352808, ifp=0x3, flags=1448, ruleset=<optimized out>, inp=0xff000000) at /usr/src/sys/netpfil/pf/pf_ioctl.c:4516
#17 0xffffffff80d42137 in pfil_run_hooks (head=<optimized out>, p=..., ifp=0xfffff80003656800, flags=flags@entry=131072, inp=inp@entry=0xfffff80034e803d0) at /usr/src/sys/net/pfil.c:187
#18 0xffffffff80db2f21 in ip_output_pfil (mp=0xfffffe0051352808, ifp=0xfffff80003656800, flags=0, inp=0xfffff80034e803d0, dst=0xfffff80034e80578, fibnum=<optimized out>, error=<optimized out>)
at /usr/src/sys/netinet/ip_output.c:130
#19 ip_output (m=m@entry=0xfffff80034b5ee00, opt=<optimized out>, ro=<optimized out>, flags=0, imo=imo@entry=0x0, inp=<optimized out>) at /usr/src/sys/netinet/ip_output.c:705
#20 0xffffffff80dc9664 in tcp_output (tp=0xfffffe00980a68f0) at /usr/src/sys/netinet/tcp_output.c:1492
#21 0xffffffff80dd80df in tcp_timer_rexmt (xtp=0xfffffe00980a68f0) at /usr/src/sys/netinet/tcp_timer.c:879
#22 0xffffffff80c25b0d in softclock_call_cc (c=0xfffffe00980a6b78, cc=cc@entry=0xffffffff81ca8200 <cc_cpu>, direct=direct@entry=0) at /usr/src/sys/kern/kern_timeout.c:696
#23 0xffffffff80c25f99 in softclock (arg=0xffffffff81ca8200 <cc_cpu>) at /usr/src/sys/kern/kern_timeout.c:816
#24 0xffffffff80bcafdd in intr_event_execute_handlers (p=<optimized out>, ie=0xfffff80003412700) at /usr/src/sys/kern/kern_intr.c:1168
#25 ithread_execute_handlers (p=<optimized out>, ie=0xfffff80003412700) at /usr/src/sys/kern/kern_intr.c:1181
#26 ithread_loop (arg=arg@entry=0xfffff800033efd20) at /usr/src/sys/kern/kern_intr.c:1269
#27 0xffffffff80bc7dde in fork_exit (callout=0xffffffff80bcad90 <ithread_loop>, arg=0xfffff800033efd20, frame=0xfffffe0051352c00) at /usr/src/sys/kern/kern_fork.c:1069
(kgdb) f 9
#9 0xffffffff81065b8d in in_cksumdata (buf=<optimized out>, len=len@entry=1448) at /usr/src/sys/amd64/amd64/in_cksum.c:111
111 /usr/src/sys/amd64/amd64/in_cksum.c: No such file or directory.
(kgdb) x/4i $pc
=> 0xffffffff81065b8d <in_cksumdata+109>: and (%rdi),%r8d
0xffffffff81065b90 <in_cksumdata+112>: add %ecx,%esi
0xffffffff81065b92 <in_cksumdata+114>: add $0xfffffffc,%esi
0xffffffff81065b95 <in_cksumdata+117>: test %esi,%esi
(kgdb) i r $rdi
rdi 0x4f0 1264
(kgdb)
(In reply to martin from comment #9) I'm working on a patch for this now. See PR 259645. A workaround in the meantime that doesn't require sendfile to be disabled is to set the kern.ipc.mb_use_ext_pgs sysctl to 0. (In reply to Mark Johnston from comment #10) Thank you. As probably expected I can't trigger the panic with the disabled sysctl. It's probably irrelevant to the issue but 259645 states this issue occurred after p5 update. That's not the case, I was able to trigger this in older versions too. (In reply to martin from comment #11) Indeed, I am sure that this problem exists in the original 13.0 release. |