| Summary: | www/nginx: 13.0-p11 crashes after 12 > 13 upgrade: m_pullup -> ipfw_chk -> ipfw_check_frame -> ... -> vn_sendfile | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Base System | Reporter: | BB Lister <bblister> | ||||
| Component: | kern | Assignee: | freebsd-bugs (Nobody) <bugs> | ||||
| Status: | Closed Overcome By Events | ||||||
| Severity: | Affects Only Me | CC: | bblister, chris, cy, joneum, lwhsu, rob2g2-freebsd, zlei | ||||
| Priority: | --- | Keywords: | crash, needs-qa | ||||
| Version: | 13.0-RELEASE | ||||||
| Hardware: | Any | ||||||
| OS: | Any | ||||||
| Attachments: |
|
||||||
|
Description
BB Lister
2022-05-27 05:26:36 UTC
The sysctl net.inet.tcp.tso=0 did not help. Another panic (a bit different) occured, as shown bellow.
I reverted the change
sysctl net.inet.tcp.tso=1
and I have disabled sendfile now in nginx (sendfile off; in http{} of nginx).
The new panic message with net.inet.tcp.tso=0 and senfile enabled is:
May 27 11:12:33 arch kernel: Fatal trap 12: page fault while in kernel mode
May 27 11:12:33 arch kernel: cpuid = 0; apic id = 00
May 27 11:12:33 arch kernel: fault virtual address = 0x148
May 27 11:12:33 arch kernel: fault code = supervisor read data, page not present
May 27 11:12:33 arch kernel: instruction pointer = 0x20:0xffffffff81086c80
May 27 11:12:33 arch kernel: stack pointer = 0x28:0xfffffe0073be2060
May 27 11:12:33 arch kernel: frame pointer = 0x28:0xfffffe0073be2060
May 27 11:12:33 arch kernel: code segment = base rx0, limit 0xfffff, type 0x1b
May 27 11:12:33 arch kernel: = DPL 0, pres 1, long 1, def32 0, gran 1
May 27 11:12:33 arch kernel: processor eflags = interrupt enabled, resume, IOPL = 0
May 27 11:12:33 arch kernel: current process = 12 (swi1: netisr 0)
May 27 11:12:33 arch kernel: trap number = 12
May 27 11:12:33 arch kernel: panic: page fault
May 27 11:12:33 arch kernel: cpuid = 0
May 27 11:12:33 arch kernel: time = 1653634930
May 27 11:12:33 arch kernel: KDB: stack backtrace:
May 27 11:12:33 arch kernel: #0 0xffffffff80c57535 at kdb_backtrace+0x65
May 27 11:12:33 arch kernel: #1 0xffffffff80c09f11 at vpanic+0x181
May 27 11:12:33 arch kernel: #2 0xffffffff80c09d83 at panic+0x43
May 27 11:12:33 arch kernel: #3 0xffffffff8108b1a7 at trap_fatal+0x387
May 27 11:12:33 arch kernel: #4 0xffffffff8108b1ff at trap_pfault+0x4f
May 27 11:12:33 arch kernel: #5 0xffffffff8108a85d at trap+0x27d
May 27 11:12:33 arch kernel: #6 0xffffffff81061f08 at calltrap+0x8
May 27 11:12:33 arch kernel: #7 0xffffffff80c9c38f at m_pullup+0x1af
May 27 11:12:33 arch kernel: #8 0xffffffff821172bf at ipfw_chk+0x3fcf
May 27 11:12:33 arch kernel: #9 0xffffffff8211905c at ipfw_check_frame+0x13c
May 27 11:12:33 arch kernel: #10 0xffffffff80d422c7 at pfil_run_hooks+0x97
May 27 11:12:33 arch kernel: #11 0xffffffff80d23bf4 at ether_output_frame+0x94
May 27 11:12:33 arch kernel: #12 0xffffffff80d23b08 at ether_output+0x6b8
May 27 11:12:33 arch kernel: #13 0xffffffff80db3ca5 at ip_output_send+0x75
May 27 11:12:33 arch kernel: #14 0xffffffff80db3ac2 at ip_output+0x12b2
May 27 11:12:33 arch kernel: #15 0xffffffff80dc9ab4 at tcp_output+0x1b04
May 27 11:12:33 arch kernel: #16 0xffffffff80dc127b at tcp_do_segment+0x2c9b
May 27 11:12:33 arch kernel: #17 0xffffffff80dbd81e at tcp_input+0xabe
@Reporter Can you confirm: - For nginx, what is the package version. Please include `pkg info nginx` output as an attachment - For the upgrade, were packages updated after base upgrade, or left with the same version? - Is the panic reproducible without ipfw enabled? If you are able to enable kernel crash processing, that would be great: https://docs.freebsd.org/en/books/developers-handbook/kerneldebug/ After I disabled sendfile on nginx, the machine seems stable for 24hours without any panic. After the binary upgrade of the base system, I performed a binary update from the packages using: pkg bootstrap -f -y pkg-static upgrade -f -y Concerning the version of nginx: #pkg info | grep nginx nginx-1.20.2_9,2 Robust and small WWW server I will upload nginx info as attachment I cannot disable ipfw, because this is a production server in the cloud and I wont leave it without any firewall. If there is a tool to automatically change the ipfw rules (over 1500) to pf or other firewall I could do it. I enabled also dumpdev=AUTO in rc.conf Is this enough for the RELEASE kernel that I am using? Should I revert the sendfile to yes of nginx to cause a panic? Created attachment 234282 [details]
the pkg info of the nginx that is causing kernel panics
pkg info nginx
After an update to another release, it is necessary to rebuild all ports: https://docs.freebsd.org/en/books/handbook/cutting-edge/#updating-upgrading-freebsdupdate Point 24.2.3.2 Nginx is now on Version 1.22.x into the Ports
I only use binary packages.
I performed the binary upgrade of the package using:
pkg bootstrap -f -y
pkg-static upgrade -f -y
#more /etc/pkg/FreeBSD.conf
FreeBSD: {
url: "pkg+http://pkg.FreeBSD.org/${ABI}/quarterly", mirror_type: "srv",
signature_type: "fingerprints",
fingerprints: "/usr/share/keys/pkg",
enabled: yes
}
Perhaps the nginx binary package is not updated in http://pkg.FreeBSD.org/${ABI}/quarterly
I will change this line to latest and upgrade again the packages.
Nevertheless, I find it strange for an application in userspace to cause kernel panic, even if running as root. I believe this application calls syscals from kernel and in normal circumstances it should not caused any kernel panic.
I enabled again for one time the sendfile=yes and within 1 hour the server panicked with: Fatal trap 12: page fault while in kernel mode cpuid = 2; apic id = 02 fault virtual address = 0x580 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff81086c80 stack pointer = 0x28:0xfffffe00bc3dff50 frame pointer = 0x28:0xfffffe00bc3dff50 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 67343 (nginx) trap number = 12 panic: page fault cpuid = 2 time = 1653866395 KDB: stack backtrace: #0 0xffffffff80c57535 at kdb_backtrace+0x65 #1 0xffffffff80c09f11 at vpanic+0x181 #2 0xffffffff80c09d83 at panic+0x43 #3 0xffffffff8108b1a7 at trap_fatal+0x387 #4 0xffffffff8108b1ff at trap_pfault+0x4f #5 0xffffffff8108a85d at trap+0x27d #6 0xffffffff81061f08 at calltrap+0x8 #7 0xffffffff80c9c38f at m_pullup+0x1af #8 0xffffffff8211f2bf at ipfw_chk+0x3fcf #9 0xffffffff8212105c at ipfw_check_frame+0x13c #10 0xffffffff80d422c7 at pfil_run_hooks+0x97 #11 0xffffffff80d23bf4 at ether_output_frame+0x94 #12 0xffffffff80d23b08 at ether_output+0x6b8 #13 0xffffffff80db3ca5 at ip_output_send+0x75 #14 0xffffffff80db3ac2 at ip_output+0x12b2 #15 0xffffffff80dc9ab4 at tcp_output+0x1b04 #16 0xffffffff80ddb189 at tcp_usr_send+0x229 #17 0xffffffff80c07e2a at vn_sendfile+0x197a This time I had enabled the dumpdev and I got a vmcore.0 file in /var/crash. This file seems to have information on my system, thus I cannot upload it, but I can send it by email to any developer. I installed devel/gdb to use kgdb but I got an error that crashed kgdb and wanted to create a core file. # kgdb /boot/kernel/kernel /var/crash/vmcore.0 .... Reading symbols from /boot/kernel/kernel... (No debugging symbols found in /boot/kernel/kernel) /wrkdirs/usr/ports/devel/gdb/work-py38/gdb-12.1/gdb/thread.c:1328: internal-error: switch_to_thread: Assertion `thr != NULL' failed. A problem internal to GDB has been detected, further debugging may prove unreliable. Quit this debugging session? (y or n) /wrkdirs/usr/ports/devel/gdb/work-py38/gdb-12.1/gdb/thread.c:1328: internal-error: switch_to_thread: Assertion `thr != NULL' failed. A problem internal to GDB has been detected, further debugging may prove unreliable. Create a core file of GDB? (y or n) n Command aborted. (kgdb) bt No thread selected. (kgdb) info threads No threads. (kgdb) I figured out that I should be using another version of kernel. I downloaded the packaged kernel-dbg.txz for 13.0 and extracted to /tmp kgdb /tmp/kernel/kernel.debug /var/crash/vmcore.0 but now I got Reading symbols from /tmp/kernel/kernel.debug... Failed to open vmcore: not a minidump for this platform (kgdb) Which means as I assume that I should boot with kernel.debug if I would like to perform debugging, which is difficult for this period. Concluding: As soon as I enable sendfile on ; on nginx within 1 hour I have a kernel panic every time on 13.0-RELEASE-p11 FreeBSD I think I'm the wrong person. This doesn't really have anything to do with NGINX the kernel problems. I'm afraid I won't be able to help you there. (In reply to BB Lister from comment #7) Hi, I moved this ticket to base/kern and hope more people can join to debug. In the mean while, can you check the status with 13.1-RELEASE? I binary upgraded the machine to: 13.1-RELEASE FreeBSD 13.1-RELEASE releng/13.1-n250148-fc952ac2212 GENERIC amd64 and enabled sendfile on; on the nginx The system remains stable for 50 hours. No reboots. In FreeBSD 13.0 I had a reboot within hours, but in 13.1 the server seems stable. I will keep the sendfile option on and report on something unusual. |