20190612 11:51:26 all (1/1): sctp.sh Fatal trap 9: general protection fault while in kernel mode cpuid = 0; apic id = 00 instruction pointer = 0x20:0xffffffff80ba77a8 stack pointer = 0x28:0xfffffe00ae6275f0 frame pointer = 0x28:0xfffffe00ae627670 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 69451 (server) trap number = 9 panic: general protection fault cpuid = 2 time = 1560333091 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00ae627300 vpanic() at vpanic+0x19d/frame 0xfffffe00ae627350 panic() at panic+0x43/frame 0xfffffe00ae6273b0 trap_fatal() at trap_fatal+0x39c/frame 0xfffffe00ae627410 trap() at trap+0x6c/frame 0xfffffe00ae627520 calltrap() at calltrap+0x8/frame 0xfffffe00ae627520 --- trap 0x9, rip = 0xffffffff80ba77a8, rsp = 0xfffffe00ae6275f0, rbp = 0xfffffe00ae627670 --- __mtx_lock_sleep() at __mtx_lock_sleep+0xf8/frame 0xfffffe00ae627670 __mtx_lock_flags() at __mtx_lock_flags+0xee/frame 0xfffffe00ae6276c0 sctp_accept() at sctp_accept+0xa6e/frame 0xfffffe00ae627840 soaccept() at soaccept+0x174/frame 0xfffffe00ae627890 kern_accept4() at kern_accept4+0x26e/frame 0xfffffe00ae627930 accept1() at accept1+0xe8/frame 0xfffffe00ae627990 amd64_syscall() at amd64_syscall+0x291/frame 0xfffffe00ae627ab0 fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe00ae627ab0 Details @ https://people.freebsd.org/~pho/stress/log/sctp.txt
Created attachment 205007 [details] sctp panic debug Attach backtrace to issue, external URL references tend to go stale/missing over time
Is there a way to reproduce the issue locally?
To reproduce the problem run the https://people.freebsd.org/~pho/setup.sh script like this: root@t2:~pho/stress2/tools # ./setup.sh Enter non-root test user name: stress Extracting stress2 to /tmp/work Tests to run are in /tmp/work/stress2/misc. To run all tests, type ./all.sh -on To run for example all tmpfs tests, type ./all.sh -on `grep -l tmpfs *.sh` To run fdatasync.sh for one hour, type ./all.sh -m 60 fdatasync.sh root@t2:~pho/stress2/tools # cd /tmp/work/stress2/misc root@t2:/tmp/work/stress2/misc # ./all.sh -a sctp.sh Note: including known problem tests. 20190805 07:18:26 all: sctp.sh 20190805 07:18:26 all (1/1): sctp.sh swap: run time 0+00:01:00, incarnations 16, load 100, verbose 1 20190805 07:19:40 all: sctp.sh : 20190805 08:35:17 all: sctp.sh 20190805 08:35:17 all (1/1): sctp.sh swap: run time 0+00:01:00, incarnations 17, load 100, verbose 1 Fatal trap 9: general protection fault while in kernel mode cpuid = 22; apic id = 2a instruction pointer = 0x20:0xffffffff80bae898 stack pointer = 0x28:0xfffffe00cd308710 frame pointer = 0x28:0xfffffe00cd308790 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 42445 (server) trap number = 9 panic: general protection fault cpuid = 22 time = 1564986950 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00cd308420 vpanic() at vpanic+0x19d/frame 0xfffffe00cd308470 panic() at panic+0x43/frame 0xfffffe00cd3084d0 trap_fatal() at trap_fatal+0x39c/frame 0xfffffe00cd308530 trap() at trap+0x6c/frame 0xfffffe00cd308640 calltrap() at calltrap+0x8/frame 0xfffffe00cd308640 --- trap 0x9, rip = 0xffffffff80bae898, rsp = 0xfffffe00cd308710, rbp = 0xfffffe00cd308790 --- __mtx_lock_sleep() at __mtx_lock_sleep+0xf8/frame 0xfffffe00cd308790 __mtx_lock_flags() at __mtx_lock_flags+0xee/frame 0xfffffe00cd3087e0 sctp_accept() at sctp_accept+0x517/frame 0xfffffe00cd308850 soaccept() at soaccept+0xa3/frame 0xfffffe00cd308880 kern_accept4() at kern_accept4+0x27a/frame 0xfffffe00cd308920 accept1() at accept1+0xf1/frame 0xfffffe00cd308980 amd64_syscall() at amd64_syscall+0x2d6/frame 0xfffffe00cd308ab0 fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe00cd308ab0 --- syscall (30, FreeBSD ELF64, sys_accept), rip = 0x8003a4b0a, rsp = 0x7fffffffe318, rbp = 0x7fffffffe800 --- KDB: enter: panic [ thread pid 42445 tid 100596 ] Stopped at kdb_enter+0x3b: movq $0,kdb_why db> x/s version version: FreeBSD 13.0-CURRENT #1 r350555: Sat Aug 3 15:50:17 CEST 2019\012 pho@t2.osted.lan:/usr/src/sys/amd64/compile/PHO\012 db>
Thanks for the script. I was able to reproduce it.
A commit references this bug: Author: tuexen Date: Tue Aug 6 10:29:20 UTC 2019 New revision: 350626 URL: https://svnweb.freebsd.org/changeset/base/350626 Log: Fix a locking issue in sctp_accept. PR: 238520 Reported by: pho@ MFC after: 1 week Changes: head/sys/netinet/sctp_usrreq.c
@pho: Could you retest with the fix from base r350626 included. I think it should fix the issue and was not able to reproduce the issue anymore.
r350626 LGTM.
(In reply to Peter Holm from comment #7) Thanks!
A commit references this bug: Author: tuexen Date: Sat Sep 7 11:58:32 UTC 2019 New revision: 352001 URL: https://svnweb.freebsd.org/changeset/base/352001 Log: MFC r350626: Fix a locking issue in sctp_accept. PR: 238520 Reported by: pho@ Changes: _U stable/12/ stable/12/sys/netinet/sctp_usrreq.c
@Michael Does that means stable/11 isn't affected or shouldn't get a merge?
(In reply to Kubilay Kocak from comment #10) I haven't changed anything. Not sure how the flags were changed. I only MFCed the fix to stable/12. I'm still planing to MFC it to stable/11.
(In reply to Michael Tuexen from comment #11) Gotcha! The Closed->FIXED threw me off :)
A commit references this bug: Author: tuexen Date: Thu May 7 00:50:51 UTC 2020 New revision: 360727 URL: https://svnweb.freebsd.org/changeset/base/360727 Log: MFC r350626: Fix a locking issue in SCTP Fix a locking issue in sctp_accept. PR: 238520 Reported by: pho Changes: _U stable/11/ stable/11/sys/netinet/sctp_usrreq.c