Crashed while doing UDP test via iperf3: root@n1_iwl_vm:~ # iperf3 -B 192.168.0.190 -c 192.168.0.169 -V -t 10 -i 1 --udp --length 16 --bitrate 5m iperf 3.15 FreeBSD n1_iwl_vm 15.0-CURRENT FreeBSD 15.0-CURRENT #21 main-7df526eb10: Mon Dec 11 14:39:56 EST 2023 root@n1_iwl_vm:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64 Control connection MSS 1460 Time: Mon, 11 Dec 2023 23:04:02 UTC Connecting to host 192.168.0.169, port 5201 Cookie: mjfz4h377cfl7oddwdzwz6dbbxysqaidl2rq Target Bitrate: 5000000 [ 5] local 192.168.0.190 port 22727 connected to 192.168.0.169 port 5201 Starting Test: protocol: UDP, 1 streams, 16 byte blocks, omitting 0 seconds, 10 second test, tos 0 [ ID] Interval Transfer Bitrate Total Datagrams [ 5] 0.00-1.00 sec 610 KBytes 5.00 Mbits/sec 39038 Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 01 fault virtual address = 0x8 fault code = supervisor write data, page not present instruction pointer = 0x20:0xffffffff80dd9e11 stack pointer = 0x0:0xfffffe007ebeea70 frame pointer = 0x0:0xfffffe007ebeea70 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 0 (ndev napi taskq) rdi: fffffe00805f9380 rsi: fffff800057b7400 rdx: fffff800057b7418 rcx: fffff8017b795000 r8: ffffffff8268a3ab r9: 0000000000000460 rax: 0000000000000000 rbx: fffff800057b7480 rbp: fffffe007ebeea70 r10: 0000000000000000 r11: 0000000000000062 r12: fffff8017b796000 r13: fffffe00805f9440 r14: fffffe00805f9380 r15: fffffe00805f9448 trap number = 12 panic: page fault cpuid = 1 time = 1702335843 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe007ebee740 vpanic() at vpanic+0x132/frame 0xfffffe007ebee870 panic() at panic+0x43/frame 0xfffffe007ebee8d0 trap_fatal() at trap_fatal+0x40c/frame 0xfffffe007ebee930 trap_pfault() at trap_pfault+0xae/frame 0xfffffe007ebee9a0 calltrap() at calltrap+0x8/frame 0xfffffe007ebee9a0 --- trap 0xc, rip = 0xffffffff80dd9e11, rsp = 0xfffffe007ebeea70, rbp = 0xfffffe007ebeea70 --- linuxkpi_ieee80211_tx_dequeue() at linuxkpi_ieee80211_tx_dequeue+0x51/frame 0xfffffe007ebeea70 iwl_mvm_mac_itxq_xmit() at iwl_mvm_mac_itxq_xmit+0xc2/frame 0xfffffe007ebeeac0 iwl_mvm_queue_state_change() at iwl_mvm_queue_state_change+0x1ef/frame 0xfffffe007ebeeb10 iwl_txq_reclaim() at iwl_txq_reclaim+0x7ef/frame 0xfffffe007ebeebd0 iwl_mvm_rx_tx_cmd() at iwl_mvm_rx_tx_cmd+0x14e/frame 0xfffffe007ebeeca0 iwl_mvm_rx_common() at iwl_mvm_rx_common+0x1dc/frame 0xfffffe007ebeece0 iwl_pcie_rx_handle() at iwl_pcie_rx_handle+0x47f/frame 0xfffffe007ebeede0 iwl_pcie_napi_poll_msix() at iwl_pcie_napi_poll_msix+0x2d/frame 0xfffffe007ebeee20 lkpi_napi_task() at lkpi_napi_task+0x1f/frame 0xfffffe007ebeee40 taskqueue_run_locked() at taskqueue_run_locked+0xab/frame 0xfffffe007ebeeec0 taskqueue_thread_loop() at taskqueue_thread_loop+0xd3/frame 0xfffffe007ebeeef0 fork_exit() at fork_exit+0x82/frame 0xfffffe007ebeef30 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe007ebeef30 --- trap 0, rip = 0, rsp = 0, rbp = 0 --- KDB: enter: panic [ thread pid 0 tid 100190 ] Stopped at kdb_enter+0x32: movq $0,0xe3c023(%rip) db> dump Dumping 516 out of 6111 MB:..4%..13%..22%..31%..41%..53%..62%..72%..81%..93% Dump complete db>
root@n1_iwl_vm:~ # sysctl hw.ncpu hw.ncpu: 10 Please let me know if the core file is needed, and I can upload it to freefall.
Given you ask if someone wants the core file, I'll take the PR. Can you test: https://people.freebsd.org/~bz/wireless/20231212-02-lkpi-txq.diff (sorry there is some other stuff in there too).
(In reply to Bjoern A. Zeeb from comment #2) Do you know why "git apply" does not work from the patch directly? root@n1_iwl_vm:/usr/src # git apply --check /usr/patches/20231212-02-lkpi-txq.diff error: compat/linuxkpi/common/include/net/mac80211.h: No such file or directory error: compat/linuxkpi/common/src/linux_80211.c: No such file or directory error: compat/linuxkpi/common/src/linux_80211.h: No such file or directory But the interactive "patch" works. root@n1_iwl_vm:/usr/src # patch -p1 < /usr/patches/20231212-02-lkpi-txq.diff Hmm... Looks like a unified diff to me... The text leading up to this was: -------------------------- |diff --git sys/compat/linuxkpi/common/include/net/mac80211.h sys/compat/linuxkpi/common/include/net/mac80211.h |index fa36bd84ac6e..c4d001b3a7e8 100644 |--- sys/compat/linuxkpi/common/include/net/mac80211.h |+++ sys/compat/linuxkpi/common/include/net/mac80211.h -------------------------- File to patch: No file found--skip this patch? [y] n File to patch: sys/compat/linuxkpi/common/include/net/mac80211.h Patching file sys/compat/linuxkpi/common/include/net/mac80211.h using Plan A... Hunk #1 succeeded at 1117. Hunk #2 succeeded at 1683. Hunk #3 succeeded at 1708. Hunk #4 succeeded at 2199. Hunk #5 succeeded at 2470. ...
(In reply to Cheng Cui from comment #3) It's a -p0 diff not prefixed with a/ b/ so not -p1. Your interactive patch did not automatically work either.
(In reply to Bjoern A. Zeeb from comment #2) > Can you test: > https://people.freebsd.org/~bz/wireless/20231212-02-lkpi-txq.diff The patch works fine to me. After multiple iperf3 tests on TCP/UDP, not more crashes on the subject, except hitting the "ifconfig wlan0 destroy" crash from Bug #273985 during my reconfig of the wlan0 on using this patch.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=eac3646fcdd445297cade756630335e23e92ea13 commit eac3646fcdd445297cade756630335e23e92ea13 Author: Bjoern A. Zeeb <bz@FreeBSD.org> AuthorDate: 2023-12-12 01:59:17 +0000 Commit: Bjoern A. Zeeb <bz@FreeBSD.org> CommitDate: 2023-12-19 00:50:49 +0000 LinuxKPI: 802.11: more TXQ implementation and locking Implement ieee80211_handle_wake_tx_queue() and ieee80211_tx_dequeue_ni() while looking at the code. They are needed by various wireless drivers. Introduce an ltxq lock and protect the skbq by that. This prevents panics due to a race between a driver upcall and the net80211 tx downcall. While the former should be rcu protected we cannot rely on that. It remains questionable if we need to protect further fields there (with a different lock?). Also introduce a txq_mtx on the lhw which needs to be further deployed but we need to come up with a good strategy to not end up with 7 different locks. Sponsored by: The FreeBSD Foundation PR: 274178, 275710 Tested by: cc MFC after: 3 days sys/compat/linuxkpi/common/include/net/mac80211.h | 27 +++++---- sys/compat/linuxkpi/common/src/linux_80211.c | 67 +++++++++++++++++++++-- sys/compat/linuxkpi/common/src/linux_80211.h | 29 +++++++++- 3 files changed, 107 insertions(+), 16 deletions(-)
it looks like this needs MFC still?
A commit in branch stable/14 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=1c7be8ecaddfac2b412244e91f924bf73f95658a commit 1c7be8ecaddfac2b412244e91f924bf73f95658a Author: Bjoern A. Zeeb <bz@FreeBSD.org> AuthorDate: 2023-12-12 01:59:17 +0000 Commit: Bjoern A. Zeeb <bz@FreeBSD.org> CommitDate: 2024-02-18 18:31:14 +0000 LinuxKPI: 802.11: more TXQ implementation and locking Implement ieee80211_handle_wake_tx_queue() and ieee80211_tx_dequeue_ni() while looking at the code. They are needed by various wireless drivers. Introduce an ltxq lock and protect the skbq by that. This prevents panics due to a race between a driver upcall and the net80211 tx downcall. While the former should be rcu protected we cannot rely on that. It remains questionable if we need to protect further fields there (with a different lock?). Also introduce a txq_mtx on the lhw which needs to be further deployed but we need to come up with a good strategy to not end up with 7 different locks. Sponsored by: The FreeBSD Foundation PR: 274178, 275710 Tested by: cc (cherry picked from commit eac3646fcdd445297cade756630335e23e92ea13) sys/compat/linuxkpi/common/include/net/mac80211.h | 27 +++++---- sys/compat/linuxkpi/common/src/linux_80211.c | 67 +++++++++++++++++++++-- sys/compat/linuxkpi/common/src/linux_80211.h | 29 +++++++++- 3 files changed, 107 insertions(+), 16 deletions(-)
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=3df959638baa60c1c88e9ac66289502f99ad8418 commit 3df959638baa60c1c88e9ac66289502f99ad8418 Author: Bjoern A. Zeeb <bz@FreeBSD.org> AuthorDate: 2023-12-12 01:59:17 +0000 Commit: Bjoern A. Zeeb <bz@FreeBSD.org> CommitDate: 2024-02-19 08:02:00 +0000 LinuxKPI: 802.11: more TXQ implementation and locking Implement ieee80211_handle_wake_tx_queue() and ieee80211_tx_dequeue_ni() while looking at the code. They are needed by various wireless drivers. Introduce an ltxq lock and protect the skbq by that. This prevents panics due to a race between a driver upcall and the net80211 tx downcall. While the former should be rcu protected we cannot rely on that. It remains questionable if we need to protect further fields there (with a different lock?). Also introduce a txq_mtx on the lhw which needs to be further deployed but we need to come up with a good strategy to not end up with 7 different locks. Sponsored by: The FreeBSD Foundation PR: 274178, 275710 Tested by: cc (cherry picked from commit eac3646fcdd445297cade756630335e23e92ea13) sys/compat/linuxkpi/common/include/net/mac80211.h | 27 +++++---- sys/compat/linuxkpi/common/src/linux_80211.c | 67 +++++++++++++++++++++-- sys/compat/linuxkpi/common/src/linux_80211.h | 29 +++++++++- 3 files changed, 107 insertions(+), 16 deletions(-)
A commit in branch releng/13.3 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=804a4c1c7b8fe00a6924fa5e4ae27a487bdc2337 commit 804a4c1c7b8fe00a6924fa5e4ae27a487bdc2337 Author: Bjoern A. Zeeb <bz@FreeBSD.org> AuthorDate: 2023-12-12 01:59:17 +0000 Commit: Bjoern A. Zeeb <bz@FreeBSD.org> CommitDate: 2024-02-19 16:06:43 +0000 LinuxKPI: 802.11: more TXQ implementation and locking Implement ieee80211_handle_wake_tx_queue() and ieee80211_tx_dequeue_ni() while looking at the code. They are needed by various wireless drivers. Introduce an ltxq lock and protect the skbq by that. This prevents panics due to a race between a driver upcall and the net80211 tx downcall. While the former should be rcu protected we cannot rely on that. It remains questionable if we need to protect further fields there (with a different lock?). Also introduce a txq_mtx on the lhw which needs to be further deployed but we need to come up with a good strategy to not end up with 7 different locks. Approved by: re (cperciva) Sponsored by: The FreeBSD Foundation PR: 274178, 275710 Tested by: cc (cherry picked from commit eac3646fcdd445297cade756630335e23e92ea13) (cherry picked from commit 3df959638baa60c1c88e9ac66289502f99ad8418) sys/compat/linuxkpi/common/include/net/mac80211.h | 27 +++++---- sys/compat/linuxkpi/common/src/linux_80211.c | 67 +++++++++++++++++++++-- sys/compat/linuxkpi/common/src/linux_80211.h | 29 +++++++++- 3 files changed, 107 insertions(+), 16 deletions(-)