Bug 275710 - iwlwifi: linuxkpi_ieee80211_tx_dequeue() page fault while in kernel mode
Summary: iwlwifi: linuxkpi_ieee80211_tx_dequeue() page fault while in kernel mode
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: wireless (show other bugs)
Version: 15.0-CURRENT
Hardware: Any Any
: --- Affects Some People
Assignee: Bjoern A. Zeeb
URL:
Keywords: crash
Depends on:
Blocks: iwlwifi
  Show dependency treegraph
 
Reported: 2023-12-11 23:13 UTC by Cheng Cui
Modified: 2024-02-19 16:11 UTC (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Cheng Cui freebsd_committer freebsd_triage 2023-12-11 23:13:46 UTC
Crashed while doing UDP test via iperf3:

root@n1_iwl_vm:~ # iperf3 -B 192.168.0.190 -c 192.168.0.169 -V -t 10 -i 1 --udp --length 16 --bitrate 5m
iperf 3.15
FreeBSD n1_iwl_vm 15.0-CURRENT FreeBSD 15.0-CURRENT #21 main-7df526eb10: Mon Dec 11 14:39:56 EST 2023     root@n1_iwl_vm:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64
Control connection MSS 1460
Time: Mon, 11 Dec 2023 23:04:02 UTC
Connecting to host 192.168.0.169, port 5201
      Cookie: mjfz4h377cfl7oddwdzwz6dbbxysqaidl2rq
      Target Bitrate: 5000000
[  5] local 192.168.0.190 port 22727 connected to 192.168.0.169 port 5201
Starting Test: protocol: UDP, 1 streams, 16 byte blocks, omitting 0 seconds, 10 second test, tos 0
[ ID] Interval           Transfer     Bitrate         Total Datagrams
[  5]   0.00-1.00   sec   610 KBytes  5.00 Mbits/sec  39038  


Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
fault virtual address	= 0x8
fault code		= supervisor write data, page not present
instruction pointer	= 0x20:0xffffffff80dd9e11
stack pointer	        = 0x0:0xfffffe007ebeea70
frame pointer	        = 0x0:0xfffffe007ebeea70
code segment		= base rx0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 0 (ndev napi taskq)
rdi: fffffe00805f9380 rsi: fffff800057b7400 rdx: fffff800057b7418
rcx: fffff8017b795000  r8: ffffffff8268a3ab  r9: 0000000000000460
rax: 0000000000000000 rbx: fffff800057b7480 rbp: fffffe007ebeea70
r10: 0000000000000000 r11: 0000000000000062 r12: fffff8017b796000
r13: fffffe00805f9440 r14: fffffe00805f9380 r15: fffffe00805f9448
trap number		= 12
panic: page fault
cpuid = 1
time = 1702335843
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe007ebee740
vpanic() at vpanic+0x132/frame 0xfffffe007ebee870
panic() at panic+0x43/frame 0xfffffe007ebee8d0
trap_fatal() at trap_fatal+0x40c/frame 0xfffffe007ebee930
trap_pfault() at trap_pfault+0xae/frame 0xfffffe007ebee9a0
calltrap() at calltrap+0x8/frame 0xfffffe007ebee9a0
--- trap 0xc, rip = 0xffffffff80dd9e11, rsp = 0xfffffe007ebeea70, rbp = 0xfffffe007ebeea70 ---
linuxkpi_ieee80211_tx_dequeue() at linuxkpi_ieee80211_tx_dequeue+0x51/frame 0xfffffe007ebeea70
iwl_mvm_mac_itxq_xmit() at iwl_mvm_mac_itxq_xmit+0xc2/frame 0xfffffe007ebeeac0
iwl_mvm_queue_state_change() at iwl_mvm_queue_state_change+0x1ef/frame 0xfffffe007ebeeb10
iwl_txq_reclaim() at iwl_txq_reclaim+0x7ef/frame 0xfffffe007ebeebd0
iwl_mvm_rx_tx_cmd() at iwl_mvm_rx_tx_cmd+0x14e/frame 0xfffffe007ebeeca0
iwl_mvm_rx_common() at iwl_mvm_rx_common+0x1dc/frame 0xfffffe007ebeece0
iwl_pcie_rx_handle() at iwl_pcie_rx_handle+0x47f/frame 0xfffffe007ebeede0
iwl_pcie_napi_poll_msix() at iwl_pcie_napi_poll_msix+0x2d/frame 0xfffffe007ebeee20
lkpi_napi_task() at lkpi_napi_task+0x1f/frame 0xfffffe007ebeee40
taskqueue_run_locked() at taskqueue_run_locked+0xab/frame 0xfffffe007ebeeec0
taskqueue_thread_loop() at taskqueue_thread_loop+0xd3/frame 0xfffffe007ebeeef0
fork_exit() at fork_exit+0x82/frame 0xfffffe007ebeef30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe007ebeef30
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic
[ thread pid 0 tid 100190 ]
Stopped at      kdb_enter+0x32: movq    $0,0xe3c023(%rip)
db> dump
Dumping 516 out of 6111 MB:..4%..13%..22%..31%..41%..53%..62%..72%..81%..93%
Dump complete
db>
Comment 1 Cheng Cui freebsd_committer freebsd_triage 2023-12-11 23:17:43 UTC
root@n1_iwl_vm:~ # sysctl hw.ncpu
hw.ncpu: 10

Please let me know if the core file is needed, and I can upload it to freefall.
Comment 2 Bjoern A. Zeeb freebsd_committer freebsd_triage 2023-12-12 01:14:14 UTC
Given you ask if someone wants the core file, I'll take the PR.

Can you test:
https://people.freebsd.org/~bz/wireless/20231212-02-lkpi-txq.diff

(sorry there is some other stuff in there too).
Comment 3 Cheng Cui freebsd_committer freebsd_triage 2023-12-12 14:53:02 UTC
(In reply to Bjoern A. Zeeb from comment #2)
Do you know why "git apply" does not work from the patch directly?

root@n1_iwl_vm:/usr/src # git apply --check /usr/patches/20231212-02-lkpi-txq.diff
error: compat/linuxkpi/common/include/net/mac80211.h: No such file or directory
error: compat/linuxkpi/common/src/linux_80211.c: No such file or directory
error: compat/linuxkpi/common/src/linux_80211.h: No such file or directory

But the interactive "patch" works.

root@n1_iwl_vm:/usr/src # patch -p1 < /usr/patches/20231212-02-lkpi-txq.diff
Hmm...  Looks like a unified diff to me...
The text leading up to this was:
--------------------------
|diff --git sys/compat/linuxkpi/common/include/net/mac80211.h sys/compat/linuxkpi/common/include/net/mac80211.h
|index fa36bd84ac6e..c4d001b3a7e8 100644
|--- sys/compat/linuxkpi/common/include/net/mac80211.h
|+++ sys/compat/linuxkpi/common/include/net/mac80211.h
--------------------------
File to patch: 
No file found--skip this patch? [y] n
File to patch: sys/compat/linuxkpi/common/include/net/mac80211.h
Patching file sys/compat/linuxkpi/common/include/net/mac80211.h using Plan A...
Hunk #1 succeeded at 1117.
Hunk #2 succeeded at 1683.
Hunk #3 succeeded at 1708.
Hunk #4 succeeded at 2199.
Hunk #5 succeeded at 2470.
...
Comment 4 Bjoern A. Zeeb freebsd_committer freebsd_triage 2023-12-12 16:05:27 UTC
(In reply to Cheng Cui from comment #3)

It's a -p0 diff not prefixed with a/ b/ so not -p1.
Your interactive patch did not automatically work either.
Comment 5 Cheng Cui freebsd_committer freebsd_triage 2023-12-12 17:42:14 UTC
(In reply to Bjoern A. Zeeb from comment #2)
> Can you test:
> https://people.freebsd.org/~bz/wireless/20231212-02-lkpi-txq.diff

The patch works fine to me. After multiple iperf3 tests on TCP/UDP, not more crashes on the subject, except hitting the "ifconfig wlan0 destroy" crash from Bug #273985 during my reconfig of the wlan0 on using this patch.
Comment 6 commit-hook freebsd_committer freebsd_triage 2023-12-19 00:54:02 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=eac3646fcdd445297cade756630335e23e92ea13

commit eac3646fcdd445297cade756630335e23e92ea13
Author:     Bjoern A. Zeeb <bz@FreeBSD.org>
AuthorDate: 2023-12-12 01:59:17 +0000
Commit:     Bjoern A. Zeeb <bz@FreeBSD.org>
CommitDate: 2023-12-19 00:50:49 +0000

    LinuxKPI: 802.11: more TXQ implementation and locking

    Implement ieee80211_handle_wake_tx_queue() and ieee80211_tx_dequeue_ni()
    while looking at the code.  They are needed by various wireless drivers.

    Introduce an ltxq lock and protect the skbq by that.
    This prevents panics due to a race between a driver upcall and
    the net80211 tx downcall.  While the former should be rcu protected we
    cannot rely on that.
    It remains questionable if we need to protect further fields there
    (with a different lock?).

    Also introduce a txq_mtx on the lhw which needs to be further deployed
    but we need to come up with a good strategy to not end up with 7 different
    locks.

    Sponsored by:   The FreeBSD Foundation
    PR:             274178, 275710
    Tested by:      cc
    MFC after:      3 days

 sys/compat/linuxkpi/common/include/net/mac80211.h | 27 +++++----
 sys/compat/linuxkpi/common/src/linux_80211.c      | 67 +++++++++++++++++++++--
 sys/compat/linuxkpi/common/src/linux_80211.h      | 29 +++++++++-
 3 files changed, 107 insertions(+), 16 deletions(-)
Comment 7 Ed Maste freebsd_committer freebsd_triage 2024-01-10 15:16:36 UTC
it looks like this needs MFC still?
Comment 8 commit-hook freebsd_committer freebsd_triage 2024-02-18 21:12:27 UTC
A commit in branch stable/14 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=1c7be8ecaddfac2b412244e91f924bf73f95658a

commit 1c7be8ecaddfac2b412244e91f924bf73f95658a
Author:     Bjoern A. Zeeb <bz@FreeBSD.org>
AuthorDate: 2023-12-12 01:59:17 +0000
Commit:     Bjoern A. Zeeb <bz@FreeBSD.org>
CommitDate: 2024-02-18 18:31:14 +0000

    LinuxKPI: 802.11: more TXQ implementation and locking

    Implement ieee80211_handle_wake_tx_queue() and ieee80211_tx_dequeue_ni()
    while looking at the code.  They are needed by various wireless drivers.

    Introduce an ltxq lock and protect the skbq by that.
    This prevents panics due to a race between a driver upcall and
    the net80211 tx downcall.  While the former should be rcu protected we
    cannot rely on that.
    It remains questionable if we need to protect further fields there
    (with a different lock?).

    Also introduce a txq_mtx on the lhw which needs to be further deployed
    but we need to come up with a good strategy to not end up with 7 different
    locks.

    Sponsored by:   The FreeBSD Foundation
    PR:             274178, 275710
    Tested by:      cc

    (cherry picked from commit eac3646fcdd445297cade756630335e23e92ea13)

 sys/compat/linuxkpi/common/include/net/mac80211.h | 27 +++++----
 sys/compat/linuxkpi/common/src/linux_80211.c      | 67 +++++++++++++++++++++--
 sys/compat/linuxkpi/common/src/linux_80211.h      | 29 +++++++++-
 3 files changed, 107 insertions(+), 16 deletions(-)
Comment 9 commit-hook freebsd_committer freebsd_triage 2024-02-19 08:09:34 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=3df959638baa60c1c88e9ac66289502f99ad8418

commit 3df959638baa60c1c88e9ac66289502f99ad8418
Author:     Bjoern A. Zeeb <bz@FreeBSD.org>
AuthorDate: 2023-12-12 01:59:17 +0000
Commit:     Bjoern A. Zeeb <bz@FreeBSD.org>
CommitDate: 2024-02-19 08:02:00 +0000

    LinuxKPI: 802.11: more TXQ implementation and locking

    Implement ieee80211_handle_wake_tx_queue() and ieee80211_tx_dequeue_ni()
    while looking at the code.  They are needed by various wireless drivers.

    Introduce an ltxq lock and protect the skbq by that.
    This prevents panics due to a race between a driver upcall and
    the net80211 tx downcall.  While the former should be rcu protected we
    cannot rely on that.
    It remains questionable if we need to protect further fields there
    (with a different lock?).

    Also introduce a txq_mtx on the lhw which needs to be further deployed
    but we need to come up with a good strategy to not end up with 7 different
    locks.

    Sponsored by:   The FreeBSD Foundation
    PR:             274178, 275710
    Tested by:      cc

    (cherry picked from commit eac3646fcdd445297cade756630335e23e92ea13)

 sys/compat/linuxkpi/common/include/net/mac80211.h | 27 +++++----
 sys/compat/linuxkpi/common/src/linux_80211.c      | 67 +++++++++++++++++++++--
 sys/compat/linuxkpi/common/src/linux_80211.h      | 29 +++++++++-
 3 files changed, 107 insertions(+), 16 deletions(-)
Comment 10 commit-hook freebsd_committer freebsd_triage 2024-02-19 16:11:06 UTC
A commit in branch releng/13.3 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=804a4c1c7b8fe00a6924fa5e4ae27a487bdc2337

commit 804a4c1c7b8fe00a6924fa5e4ae27a487bdc2337
Author:     Bjoern A. Zeeb <bz@FreeBSD.org>
AuthorDate: 2023-12-12 01:59:17 +0000
Commit:     Bjoern A. Zeeb <bz@FreeBSD.org>
CommitDate: 2024-02-19 16:06:43 +0000

    LinuxKPI: 802.11: more TXQ implementation and locking

    Implement ieee80211_handle_wake_tx_queue() and ieee80211_tx_dequeue_ni()
    while looking at the code.  They are needed by various wireless drivers.

    Introduce an ltxq lock and protect the skbq by that.
    This prevents panics due to a race between a driver upcall and
    the net80211 tx downcall.  While the former should be rcu protected we
    cannot rely on that.
    It remains questionable if we need to protect further fields there
    (with a different lock?).

    Also introduce a txq_mtx on the lhw which needs to be further deployed
    but we need to come up with a good strategy to not end up with 7 different
    locks.

    Approved by:    re (cperciva)
    Sponsored by:   The FreeBSD Foundation
    PR:             274178, 275710
    Tested by:      cc

    (cherry picked from commit eac3646fcdd445297cade756630335e23e92ea13)
    (cherry picked from commit 3df959638baa60c1c88e9ac66289502f99ad8418)

 sys/compat/linuxkpi/common/include/net/mac80211.h | 27 +++++----
 sys/compat/linuxkpi/common/src/linux_80211.c      | 67 +++++++++++++++++++++--
 sys/compat/linuxkpi/common/src/linux_80211.h      | 29 +++++++++-
 3 files changed, 107 insertions(+), 16 deletions(-)