Bug 283015 - iwlwifi: kernel panic: sleeping thread holds lvif – lock leak in lkpi_sta_scan_to_auth()
Summary: iwlwifi: kernel panic: sleeping thread holds lvif – lock leak in lkpi_sta_sca...
Status: In Progress
Alias: None
Product: Base System
Classification: Unclassified
Component: wireless (show other bugs)
Version: 15.0-CURRENT
Hardware: Any Any
: --- Affects Some People
Assignee: Mark Johnston
URL: https://reviews.freebsd.org/D47949
Keywords: crash
Depends on:
Blocks: iwlwifi
  Show dependency treegraph
 
Reported: 2024-11-28 00:31 UTC by Graham Perrin
Modified: 2024-12-13 20:31 UTC (History)
3 users (show)

See Also:
grahamperrin: maintainer-feedback? (bz)


Attachments
A photograph of the primary screen during the panic, frozen at 09:39:04 (796.43 KB, image/png)
2024-11-28 00:47 UTC, Graham Perrin
no flags Details
An extract from core.txt.10 (5.69 KB, text/plain)
2024-12-01 10:22 UTC, Graham Perrin
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Graham Perrin 2024-11-28 00:31:11 UTC
base 3f0289ea7f66c82656a43edf6527055fd27d225d (2024-11-26 21:56:45 +0000) at the time of the panic. 

root@mowa219-gjp4-zbook-freebsd:~ # cd /var/crash
root@mowa219-gjp4-zbook-freebsd:/var/crash # cat info.10
Dump header from device: /dev/ada1p2
  Architecture: amd64
  Architecture Version: 2
  Dump Length: 3464708096
  Blocksize: 512
  Compression: none
  Dumptime: 2024-11-27 09:39:05 +0000
  Hostname: mowa219-gjp4-zbook-freebsd
  Magic: FreeBSD Kernel Dump
  Version String: FreeBSD 15.0-CURRENT main-n273879-3f0289ea7f66 GENERIC-NODEBUG
  Panic String: sleeping thread holds lvif
  Dump Parity: 2290504797
  Bounds: 10
  Dump Status: good
root@mowa219-gjp4-zbook-freebsd:/var/crash # exit
logout
% date ; uptime
Thu 28 Nov 2024 00:13:56 GMT
12:13a.m.  up  1:43, 6 users, load averages: 1.21, 1.64, 1.68
% uname -abvKU
FreeBSD mowa219-gjp4-zbook-freebsd 15.0-CURRENT FreeBSD 15.0-CURRENT main-n273880-fd630ae93634 GENERIC-NODEBUG amd64 1500028 1500028 2a9bd20e13e0fd863895717d95733ba722474482
%
Comment 1 Graham Perrin 2024-11-28 00:47:41 UTC
Created attachment 255501 [details]
A photograph of the primary screen during the panic, frozen at 09:39:04

Greyscale. The original colour version (too large) failed to attach at comment 0.

<https://bsd-hardware.info/?probe=c8d95da1f8> was the result of a probe at 06:02 UTC the day before the panic (2024-11-26).

Notable, before the panic on 2024-11-27: 

- at home, I would have routinely
  ifconfig gif0 down && ifconfig em0 down
  because (a) I rarely require the gif0 tunnel for IPv6 with em0; 
  and (b) usually I prefer em(4), but recently I'm testing iwlwifi(4)

- sleep of the OS at home, whilst connected to ssid 'piano5'

- wake of the OS at work, miles away, apparently still connected to the 
  home network

ifconfig wlan1 down

ifconfig wlan1 up

- not connected to the required network (ssid eduroam)

route delete default ; ifconfig gif0 down ; service netif stop em0 > & /dev/null ; ifconfig wlan0 destroy ; sleep 1 ; service netif start em0 > & /dev/null ; sleep 15 ; resolvconf -i ; route show default ; ping -4 -c 2 freshports.org
Comment 2 Graham Perrin 2024-11-28 01:05:27 UTC
Björn, might this help with bug 263632?

Would you like information from core.txt.10 (below) for the panic with GENERIC-NODEBUG, or will it be more useful to reproduce the panic with GENERIC?

It's now 01:05 at home, I can boot GENERIC and then aim to reproduce the panic a few hours from now, shortly after I wake the OS at work. 


% ls -hlnrt /var/crash | tail -n 7
-rw-r--r--  1 0 0  4.8M 15 Nov 21:35 core.txt.9
-rw-r--r--  1 0 0    3B 27 Nov 10:13 bounds
-rw-------  1 0 0  437B 27 Nov 10:13 info.10
-rw-------  1 0 0  3.2G 27 Nov 10:14 vmcore.10
lrwxr-xr-x  1 0 0    7B 27 Nov 10:14 info.last -> info.10
lrwxr-xr-x  1 0 0    9B 27 Nov 10:14 vmcore.last -> vmcore.10
-rw-r--r--  1 0 0  954K 27 Nov 10:15 core.txt.10
%
Comment 3 Graham Perrin 2024-11-28 17:35:35 UTC
bug 283015 comment 2
Comment 4 Graham Perrin 2024-11-28 17:40:50 UTC
Sorry, ignore comment 3, I was previewing the usability of link text (can't be previewed when opening a report). I forgot to remove the test text when adding the see also.
Comment 5 Graham Perrin 2024-12-01 10:08:01 UTC
8086:08b1:8086:c060 is at <https://bsd-hardware.info/?probe=c8d95da1f8#pci:8086-08b1-8086-c060>.

As far as I can tell, iwlwifi detects this as: 

Dual Band Wireless N 7260
Comment 6 Graham Perrin 2024-12-01 10:22:14 UTC
Created attachment 255561 [details]
An extract from core.txt.10

Re: comment 2

…
Sleeping thread (tid 100602, pid 0) owns a non-sleepable lock
KDB: stack backtrace of thread 100602:
sched_switch() at sched_switch+0x829/frame ⋯
mi_switch() at mi_switch+0xbc/frame ⋯
_sleep() at _sleep+0x19e/frame ⋯
taskqueue_thread_loop() at taskqueue_thread_loop+0xb1/frame ⋯
fork_exit() at fork_exit+0x7b/frame ⋯
fork_trampoline() at fork_trampoline+0xe/frame ⋯
--- trap 0xc, rip = ⋯, rsp = ⋯, rbp = ⋯ ---
panic: sleeping thread holds lvif
cpuid = 6
…

The attachment includes a little more context.
Comment 7 Mark Johnston freebsd_committer freebsd_triage 2024-12-06 17:48:34 UTC
It looks like lkpi_sta_scan_to_auth() is missing a LKPI_80211_LVIF_UNLOCK(lvif) in the path which returns EBUSY.
Comment 8 Bjoern A. Zeeb freebsd_committer freebsd_triage 2024-12-07 02:00:30 UTC
(In reply to Mark Johnston from comment #7)

Good catch and thanks for D47949

This seems to be a problem given the wifi device is not destroyed before suspend (at least this is what I get from the description);  also see the second half of my comment on the other bug [1] and the see-also there (or mentioned in here by Graham himself).

[1] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=283027#c6
Comment 9 commit-hook freebsd_committer freebsd_triage 2024-12-13 20:31:10 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=926905796749750da6464b97ec4f8eec0882cc0e

commit 926905796749750da6464b97ec4f8eec0882cc0e
Author:     Mark Johnston <markj@FreeBSD.org>
AuthorDate: 2024-12-13 20:28:13 +0000
Commit:     Mark Johnston <markj@FreeBSD.org>
CommitDate: 2024-12-13 20:28:13 +0000

    linuxkpi: Fix a lock leak in lkpi_sta_scan_to_auth()

    PR:             283015
    Reviewed by:    bz
    MFC after:      1 week
    Fixes:          0936c648ad0e ("LinuxKPI: 802.11: update the ni/lsta reference cycle")
    Differential Revision:  https://reviews.freebsd.org/D47949

 sys/compat/linuxkpi/common/src/linux_80211.c | 2 ++
 1 file changed, 2 insertions(+)