Bug 255465 - Kernel panic with Intel Wireless 4965AGN chip
Summary: Kernel panic with Intel Wireless 4965AGN chip
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: wireless (show other bugs)
Version: CURRENT
Hardware: amd64 Any
: --- Affects Only Me
Assignee: freebsd-wireless (Nobody)
Depends on:
Reported: 2021-04-28 14:22 UTC by Radosław Chmielarz
Modified: 2021-05-07 13:29 UTC (History)
2 users (show)

See Also:

[PATCH] Adjust EEPROM read timeout for Intel 4965AGN M2 (1.13 KB, patch)
2021-05-07 13:29 UTC, Radosław Chmielarz
radoslaw.chmielarz: maintainer-approval+
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Radosław Chmielarz 2021-04-28 14:22:38 UTC

After attaching Intel Wireless WiFi Link 4965AGN through PCIe slot I have experienced a kernel panic on startup with stack trace pointing to ieee80211_chan_init() originating from iwn_attach(). Probably the kernel is pointing at a more precise piece of code but I'm using an USB keyboard and the keyboard driver is not yet loaded so I can't type anything on DDB console.

Despite kernel panic coredump was not created. This is most probably because on this device I have only limited amount of disk space (4GB).

If there is a way of using devctl rescan to hotplug this device after system startup it would be great. The only idea I had is to modify iwn.c and remove code from it until the system start panicking again to point me to the correct line of code. But this is very time consuming because of all the recompiles.

I've made a low-tech picture with my phone.
Comment 1 Mark Johnston freebsd_committer 2021-05-03 17:15:13 UTC
The picture doesn't appear to be attached.

If there's not enough disk space for a dump, try adding "-Z" to a dumpon(8) invocation or set dumpon_flags="-Z" in /etc/rc.conf to enable in-kernel compression.
Comment 2 Radosław Chmielarz 2021-05-04 08:21:23 UTC

Sorry for this, it seems I forgot to add it.

I was looking into the code to figure out where the problem is originating from and the actual problem turned out to be that I was getting "timeout reading ROM" error messages at startup, this in turn was causing all the values to be 0 and since the code in ieee80211_get_ratetable() (called from ieee80211_chan_init()) assumed that the channel value passed in was valid it was calling panic() to indicate that there is missing implementation for this device.

I have then drilled down to the code which was reading EEPROM and compared it with Linux (where the device is working). I had not seen any significant differences apart from a different timeout handling. After modifying the timeout the device was successfully initiated.

I own an Intel 4965AGN MM2 with TA: D74676-004 and the measured EEPROM read time for it with 5 us delay is 60 us and with 1 us delay 25 us. This is larger than what the current code supports. It's also strange since the code is already several years old and this particular chip is quite popular (at least from what I have read). Either this is specific to my hardware setup or the hardware is not used so often.

I will post a separate email with a patch for adjusted read timeout and timeout error handling.
Comment 3 Radosław Chmielarz 2021-05-07 13:29:01 UTC
Created attachment 224750 [details]
[PATCH] Adjust EEPROM read timeout for Intel 4965AGN M2

I'm sorry for a very simplistic change but propagating ETIMEODOUT error from iwn_read_prom_data upwards is still causing kernel panic as iwn_detach assumes that the device data structure is complete.

I will work on this problem (error handling) in the future, but since time is scarce it's better to have at least something working than nothing.