Created attachment 233540 [details]
I've spent a few days reading _a lot_ of documentation and experimenting with getting FreeBSD running on my Framework Laptop on 13.0-RELEASE, 13.1-RC4, and 14-CURRENT (just to see how far all of the branches are working on it - although originally I wanted to just run 13.1 since that's the next RELEASE but I was getting too many issues and crashes with it.. and I've been avoiding jumping into STABLE/CURRENT, but c'est la vie haha). I've definitely noticed a bunch of issues that either already got fixed, or still remain (but I've yet gathered exact repro scenarios). However, for this particular issue (and I'm not sure if it's 100% related but I think it might be), the Intel AX210 card on the framework laptop seems to associate with my AP properly, but no DHCPOFFERS are received. I think there was only one time a few days ago on an older build of CURRENT that I was able to get it to get an IP, but I haven't been able to replicate that working state.
Also, this is on the sources for 14-CURRENT (for both world and kernel) as of today 2022-04-27. I'm motivated to help any of the devs test any code needed to get, not only this issue, but really anything regarding the framework laptop as an actual desktop productivity machine. I'm not a FreeBSD pro, but have been using it on my home server for the past 2~ years. So apologies if there any particular concepts or things I don't yet know, feel free to point me in the right direction though :).
I've attached various parts of my system config and dmesg output to help debug this.
I've noticed that I sometimes get the following errors as well (I've also looked at what the Linux community said about those errors and it seems they were fixed in 5.13):
iwlwifi0: iwl_trans_send_cmd bad_state = 0
iwlwifi0: Failed to remove MAC context: -5
Given the above situation where the system started up the wlan0 interfaced (parent: iwlwifi0) and was associated to the AP but did not receive any DHCP offers:
1. ifconfig wlan0 down
2. ifconfig wlan0 192.168.1.105/24 netmask 255.255.255.0
> The command will trigger, and a few seconds later it will hard crash. I've attached a camera shot of this. But something relevant may be the following:
iwlwifi0: Failed to send binding (action:1): -5
iwlwifi0: PHY ctxt cmd error. ret=-5
iwlwifi0: lkpi_iv_newstate: error -5 during state transition 1 (SCAN) -> 2 (AUTH)
iwlwifi0: No queue was found. Dropping TX
iwlwifi0: Failed to trigger RX queues sync (-5)
panic: lkpi_sta_auth_to_scan: lsta 0xfffff8012f0c5800 state not NONE: 0, nstate 1 arg 1
Overall, this laptop seems to have slowly gotten better support so that's definitely encouraging and I also read the recent news about FreeBSD and the work going on to improve Framework Laptop support. The wifi issues caused hard reboots on 13.1-RC4 current (It would just shut off, I believe the "firmware dying in a fire" that Kyle Evans mentioned here: https://lists.freebsd.org/archives/freebsd-wireless/2021-October/000099.html might have been what he meant by that). This may or may not be the same bug that I'm experiencing here, but maybe since I'm running -CURRENT now with the default debug configuration, "hard crash reboots" turn into "hard crash - do not reboot" situations?
Anyways, let me know if anything and I can further help debug.
Created attachment 233541 [details]
Created attachment 233542 [details]
Created attachment 233543 [details]
Created attachment 233544 [details]
Created attachment 233545 [details]
camera photo of crash
Created attachment 233548 [details]
rtwn0 external usb 2.0 wifi adapter info
I just tested an external USB 2.0 Wifi adapter that I had and it is detected by FreeBSD, lights up, and is able to associate, but didn't receive an IP. This could indicate that maybe I've just misconfigured something somewhere (I did follow all of the Advance Wireless Configuration steps in the handbook) when in comes to the IP association part. However, the hard crash that I detected is still a bug. Attaching output for the 'rtwn0' wifi adapter.
(In reply to Jonathan Vasquez from comment #2)
Not sure if it matters. But in my case. I need to blacklist
then I add if_iwlwifi to my kld_list.
Maybe this might help?
I don't crash. :-)
Thanks for that :). So I've been playing around with the system for the whole day today trying to understand the behavior of this machine in conjunction with FreeBSD and its drivers. There's just too many variables to explain everything but the good news is that with Chris' suggestion, DHCP now works on the iwlwifi0 adapter. DHCP doesn't work on the rwtn0 device though (I'm testing both to further test the interaction of the drivers and the hardware). All the devices are working on Linux from the last time I tested them. Static IP assignment still crashes the iwlwifi0 device, and I was able to get Static IP working in the rwtn0 device but it doesn't consistently work across -immediate- reboots (I haven't tested more cold scenario since as I said, I was playing around with it for the whole day lol). Sometimes it takes 15 seconds for it to work, sometimes it takes 45+ seconds, and something it just doesn't work.
There were also some other settings I was missing during static ip assignment, specifically setting the: defaultrouter="<ip>" field. I was able to test this through /etc/rc.conf + service netif restart, and I also tried to manually construct the device and see if it worked. It basically yielded the above results.
ifconfig wlan0 create wlandev rwtn0
ifconfig wlan0 up
ifconfig wlan0 scan (Shows the AP)
wpa_supplicant -i wlan0 -c /etc/wpa_supplicant.conf
On another terminal (tmux):
ifconfig wlan0 inet 192.168.1.101 netmask 255.255.255.0
route add default 192.168.1.1
echo "nameserver 192.168.1.1" > /etc/resolv.conf
ping 192.168.1.1 (No response, non-deterministic as above results described)
netstat -rn and arp -a showed what you would expect in a correct working network (like the default gateway being set to 192.168.1.1 and the iface it used as wlan0).
So yea, lots of different interactions here. I also noticed sometimes if I left the "ifconfig_ue0="DHCP" in the file, sometimes I wouldn't get a DHCP request on the wireless card side.. but after many hours later I tried leaving both of them enabled and it was still working fine.. so it could have just been some weirdness. I even rebooted my router just to make sure (even though everything was working, including multiple other machines connected to the same router).
this is my current /etc/rc.conf for further info
root@leslie:~ # cat /etc/rc.conf
create_args_wlan0="regdomain FCC country US"
#ifconfig_wlan0="inet 192.168.1.101 netmask 255.255.255.0 ssid Summerland WPA"
Either way, at least we were able to find another bug (the static ip assignment on the iwlwifi0 card causes a crash).
(1) For the AX210 iwm(4) blocklisting should make zero difference as the IDs won't be in the driver and so it'll not probe or try to attach. Something else funky is going on and it highly smells like "timing".
(2) The ifconfig wlan0 down is currently known to leave state behind on iwlwifi which will then in the follow-up result in the SW crash; I was hoping I could fix that while I was on the road but will try to do so when I am back in the office next week.
(3) I'd highly appreciate if the bug report could not be convoluted with too many different issuesas it'll be hard for me to follow otherwise.
(4) If DHCP is an issue ifconfig list scan showing the AP doesn't mean you are associated. You need to check for the "status: associated" in ifconfig wlan0 or otherwise possibly in wpa. Likewise for manual configuration.
Thanks for taking a look into this.
1. Yup that may be the case.
2. Sounds good.
3. I only posted particular pieces of information (all networking related) that I thought may affect this particular PR. So it's done primarily for the sake of completeness to make sure everything is accounted for. I know the FreeBSD community is highly RTFM so I wanted to make sure I covered as much as I could regarding the networking situation.
4. Yup I'm aware of that. My posts mentioned that ifconfig wlan0 scan displayed the ssid since previous instructions I've read mentioned to check if your wifi adapter even displays the ssid in the first place (or any wifi networks at all). If you check my ifconfig output, it will display that it is in fact associated.
Let me know if there is anything else you want me to test out in the meantime that can further assist you.
I just retested again without having the devlist_blacklist="if_iwm" and also without "if_iwlwifi" in kld_list. It seems to have no effect now (as you said Bjoern). I noticed the 'if_iwlwifi' get autoloaded a bit before the firmware is used either way so that line probably doesn't have much effect in this case.
This is probably related to this but I noticed that if I did multiple reboots in sequence, the ability for the wifi card to re-establish a connection diminished. I'm guessing this maybe regarding the state, but not OS state, but firmware state, I'm guessing there may be some sort of SRAM chip in there that is maintaining the previous state of the multiple and subsequent reboots. This may explain why if I wait a bit (I kinda have to right lol) it eventually is able to connect again.
(In reply to Jonathan Vasquez from comment #12)
Do you have the ability to check what your AP is thinking?
(In reply to Bjoern A. Zeeb from comment #13)
Also reading back to the beginning of the PR, this highly sounds like a wpa_supplicant issue which was fixed a few weeks ago.
In addition LinuxKPI for iwlwifi has moved forward since the original opening.
Can you update the PR with a latest status.
I am very sorry, I apparently hadn't assigned the PR to myself and completely missed it later for.
My crash on the recent CURRENT with Intel(R) Dual Band Wireless AC 7265, which seems highly related:
The crash happens on "service netif start", presumably when the IP address is being set.
(In reply to Gleb Popov from comment #15)
Do not service netif restart wlan0 for the moment.
That'll destroy and re-crate your wlan0 interface and while we tear down the state and build it up, the firmware does not seem to like that.
Please try ifconfig wlan0 down && sleep 1 && ifconfig wlan0 up if you have to.
No prob Bjoern :). I'm currently on 13-STABLE (stable/13-n252739-af335a43669) but am currently building the latest 13-STABLE as of (ef2aa77530127f) which includes your recent wireless changes. I'll report back once I have this built.
As for my ability to check what my AP is thinking, I'm not necessarily sure how to do that, but I am running DD-WRT on my Linksys WRT3200ACM if that helps.
Created attachment 237438 [details]
I finished testing again on FreeBSD 13.1-STABLE #0 stable/13-n252783-ef2aa775301. I was able to associate the wireless card with my AP, but it failed to get a DHCP request (This worked before but very flaky, this time nothing though). There are no MAC address conflicts on my router, and my router is working fine and delegating DHCP addresses properly. I can also see all of the DHCP clients connected to my AP from the DD-WRT admin menu, and I can see that the MAC address for my wireless card is indeed connected and associated with the AP. I did notice that under the "Info" column under the "Wireless Nodes" section for DD-WRT, I saw that the iwlwifi card is being reported as "LEGACY" where as the other wireless nodes are either "VHT20SGI" or "HT20".
I also tried to re-test assigning a static IP to the card and see what happens in that case. The same thing as before occurred where immediately upon restarting the card, it crashed the entire system. A subsequent reboot automatically crashed the system on boot as well (since the direct ip assignments were still in /etc/rc.conf), and lastly doing another subsequent reboot (so two reboots back to back) allowed the system to boot up properly and assign the IP to the card, but it failed to associate to my AP (it actually associated with another random AP in the area since I didn't specify "WPA" in the config and thus it didn't limit it to the AP defined in /etc/wpa_supplicant.conf).
I've attached some of the relevant info from various parts of the system in the attached document (stable-13-n252783-ef2aa775301-20221018.txt).
Created attachment 237491 [details]
wpa_supplicant and dhcp static lease
Good news. After some time experimenting with the wireless card and the AP, I was able to try something "clever" in order to get an IP from the router.
After some experiments, I continued to notice that the router didn't want to give a DHCPOFFER to the wireless card, even though the router has no issues giving the card an IP (the same 192.168.1.136 address) when using wifibox on freebsd (which is using iwlwifi directly on Linux via bhyve's PCI passthrough). We all know this workaround. That made me think, that maybe if I can just tell the router's DHCP server to always give this ip address to X MAC address, then maybe there -will- be a DHCPOFFER available.. maybe the router is getting confused because the freebsd iwlwifi driver is communicating with it in a way that it doesn't understand. Fast forward, after adding my wlan card's MAC address directly to my router's DHCP server's Static Leases table, I restarted the wlan0 interface on FreeBSD, re-associated with wpa_supplicant (it happened to be on 5G freq at this stage but earlier it was on 2.4 (Since I have identical SSIDs for both freq, and the ap/card can decide what's best for them), killed any old 'dhclient's, and re-ran dhclient wlan0, and it got the static IP offer immediately over DHCP. This means I can start testing this card in a more day-to-day basis now.
This obviously opens questions as to why the AP/Card failed to exchange a proper DHCP IP (there are plenty of IP slots available in my DHCP range). Originally I was thinking maybe there was a conflict between some old lease file in /var/db/dhclient.*.*, given that I'm switching between wifibox, and the native driver on the same device. But this didn't have an affect.
I've attached the logs from my debugging and successful connection. I'll be continuing to experiment with some wpa_supplicant settings and doing some more reboots to test the performance of the reassociation and make sure that the driver can continue to reconnect. I also want to remove the DHCP static lease and see if it "works" automatically afterwards.
Created attachment 237492 [details]
wpa_supplicant and dhcp static lease Part 2
Just finished my testing. More good news.
1. I was able to wipe out any DHCP configs on my laptop (/var/db/dhclient*) and attempt to cleanly re-retrieve the static lease from the AP. This worked fine and was given the 192.168.1.140 static address.
2. After this I wiped out the wireless leases again from the laptop and removed the 192.168.1.140 static lease from the router. I rebooted the router afterwards. IIRC I also rebooted the laptop. Once rebooted, I attempted to retrieve a DHCP address again. This actually worked this time, and to make things more interesting, it re-gave me the old IP address that it was giving the wifibox bhyve VM, which was 192.168.1.136. So this definitely means that the router did remember this MAC address within it's DB somewhere, but it didn't want to give it to me before.
3. I wiped out the configs again and restored my original wifi settings:
re-enabled the `ifconfig_wlan0="WPA SYNCDHCP` line so that upon reboot it will automatically start the wpa_supplicant app, associate, and start dhclient, and request IP.
After that I rebooted the machine, and the system booted up perfectly fine and retrieved the 192.168.1.136 offer again and was able to connect to the internet.
I remember a few months ago I went to a friend's house and their AP did not want to give me an IP address, (I believe it did associate though), so I'm wondering if this was the same bug that was affecting this..
Created attachment 237594 [details]
dmesg (DHCPOFFER on loop)
Some minor updates:
1. Sometimes the card takes a while to get a DHCPOFFER, or stays in the DHCPREQUEST stage for a while. Eventually it seems to get it though (Maybe after many minutes).
2. Today the card decided to go a little crazy when it was booting up (It was off for more than 12 hours). The card did receive a DHCPOFFER request, but then continuously seemed to not accept it and it continued to receive a DHCPOFFER request. I've attached a picture and the dmesg output.
After I (Ctrl+C) a bunch of times as boot, I eventually got to the login prompt. I logged in and ran `service netif restart wlan0`, after that it connected and got an IP. Seems I had to kick it once (maybe to clear some buggy state) haha.
Created attachment 237595 [details]
DHCPOFFER on loop - Picture
Created attachment 237862 [details]
core.txt.1 (setting ip)
Good news! I was able to finally fix my issue regarding saving crash dumps and I now have a dump to show (Since I was using encrypted swap ( w/ .eli extension in /etc/fstab) that was causing my cores not be able to be saved. For whatever reason I thought the system was "smart enough" to extract the core before re-encrypting that swap partition on boot). Anyways, I've attached the extract as core.txt.1 (setting ip). These are the network related settings when it crashes in /etc/rc.conf:
ifconfig_wlan0="WPA inet 192.168.1.135/24"
create_args_wlan0="country US regdomain FCC"
I'm not familiar with kernel debugging but since I got crash dumps working and gdb, I'll see if I can poke around and see what happens. I do know a bit of basic "pdb (for python development)" and I know that pdb was suppose to be inspired by gdb, so maybe it will help haha.
Working myself thru this material, I come to the conclusion that my bug #266887 might actually be a duplicate of this.
From the initial description here, the basic reproducible behaviour is to
ifconfig down; ifconfig up
(after the iface was already up).
And this is exactly what I found as my core issue:
Doing a sequence of ifconfig up/down/up will always crash the system, either immediately or after a few seconds.
It doesn't matter what is done with the interface, if there is a connection established or dhcp working, or whatever. We can do it in single-user with no networking at all, and get the same result:
ifconfig create wlan0 wlandev iwlwifi0
ifconfig up -> *KAPUTT*
Looking into bug #267029, that also talks about *re*establishing a network - so the underlying cause might also be this same sequence of up/down/up. Possibly some others, too.
As long as the interface is just brought up and used, it works - which may be misleading to people, because there are system programs (e.g. devd) that may bring the interface down and up on certain conditions, and getting a crash after that might come as a surprise.
@Bjoern: in comment 9 you talk about this being a known issue that will be fixed.
That was half a year ago, and my experience is this being still the basic issue (in STABLE-13). What's the state now?
I'm running latest stable/13 and it still happens and should be the same in main as well (14 current).
(In reply to Jonathan Vasquez from comment #23)
Thank You, Jonathan. Looking into Your attachment, I see this:
#10 __mtx_lock_sleep (c=0xfffffe0156b201b0, v=<optimized out>)
#11 0xffffffff80d88d43 in psq_drain (psq=0xfffffe0156b20198)
#12 ieee80211_node_psq_drain (ni=ni@entry=0xfffffe0156b19000)
And this is exactly where I am currently looking into, because my kernel crashes just there. (I'm on a Fujitsu A3511 with AX201, but this here looks very much identical to my issue.)
Background: since ifconfig up/down/up does crash here, I resorted to do kldunload/load instead. This didn't work, and I had to fix issue #267869 first.
And now, when doing kldunload, I get this crash in mtx_lock() from ieee80211_node_psq_drain() - not always, but often.
the backtrace is interesting as the ni still seems to be valid.
I am currently trying to hunt down what looks like a node reference count problem as I get a 0xdeadc0dedeadcXXX situation at times which indicates that the node was freed and still being used.
I have some extra local [wlan]debug code to give extra "landmarks"; I'll try to put it into main and follow-up here with what to run.
Thanks Bjoern ;).
Bjoern, the lock seems to have a problem.
As I have no idea what we are doing here, but I can code K&R C and I can learn. So I spoiled my kernel with printf(). Specifically I put one in the entry of psq_drain():
printf("entering psq_drain psq=%lld, lock=%lu\n",
(long long)psq, psq->psq_lock.mtx_lock);
Now when everything goes well, it works like this:
# ifconfig wlan0 down
 entering psq_drain psq=-2195130981992, lock=0
<6> wlan0: link state changed to DOWN
# kldunload if_iwlwifi
 entering psq_drain psq=-2195461594728, lock=0
And when it crashes, it looks like this:
To the contrary, when it crashes it looks like this:
# ifconfig wlan0 down
 entering psq_drain psq=-2195141467752, lock=0
<6> wlan0: link state changed to DOWN
# kldunload if_iwlwifi
 entering psq_drain psq=-2195120246376, lock=0
 entering psq_drain psq=-2195120246376, lock=4
 Fatal trap 12: page fault while in kernel mode
Tentatively, as far as I currently understand, that mtx_lock=4 is not supposed to go into __mtx_lock_sleep().
In my case if_iwlwifi.ko is loaded and not crashing but I can't create the interface, more details here (https://forums.freebsd.org/threads/wi-fi-6-ax200-iwlwifi0-siocifcreate2-wlan0.88000/#post-598051)
Something that I notice is that if I unload the module it never gets removed:
it keeps adding it:
pci7: <network> at device 0.0 (no driver attached)
Warning: memory type lkpikmalloc leaked memory on destroy (1 allocations, 64 bytes leaked).
Intel(R) Wireless WiFi based driver for FreeBSD
iwlwifi0: <iwlwifi> mem 0xfc600000-0xfc603fff at device 0.0 on pci7
iwlwifi0: successfully loaded firmware image 'iwlwifi-cc-a0-73.ucode'
iwlwifi0: api flags index 2 larger than supported by driver
iwlwifi0: TLV_FW_FSEQ_VERSION: FSEQ Version: 22.214.171.124
iwlwifi0: loaded firmware version 73.35c0a2c6.0 cc-a0-73.ucode op_mode iwlmvm
iwlwifi0: Detected Intel(R) Wi-Fi 6 AX200 160MHz, REV=0x340
iwlwifi0: Detected RF HR B3, rfid=0x10a100
iwlwifi0: base HW address: 50:e0:85:87:b5:18
Any ideas on how could I create the interface?