Hello, I've been debugging an issue with the creator if wifibox ( PÁLI Gábor János) that I've been experiencing on my Thinkpad X1 Carbon Gen 7 on 13.1-STABLE (n251926-488f9d85278, but I also reproduced it in 13.1-RELEASE - n250148-fc952ac2212 - while seeing if this was a regression, it is not). My particular scenario is using 'wifibox' to pass my wifi card to an Alpine Linux VM using bhyve. Most of the details and logs during our debugging have been posted in this bug report: https://github.com/pgj/freebsd-wifibox/issues/31 It seems that when I shut down my laptop (host) while the bhyve VM is running, it would non-deterministically (but very soon, around 1-5 times of doing suspend/resume) would resume the system successfully, but the wifi would stop working. Upon further investigation, we noticed that the wireless interface and the entire PCI card was completely missing from within the VM. The only way to "fix" it was to reboot. Upon further experimentation, I noticed that if I stopped the VM, then suspended/resumed, and then started the VM again, the PCI card would be successfully available again. This workaround so far hasn't failed for me (but I'm still testing it throughout the day). My suspicion is that when bhyve has passed down a PCI device and the host is suspended, the resources for that handle aren't being released correctly and the machine may be in an inconsistent state in regards to this specific passthrough device. To post some logs, we can see the following before and after inside the bhyve VM: Before ----------- wifibox:~# lspci 00:1f.0 Class 0601: 8086:7000 00:04.2 Class 0100: 1af4:1009 00:04.0 Class 0100: 1af4:1001 00:00.0 Class 0600: 1275:1275 00:04.3 Class 0100: 1af4:1009 00:06.0 Class 0280: 8086:02f0 00:04.1 Class 0100: 1af4:1009 00:05.0 Class 0200: 8086:100f wifibox:~# ifconfig eth0 Link encap:Ethernet HWaddr 00:A0:98:8A:05:71 inet addr:10.1.0.1 Bcast:0.0.0.0 Mask:255.0.0.0 inet6 addr: fe80::2a0:98ff:fe8a:571/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:20396 errors:0 dropped:0 overruns:0 frame:0 TX packets:33557 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:4197622 (4.0 MiB) TX bytes:27342203 (26.0 MiB) lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:65536 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) wlan0 Link encap:Ethernet HWaddr F8:E4:E3:EB:35:02 inet addr:192.168.1.139 Bcast:192.168.1.255 Mask:255.255.255.0 inet6 addr: fe80::fae4:e3ff:feeb:3502/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:34620 errors:0 dropped:0 overruns:0 frame:0 TX packets:20133 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:27437272 (26.1 MiB) TX bytes:4448962 (4.2 MiB) wifibox:~# dmesg | grep iwlwifi [ 0.869316] iwlwifi 0000:00:06.0: can't derive routing for PCI INT A [ 0.869318] iwlwifi 0000:00:06.0: PCI INT A: not connected [ 0.870101] iwlwifi 0000:00:06.0: Failed to set affinity mask for IRQ 41 [ 0.926465] iwlwifi 0000:00:06.0: api flags index 2 larger than supported by driver [ 0.926473] iwlwifi 0000:00:06.0: TLV_FW_FSEQ_VERSION: FSEQ Version: 89.3.35.37 [ 0.926601] iwlwifi 0000:00:06.0: loaded firmware version 66.f1c864e0.0 QuZ-a0-jf-b0-66.ucode op_mode iwlmvm [ 0.946738] iwlwifi 0000:00:06.0: Detected Intel(R) Wireless-AC 9560 160MHz, REV=0x354 [ 1.061975] iwlwifi 0000:00:06.0: Detected RF JF, rfid=0x105110 [ 1.118696] iwlwifi 0000:00:06.0: base HW address: f8:e4:e3:eb:35:02 [ 5.207325] iwlwifi 0000:00:06.0: Unhandled alg: 0x3f0707 After ------ wifibox:~# lspci 00:1f.0 Class 0601: 8086:7000 00:04.2 Class 0100: 1af4:1009 00:04.0 Class 0100: 1af4:1001 00:00.0 Class 0600: 1275:1275 00:04.3 Class 0100: 1af4:1009 00:04.1 Class 0100: 1af4:1009 00:05.0 Class 0200: 8086:100f wifibox:~# ifconfig eth0 Link encap:Ethernet HWaddr 00:A0:98:8A:05:71 inet addr:10.1.0.1 Bcast:0.0.0.0 Mask:255.0.0.0 inet6 addr: fe80::2a0:98ff:fe8a:571/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:145 errors:0 dropped:0 overruns:0 frame:0 TX packets:17 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:143406 (140.0 KiB) TX bytes:1594 (1.5 KiB) lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:65536 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) wifibox:~# dmesg | grep iwlwifi <nothing> Here we can see that the '00:06.0 Class 0280: 8086:02f0' has completely dissapeared. Using my "stop wifibox before sleep to release the PCI resource + start it back up later" approach in this way: root@leslie:~ # cat /usr/local/etc/devd/wifibox.conf # This is a `devd(8)` configuration file to run the resume action of # wifibox on the ACPI resume event. Review the contents and create a # copy of it without the `.sample` extension to use it. Restart the # `devd` service once the file has been created. notify 11 { match "system" "ACPI"; match "subsystem" "Suspend"; action "logger 'Stopping wifibox before suspend' && /usr/local/sbin/wifibox stop && /etc/rc.suspend acpi $notify"; }; notify 11 { match "system" "ACPI"; match "subsystem" "Resume"; action "/etc/rc.resume acpi $notify && logger 'Starting wifibox after resume and getting IP via DHCP' && /usr/local/sbin/wifibox start && /sbin/dhclient wifibox0"; }; we can see the following in the logs (correct flow on the host): bridge0: Ethernet address: 58:9c:fc:10:ff:99 bridge0: changing name to 'wifibox0' tap0: Ethernet address: 58:9c:fc:10:c0:3f tap0: promiscuous mode enabled wifibox0: link state changed to DOWN ppt0 mem 0xea238000-0xea23bfff at device 20.3 on pci0 tap0: link state changed to UP wifibox0: link state changed to UP lo0: link state changed to UP ... tap0: link state changed to DOWN wifibox0: link state changed to DOWN ppt0: detached pci0: <network> at device 20.3 (no driver attached) ue0: link state changed to UP tap0: promiscuous mode disabled ... ppt0: detached pci0: <network> at device 20.3 (no driver attached) ppt0 mem 0xea238000-0xea23bfff at device 20.3 on pci0 bridge0: Ethernet address: 58:9c:fc:10:ff:99 bridge0: changing name to 'wifibox0' tap0: Ethernet address: 58:9c:fc:10:c0:3f tap0: promiscuous mode enabled wifibox0: link state changed to DOWN tap0: link state changed to UP wifibox0: link state changed to UP tap0: link state changed to DOWN wifibox0: link state changed to DOWN ppt0: detached pci0: <network> at device 20.3 (no driver attached) ppt0 mem 0xea238000-0xea23bfff at device 20.3 on pci0 tap0: link state changed to UP wifibox0: link state changed to UP tap0: link state changed to DOWN wifibox0: link state changed to DOWN ppt0: detached pci0: <network> at device 20.3 (no driver attached) tap0: promiscuous mode disabled Regarding the devd rule for wifibox, notice that in the Suspend action, the "/etc/rc.suspend acpi $notify" call is happening at the end. I'm suspecting this is behaving like a stack (and makes sense since we are basically reversing stuff when we suspend/resume). When I made that call in the beginning, the PCI passthrough issue arose, but when I placed it at the end it was fine. I'm guessing this is due to the suspend command being triggered and the wifibox section didn't have enough time to fully release the resources, thus it would yield the same issue that I mentioned before. Perhaps that rule can help further narrowing this issue down. At the moment I'm testing some slight improvements to the devd workaround that PÁLI suggested which will help speed up the suspend/resume times and not need me to explicitly call the dhclient command. Ultimately, we'll want to fix the bhyve issue. I don't have any experience with bhyve or its internals but I can help with debugging across RELEASE/STABLE/CURRENT. - Jonathan
I was able to reproduce this same issue on my Framework Laptop (Batch 6). The same workaround I described above also worked on it.