Bug 265487 - bhyve drops the PCI passthrough device in a VM if the host is suspended/resumed
Summary: bhyve drops the PCI passthrough device in a VM if the host is suspended/resumed
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: bhyve (show other bugs)
Version: 13.1-RELEASE
Hardware: Any Any
: --- Affects Only Me
Assignee: freebsd-virtualization (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-07-28 21:58 UTC by Jonathan Vasquez
Modified: 2022-08-06 13:43 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jonathan Vasquez 2022-07-28 21:58:10 UTC
Hello,

I've been debugging an issue with the creator if wifibox (
PÁLI Gábor János) that I've been experiencing on my Thinkpad X1 Carbon Gen 7 on 13.1-STABLE (n251926-488f9d85278, but I also reproduced it in 13.1-RELEASE - n250148-fc952ac2212 - while seeing if this was a regression, it is not).

My particular scenario is using 'wifibox' to pass my wifi card to an Alpine Linux VM using bhyve. Most of the details and logs during our debugging have been posted in this bug report: https://github.com/pgj/freebsd-wifibox/issues/31

It seems that when I shut down my laptop (host) while the bhyve VM is running, it would non-deterministically (but very soon, around 1-5 times of doing suspend/resume) would resume the system successfully, but the wifi would stop working. Upon further investigation, we noticed that the wireless interface and the entire PCI card was completely missing from within the VM. The only way to "fix" it was to reboot.

Upon further experimentation, I noticed that if I stopped the VM, then suspended/resumed, and then started the VM again, the PCI card would be successfully available again. This workaround so far hasn't failed for me (but I'm still testing it throughout the day). My suspicion is that when bhyve has passed down a PCI device and the host is suspended, the resources for that handle aren't being released correctly and the machine may be in an inconsistent state in regards to this specific passthrough device.

To post some logs, we can see the following before and after inside the bhyve VM:

Before
-----------

wifibox:~# lspci
00:1f.0 Class 0601: 8086:7000
00:04.2 Class 0100: 1af4:1009
00:04.0 Class 0100: 1af4:1001
00:00.0 Class 0600: 1275:1275
00:04.3 Class 0100: 1af4:1009
00:06.0 Class 0280: 8086:02f0
00:04.1 Class 0100: 1af4:1009
00:05.0 Class 0200: 8086:100f

wifibox:~# ifconfig
eth0      Link encap:Ethernet  HWaddr 00:A0:98:8A:05:71  
          inet addr:10.1.0.1  Bcast:0.0.0.0  Mask:255.0.0.0
          inet6 addr: fe80::2a0:98ff:fe8a:571/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:20396 errors:0 dropped:0 overruns:0 frame:0
          TX packets:33557 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:4197622 (4.0 MiB)  TX bytes:27342203 (26.0 MiB)

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

wlan0     Link encap:Ethernet  HWaddr F8:E4:E3:EB:35:02  
          inet addr:192.168.1.139  Bcast:192.168.1.255  Mask:255.255.255.0
          inet6 addr: fe80::fae4:e3ff:feeb:3502/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:34620 errors:0 dropped:0 overruns:0 frame:0
          TX packets:20133 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:27437272 (26.1 MiB)  TX bytes:4448962 (4.2 MiB)

wifibox:~# dmesg | grep iwlwifi
[    0.869316] iwlwifi 0000:00:06.0: can't derive routing for PCI INT A
[    0.869318] iwlwifi 0000:00:06.0: PCI INT A: not connected
[    0.870101] iwlwifi 0000:00:06.0: Failed to set affinity mask for IRQ 41
[    0.926465] iwlwifi 0000:00:06.0: api flags index 2 larger than supported by driver
[    0.926473] iwlwifi 0000:00:06.0: TLV_FW_FSEQ_VERSION: FSEQ Version: 89.3.35.37
[    0.926601] iwlwifi 0000:00:06.0: loaded firmware version 66.f1c864e0.0 QuZ-a0-jf-b0-66.ucode op_mode iwlmvm
[    0.946738] iwlwifi 0000:00:06.0: Detected Intel(R) Wireless-AC 9560 160MHz, REV=0x354
[    1.061975] iwlwifi 0000:00:06.0: Detected RF JF, rfid=0x105110
[    1.118696] iwlwifi 0000:00:06.0: base HW address: f8:e4:e3:eb:35:02
[    5.207325] iwlwifi 0000:00:06.0: Unhandled alg: 0x3f0707


After
------
wifibox:~# lspci
00:1f.0 Class 0601: 8086:7000
00:04.2 Class 0100: 1af4:1009
00:04.0 Class 0100: 1af4:1001
00:00.0 Class 0600: 1275:1275
00:04.3 Class 0100: 1af4:1009
00:04.1 Class 0100: 1af4:1009
00:05.0 Class 0200: 8086:100f

wifibox:~# ifconfig
eth0      Link encap:Ethernet  HWaddr 00:A0:98:8A:05:71  
          inet addr:10.1.0.1  Bcast:0.0.0.0  Mask:255.0.0.0
          inet6 addr: fe80::2a0:98ff:fe8a:571/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:145 errors:0 dropped:0 overruns:0 frame:0
          TX packets:17 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:143406 (140.0 KiB)  TX bytes:1594 (1.5 KiB)

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

wifibox:~# dmesg | grep iwlwifi
<nothing>


Here we can see that the '00:06.0 Class 0280: 8086:02f0' has completely dissapeared.

Using my "stop wifibox before sleep to release the PCI resource + start it back up later" approach in this way:


root@leslie:~ # cat /usr/local/etc/devd/wifibox.conf

# This is a `devd(8)` configuration file to run the resume action of
# wifibox on the ACPI resume event.  Review the contents and create a
# copy of it without the `.sample` extension to use it.  Restart the
# `devd` service once the file has been created.

notify 11 {
        match "system"          "ACPI";
        match "subsystem"       "Suspend";
        action "logger 'Stopping wifibox before suspend' && /usr/local/sbin/wifibox stop && /etc/rc.suspend acpi $notify";
};

notify 11 {
        match "system"          "ACPI";
        match "subsystem"       "Resume";
        action "/etc/rc.resume acpi $notify && logger 'Starting wifibox after resume and getting IP via DHCP' && /usr/local/sbin/wifibox start && /sbin/dhclient wifibox0";
};


we can see the following in the logs (correct flow on the host):

bridge0: Ethernet address: 58:9c:fc:10:ff:99
bridge0: changing name to 'wifibox0'
tap0: Ethernet address: 58:9c:fc:10:c0:3f
tap0: promiscuous mode enabled
wifibox0: link state changed to DOWN
ppt0 mem 0xea238000-0xea23bfff at device 20.3 on pci0
tap0: link state changed to UP
wifibox0: link state changed to UP
lo0: link state changed to UP


...

tap0: link state changed to DOWN
wifibox0: link state changed to DOWN
ppt0: detached
pci0: <network> at device 20.3 (no driver attached)
ue0: link state changed to UP
tap0: promiscuous mode disabled

...

ppt0: detached
pci0: <network> at device 20.3 (no driver attached)
ppt0 mem 0xea238000-0xea23bfff at device 20.3 on pci0
bridge0: Ethernet address: 58:9c:fc:10:ff:99
bridge0: changing name to 'wifibox0'
tap0: Ethernet address: 58:9c:fc:10:c0:3f
tap0: promiscuous mode enabled
wifibox0: link state changed to DOWN
tap0: link state changed to UP
wifibox0: link state changed to UP
tap0: link state changed to DOWN
wifibox0: link state changed to DOWN
ppt0: detached
pci0: <network> at device 20.3 (no driver attached)
ppt0 mem 0xea238000-0xea23bfff at device 20.3 on pci0
tap0: link state changed to UP
wifibox0: link state changed to UP
tap0: link state changed to DOWN
wifibox0: link state changed to DOWN
ppt0: detached
pci0: <network> at device 20.3 (no driver attached)
tap0: promiscuous mode disabled

Regarding the devd rule for wifibox, notice that in the Suspend action, the "/etc/rc.suspend acpi $notify" call is happening at the end. I'm suspecting this is behaving like a stack (and makes sense since we are basically reversing stuff when we suspend/resume). When I made that call in the beginning, the PCI passthrough issue arose, but when I placed it at the end it was fine. I'm guessing this is due to the suspend command being triggered and the wifibox section didn't have enough time to fully release the resources, thus it would yield the same issue that I mentioned before. Perhaps that rule can help further narrowing this issue down.

At the moment I'm testing some slight improvements to the devd workaround that 
PÁLI suggested which will help speed up the suspend/resume times and not need me to explicitly call the dhclient command. Ultimately, we'll want to fix the bhyve issue. I don't have any experience with bhyve or its internals but I can help with debugging across RELEASE/STABLE/CURRENT.

- Jonathan
Comment 1 Jonathan Vasquez 2022-08-06 13:43:46 UTC
I was able to reproduce this same issue on my Framework Laptop (Batch 6). The same workaround I described above also worked on it.