fwupd project is using libgusb, which is a Glib wrapper for libusb.
libgusb uses asynchronous API of libusb with events handled in a separate thread.
fwupd works with devices by opening them, performing operations and then closing.
How the failure occurs
fwupd closes a device before all events of the operation have been processed. This means that when libusb10_handle_events_sub() eventually processes related event, the device gets marked as gone (device_is_gone field). On all future calls libusb_submit_transfer() fails with LIBUSB_ERROR_NO_DEVICE because that flag is set.
Effectively this means that a device can be used only once in such a way.
As I understand libgusb uses the API as intended and it is implementation of libusb which shouldn't prevent repeated use of a device. Maybe it should wait before closing or just reset the flag on opening the device.
Forgot to say that this happens on 12.2, but I don't think it has changed since.
I think this is a bug in the the application.
If you close the USB device while transfers are pending, the completion events won't be delivered. I guess you need to wait for the pending transfers to complete before you close the device.
I don't know what kind of event comes and might need to investigate callbacks more, but why mark device as gone in this case? It's not gone, it's just closed.
Hey Sergii, HPS :-)
Sergii is it possible to open device again after closing? It should not be possible to transfer to a device that is not opened.. so LIBUSB_ERROR_NO_DEVICE may indicate that device is not opened and there is nowhere to tranfer to. Asynchronous transfers may be put on some sort of list and close function should verify the list before closing..? Can you verify with the application? :-)
In order to analyse situation in depth Wireshark can dump the USB traffic. You may want to compare Linux and FreeBSD behaviour if you suspect (low pobable) problem in the stack :-)
Created attachment 225431 [details]
> Sergii is it possible to open device again after closing? It should not be possible to transfer to a device that is not opened.. so LIBUSB_ERROR_NO_DEVICE may indicate that device is not opened and there is nowhere to tranfer to.
Yes, device can be opened. Transfer is done after opening the device, that's not the issue.
> Asynchronous transfers may be put on some sort of list and close function should verify the list before closing..? Can you verify with the application? :-)
Actually looks like the first time the error happens even without any transactions (no libusb_submit_transfer() calls). The device is opened, some info calls are done, it's closed, second thread wakes up, finds that 32 bytes are available to read and marks the device as gone.
I've made a simple example where the behaviour manifests itself in debugger almost every time. Setting breakpoints helps with that. In fwupd breakpoint on libusb_close() isn't necessary.
Created attachment 225579 [details]
Better bug reproducer
Made a better example that showcases the bug. No help from debugger is necessary. Demonstrates the issue 100% runs for me. Device is opened and closed twice. Second time its opened there is a transfer attempt which always fails (returns "no device" error).
The API is used properly as far as I can tell and the code "works" if you remove sleep() calls, but doesn't if you leave them. I'm pretty confident this is a bug in libusb.
Another suggestion for a fix: before handling an event check whether device is still in the polling queue and if not, ignore the event. This might be the best fix because the issue is that device is being polled after it was closed (background thread was blocked on poll() at that time), so any events from this device should be rightfully discarded as if it wasn't on the polling queue when the event has arrived.
Created attachment 225640 [details]
libusb patch that fixes the race
Adding a simple condition that the device must be opened to be considered gone fixes the issue. I might be missing something about detecting unexpectedly detached devices, but this way firmware update succeeds instead of failing right at the start.
A commit in branch main references this bug:
Author: Hans Petter Selasky <hselasky@FreeBSD.org>
AuthorDate: 2021-06-11 15:06:10 +0000
Commit: Hans Petter Selasky <hselasky@FreeBSD.org>
CommitDate: 2021-06-11 15:06:44 +0000
Improve handling of USB device re-open in the LibUSB v1.x API.
Make sure the "device_is_gone" flag is cleared after every successful open,
so that the "device_is_gone" flag doesn't persist forever.
Found by: firstname.lastname@example.org
MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking
lib/libusb/libusb10.c | 9 +++++++++
lib/libusb/libusb10_io.c | 12 ++++++++++--
lib/libusb/libusb20.c | 5 +++++
3 files changed, 24 insertions(+), 2 deletions(-)
OK, got it.
Please verify that the submitted patch works for you!
Yes, patch from the commit works fine.