Created attachment 232775 [details] dmesg happens repeatedly until keyboard/mouse are lost completely. keyboard doesn't respond at all (numlock light doesn't toggle when pressed). remote via ssh still works, I will see if I can get usb traces somehow. reboot is required to fix. very reproducible :-( not happened under 13.0-RELEASE, just in 13.1-BETA2+. ... [4556] uhub0: <(0x1b21) XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus3 [4556] uhub0: 4 ports with 4 removable, self powered [4557] xhci2: Resetting controller [4557] usb_alloc_device: set address 2 failed (USB_ERR_TIMEOUT, ignored) [4584] usbd_setup_device_desc: getting device descriptor at addr 2 failed, USB_ERR_TIMEOUT [4585] usbd_req_re_enumerate: addr=2, set address failed! (USB_ERR_TIMEOUT, ignored) [4611] usbd_setup_device_desc: getting device descriptor at addr 2 failed, USB_ERR_TIMEOUT [4613] usbd_req_re_enumerate: addr=2, set address failed! (USB_ERR_TIMEOUT, ignored) [4632] usbd_setup_device_desc: getting device descriptor at addr 2 failed, USB_ERR_TIMEOUT [4633] usbd_req_re_enumerate: addr=2, set address failed! (USB_ERR_TIMEOUT, ignored) [4657] usbd_setup_device_desc: getting device descriptor at addr 2 failed, USB_ERR_TIMEOUT [4659] usbd_req_re_enumerate: addr=2, set address failed! (USB_ERR_TIMEOUT, ignored) [4683] usbd_setup_device_desc: getting device descriptor at addr 2 failed, USB_ERR_TIMEOUT [4683] ugen3.2: <Unknown > at usbus3 (disconnected) [4684] uhub_reattach_port: could not allocate new device [4685] usb_alloc_device: device init 2 failed (USB_ERR_TIMEOUT, ignored) [4685] ugen3.2: <Unknown > at usbus3 (disconnected) [4685] uhub_reattach_port: could not allocate new device [4685] uhub0: at usbus3, port 1, addr 1 (disconnected) [4685] uhub0: detached [4686] xhci2: Controller halt timeout. [4686] uhub0 numa-domain 0 on usbus3 [4686] uhub0: <(0x1b21) XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus3 [4686] uhub0: 4 ports with 4 removable, self powered [4687] xhci2: Resetting controller [4687] usb_alloc_device: set address 2 failed (USB_ERR_TIMEOUT, ignored) [4714] usbd_setup_device_desc: getting device descriptor at addr 2 failed, USB_ERR_TIMEOUT [4715] usbd_req_re_enumerate: addr=2, set address failed! (USB_ERR_TIMEOUT, ignored) [4741] usbd_setup_device_desc: getting device descriptor at addr 2 failed, USB_ERR_TIMEOUT [4743] usbd_req_re_enumerate: addr=2, set address failed! (USB_ERR_TIMEOUT, ignored) [4762] usbd_setup_device_desc: getting device descriptor at addr 2 failed, USB_ERR_TIMEOUT [4763] usbd_req_re_enumerate: addr=2, set address failed! (USB_ERR_TIMEOUT, ignored) [4787] usbd_setup_device_desc: getting device descriptor at addr 2 failed, USB_ERR_TIMEOUT [4789] usbd_req_re_enumerate: addr=2, set address failed! (USB_ERR_TIMEOUT, ignored) [4814] usbd_setup_device_desc: getting device descriptor at addr 2 failed, USB_ERR_TIMEOUT [4814] ugen3.2: <Unknown > at usbus3 (disconnected) [4814] uhub_reattach_port: could not allocate new device [4815] usb_alloc_device: device init 2 failed (USB_ERR_TIMEOUT, ignored) [4815] ugen3.2: <Unknown > at usbus3 (disconnected) [4815] uhub_reattach_port: could not allocate new device [4815] uhub0: at usbus3, port 1, addr 1 (disconnected) [4815] uhub0: detached [4816] xhci2: Controller halt timeout. [4816] uhub0 numa-domain 0 on usbus3 [4816] uhub0: <(0x1b21) XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus3 [4816] uhub0: 4 ports with 4 removable, self powered [4818] xhci2: Resetting controller [4818] usb_alloc_device: set address 2 failed (USB_ERR_TIMEOUT, ignored) dch@akai /t/dmesg>
Does entering: set hw.usb.xhci.dcepquirk=1 In the loader help? --HPS
Created attachment 232776 [details] output of usbconfig dump_all_desc (after reboot) I've tried setting hw.usb.xhci.debug=1 but the system is somewhat unusable after that, video/audio/keyboard suffer huge lag. Is there something more granular / less intrusive I can try?
set, thanks. I'll report back tomorrow on progress.
I had a longer period of stability with the tunable, but still a hang soon after a video conference call in firefox this morning. I didn't grab the logs, but I can still see the disconnects after reboot happening: [2579] ugen0.4: <Apple Inc. iPhone> at usbus0 (disconnected) [2579] ipheth0: at uhub1, port 11, addr 9 (disconnected) [2579] ipheth0: detached [2580] ugen0.4: <Apple Inc. iPhone> at usbus0 [2580] ipheth0 numa-domain 0 on uhub1 [2580] ipheth0: <Apple Inc. iPhone, class 0/0, rev 2.00/8.02, addr 10> on usbus0 [2580] ue0: <USB Ethernet> on ipheth0 [2580] ue0: bpf attached [2580] ue0: Ethernet address: 82:ed:2c:45:8e:f7 [3138] ugen0.4: <Apple Inc. iPhone> at usbus0 (disconnected) [3138] ipheth0: at uhub1, port 11, addr 10 (disconnected) [3138] ipheth0: detached [10567] ugen0.4: <Apple Inc. iPhone> at usbus0 [10567] ipheth0 numa-domain 0 on uhub1 [10567] ipheth0: <Apple Inc. iPhone, class 0/0, rev 2.00/8.02, addr 11> on usbus0 [10567] ue0: <USB Ethernet> on ipheth0 [10567] ue0: bpf attached [10567] ue0: Ethernet address: 82:ed:2c:45:8e:f7 [11031] ugen0.4: <Apple Inc. iPhone> at usbus0 (disconnected) [11031] ipheth0: at uhub1, port 11, addr 11 (disconnected) [11031] ipheth0: detached [11057] ugen0.4: <Apple Inc. iPhone> at usbus0 [11057] ipheth0 numa-domain 0 on uhub1 [11057] ipheth0: <Apple Inc. iPhone, class 0/0, rev 2.00/8.02, addr 12> on usbus0 [11057] ue0: <USB Ethernet> on ipheth0 [11057] ue0: bpf attached [11057] ue0: Ethernet address: 82:ed:2c:45:8e:f7 [11132] ugen3.3: <vendor 0x04d9 USB Keyboard> at usbus3 (disconnected) [11132] ukbd0: at uhub3, port 4, addr 2 (disconnected) [11132] ukbd0: detached [11132] uhid0: at uhub3, port 4, addr 2 (disconnected) [11132] uhid0: detached [11132] ugen3.3: <vendor 0x04d9 USB Keyboard> at usbus3 [11132] ukbd0 numa-domain 0 on uhub3 [11132] ukbd0: <vendor 0x04d9 USB Keyboard, class 0/0, rev 1.10/12.09, addr 2> on usbus3 [11132] kbd2 at ukbd0 [11132] kbd2: ukbd0, generic (0), config:0x0, flags:0x3d0000 [11132] uhid0 numa-domain 0 on uhub3 [11132] uhid0: <vendor 0x04d9 USB Keyboard, class 0/0, rev 1.10/12.09, addr 2> on usbus3 [11637] ugen3.3: <vendor 0x04d9 USB Keyboard> at usbus3 (disconnected) [11637] ukbd0: at uhub3, port 4, addr 2 (disconnected) [11637] ukbd0: detached [11637] uhid0: at uhub3, port 4, addr 2 (disconnected) [11637] uhid0: detached [11638] ugen3.3: <vendor 0x04d9 USB Keyboard> at usbus3 [11638] ukbd0 numa-domain 0 on uhub3 [11638] ukbd0: <vendor 0x04d9 USB Keyboard, class 0/0, rev 1.10/12.09, addr 2> on usbus3 [11638] kbd2 at ukbd0 [11638] kbd2: ukbd0, generic (0), config:0x0, flags:0x3d0000 [11638] uhid0 numa-domain 0 on uhub3 [11638] uhid0: <vendor 0x04d9 USB Keyboard, class 0/0, rev 1.10/12.09, addr 2> on usbus3 [11639] ugen3.3: <vendor 0x04d9 USB Keyboard> at usbus3 (disconnected) [11639] ukbd0: at uhub3, port 4, addr 2 (disconnected) [11639] ukbd0: detached [11639] uhid0: at uhub3, port 4, addr 2 (disconnected) [11639] uhid0: detached [11639] ugen3.3: <vendor 0x04d9 USB Keyboard> at usbus3 [11639] ukbd0 numa-domain 0 on uhub3 [11639] ukbd0: <vendor 0x04d9 USB Keyboard, class 0/0, rev 1.10/12.09, addr 2> on usbus3 [11639] kbd2 at ukbd0 [11639] kbd2: ukbd0, generic (0), config:0x0, flags:0x3d0000 [11639] uhid0 numa-domain 0 on uhub3 [11639] uhid0: <vendor 0x04d9 USB Keyboard, class 0/0, rev 1.10/12.09, addr 2> on usbus3 [11665] ugen0.4: <Apple Inc. iPhone> at usbus0 (disconnected) [11665] ipheth0: at uhub1, port 11, addr 12 (disconnected) [11665] ipheth0: detached [12062] ugen3.3: <vendor 0x04d9 USB Keyboard> at usbus3 (disconnected) [12062] ukbd0: at uhub3, port 4, addr 2 (disconnected) [12062] ukbd0: detached [12062] uhid0: at uhub3, port 4, addr 2 (disconnected) [12062] uhid0: detached [12062] ugen3.3: <vendor 0x04d9 USB Keyboard> at usbus3 [12062] ukbd0 numa-domain 0 on uhub3 [12062] ukbd0: <vendor 0x04d9 USB Keyboard, class 0/0, rev 1.10/12.09, addr 2> on usbus3 [12062] kbd2 at ukbd0 [12062] kbd2: ukbd0, generic (0), config:0x0, flags:0x3d0000 [12062] uhid0 numa-domain 0 on uhub3 [12062] uhid0: <vendor 0x04d9 USB Keyboard, class 0/0, rev 1.10/12.09, addr 2> on usbus3 [12926] ugen0.4: <Apple Inc. iPhone> at usbus0
Could you enable: sysctl hw.usb.uhub.debug=17 When this happens and also capture the resulting prints. Assuming you have "options USB_DEBUG" in the kernel configuration file. --HPS
Created attachment 232856 [details] after issue recurred, without debug flags
Created attachment 232857 [details] with debugging flag enabled, & unplugging all the USB peripherals, finally only re-adding the mouse/keyboard
Comment on attachment 232857 [details] with debugging flag enabled, & unplugging all the USB peripherals, finally only re-adding the mouse/keyboard "wPortChange=0x0020" might indicate a "Warm Port Reset Change (WRC)". Could you also enable: sysctl hw.usb.debug=17 and sysctl hw.usb.xhci.debug=17 --HPS
is this something I can do after the issue occurs? system is unusable with these flags enabled already.
Setting: sysctl kern.consmute=1 Might also help. Yes, you can try enabling only when the issue appears.
Created attachment 232880 [details] before setting both sysctls but after keyboard went awol
Created attachment 232881 [details] debug with both sysctls enabled=17
Created attachment 232882 [details] after switching debugging off again
this was all done under 13.1-RC1 already.
From my quick glimpse at the logs, I see something has gone wrong at the XHCI hardware level! Now we need to figure out what commands your XHCI controller rejects. Oouch! > xhci_do_command: Command timeout! --HPS
the mainboard is a supermicro http://www.supermicro.com/products/motherboard/Xeon/C600/X10SRA-F.cfm I do actually have an PCIe USB card I can drop in, I can try moving everything over to that and seeing if stuff recurs?
Please do!
well some good news, with the additional PCI card, at least I can move the keyboard from mainboard USB ports to the PCI card USB ports & get the keyboard back! I still get a lockup soon after a webrtc session starts, more logs available if that helps. I don't have any dmesg disconnects listed since the quirk setting is enabled. Of note is the "state" of the webcam is still blocked until it is physically disconnected, a normal FreeBSD reboot of the box itself doesn't free up the webcam again. The PCI card has keyboard, usbaudio & webcam on it; other stuff is on the mainboard. I will see how things go with the keyboard on the mainboard next time round.
Created attachment 233314 [details] usb devices (audio this time) disconnected, switch to debug 17 per history before rebooting now on 13.1-RC3 and still full lockups requiring reboot.
still on 13.1-RC5.
Wild guess, adding John Baldwin: May there be some PCI changes related to quick system startup causing this? --HPS
Hello world :-) I have the same issue when connecting USB 3.0 hub to 3.0 port on my desktop.. when connecting to 2.0 port it works fine. It is here since 13.0 (when I switched to a desktop). This is Unitek 7 port USB3.0 hub with external power supply using 3.0 A-B cable.
There are very few PCI changes in 13.1 relative to 13.0 and I don't think any of them would be relevant to this. Dave, have you tried bisecting the kernel on stable/13 to see when it starts failing?
After reading the whole thread it seems in this case here problem is intermittent while for me the hub fails at connect. I considered this to be faulty hub/port. But if I could use it also on USB 3.0 port that would be great :-) Anyways I have another hub connected to USB3.0 port so not a big deal for me to have another on 2.0 port :-)
I am having the same issue. For me it happened between: releng/13.1-n250134-6b642cf5c87 # good releng/13.1-n250141-2e9ad6042be # bad
Emanuel Haupt: There are were few relevant changes in that delta: Can you provide the output from dmesg, when "sysctl hw.usb.xhci.debug=16" when the issue occurs. You need "options USB_DEBUG" in the kernel configuration. Does this reverting/applying the commit below change anything. Do you know if your device is attached via thunderbolt? --HPS commit 245d5a65f5805864881e2601190e7783057d2768 Author: Hans Petter Selasky <hselasky@FreeBSD.org> Date: Thu Apr 21 16:59:09 2022 +0200 xhci(4): Ensure the so-called data toggle gets properly reset. Use the drop and enable endpoint context commands to force a reset of the data toggle for USB 2.0 and USB 3.0 after: - clear endpoint halt command (when the driver wishes). - set config command (when the kernel or user-space wants). - set alternate setting command (only affected endpoints). Some XHCI HW implementations may not allow the endpoint reset command when the endpoint context is not in the halted state. Reported by: Juniper and Gary Jennejohn Approved by: re (gjb) Sponsored by: NVIDIA Networking (cherry picked from commit cda31e734925346328fd2369585ab3f6767ec225)
Please specify the XHCI PCI ID, as shown by "pciconf -lv". --HPS
Created attachment 233622 [details] USB_DEBUG with hw.usb.xhci.debug=16
Created attachment 233623 [details] pciconf -lv
Both mouse and keyboard are attached to a Level 1 KVM switch that is connected via USB3 (not thunderbolt). It happens immediately, 100% reproducible once xorg starts. Here is how I collected the output: - Added hw.usb.xhci.debug=16 to /etc/sysctl.conf - Rebooted machine - Made sure the KVM Switch is active on this machine - System boots up and I have not even the chance to switch to console (CTRL-ALT-F1) - ssh into machine and collect output from /var/log/messages Since this might be nvidia related, here is the driver I am using: nvidia-driver-510.60.02
The same happens if xorg is disabled at startup. The keyboard remains 100% unresponsive.
Created attachment 233625 [details] usbconfig dump_device_desc of mouse/keyboard/hub
FWIW, same thing happens with the latest nvidia driver # https://www.nvidia.com/en-us/drivers/unix/ FreeBSD x64 Latest Production Branch Version: 510.68.02
Emmanual Haupt, Can you also get me: procstat -akk When the issue happens? I cannot find any errors in there. Maybe the log was truncated, but I can see the XHCI is working on some USB transfers. Maybe in some kind of a loop ... Sometimes you need to do: sysctl kern.consmute=1 To get all prints in /var/log/messages . --HPS
> When the issue happens? Always, as in: I boot the system and the keyboard never works. I see the login prompt and the keyboard is unresponsive.
Created attachment 233647 [details] procstat -akk captured via ssh into the machine
/var/log/messages after sysctl kern.consmute=1: https://critical.ch/people/262882/messages.txt (external link because of size)
Emanuel Haupt: What does "usbconfig" output when this issue happens? --HPS
Emmanuel Haupt: The only thing I see in the logs is a mass storage device. Maybe it is a auto-installer disk? Can you try to run: cdcontrol -f /dev/xxx eject On this device? --HPS
The USB device you're seeing is the USB boot device I've created. It's plugged to a different USB 2.0 port. I wouldn't want to eject it :-) You keep writing "when it happens". I want to stress this once again. It does not suddenly "happen" the keyboard does not work from the start when I boot the system. All the output I'm providing here is obtained via SSH into the machine: root@pr262882:~ # usbconfig ugen2.1: <Intel XHCI root HUB> at usbus2, cfg=0 md=HOST spd=SUPER (5.0Gbps) pwr=SAVE (0mA) ugen4.1: <(0x1b21) XHCI root HUB> at usbus4, cfg=0 md=HOST spd=SUPER (5.0Gbps) pwr=SAVE (0mA) ugen5.1: <Intel EHCI root HUB> at usbus5, cfg=0 md=HOST spd=HIGH (480Mbps) pwr=SAVE (0mA) ugen1.1: <(0x1b21) XHCI root HUB> at usbus1, cfg=0 md=HOST spd=SUPER (5.0Gbps) pwr=SAVE (0mA) ugen3.1: <Intel EHCI root HUB> at usbus3, cfg=0 md=HOST spd=HIGH (480Mbps) pwr=SAVE (0mA) ugen0.1: <(0x10de) XHCI root HUB> at usbus0, cfg=0 md=HOST spd=SUPER (5.0Gbps) pwr=SAVE (0mA) ugen4.2: <GenesysLogic USB3.0 Hub> at usbus4, cfg=0 md=HOST spd=SUPER (5.0Gbps) pwr=SAVE (0mA) ugen5.2: <vendor 0x8087 product 0x0024> at usbus5, cfg=0 md=HOST spd=HIGH (480Mbps) pwr=SAVE (0mA) ugen3.2: <vendor 0x8087 product 0x0024> at usbus3, cfg=0 md=HOST spd=HIGH (480Mbps) pwr=SAVE (0mA) ugen4.3: <GenesysLogic USB2.0 Hub> at usbus4, cfg=0 md=HOST spd=HIGH (480Mbps) pwr=SAVE (100mA) ugen2.2: <Genesys USB Reader> at usbus2, cfg=0 md=HOST spd=SUPER (5.0Gbps) pwr=ON (224mA) ugen3.3: <vendor 0x05e3 USB2.0 Hub> at usbus3, cfg=0 md=HOST spd=HIGH (480Mbps) pwr=SAVE (100mA) ugen5.3: <vendor 0x05e3 USB2.0 Hub> at usbus5, cfg=0 md=HOST spd=HIGH (480Mbps) pwr=SAVE (100mA) ugen3.4: <USB SanDisk 3.2Gen1> at usbus3, cfg=0 md=HOST spd=HIGH (480Mbps) pwr=ON (224mA) ugen5.4: <Logitech Logitech Wireless Headset> at usbus5, cfg=0 md=HOST spd=FULL (12Mbps) pwr=ON (144mA) ugen3.5: <vendor 0x05e3 USB2.0 Hub> at usbus3, cfg=0 md=HOST spd=HIGH (480Mbps) pwr=SAVE (100mA) ugen5.5: <vendor 0x05e3 USB2.0 Hub> at usbus5, cfg=0 md=HOST spd=HIGH (480Mbps) pwr=SAVE (100mA) ugen3.6: <Corsair Memory, Inc. Integrated USB Bridge> at usbus3, cfg=0 md=HOST spd=FULL (12Mbps) pwr=ON (100mA) root@pr262882:~ # uname -a FreeBSD pr262882.local 13.1-RC5 FreeBSD 13.1-RC5 releng/13.1-n250141-2e9ad6042be PR262882 amd64
Hi Emmanuel, ugen4.1: <(0x1b21) XHCI root HUB> at usbus4, cfg=0 md=HOST spd=SUPER (5.0Gbps) ugen4.2: <GenesysLogic USB3.0 Hub> at usbus4, cfg=0 md=HOST spd=SUPER (5.0Gbps) ugen4.3: <GenesysLogic USB2.0 Hub> at usbus4, cfg=0 md=HOST spd=HIGH (480Mbps) pwr=SAVE (100mA) I suspect the KVM USB keyboard and mouse is supposed to reside under ugen4.3 . What happens if you run this command: usbconfig -d ugen4.3 reset Does any more devices show up? You can also try: usbconfig -d ugen4.1 reset --HPS
After: usbconfig -d ugen4.3 reset Mouse and keyboard work again. Here is what I captured in /var/log/messages: https://critical.ch/people/262882/messages-after-reset
Emanuel Haupt: Now disable the XHCI debugging. And set: hw.usb.uhub.debug=16 In /boot/loader.conf (I think that will work) Then reboot and capture all messages (dmesg), and then run that usbconfig command I gave you, if no USB mouse and keyboard shows up. I think we are seeing some kind of timing race, with regards to enumerating the USB 2.0 HUB there. I'm not sure why. Thank you! --HPS
dmesg -a: https://critical.ch/people/262882/dmesg_a /var/log/messages before usb reset: https://critical.ch/people/262882/messages-before-reset /var/log/messages after usb reset: https://critical.ch/people/262882/messages-after-reset
Here is dmesg -a with a higher kern.msgbufsize: https://critical.ch/people/262882/dmesg_a_big_kern_msgbufsize
Created attachment 233659 [details] Patch to test Hi Emmanuel, Can you test this patch? Please provide USB HUB debug messages in either case it works or not. --HPS
Unfortunately with the patch the boot process loops forever and never gets to a login prompt.
Created attachment 233669 [details] Patch to test (v2) Can you try this new patch aswell?
Same with this patch. It loops forever.
And you reverted the previous one I guess. I'll need to think a bit more about this. It is possible for the kernel to reset the "Virtual HUB", but then I need a failsafe test which doesn't cause the looping. I looks to me like trying to touch any of ports w/o the USB reset is a no-go. Basically the USB port status is saying there is a device there. It would be very nice to see some looping debug prints somehow. --HPS
Dave: Can you try the same thing? usbconfig show_ifdrv Then figure out where (ugenX.Y) the uhub<N> for the keyboard is located and reset it using: usbconfig -d ugenX.Y reset Does it help? --HPS
Dave: Does loading: /boot/kernel/uacpi.ko from the loader make any changes? --HPS
(In reply to Hans Petter Selasky from comment #50) > And you reverted the previous one I guess. Correct. > It would be very nice to see some looping debug prints somehow. Would making a video with my phone of the scrolling messages help?
> Would making a video with my phone of the scrolling messages help? Yes, you can send it to me privately if you like: hselasky@freebsd.org Try to get it from the start. --HPS
How widespread is this issue?
(In reply to Hans Petter Selasky from comment #54) > Yes, you can send it to me privately if you like: Sent.
(In reply to Glen Barber from comment #55) > How widespread is this issue? It's hard to tell. I am using a fairly high quality KVM switch (https://store.level1techs.com/products/14-kvm-switch-dual-monitor-2computer). I currently do not have another USB3 hub that I could use to test. It might be prudent to revert 245d5a65f5805864881e2601190e7783057d2768 for the upcoming release.
Created attachment 233691 [details] Patch to test (v3) Emmanuel, Can you test this patch? --HPS
> It might be prudent to revert 245d5a65f5805864881e2601190e7783057d2768 for the upcoming release. Maybe from the release branch for now and leave it in -stable? I won't object to that, but let's see first if this issue is fixable, because the patch I made really fixes an issue, and the old behaviour is not that desired with regards to mass storage. Making one fix makes another issue pop up! How fun :-) --HPS
(In reply to Glen Barber from comment #55) I walked down to the ThreadRipper 1950X system and plugged in a RPi USB keyboard into a USB3 port and got: ugen2.2: <vendor 0x05e3 USB2.0 Hub> at usbus2 uhub6 on uhub1 uhub6: <vendor 0x05e3 USB2.0 Hub, class 9/0, rev 2.00/32.98, addr 1> on usbus2 uhub6: MTT enabled uhub_attach: port 1 power on or off failed, USB_ERR_IOERROR uhub_attach: port 2 power on or off failed, USB_ERR_IOERROR uhub_attach: port 3 power on or off failed, USB_ERR_IOERROR uhub_attach: port 4 power on or off failed, USB_ERR_IOERROR uhub6: 4 ports with 4 removable, self powered uhub_reattach_port: device problem (USB_ERR_IOERROR), disabling port 1 uhub_reattach_port: device problem (USB_ERR_IOERROR), disabling port 2 uhub_reattach_port: device problem (USB_ERR_IOERROR), disabling port 3 uhub_reattach_port: device problem (USB_ERR_IOERROR), disabling port 4 ugen2.2: <vendor 0x05e3 USB2.0 Hub> at usbus2 (disconnected) uhub6: at uhub1, port 1, addr 1 (disconnected) uhub6: detached ugen2.2: <vendor 0x05e3 USB2.0 Hub> at usbus2 uhub6 on uhub1 uhub6: <vendor 0x05e3 USB2.0 Hub, class 9/0, rev 2.00/32.98, addr 1> on usbus2 uhub6: MTT enabled uhub6: 4 ports with 4 removable, self powered ugen2.3: <vendor 0x04d9 RPI Wired Keyboard 4> at usbus2 ukbd2 on uhub6 ukbd2: <vendor 0x04d9 RPI Wired Keyboard 4, class 0/0, rev 2.00/1.40, addr 2> on usbus2 kbd4 at ukbd2 uhid1 on uhub6 uhid1: <vendor 0x04d9 RPI Wired Keyboard 4, class 0/0, rev 2.00/1.40, addr 2> on usbus2 (The keyboard I normally use is not a USB one.)
(In reply to Hans Petter Selasky from comment #58) Your patch (v3) does not apply. Do I have to apply it on top of v2?
Glen: The initial issue was reported on RC1 and the patch was made on RC5, so reverting won't solve this one. --HPS
Created attachment 233692 [details] Patch to test (v3 for 13-stable)
Emmanuel: Your patch (v3) does not apply. Do I have to apply it on top of v2? No, clean 13-stable. I patched it on -14 and realized that there is a commit missing, which is not relevant to this bug. I looked at your video, but can't see clearly where it is going wrong, mostly because it is scrolling very fast.
Emmanuel: Are you on IRC or slack? --HPS
(In reply to Mark Millard from comment #60) That was main [so: 14]: (output line split some for better readability) # uname -apKU FreeBSD amd64_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #34 main-n255108-9fb40baf6043-dirty: Thu Apr 28 19:42:46 PDT 2022 root@amd64_ZFS:/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/sys/GENERIC-NODBG amd64 amd64 1400057 1400057 Other keyboards with hubs got similar. One keyboard has no hub and it got no odd messages. For reference, the RPi keyboard plugged into a RPi4B got normal output: ugen0.5: <vendor 0x05e3 USB2.0 Hub> at usbus0 uhub2 on uhub1 uhub2: <vendor 0x05e3 USB2.0 Hub, class 9/0, rev 2.00/32.98, addr 4> on usbus0 uhub2: MTT enabled uhub2: 4 ports with 4 removable, self powered ugen0.6: <vendor 0x04d9 RPI Wired Keyboard 4> at usbus0 ukbd0 on uhub2 ukbd0: <vendor 0x04d9 RPI Wired Keyboard 4, class 0/0, rev 2.00/1.40, addr 5> on usbus0 kbd1 at ukbd0 uhid0 on uhub2 uhid0: <vendor 0x04d9 RPI Wired Keyboard 4, class 0/0, rev 2.00/1.40, addr 5> on usbus0 # uname -apKU FreeBSD CA72_UFS 14.0-CURRENT FreeBSD 14.0-CURRENT #41 main-n255108-9fb40baf6043-dirty: Thu Apr 28 20:43:22 PDT 2022 root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-nodbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-NODBG-CA72 arm64 aarch64 1400057 1400057
Hi Mark: Does applying this patch on 14-current: https://bugs.freebsd.org/bugzilla/attachment.cgi?id=233691&action=diff Make those USB_ERR_IOERROR go away on your threadripper? --HPS
(In reply to Hans Petter Selasky from comment #67) Sorry it took so long to get back to this. The patched kernel results in the output for the RPi keyboard: ugen2.2: <vendor 0x05e3 USB2.0 Hub> at usbus2 uhub6 on uhub1 uhub6: <vendor 0x05e3 USB2.0 Hub, class 9/0, rev 2.00/32.98, addr 1> on usbus2 uhub6: MTT enabled uhub6: 4 ports with 4 removable, self powered ugen2.3: <vendor 0x04d9 RPI Wired Keyboard 4> at usbus2 ukbd2 on uhub6 ukbd2: <vendor 0x04d9 RPI Wired Keyboard 4, class 0/0, rev 2.00/1.40, addr 2> on usbus2 kbd4 at ukbd2 uhid1 on uhub6 uhid1: <vendor 0x04d9 RPI Wired Keyboard 4, class 0/0, rev 2.00/1.40, addr 2> on usbus2 Looks good to me. On a RPi4B the patched kernel continued to work.
(In reply to Hans Petter Selasky from comment #67) FYI, I just tried plugging in a USB3 NVMe SSD and got: uhub_reattach_port: port 2 U1 timeout failed, error=USB_ERR_IOERROR uhub_reattach_port: port 2 U2 timeout failed, error=USB_ERR_IOERROR usb_msc_auto_quirk: UQ_MSC_NO_GETMAXLUN set for USB mass storage device Samsung PSSD T7 Touch (0x04e8:0x4001) ugen0.9: <Samsung PSSD T7 Touch> at usbus0 umass2 on uhub0 umass2: <Samsung PSSD T7 Touch, class 0/0, rev 3.20/1.00, addr 9> on usbus0 umass2: SCSI over Bulk-Only; quirks = 0x0100 umass2:12:2: Attached to scbus12 da5 at umass-sim2 bus 2 scbus12 target 0 lun 0 da5: <Samsung PSSD T7 Touch 0> Fixed Direct Access SPC-4 SCSI device da5: Serial Number REPLACED da5: 400.000MB/s transfers da5: 953869MB (1953525168 512 byte sectors) da5: quirks=0x2<NO_6_BYTE> By contrast the patched RPi4B got only: usb_msc_auto_quirk: UQ_MSC_NO_GETMAXLUN set for USB mass storage device Samsung PSSD T7 Touch (0x04e8:0x4001) ugen0.4: <Samsung PSSD T7 Touch> at usbus0 umass1 on uhub0 umass1: <Samsung PSSD T7 Touch, class 0/0, rev 3.20/1.00, addr 3> on usbus0 umass1: SCSI over Bulk-Only; quirks = 0x0100 umass1:1:1: Attached to scbus1 da1 at umass-sim1 bus 1 scbus1 target 0 lun 0 da1: <Samsung PSSD T7 Touch 0> Fixed Direct Access SPC-4 SCSI device da1: Serial Number S5K5NJ0R107444J da1: 400.000MB/s transfers da1: 953869MB (1953525168 512 byte sectors) da1: quirks=0x2<NO_6_BYTE> So the ThreadRipper 1950X gets the extra lines: uhub_reattach_port: port 2 U1 timeout failed, error=USB_ERR_IOERROR uhub_reattach_port: port 2 U2 timeout failed, error=USB_ERR_IOERROR
(In reply to Mark Millard from comment #69) More detail: the 2 extra messages only happen for plugging into the USB 3.1 ports, not the USB 3.0 ports. (RPi4B's do not have 3.1.)
(In reply to Hans Petter Selasky from comment #67) So I tried plugging in the RPi keyboard into a USB 3.1 port and got: ugen0.9: <vendor 0x05e3 USB2.0 Hub> at usbus0 uhub6 on uhub0 uhub6: <vendor 0x05e3 USB2.0 Hub, class 9/0, rev 2.00/32.98, addr 11> on usbus0 uhub6: MTT enabled uhub6: 4 ports with 4 removable, self powered usb_alloc_device: device init 10 failed (USB_ERR_IOERROR, ignored) ugen0.10: <Unknown > at usbus0 (disconnected) uhub_reattach_port: could not allocate new device So: alloc/reattach notices.
> So the ThreadRipper 1950X gets the extra lines: > uhub_reattach_port: port 2 U1 timeout failed, error=USB_ERR_IOERROR > uhub_reattach_port: port 2 U2 timeout failed, error=USB_ERR_IOERROR Was this with or without the v3 patch? --HPS
Patch doesn't help me at all (14-CURRENT), but resetting the USB hub makes the devices appear.
(In reply to Hans Petter Selasky from comment #62) Ok, thank you for the feedback. I understand there is still some unclear behavior, but what are we looking at, timeframe-wise, for a resolution? I'm asking because -RC6, which is apparently warranted now, will be built in roughly 36 hours. So, I need to plan accordingly if it needs to be delayed, or if we will have an -RC7.
I need the next 24 hours for debugging at least. I'll keep you posted Glen. --HPS
(In reply to Hans Petter Selasky from comment #75) Thank you very much.
(In reply to Hans Petter Selasky from comment #72) The only patch was the main [so: 14] one from: https://bugs.freebsd.org/bugzilla/attachment.cgi?id=233691&action=diff
(In reply to Mark Millard from comment #77) Note: At this point I've only had about 1/2 a nights sleep. So I may not respond quickly.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=09dd1adfa4c9bb1b49f4ef5524a308732883e132 commit 09dd1adfa4c9bb1b49f4ef5524a308732883e132 Author: Hans Petter Selasky <hselasky@FreeBSD.org> AuthorDate: 2022-05-03 16:10:49 +0000 Commit: Hans Petter Selasky <hselasky@FreeBSD.org> CommitDate: 2022-05-03 16:13:53 +0000 xhci(4): Always add and evaluate the slot context. Because the maximum number of endpoint contexts is stored there. Tested by: ehaupt@ PR: 262882 MFC after: 3 hours Sponsored by: NVIDIA Networking sys/dev/usb/controller/xhci.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=e276d281503160ba3648bd394cde95736ee53329 commit e276d281503160ba3648bd394cde95736ee53329 Author: Hans Petter Selasky <hselasky@FreeBSD.org> AuthorDate: 2022-05-03 16:09:17 +0000 Commit: Hans Petter Selasky <hselasky@FreeBSD.org> CommitDate: 2022-05-03 16:13:53 +0000 xhci(4): Only drop BULK and INTERRUPT endpoints to reset data toggle. Only drop BULK and INTERRUPT endpoints, to reset the data toggle, because for other endpoint types this is not critical. Tested by: ehaupt@ PR: 262882 MFC after: 3 hours Sponsored by: NVIDIA Networking sys/dev/usb/controller/xhci.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-)
Hi Glen: After several hours of debugging with Emmanuel today, I found two issues which fixed his USB problems. I plan to merge these as soon as possible to 13-stable and ask for MFC to 13.1 aswell. I will ask the people subscribed there to test those patches. They will apply cleanly on top of 13-stable and 13.1, so just do git cherry-pick xxxxx . The problems are due to different XHCI firmware designs from what I can see, not so easy to know about. --HPS
(In reply to Hans Petter Selasky from comment #81) Thank you for the update, and thank you and manu@ for resolving this so quickly. Let's let this sit in -CURRENT for at least a few hours, or maybe until tomorrow, then please send a request for approval against releng/13.1 to re@. For the early merge to stable/13, please use 'Approved by: re (gjb, early MFC)' in the commit log. I would like to hear back from Nathan as well, if this addresses the issue he had hit.
(In reply to Glen Barber from comment #82) > Thank you for the update, and thank you and manu@ for resolving this so quickly. Did you mean ehaupt@? I'm the other Emanuel with one 'm' :-) Also, thank you from my side for Hans's tireless help.
(In reply to Hans Petter Selasky from comment #81) Looks like the patch I tested is not being included. I'll revert and update.
(In reply to Emanuel Haupt from comment #83) Oops, sorry. I thought I got all of the people involved. Sorry. :( But yes, thank you as well.
(In reply to Hans Petter Selasky from comment #81) The ThreadRipper 1905X is now based on main-n255153-7ac164dc8e2e . The RPi keyboard connections tests are not producing odd messages on USB3.0 or USB 3.1 ports. But the USB3 NVMe SSD USB3.1 port connection test is still producing the reattach timeout failed notices: uhub_reattach_port: port 2 U1 timeout failed, error=USB_ERR_IOERROR uhub_reattach_port: port 2 U2 timeout failed, error=USB_ERR_IOERROR usb_msc_auto_quirk: UQ_MSC_NO_GETMAXLUN set for USB mass storage device Samsung PSSD T7 Touch (0x04e8:0x4001) ugen0.9: <Samsung PSSD T7 Touch> at usbus0 umass2 on uhub3 umass2: <Samsung PSSD T7 Touch, class 0/0, rev 3.20/1.00, addr 10> on usbus0 umass2: SCSI over Bulk-Only; quirks = 0x0100 umass2:12:2: Attached to scbus12 da5 at umass-sim2 bus 2 scbus12 target 0 lun 0 da5: <Samsung PSSD T7 Touch 0> Fixed Direct Access SPC-4 SCSI device da5: Serial Number S5K5NJ0R107157Z da5: 400.000MB/s transfers da5: 953869MB (1953525168 512 byte sectors) da5: quirks=0x2<NO_6_BYTE>
(In reply to Hans Petter Selasky from comment #81) Not that I expect it fits here, but in the spirit of reporting all oddities during testing . . . I got out 2 USB3 media readers. On both the updated ThreadRipper 1950X and the non-updated HoneyComb get things like the following when they were plugged in: usb_msc_auto_quirk: UQ_MSC_NO_TEST_UNIT_READY set for USB mass storage device Kingston Multi-Reader (0x11b0:0x6368) usb_msc_auto_quirk: UQ_MSC_NO_PREVENT_ALLOW set for USB mass storage device Kingston Multi-Reader (0x11b0:0x6368) usb_msc_auto_quirk: UQ_MSC_NO_SYNC_CACHE set for USB mass storage device Kingston Multi-Reader (0x11b0:0x6368) usb_msc_auto_quirk: UQ_MSC_NO_START_STOP set for USB mass storage device Kingston Multi-Reader (0x11b0:0x6368) ugen3.3: <Kingston Multi-Reader> at usbus3 umass2 on uhub1 umass2: <Bulk-In, Bulk-Out, Interface> on usbus3 umass2: SCSI over Bulk-Only; quirks = 0xc005 umass2:12:2: Attached to scbus12 (probe0:umass-sim2:2:0:0): REPORT LUNS. CDB: a0 00 00 00 00 00 00 00 00 10 00 00 (probe0:umass-sim2:2:0:0): CAM status: SCSI Status Error (probe0:umass-sim2:2:0:0): SCSI status: Check Condition (probe0:umass-sim2:2:0:0): SCSI sense: ILLEGAL REQUEST asc:24,0 (Invalid field in CDB) (probe0:umass-sim2:2:0:0): Error 22, Unretryable error da5 at umass-sim2 bus 2 scbus12 target 0 lun 0 da5: < Multi-Reader -0 1.00> Removable Direct Access SPC-4 SCSI device . . . and: umass2: detached usb_msc_auto_quirk: UQ_MSC_NO_TEST_UNIT_READY set for USB mass storage device Kingston USB3.0 Media Reader (0x11b0:0x6348) usb_msc_auto_quirk: UQ_MSC_NO_PREVENT_ALLOW set for USB mass storage device Kingston USB3.0 Media Reader (0x11b0:0x6348) usb_msc_auto_quirk: UQ_MSC_NO_SYNC_CACHE set for USB mass storage device Kingston USB3.0 Media Reader (0x11b0:0x6348) usb_msc_auto_quirk: UQ_MSC_NO_START_STOP set for USB mass storage device Kingston USB3.0 Media Reader (0x11b0:0x6348) ugen3.3: <Kingston USB3.0 Media Reader> at usbus3 umass2 on uhub1 umass2: <Bulk-In, Bulk-Out, Interface> on usbus3 umass2: SCSI over Bulk-Only; quirks = 0xc005 umass2:12:2: Attached to scbus12 (probe0:umass-sim2:2:0:0): REPORT LUNS. CDB: a0 00 00 00 00 00 00 00 00 10 00 00 (probe0:umass-sim2:2:0:0): CAM status: SCSI Status Error (probe0:umass-sim2:2:0:0): SCSI status: Check Condition (probe0:umass-sim2:2:0:0): SCSI sense: ILLEGAL REQUEST asc:24,0 (Invalid field in CDB) (probe0:umass-sim2:2:0:0): Error 22, Unretryable error da5 at umass-sim2 bus 2 scbus12 target 0 lun 0 da5: < FCR-HS3 -0 1.00> Removable Direct Access SPC-4 SCSI device . . .
Hi Glen, I can hold the MFC + re@ e-mail till tomorrow, given that you hold the next 13.1 RC for that. It will be 10 hours from now approx. Then more people can test. --HPS
(In reply to Hans Petter Selasky from comment #88) Thank you for the update. Please keep re@ informed (ideally via this ticket or direct email to re@) if anything changes, regresses, etc., and you need more time. Your help is very much appreciated.
(In reply to Hans Petter Selasky from comment #88) Actually, I think merging to stable/13 early would be good, if you have the time to do so. Then we can find out if users tracking 13.1-STABLE still hit issues or not, after which we can get this into releng/13.1 before the next RC build (which is now necessary).
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=6d8c6b24ee0a0416204356a98e4e7606489894c5 commit 6d8c6b24ee0a0416204356a98e4e7606489894c5 Author: Hans Petter Selasky <hselasky@FreeBSD.org> AuthorDate: 2022-05-03 16:10:49 +0000 Commit: Hans Petter Selasky <hselasky@FreeBSD.org> CommitDate: 2022-05-03 19:43:13 +0000 xhci(4): Always add and evaluate the slot context. Because the maximum number of endpoint contexts is stored there. Tested by: ehaupt@ PR: 262882 Approved by: re (gjb, early MFC) Sponsored by: NVIDIA Networking (cherry picked from commit 09dd1adfa4c9bb1b49f4ef5524a308732883e132) sys/dev/usb/controller/xhci.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=610528736f3f0bf51f990dd93c5061a7a437e519 commit 610528736f3f0bf51f990dd93c5061a7a437e519 Author: Hans Petter Selasky <hselasky@FreeBSD.org> AuthorDate: 2022-05-03 16:09:17 +0000 Commit: Hans Petter Selasky <hselasky@FreeBSD.org> CommitDate: 2022-05-03 19:41:51 +0000 xhci(4): Only drop BULK and INTERRUPT endpoints to reset data toggle. Only drop BULK and INTERRUPT endpoints, to reset the data toggle, because for other endpoint types this is not critical. While at it fix some whitespace. Tested by: ehaupt@ PR: 262882 Approved by: re (gjb, early MFC) Sponsored by: NVIDIA Networking (cherry picked from commit e276d281503160ba3648bd394cde95736ee53329) sys/dev/usb/controller/xhci.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-)
Glen: OK. stable/13 is updated now. --HPS
Thank you. Provided there is no obvious fallout, let's continue with your original plan to include this in releng/13.1 in 10-ish hours. Please remember to send the request for approval to re@ following the change request guidelines.
(In reply to Mark Millard from comment #86) The messages that I've reported getting for USB 3.1 ports when the NVMe SSD USB3 devices are plugged in: uhub_reattach_port: port 2 U1 timeout failed, error=USB_ERR_IOERROR uhub_reattach_port: port 2 U2 timeout failed, error=USB_ERR_IOERROR seem to be tied to code that looks like (U1 case shown): case C(UR_SET_FEATURE, UT_WRITE_CLASS_OTHER): i = index >> 8; index &= 0x00FF; if ((index < 1) || (index > sc->sc_noport)) { err = USB_ERR_IOERROR; goto done; } port = XHCI_PORTSC(index); v = XREAD4(sc, oper, port) & ~XHCI_PS_CLEAR; switch (value) { case UHF_PORT_U1_TIMEOUT: if (XHCI_PS_SPEED_GET(v) != 4) { err = USB_ERR_IOERROR; goto done; } So it seems to be not getting the speed-mode it expects and it treats that as an error status. I've no clue if the speed-mode should be guaranteed as the code suggests at the point of plugging an NMVe SSD into a USB 3.1 port or not. But it sure looks like a distinct issue from the original buzilla submittal.
Mark: Could you print XHCI_PS_SPEED_GET(v) ? Likely it the check should be < 4 . --HPS
s/ it //
Created attachment 233705 [details] Patch to fix U1/U2 IOERROR issue Mark: Can you test this patch, and see if the U1/U2 port timeout errors go away? --HPS
(In reply to Hans Petter Selasky from comment #98) ThreadRipper 1950X updated to be based on main-n255160-9a3583bfbd17 with the U1/U2 IOERROR related patch: I got no odd messages from testing this context. Things look to be working.
*** Bug 263661 has been marked as a duplicate of this bug. ***
A commit in branch stable/12 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=11a732b280319e2babb5d575d14e89e12127d06a commit 11a732b280319e2babb5d575d14e89e12127d06a Author: Hans Petter Selasky <hselasky@FreeBSD.org> AuthorDate: 2022-05-03 16:10:49 +0000 Commit: Hans Petter Selasky <hselasky@FreeBSD.org> CommitDate: 2022-05-04 07:28:46 +0000 xhci(4): Always add and evaluate the slot context. Because the maximum number of endpoint contexts is stored there. Tested by: ehaupt@ PR: 262882 Sponsored by: NVIDIA Networking (cherry picked from commit 09dd1adfa4c9bb1b49f4ef5524a308732883e132) sys/dev/usb/controller/xhci.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
A commit in branch stable/12 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=473c925e4359f79224374911cdeb1477bf1ef939 commit 473c925e4359f79224374911cdeb1477bf1ef939 Author: Hans Petter Selasky <hselasky@FreeBSD.org> AuthorDate: 2022-05-03 16:09:17 +0000 Commit: Hans Petter Selasky <hselasky@FreeBSD.org> CommitDate: 2022-05-04 07:28:41 +0000 xhci(4): Only drop BULK and INTERRUPT endpoints to reset data toggle. Only drop BULK and INTERRUPT endpoints, to reset the data toggle, because for other endpoint types this is not critical. Tested by: ehaupt@ PR: 262882 Sponsored by: NVIDIA Networking (cherry picked from commit e276d281503160ba3648bd394cde95736ee53329) sys/dev/usb/controller/xhci.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-)
A commit in branch stable/11 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=a1ec8baee5dd0cb8e344ab0e2feafcf49f4a802a commit a1ec8baee5dd0cb8e344ab0e2feafcf49f4a802a Author: Hans Petter Selasky <hselasky@FreeBSD.org> AuthorDate: 2022-05-03 16:09:17 +0000 Commit: Hans Petter Selasky <hselasky@FreeBSD.org> CommitDate: 2022-05-04 07:30:07 +0000 xhci(4): Only drop BULK and INTERRUPT endpoints to reset data toggle. Only drop BULK and INTERRUPT endpoints, to reset the data toggle, because for other endpoint types this is not critical. Tested by: ehaupt@ PR: 262882 Sponsored by: NVIDIA Networking (cherry picked from commit e276d281503160ba3648bd394cde95736ee53329) sys/dev/usb/controller/xhci.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-)
A commit in branch stable/11 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=cacb5f3ea5d39d9ee02e6f278993fb1b308ca9ba commit cacb5f3ea5d39d9ee02e6f278993fb1b308ca9ba Author: Hans Petter Selasky <hselasky@FreeBSD.org> AuthorDate: 2022-05-03 16:10:49 +0000 Commit: Hans Petter Selasky <hselasky@FreeBSD.org> CommitDate: 2022-05-04 07:30:12 +0000 xhci(4): Always add and evaluate the slot context. Because the maximum number of endpoint contexts is stored there. Tested by: ehaupt@ PR: 262882 Sponsored by: NVIDIA Networking (cherry picked from commit 09dd1adfa4c9bb1b49f4ef5524a308732883e132) sys/dev/usb/controller/xhci.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
(In reply to Mark Millard from comment #99) I have updated the ThreadRipper 1950X bectl environment for main to: # ~/fbsd-based-on-what-commit.sh -C /usr/main-src/ branch: main merge-base: a1c0442b418b39e57d287750147b0aeae5140766 merge-base: CommitDate: 2022-05-04 07:26:39 +0000 a1c0442b418b (HEAD -> main, freebsd/main, freebsd/HEAD) xhci(4): Tweak USB port speed checks to allow newer super speed generations. n255163 (--first-parent --count for merge-base) in order to to pick up the commits for the U1/U2 IOERROR issue. (Looks like it will be a week or so for the stable/13 context to have commits available.)
A commit in branch releng/13.1 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=340ed8ccb576e74e0cc8e5f1e8e3bbabbe53f090 commit 340ed8ccb576e74e0cc8e5f1e8e3bbabbe53f090 Author: Hans Petter Selasky <hselasky@FreeBSD.org> AuthorDate: 2022-05-03 16:09:17 +0000 Commit: Hans Petter Selasky <hselasky@FreeBSD.org> CommitDate: 2022-05-04 07:20:46 +0000 xhci(4): Only drop BULK and INTERRUPT endpoints to reset data toggle. Only drop BULK and INTERRUPT endpoints, to reset the data toggle, because for other endpoint types this is not critical. While at it fix some whitespace. Tested by: ehaupt@ PR: 262882 Approved by: re (gjb, early MFC) Sponsored by: NVIDIA Networking (cherry picked from commit e276d281503160ba3648bd394cde95736ee53329) (cherry picked from commit 610528736f3f0bf51f990dd93c5061a7a437e519) sys/dev/usb/controller/xhci.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-)
A commit in branch releng/13.1 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=465c5bd88e64852b39d711ea3e565edc90d65210 commit 465c5bd88e64852b39d711ea3e565edc90d65210 Author: Hans Petter Selasky <hselasky@FreeBSD.org> AuthorDate: 2022-05-03 16:10:49 +0000 Commit: Hans Petter Selasky <hselasky@FreeBSD.org> CommitDate: 2022-05-04 07:20:54 +0000 xhci(4): Always add and evaluate the slot context. Because the maximum number of endpoint contexts is stored there. Tested by: ehaupt@ PR: 262882 Approved by: re (gjb, early MFC) Sponsored by: NVIDIA Networking (cherry picked from commit 09dd1adfa4c9bb1b49f4ef5524a308732883e132) (cherry picked from commit 6d8c6b24ee0a0416204356a98e4e7606489894c5) sys/dev/usb/controller/xhci.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
(In reply to Hans Petter Selasky from comment #98) Hi Hans, Thanks for all the bug fixes. Mark.
Thank You HPS!! :-)
I'm seeing the same behavior on my T480s with a ThinkPad Thunderbolt 3 Dock Gen 2. It's running a very new 14-CURRENT: FreeBSD geroi 14.0-CURRENT FreeBSD 14.0-CURRENT #1 main-n255577-586ed321068: Wed May 11 19:48:41 CEST 2022 debdrup@geroi:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64. I'll be attaching a log file with USB_DEBUG in GENERIC (it's there by default), and hw.usb.debug=1, in the hopes that can help shed some light on things. I don't mind testing patches, if it helps this get resolved. :)
Whoops, I hadn't realised there was a limit for attachments on BugZilla, so the log file is here instead: https://people.freebsd.org/~debdrup/usb-hub-falloff.txt
(In reply to Daniel Ebdrup Jensen from comment #110) It may be that anything involving Thunderbolt capable hardware is considered a separate type of issue. It might also be that anything involving Thunderbolt capable hardware that appears to work is not by design at this point for FreeBSD. In other words, so far as I know, FreeBSD does not claim to support involving Thunderbolt capable hardware at this time, not that I'm an expert on such FreeBSD issues or anything.
(In reply to Mark Millard from comment #112) Thunderbolt, in so far as presenting a display display device, works (both according to the log mentioned above, and Wayland/Sway automatically outputting to it when the dock is connected): May 11 22:15:09 geroi kernel: [352.126842] <6>[drm] Connector DP-3: get mode from tunables: May 11 22:15:09 geroi kernel: [352.126874] <6>[drm] - kern.vt.fb.modes.DP-3 May 11 22:15:09 geroi kernel: [352.126897] <6>[drm] - kern.vt.fb.default_mode May 11 22:15:09 geroi kernel: [352.127023] <6>[drm] Connector DP-4: get mode from tunables: May 11 22:15:09 geroi kernel: [352.127048] <6>[drm] - kern.vt.fb.modes.DP-4 May 11 22:15:09 geroi kernel: [352.127066] <6>[drm] - kern.vt.fb.default_mode The log is also full of USB device attaches, so clearly that part is working too. I've also had several other FreeBSD developers tell me that USB and video works via thunderbolt. It should also be mentioned that sometimes USB devices briefly appear to work (ie. I can move the mouse and type on the keyboard, but stops working when the USB hub disconnects), but I can't find a consistent pattern in making it work for a brief amount of time, so it's hard to replicate. During one of these times when it was working for a little while, I was even able to set hw.snd.default_unit=3 and get audio playing on the speakers connected to the dock. Question for those that know better than I: Does it matter that it's a series of chained hubs, and/or could this be part of the problem?
(In reply to Daniel Ebdrup Jensen from comment #113) I was implicitly assuming USB4/Thunderbolt4, where the likes of USB3.2 is tunneled, rather than direct, if I understand right. Nothing analogous to DisplayPort Alt Mode for USB3.2 so far as I can tell. It may well be that one needs to make clear distinctions about which one(s) of Thunderbolt 1, 2, 3, or 4 is expected as being supported or under discussion. I'm not sure generic "Thunderbolt" references work all that well. Sorry for not being more explicit (even if I was wrong anyway). If I understand right, Thunderbolt 3 is like Thunderbolt 4 for the likes of USB 3.2 (well, at the time 3.1): tunneled. But all of this is just from a little reading, not any implementation involvement. I've been guessing that various issues would be visible to FreeBSD and have to be managed somewhat explicitly for Thunderbolt 3, 4, and USB4 relative to handling the likes of USB3.2 . But I'd be happy to be guessing incorrectly.
Hello everyone. Do you think my problem is related to this bug? My usb driver is ehci, not xhci 13.0-p11 running on Dell r720xd. APC ups connected via usb cable. During boot this device always fail to init: ===== uhub4 numa-domain 0 on uhub2 uhub4: <vendor 0x0424 product 0x2512, class 9/0, rev 2.00/b.b3, addr 3> on usbus0 uhub4: MTT enabled uhub4: 1 port with 1 removable, self powered usb_alloc_device: set address 4 failed (USB_ERR_STALLED, ignored) usbd_setup_device_desc: getting device descriptor at addr 4 failed, USB_ERR_STALLED usbd_req_re_enumerate: addr=4, set address failed! (USB_ERR_STALLED, ignored) usbd_setup_device_desc: getting device descriptor at addr 4 failed, USB_ERR_STALLED usbd_req_re_enumerate: addr=4, set address failed! (USB_ERR_STALLED, ignored) usbd_setup_device_desc: getting device descriptor at addr 4 failed, USB_ERR_STALLED usbd_req_re_enumerate: addr=4, set address failed! (USB_ERR_STALLED, ignored) usbd_setup_device_desc: getting device descriptor at addr 4 failed, USB_ERR_STALLED usbd_req_re_enumerate: addr=4, set address failed! (USB_ERR_STALLED, ignored) Root mount waiting for: usbus0 usbd_setup_device_desc: getting device descriptor at addr 4 failed, USB_ERR_STALLED ugen0.4: <Unknown > at usbus0 (disconnected) uhub_reattach_port: could not allocate new device Root mount waiting for: usbus0 ugen0.4: <no manufacturer Gadget USB HUB> at usbus0 ===== But after boot, if i re-insert the cable, it works ok: ==== ugen0.7: <American Power Conversion Back-UPS CS 650 FW:915.R1 .I USB FW:R1> at usbus0 ====
Can you try a 13-stable kernel aswell? # Use: usbconfig -d X.Y reset # to reset the parent USB HUB, to see if it gets recognized. --HPS
(In reply to Hans Petter Selasky from comment #116) No, i'm planning to try 13.1. Does that make sense? It is production server and I'm afraid to use non-release branches on it. Also, i see in the discussion that the problem may be related to the non-standard speed of the device. This device have a non-standard speed as well. ugen0.7: <American Power Conversion Back-UPS CS 650 FW:915.R1 .I USB FW:R1> at usbus0, cfg=0 md=HOST spd=FULL (12Mbps) pwr=ON (24mA)
(In reply to Hans Petter Selasky from comment #116) I'm sorry, I misread your comment. I will try tomorrow 13-stable. I meant that I won't be able to put 13-stable as the main system
Ivan: Just build and install a 13-stable kernel as of today. Reboot and see what happens. Then you can restore the old kernel from /boot/kernel.old . Or copy it somewhere else to be safe. There are also some debug symbols /usr/lib/debug/boot/ which might need to be restored the same way. --HPS
(In reply to Hans Petter Selasky from comment #119) building world&kernel But this morning I rebooted the system without changing anything, and it worked during the boot.
(In reply to Hans Petter Selasky from comment #119) I can't reproduce it anymore. It failed to work consistently for at least a few months after each reboot. Now two reboots works fine. Nothing changed in system config. Thank you for this magic :)
Triage: de-tag the summary line. <https://wiki.freebsd.org/Bugzilla/DosAndDonts>
To dch: is this still a problem?