237666 – repeated messages of "uhub_reattach_port: giving up port reset - device vanished"

Bug 237666 - repeated messages of "uhub_reattach_port: giving up port reset - device vanished"

Summary: repeated messages of "uhub_reattach_port: giving up port reset - device vanis...

Status:	In Progress

Alias:	None

Product:	Base System
Classification:	Unclassified
Component:	usb (show other bugs)
Version:	13.2-RELEASE
Hardware:	Any Any

Importance:	--- Affects Some People
Assignee:	freebsd-usb (Nobody)

URL:
Keywords:	needs-qa, regression

Depends on:
Blocks:

Reported:	2019-04-30 08:22 UTC by schmidt
Modified:	2024-02-29 14:47 UTC (History)
CC List:	32 users (show)

See Also:	235625

Attachments
dmesg.boot from the maschin (91.15 KB, text/plain) 2019-04-30 08:22 UTC, schmidt	no flags	Details
dmesg.boot with stable kernel an verbose on (91.66 KB, text/plain) 2019-04-30 10:28 UTC, schmidt	no flags	Details
dmesg.boot with stable kernel an verbose off (86.24 KB, text/plain) 2019-04-30 10:29 UTC, schmidt	no flags	Details
dmesg.boot from the same machine running 11.2 (54.29 KB, text/plain) 2019-05-15 06:07 UTC, schmidt	no flags	Details
dmesg.boot 11.2 with Lagacy USB Off an hw.usb.uhub.debug=17 set (93.10 KB, text/plain) 2019-05-15 09:27 UTC, schmidt	no flags	Details
12.0 boot messages on Gigabyte motherboard (38.60 KB, text/plain) 2019-08-07 08:18 UTC, Mike Karels	no flags	Details
Patch for XHCI (1.64 KB, patch) 2019-08-09 16:49 UTC, Hans Petter Selasky	no flags	Details \| Diff
devinfo -r (9.33 KB, text/plain) 2019-08-11 17:47 UTC, Mike Karels	no flags	Details
dmesg from FreeBSD 11.3 on Dell 8930 PC (10.63 KB, text/plain) 2019-11-22 16:26 UTC, Gerard Seibert	no flags	Details
pciconf -lv from Gigabyte system (19.98 KB, text/plain) 2019-12-08 21:58 UTC, Mike Karels	no flags	Details
Dmesg with hw.usb.xhci.debug=1 (93.26 KB, text/plain) 2019-12-23 05:44 UTC, Felix Johnson	no flags	Details
Complete 'dmidecode' output for Dell XPS 8930 (28.34 KB, text/plain) 2020-04-04 13:27 UTC, Gerard Seibert	no flags	Details
FreeBSD 11.3 boot log (20.40 KB, text/plain) 2020-04-26 18:37 UTC, Gerard Seibert	no flags	Details
PCI debug patch (587 bytes, patch) 2020-12-10 12:03 UTC, Hans Petter Selasky	no flags	Details \| Diff
usb: uhub return error from uhub_explore (5.05 KB, patch) 2021-05-07 22:11 UTC, Bjoern A. Zeeb	no flags	Details \| Diff
dmesg.boot from Marco (17.85 KB, text/plain) 2023-05-19 10:43 UTC, Marco Beishuizen	no flags	Details
Show Obsolete (1) View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description schmidt 2019-04-30 08:22:45 UTC

Created attachment 204107 [details]
dmesg.boot from the maschin

I've just updated my System from FreeBSD 11.1-Release to FreeBSD 12.0-Release. 

After Booting i get every second the message 

uhub_reattach_port: giving up port reset - device vanished

I've disconnected all external usb devices (including mouse and keyboard) the messages continue. 

I've use the Hardware for at least 4 years now. No such problem in the past.

Comment 1 Hans Petter Selasky freebsd_committer

2019-04-30 09:06:54 UTC

Can you try a 12-stable based kernel?

I suspect APCI changes might have turned on something in the hardware which was off before. USB changes are down to a minimum.

The only relevant change I can think of is this one:
https://svnweb.freebsd.org/changeset/base/343134

--HPS

Comment 2 schmidt 2019-04-30 10:27:58 UTC

I've build a stable kernel and booted with it. 

The Problems are still there. 


If it's only the messages i could live with that but my keyboard and mouse behave strangely. 

I about 30 sec after boot mouse and keyboard are stuck and only disconnecting the keyboard revive keyboard and mouse.

X11 lags sometimes. I think it's the usb resets the freeze everything. 

I have the dmesg from the stable kernel attached. 

If anything under stable the problems with keyboard are more pronounced. Keys repeating or missing without reason.

So I'm back with the release kernel.

Regards 
  Estartu

Comment 3 schmidt 2019-04-30 10:28:51 UTC

Created attachment 204110 [details]
dmesg.boot with stable kernel an verbose on

Comment 4 schmidt 2019-04-30 10:29:24 UTC

Created attachment 204111 [details]
dmesg.boot with stable kernel an verbose off

Comment 5 srlemke 2019-05-14 14:43:37 UTC

Just to pump this thread a little more:

<slemke@besta>/home/slemke/dropbox # dmesg |grep vanished |wc -l
815

<root@besta>/root # pciconf -lv |grep -i usb
    device     = '100 Series/C230 Series Chipset Family USB 3.0 xHCI Controller'
    subclass   = USB
    device     = 'DSL6340 USB 3.1 Controller [Alpine Ridge]'
    subclass   = USB
<root@besta>/root # 

So this is a very interesting scenario and will probably very difficult to debug and solve:

I have one of these portable docking station(works fine on linux):
https://www.czc.cz/dell-dokovaci-stanice-wd15-130w/187601/produkt - its and usb-c docking station.

On system boot it boot-locks with the message:
"uhub_reattach_port: giving up port reset - device vanished" - if I quickly unplug and replug the usb-c the boot proceeds and images appears on the 2 screens attached to the docking station. No regular USB2 are working on the docking station (only the 2 monitors) - Only the screns are correctly detected.

<root@besta>/home/slemke # uname -a
FreeBSD besta 12.0-RELEASE-p3 FreeBSD 12.0-RELEASE-p3 GENERIC  amd64
<root@besta>/home/slemke #

Comment 6 schmidt 2019-05-15 06:07:46 UTC

Created attachment 204382 [details]
dmesg.boot from the same machine running 11.2

There was no working with 12.0 and this bug. So i downgraded back to 11.2. 

There are some USB errors there as well but no repeated errors and no ill effects.

Comment 7 schmidt 2019-05-15 06:17:09 UTC

I've the same USB Controller chip as srlemke@protonmail.com but mine is on the Mainboard, because the error is still there when all USB devices are disconnected. 

xhci0@pci0:6:0:0:       class=0x0c0330 card=0x11112222 chip=0x15b68086 rev=0x00 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = 'DSL6540 USB 3.1 Controller [Alpine Ridge]'
    class      = serial bus
    subclass   = USB

Comment 8 Hans Petter Selasky freebsd_committer

2019-05-15 07:43:11 UTC

Can you try to figure out if anything is connected to the failing port?

Comment 9 schmidt 2019-05-15 08:00:11 UTC

The error is there even if nothing is connected to any USB-Port.

Comment 10 Hans Petter Selasky freebsd_committer

2019-05-15 08:20:11 UTC

And you're sure this is no internal USB device?

--HPS

Comment 11 schmidt 2019-05-15 08:37:44 UTC

I have tested is without any device connected. No keyboard, no mouse, no external HUB. Nothing and the error is still there.

Comment 12 Hans Petter Selasky freebsd_committer

2019-05-15 08:47:45 UTC

I understand, but sometimes the vendor wire internal devices to the USB ports, like bluetooth and BIOS related features.

BTW: Have you tried playing with BIOS legacy USB support ON/OFF?

Comment 13 Hans Petter Selasky freebsd_committer

2019-05-15 09:12:18 UTC

You might want to set:
sysctl hw.usb.uhub.debug=17

When collecting dmesg from the failing case.

Comment 14 schmidt 2019-05-15 09:27:49 UTC

Created attachment 204388 [details]
dmesg.boot 11.2 with Lagacy USB Off an hw.usb.uhub.debug=17 set

Disabling Legacy USB support didn't change anything. Therefore i booted with hw.usb.uhub.debug=17 set. The Buffer for dmesg isn't large enough for the whole kernel messages. so only the last part are here but the USB Errors are in the file.

Comment 15 Mike Karels freebsd_committer

2019-08-07 08:18:33 UTC

Created attachment 206324 [details]
12.0 boot messages on Gigabyte motherboard

I just switched from 11.3 to 12.0, and also have this problem since upgrade on a system with a Gigabyte motherboard.  I attached a dmesg.  I note that I am also losing communication with a UPS (via USB) every 10 minutes, which is also new.  I have lots of USB devices attached; I have not attempted to remove them, etc, but can do so. I can also provide 11.3 dmesgs.

Comment 16 Hans Petter Selasky freebsd_committer

2019-08-07 08:52:49 UTC

Does it help to set:

sysctl hw.usb.disable_enumeration=1

After boot. Does the UPS still work?

I suspect the issue has nothing to do with USB, but likely early AP startup (timing issue, can be disabled in kernel build) or ACPI's recent USB support.

1) Try removing this option from kernel config:
sys/amd64/conf/GENERIC:options 	EARLY_AP_STARTUP

2) Try booting with the following line added to /boot/loader.conf:

debug.acpi.disabled="usb"

--HPS

Comment 17 Mike Karels freebsd_committer

2019-08-07 12:21:49 UTC

Setting sysctl hw.usb.disable_enumeration=1 makes the messages stop. The UPS does not appear to work; I stopped apcupsd, and apctest cannot open the device. Clearing the sysctl makes the messages start again, but apctest does not work. Unfortunately, I did not try apctest before setting the sysctl.

Booting with debug.acpi.disabled="usb" did not seem to change anything: the messages continued, apcupsd complained about losing communication, and apctest didn't work.  hw.usb.disable_enumeration=1 still stopped the messages, but apcupsd still complains about lost communiction.  I can test the rest this evening.

Comment 18 Mike Karels freebsd_committer

2019-08-08 05:00:37 UTC

Commenting out EARLY_AP_STARTUP from the kernel config did not seem to change anything: the error message still repeats, and apcupsd loses contact with the UPS (probably never has it). I forgot to change the ident, but the APs were not started until nearly the end of autoconfiguration, just before attaching cd0 and mounting root.  The error messages start after that, after finding several more USB devices.  From messages:

Aug  7 21:43:31 pughole kernel: Launching APs: 1 10 3 8 9 11 5 7 4 2 6
Aug  7 21:43:31 pughole kernel: cd0 at ahcich1 bus 0 scbus0 target 0 lun 0
Aug  7 21:43:31 pughole kernel: cd0: <HL-DT-ST BD-RE  WH14NS40 1.00> Removable C
D-ROM SCSI device
Aug  7 21:43:31 pughole kernel: cd0: Serial Number SIK9IFALJ462
Aug  7 21:43:31 pughole kernel: cd0: 150.000MB/s transfers (SATA 1.x, UDMA6, ATA
PI 12bytes, PIO 8192bytes)
Aug  7 21:43:31 pughole kernel: cd0: Attempt to query device size failed: NOT RE
ADY, Medium not present - tray closed
Aug  7 21:43:31 pughole kernel: Trying to mount root from ufs:/dev/nvd0p5 [rw]..
.
Aug  7 21:43:31 pughole kernel: uhub0: 4 ports with 4 removable, self powered
Aug  7 21:43:31 pughole kernel: ZFS filesystem version: 5
Aug  7 21:43:31 pughole kernel: ZFS storage pool version: features support (5000
)
Aug  7 21:43:31 pughole kernel: uhub2: 2 ports with 2 removable, self powered
Aug  7 21:43:31 pughole kernel: uhub3: 2 ports with 2 removable, self powered
Aug  7 21:43:31 pughole kernel: uhub1: 21 ports with 21 removable, self powered
Aug  7 21:43:31 pughole kernel: uhub_reattach_port: giving up port reset - device vanished
Aug  7 21:43:31 pughole kernel: ugen1.2: <vendor 0x045b product 0x0209> at usbus1
Aug  7 21:43:31 pughole kernel: uhub4 numa-domain 0 on uhub1
Aug  7 21:43:31 pughole kernel: uhub4: <vendor 0x045b product 0x0209, class 9/0, rev 2.10/1.00, addr 1> on usbus1

Any reason to try these in combination?

Comment 19 Hans Petter Selasky freebsd_committer

2019-08-08 06:33:40 UTC

Hi,

Can you collect logs from FreeBSD 12 with:

sysctl hw.usb.uhub.debug=17

Just let it run for 12 minutes. According to your logs it takes around 10 minutes for the cycle before UPS detaches.

--HPS

Comment 20 Hans Petter Selasky freebsd_committer

2019-08-08 06:37:06 UTC

Aug  6 18:07:20 pughole kernel: usbus0 numa-domain 0 on xhci0
Aug  6 18:07:20 pughole kernel: usbus0: 5.0Gbps Super Speed USB v3.0

I'm a little surprised that your xhci has 21 ports on the root-HUB. That seems a lot. Maybe 5 is more correct. Can you check this?

--HPS

Comment 21 Mike Karels freebsd_committer

2019-08-08 07:11:51 UTC

I'll collect the logs tomorrow.  About the 21 ports: 11.3 reported the same thing, but I agree that the number sounds suspicious.  Does a verbose boot provide any information about that?

Comment 22 Hans Petter Selasky freebsd_committer

2019-08-08 07:20:13 UTC

Hi,

I think bootverbose won't help.

hw.usb.uhub.debug will reveal more information about which ports are active and which are causing problems. Currently xhci.debug is a last resort, because it is very verbose.

Thanks for testing. Lets try to figure out the reason for this "giving up port reset" failure, which also happens on 11.x - right.

--HPS

Comment 23 Mike Karels freebsd_committer

2019-08-08 23:39:20 UTC

The output from the time with debug enabled is too big to attach. ftp://ftp.karels.net/outgoing/messages.usb_debug should work (normal anonymous access).

Comment 24 Hans Petter Selasky freebsd_committer

2019-08-09 07:29:20 UTC

Can you repeat with hw.usb.xhci.debug=17. It looks like the XHCI port status register is returning all ones, which is strange.

Comment 25 Mike Karels freebsd_committer

2019-08-09 13:26:50 UTC

I wasn't sure about hub_debug, so I did both.  ftp://ftp.karels.net/outgoing/messages.xhci_debug.gz (it's 14M without compression).

Comment 26 Hans Petter Selasky freebsd_committer

2019-08-09 16:38:52 UTC

Aug  9 07:47:30 pughole kernel: xhci_roothub_exec: port status=0x000802a0
Aug  9 07:47:30 pughole kernel: uhub_read_port_status: port 21, wPortStatus=0x07a0, wPortChange=0x0020, err=USB_ERR_NORMAL_COMPLETION
Aug  9 07:47:31 pughole kernel: xhci_roothub_exec: type=0xa3 request=0x00 wLen=0x0004 wValue=0x0000 wIndex=0x0003
Aug  9 07:47:31 pughole kernel: xhci_roothub_exec: UR_GET_STATUS i=3
Aug  9 07:47:31 pughole kernel: xhci_roothub_exec: port status=0xffffffff
Aug  9 07:47:31 pughole kernel: uhub_read_port_status: port 3, wPortStatus=0x07fb, wPortChange=0x00fb, err=USB_ERR_NORMAL_COMPLETION
Aug  9 07:47:31 pughole kernel: uhub_reattach_port: giving up port reset - device vanished
Aug  9 07:47:31 pughole kernel: xhci_roothub_exec: type=0x23 request=0x01 wLen=0x0000 wValue=0x0001 wIndex=0x0003
Aug  9 07:47:31 pughole kernel: xhci_roothub_exec: UR_CLEAR_PORT_FEATURE


Here it is "port status=0xffffffff" which is a totally off-track XHCI PCI read value.

I'll try to make a patch you can try. This looks like a low-level hardware PCI issue.

Comment 27 Hans Petter Selasky freebsd_committer

2019-08-09 16:49:00 UTC

Created attachment 206399 [details]
Patch for XHCI

Can you repeat XHCI logging with this patch?

Comment 28 Mike Karels freebsd_committer

2019-08-10 16:55:14 UTC

ftp://ftp.karels.net/outgoing/messages.xhci_retry.gz.  Oddly, port status is still sometimes -1, sometimes a more reasonable value, and they are in clusters.

Comment 29 Mike Karels freebsd_committer

2019-08-10 17:16:27 UTC

Hmm, I see that there are two xhci controllers.  One is on pci7, and I see this:

pcib7: <PCI-PCI bridge> at device 2.0 numa-domain 0 on pci4
pcib7: failed to allocate initial I/O port window: 0-0xfff
pcib7: failed to allocate initial memory window: 0-0xfffff
pcib7: failed to allocate initial prefetch window: 0-0xfffff
pci7: <PCI bus> numa-domain 0 on pcib7
xhci0: <XHCI (generic) USB 3.0 controller> mem 0xd8000000-0xd800ffff at device 0
.0 numa-domain 0 on pci7
xhci0: 32 bytes context size, 64-bit DMA
usbus0 numa-domain 0 on xhci0

and

pcib1: <ACPI Host-PCI bridge> port 0xcf8-0xcff numa-domain 0 on acpi0
pci1: <ACPI PCI bus> numa-domain 0 on pcib1
...
pci1: <unknown> at device 17.0 (no driver attached)
xhci1: <Intel Wellsburg USB 3.0 controller> mem 0xf6120000-0xf612ffff irq 19 at device 20.0 numa-domain 0 on pci1
xhci1: 32 bytes context size, 64-bit DMA
usbus1 numa-domain 0 on xhci1
pci1: <simple comms> at device 22.0 (no driver attached)

I'm also not using any USB3 device regularly, so I don't know if the misbehaving controller is functioning at all.

The messages above are similar with 11.3.

Comment 30 Hans Petter Selasky freebsd_committer

2019-08-10 17:46:21 UTC

Hi Konstatin,

We are seeing a 32-bit PCI read resulting in -1U. This is a regression issue in more recent versions of FreeBSD. Can you comment on these failure messages:

--HPS

> Hmm, I see that there are two xhci controllers.  One is on pci7, and I see this:
> 
> pcib7: <PCI-PCI bridge> at device 2.0 numa-domain 0 on pci4
> pcib7: failed to allocate initial I/O port window: 0-0xfff
> pcib7: failed to allocate initial memory window: 0-0xfffff
> pcib7: failed to allocate initial prefetch window: 0-0xfffff
> pci7: <PCI bus> numa-domain 0 on pcib7
> xhci0: <XHCI (generic) USB 3.0 controller> mem 0xd8000000-0xd800ffff at device 0
> .0 numa-domain 0 on pci7
> xhci0: 32 bytes context size, 64-bit DMA
> usbus0 numa-domain 0 on xhci0
> 
> and
> 
> pcib1: <ACPI Host-PCI bridge> port 0xcf8-0xcff numa-domain 0 on acpi0
> pci1: <ACPI PCI bus> numa-domain 0 on pcib1
> ...
> pci1: <unknown> at device 17.0 (no driver attached)
> xhci1: <Intel Wellsburg USB 3.0 controller> mem 0xf6120000-0xf612ffff irq 19 at device 20.0 numa-domain 0 on pci1
> xhci1: 32 bytes context size, 64-bit DMA
> usbus1 numa-domain 0 on xhci1
> pci1: <simple comms> at device 22.0 (no driver attached)
> 
> I'm also not using any USB3 device regularly, so I don't know if the misbehaving controller is functioning at all.
> 
> The messages above are similar with 11.3.

Comment 31 Hans Petter Selasky freebsd_committer

2019-08-10 17:47:10 UTC

Mike, can you repeat when you started seeing this bug?

Comment 32 Mike Karels freebsd_committer

2019-08-10 18:27:49 UTC

Yes, I started seeing these when I upgraded from FreeBSD 11.3 (RELEASE) to 12.0-RELEASE-p9 about a week ago.  I have /var/log/messages from 11.3 for comparison. btw, I added your xhci patch to the system with no EARLY_AP_STARTUP, although it probably doesn't matter.  I'll note that 12.0 is older than 11.3 in various ways; I should probably try 12-stable.

fwiw, this is a Gigabyte GA-X99P-SLI motherboard (rev. 1.0) with Intel Core i7-5820K, 6 cores plus hyperthreading, several years old, and has run many versions of FreeBSD.  I believe the chipset is an Intel X99; I could research the details.  I might be able to reboot 11.3 if it mattered (it's on an alternate partition, but I don't know how to boot it using UEFI).

Thanks for looking into this!

Comment 33 Mike Karels freebsd_committer

2019-08-10 20:21:48 UTC

An up-to-date GENERIC 12-stable kernel has the problem also.

Comment 34 Konstantin Belousov freebsd_committer

2019-08-11 08:52:45 UTC

(In reply to Hans Petter Selasky from comment #30)
Minus one reads mean that there is no more a device to respond at the address read.  I have no idea why, and there seems to be the mix of several scenarios of failure.  At least I would not be suprised to see that if the docking station was detached, assuming the controller is inside it.

Comment 35 Mike Karels freebsd_committer

2019-08-11 13:51:32 UTC

This is not a laptop; it's a tower computer.  Also, note that the messages start even before autoconfiguration is complete.

Comment 36 Hans Petter Selasky freebsd_committer

2019-08-11 14:58:06 UTC

@kib: What about these PCI errors?

> pcib7: failed to allocate initial I/O port window: 0-0xfff
> pcib7: failed to allocate initial memory window: 0-0xfffff
> pcib7: failed to allocate initial prefetch window: 0-0xfffff

Comment 37 Hans Petter Selasky freebsd_committer

2019-08-11 15:13:44 UTC

Hi,

Can you attach output from:

devinfo -r

and

acpidump -dt

--HPS

Comment 38 Hans Petter Selasky freebsd_committer

2019-08-11 15:15:27 UTC

Adding John Baldwin aswell after searching for similar issues.

Comment 39 Konstantin Belousov freebsd_committer

2019-08-11 15:37:14 UTC

(In reply to Hans Petter Selasky from comment #36)
These 'failed to allocate' messages are the secondary effects of reading all ones.  Values of the bridge' registers to specify io and memory window to forward to subordinary buses are all ones (AKA 0xfff...), and that range conflicts with already allocated resources.

So the question is how it comes that bridges were enumerated, but accessing their PCI config space reads failing.  I am curious if manually reading e.g. pci vendor or device id using pciconf would show sane values or 1111...s.

Comment 40 Mike Karels freebsd_committer

2019-08-11 17:47:57 UTC

Created attachment 206443 [details]
devinfo -r

devinfo -r is attached; acpdidump -dt is ftp://ftp.karels.net/outgoing/acpidump-dt.

Comment 41 John Baldwin freebsd_committer

2019-08-12 16:47:54 UTC

If PCI messed up you would get -1 constantly, not sporadically.  FWIW, I have the exact same problem on my desktop since I upgraded it to 12.0.  I also use apcupsd and usually get a couple of e-mails a day due to "losing communications", but apcupsd is able to reattach when the device comes back so it still works overall aside from sending spammy e-mails.

Mine is also not a laptop, it's an Ivy Bridge desktop motherboard:

smbios.planar.maker="MSI"
smbios.planar.product="Z77A-G45 (MS-7752)"

xhci0: <Intel Panther Point USB 3.0 controller> mem 0xf7100000-0xf710ffff irq 16 at device 20.0 on pci0
xhci0: 32 bytes context size, 64-bit DMA
usbus0: waiting for BIOS to give up control
xhci0: Port routing mask set to 0xffffffff
usbus0 on xhci0
usbus0: 5.0Gbps Super Speed USB v3.0
ehci0: <Intel Panther Point USB 2.0 controller> mem 0xf7118000-0xf71183ff irq 16 at device 26.0 on pci0
usbus1: EHCI version 1.0
usbus1 on ehci0
usbus1: 480Mbps High Speed USB v2.0
...
ehci1: <Intel Panther Point USB 2.0 controller> mem 0xf7117000-0xf71173ff irq 23 at device 29.0 on pci0
usbus2: EHCI version 1.0
usbus2 on ehci1
usbus2: 480Mbps High Speed USB v2.0
...
ugen1.1: <Intel EHCI root HUB> at usbus1
ugen2.1: <Intel EHCI root HUB> at usbus2
ugen0.1: <0x8086 XHCI root HUB> at usbus0
...
uhub0: <Intel EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus1
uhub1: <Intel EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus2
uhub2: <0x8086 XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus0
uhub2: 8 ports with 8 removable, self powered
uhub0: 2 ports with 2 removable, self powered
uhub1: 2 ports with 2 removable, self powered
ugen1.2: <vendor 0x8087 product 0x0024> at usbus1
uhub3 on uhub0
uhub3: <vendor 0x8087 product 0x0024, class 9/0, rev 2.00/0.00, addr 2> on usbus1
ugen2.2: <vendor 0x8087 product 0x0024> at usbus2
uhub4 on uhub1
uhub4: <vendor 0x8087 product 0x0024, class 9/0, rev 2.00/0.00, addr 2> on usbus2
uhub3: 6 ports with 6 removable, self powered
uhub4: 8 ports with 8 removable, self powered
ugen1.3: <American Power Conversion Back-UPS RS 1000G FW:868.L3 .D USB FW:L3> at usbus1
ugen1.4: <vendor 0x0451 product 0x8043> at usbus1
uhub5 on uhub3
uhub5: <vendor 0x0451 product 0x8043, class 9/0, rev 2.10/1.00, addr 4> on usbus1
uhub5: MTT enabled
uhub5: 4 ports with 4 removable, self powered
ugen1.5: <Logitech Trackball> at usbus1
ums0 on uhub5
ums0: <Logitech Trackball, class 0/0, rev 1.10/2.20, addr 5> on usbus1
ums0: 3 buttons and [XYZ] coordinates ID=0
ugen1.6: <LITE-ON Technology USB NetVista Full Width Keyboard.> at usbus1
ukbd0 on uhub5
ukbd0: <EP1 Interrupt> on usbus1
kbd2 at ukbd0

This is a typical set of logs from /var/log/messages when the UPS bounces:

Aug  9 17:35:01 ralph kernel: ugen1.3: <American Power Conversion Back-UPS RS 10
00G FW:868.L3 .D USB FW:L3> at usbus1 (disconnected)
Aug  9 17:35:01 ralph kernel: ugen1.3: <American Power Conversion Back-UPS RS 10
00G FW:868.L3 .D USB FW:L3> at usbus1
Aug  9 17:35:01 ralph kernel: interface uhid.1 already present in the KLD 'kerne
l'!
Aug  9 17:35:01 ralph kernel: linker_load_file: /boot/kernel/uhid.ko - unsupport
ed file type
Aug  9 17:35:07 ralph apcupsd[1967]: Communications with UPS restored.

Comment 42 John Baldwin freebsd_committer

2019-08-12 16:50:06 UTC

(In reply to Hans Petter Selasky from comment #36)
Those are a red herring.  The next few lines down in dmesg show that a child device allocated resources ok which meant that pcib7 allocated fresh resources for its I/O windows.  You could verify this with 'pciconf -lB pcib7' and see that it has valid ranges for I/O and memory decoding, but the child device (xhci0) would not be able to attach successfully if we hadn't allocated valid windows in the parent bridge.

Comment 43 Konstantin Belousov freebsd_committer

2019-08-12 17:46:16 UTC

(In reply to John Baldwin from comment #42)
apcupsd loosing ugen's is nothing new.  I have this since 9.x times on very different machines, from old cedarview atoms up to the Skylake-X.  I do not think it is related.

Comment 44 John Baldwin freebsd_committer

2019-08-12 18:02:11 UTC

(In reply to Konstantin Belousov from comment #43)
Yes, but I never had a single instance while running 10.x and 11.x on this desktop.  The day I upgraded to 12.0 I started getting multiple alerts a day.

Comment 45 Mike Karels freebsd_committer

2019-09-08 20:09:40 UTC

Any additional thoughts on this?  I rebooted with my UPS disconnected, and I still get constant messages about device vanished.  Then I rebooted with *no* external USB devices connected, and I *still* get these messages immediately and constantly.  I never saw this with 11.3, and I may have to revert to it.  I'm happy to run experiments to test any theories.

Comment 46 Hans Petter Selasky freebsd_committer

2019-09-09 07:14:34 UTC

Maybe you could try to do some bisecting with the FreeBSD master branch? To find the offending commit?

--HPS

Comment 47 Hans Petter Selasky freebsd_committer

2019-09-09 07:31:16 UTC

Start with r308158 of FreeBSD head branch.

Then build kernels and add revision steps like 1 2 4 8 16 32 and so on until the issue appears. Then repeat like in binary search to find the exact revision.

--HPS

Comment 48 Mike Karels freebsd_committer

2019-09-10 01:13:49 UTC

I'll give it a try soon.

Comment 49 Hans Petter Selasky freebsd_committer

2019-09-17 14:06:27 UTC

Ping - any progress?

Comment 50 Mike Karels freebsd_committer

2019-09-17 23:21:05 UTC

I've been working on this off and on.  It looks like the repeated messages started with r330957, an ACPI update.  However, the problem with the UPS started much earlier.  I haven't bracketed that, as lots of the early systems won't boot, at least multi-user, which makes it hard to test.  It seems to be before r315272.

Comment 51 Hans Petter Selasky freebsd_committer

2019-09-18 07:17:14 UTC

You should be able to identify the ACPI update in the stable/12 branch aswell. Then use that to bisect, because head might be a bit unstable as you see :-)

--HPS

Comment 52 Hans Petter Selasky freebsd_committer

2019-09-18 09:18:56 UTC

commit 48286dea101b3aad1e566952e372a646c87a8970
Author: jkim <jkim@FreeBSD.org>
Date:   Wed Mar 14 23:45:48 2018 +0000

    Merge ACPICA 20180313.


HEAD r330957 appears to be before 12.0 was released and was never MFC'ed to FreeBSD 11. I'm adding jkim, in case he has any input on this.

--HPS

Comment 53 Hans Petter Selasky freebsd_committer

2019-09-18 09:23:39 UTC

jkim: What is the right way to add ACPI debugging?

Comment 54 Mike Karels freebsd_committer

2019-09-19 03:31:01 UTC

(In reply to Hans Petter Selasky from comment #52)
In an "svn log" that goes far enough back to see this ACPI update on stable/12, I see the same revision number as on head.  "svn update -r330957" in that tree deletes most of the kernel.  I'm not sure what this means, but I know I can't compile it.

Comment 55 Jung-uk Kim freebsd_committer

2019-09-20 20:12:40 UTC

(In reply to Hans Petter Selasky from comment #53)
Add "options ACPI_DEBUG" to kernel configuration and set debug.acpi.layer and debug.acpi.level as described in acpi(4).

https://www.freebsd.org/cgi/man.cgi?query=acpi&format=html#DEBUGGING_OUTPUT

Comment 56 Jung-uk Kim freebsd_committer

2019-09-20 22:13:35 UTC

(In reply to Mike Karels from comment #54)
I just updated stable/12 to ACPICA 20190703 (r352566).

https://svnweb.freebsd.org/changeset/base/352566

Comment 57 Mike Karels freebsd_committer

2019-09-21 01:25:38 UTC

(In reply to Jung-uk Kim from comment #56)
I built a kernel at r352566.  It still prints this error message once per second.  However, I did not get the messages from apcupsd about losing contact.  I don't know if that means if it is working better or worse.  As I am running 12.0 userland, I went back to my previous kernel.

btw, I had to add "device iflib" to GENERIC.  I'll check whether anyone has commented on that.

Comment 58 Gerard Seibert 2019-11-15 21:35:01 UTC

I am getting this exact same message when I try to install FreeBSD 12.1 onto a new Dell XPS 8930 PC. I am unable to complete the installation. This was while using a "Boot Only" CD.

Comment 59 Hans Petter Selasky freebsd_committer

2019-11-16 09:13:11 UTC

Is this a regression issue?

Comment 60 Hans Petter Selasky freebsd_committer

2019-11-16 09:15:46 UTC

Hi,

This issue likely has to do with some PCI register spuriously returning -1U . Someone has to bisect the revision history to figure out the commit that introduced it!

--HPS

Comment 61 Mike Karels freebsd_committer

2019-11-16 13:22:52 UTC

Answering questions in the last two comments:

- Yes, this is a regression.  See the description and comment 15.

- I did the bisection on head, see Comment 50.  I started to look at bisection on the 12 branch; I don't remember how far I got.

Comment 62 Mike Karels freebsd_committer

2019-11-16 19:20:37 UTC

If I'm interpreting things right, r330957 was committed on head before releng/12.0 was branched.  If I try to update releng/12.0 or stable/12 with that revision (actually the one before, r330956), svn says that the target path does not exist.

Comment 63 Gerard Seibert 2019-11-18 17:57:07 UTC

I am just wondering if there has been any movement on this issue. At this point, I am stuck with a new computer that I cannot install FreeBSD 12.1 on, which effectively leaves me with an expensive paperweight that, unfortunately, I cannot get my money back on.

Comment 64 Hans Petter Selasky freebsd_committer

2019-11-19 06:48:36 UTC

Can you try to install FreeBSD 11?

Comment 65 Gerard Seibert 2019-11-19 10:38:53 UTC

(In reply to Hans Petter Selasky from comment #64)

I hate to go backward. It goes against my basic instincts. However, when I get a chance, I will download the "FreeBSD-11.3-RELEASE-amd64-bootonly.iso" and try to install it from there. Would that be acceptable?

Comment 66 Gerard Seibert 2019-11-21 15:55:29 UTC

I can confirm that FreeBSD 11.3 installs and works fine on this PC. Obviously, the problem lies with the 12 branch. I am not a skilled programmer, but what else can I do to help? I really want to squash this bug so I can install the 12 branch.

Comment 67 Hans Petter Selasky freebsd_committer

2019-11-21 15:58:46 UTC

It is likely a regression in sys/dev/pci or ACPI code.

We need to identify the exact commit that breaks it. It would be a great help.

If you are on IRC and can offer you to build the needed kernels, and then you have to install/boot them from a FreeBSD 11 install as base.

--HPS

Comment 68 Gerard Seibert 2019-11-22 16:26:55 UTC

Created attachment 209346 [details]
dmesg from FreeBSD 11.3 on Dell 8930 PC

This is the dmesg displayed when FreeBSD 11.3 loads up. I hope it will be of some use.

Comment 69 Hans Petter Selasky freebsd_committer

2019-12-04 12:13:59 UTC

Ping - any news here? Did anyone complete bisecting?

Comment 70 Gerard Seibert 2019-12-04 13:35:10 UTC

(In reply to Hans Petter Selasky from comment #69)

I don't believe that a solution has been discovered as of yet. The really bad news is that this "bug" apparently also infects the FreeBSD 13.x branch. I am still stuck on the 11.x branch with no avenue available to upgrade my OS.

I have faith that they will find the bug and squash it.

Thought of the Day:
"Real stupidity beats artificial intelligence every time.”

Comment 71 Hans Petter Selasky freebsd_committer

2019-12-04 14:23:07 UTC

Is this hardware something that can be bought cheaply?

--HPS

Comment 72 Gerard Seibert 2019-12-04 15:55:16 UTC

(In reply to Hans Petter Selasky from comment #71)

I cannot speak for anyone else; however, in my case, this was a brand new Dell XPS 8930 PC. I purchased it as a cheap replacement for one of my old (2007) PCs that was barely limping along. Everything was stock. No fancy add-ons, etcetera. I have posted the DMESG, comment #68, previously. For the record, my "OLD" unit was running FreeBSD 12.

Comment 73 schmidt 2019-12-04 16:01:51 UTC

(In reply to Hans Petter Selasky from comment #71)
My Mainboad is a servermainboard from supermicro. So nothing realy cheap there.

Comment 74 Gerard Seibert 2019-12-04 16:21:39 UTC

While I do not believe this to be an "edge" case, I don't think it will get any serious attention until more users speak up. I know of at least one other user who has the same problem and has not posted it. I will encourage them to do so.

I do know that this problem has been reported on the Dell Community Forum. They are claiming it is FreeBSD's problem since there have been no reports of issues from users of other operating systems.

Until enough users file a complaint against this bug, I doubt that it will get any serious attention.

Comment 75 Mike Karels freebsd_committer

2019-12-05 01:46:18 UTC

(In reply to Hans Petter Selasky from comment #71)

My hardware is not especially cheap, but is also not new and might not be readily available.  It's a Gigabyte motherboard with a Haswell-E CPU.

About bisection: I bisected on head, and ended up with r330957 as the commit that started the problem (see comment #50).  It appears that that revision was not merged separately to 12, so there is nothing to bisect there.  That commit was an ACPI update.

Comment 76 Hans Petter Selasky freebsd_committer

2019-12-06 09:59:39 UTC

OK,

Can you add "options ACPI_DEBUG" to your kernel configuration file?

Then in /boot/loader.conf add:

debug.acpi.layer="ACPI_ALL_COMPONENTS ACPI_ALL_DRIVERS"
debug.acpi.level="ACPI_LV_ERROR"

The list of valid keywords are given by:
sys/dev/acpica/acpi.c

Try to correlate an ACPI event with the enumeration failure!

If no ACPI events, try to enable more ACPI debug layers/levels until you find such an event or error message.

--HPS

Comment 77 Hans Petter Selasky freebsd_committer

2019-12-06 10:16:37 UTC

If that fails, try to compare ACPI debug output before and after r330957.

--HPS

Comment 78 Mike Karels freebsd_committer

2019-12-07 15:19:15 UTC

I built a stable/12 kernel with ACPI debug.  ACPI_DEBUG by itself didn't compile; it used global variables conditional on ACPI_DEBUGGER.  I added ACPI_DEBUGGER, and added that to conf/options.

Using the suggested loader variables, there was no new output.  I don't see ACPI_LV_ERROR in acpi.c.  I tried verbosity1, and that spewed messages so fast it prevented a successful boot in 20 minutes or so.  I then tried ACPI_LV_INFO.  Here is what is logged in about a second, from one "vanished" message to the next:

Dec  7 09:09:35 pughole kernel: uhub_reattach_port: giving up port reset - devic
e vanished
Dec  7 09:09:35 pughole kernel:  exregion-0527 ExPciConfigSpaceHandle: Pci-Confi
g 0 (8) Seg(0000) Bus(0000) Dev(0003) Func(0002) Reg(00b2)
Dec  7 09:09:35 pughole kernel:  exregion-0527 ExPciConfigSpaceHandle: Pci-Confi
g 0 (8) Seg(0000) Bus(0000) Dev(0003) Func(0002) Reg(00aa)
Dec  7 09:09:35 pughole syslogd: last message repeated 3 times
Dec  7 09:09:35 pughole kernel:  exregion-0527 ExPciConfigSpaceHandle: Pci-Confi
g 1 (8) Seg(0000) Bus(0000) Dev(0003) Func(0002) Reg(00aa)
Dec  7 09:09:35 pughole kernel:  exregion-0527 ExPciConfigSpaceHandle: Pci-Confi
g 0 (8) Seg(0000) Bus(0000) Dev(0003) Func(0002) Reg(00aa)
Dec  7 09:09:35 pughole kernel:  exregion-0527 ExPciConfigSpaceHandle: Pci-Confi
g 0 (8) Seg(0000) Bus(0000) Dev(0003) Func(0003) Reg(00b2)
Dec  7 09:09:35 pughole kernel:  exregion-0527 ExPciConfigSpaceHandle: Pci-Confi
g 0 (8) Seg(0000) Bus(0000) Dev(0003) Func(0003) Reg(00aa)
Dec  7 09:09:35 pughole syslogd: last message repeated 3 times
Dec  7 09:09:35 pughole kernel:  exregion-0527 ExPciConfigSpaceHandle: Pci-Confi
g 1 (8) Seg(0000) Bus(0000) Dev(0003) Func(0003) Reg(00aa)
Dec  7 09:09:35 pughole kernel:  exregion-0527 ExPciConfigSpaceHandle: Pci-Confi
g 0 (8) Seg(0000) Bus(0000) Dev(0003) Func(0003) Reg(00aa)
Dec  7 09:09:35 pughole kernel:  exregion-0527 ExPciConfigSpaceHandle: Pci-Confi
g 0 (8) Seg(0000) Bus(0080) Dev(0000) Func(0000) Reg(00b2)
Dec  7 09:09:35 pughole kernel:  exregion-0527 ExPciConfigSpaceHandle: Pci-Confi
g 0 (8) Seg(0000) Bus(0080) Dev(0000) Func(0000) Reg(00aa)
Dec  7 09:09:35 pughole syslogd: last message repeated 3 times
Dec  7 09:09:35 pughole kernel:  exregion-0527 ExPciConfigSpaceHandle: Pci-Confi
g 1 (8) Seg(0000) Bus(0080) Dev(0000) Func(0000) Reg(00aa)
Dec  7 09:09:35 pughole kernel:  exregion-0527 ExPciConfigSpaceHandle: Pci-Confi
g 0 (8) Seg(0000) Bus(0080) Dev(0000) Func(0000) Reg(00aa)
Dec  7 09:09:36 pughole kernel:  exregion-0527 ExPciConfigSpaceHandle: Pci-Confi
g 0 (8) Seg(0000) Bus(0080) Dev(0002) Func(0000) Reg(00b2)
Dec  7 09:09:36 pughole kernel:  exregion-0527 ExPciConfigSpaceHandle: Pci-Confi
g 0 (8) Seg(0000) Bus(0080) Dev(0002) Func(0000) Reg(00aa)
Dec  7 09:09:36 pughole syslogd: last message repeated 3 times

ec  7 09:09:36 pughole kernel:  exregion-0527 ExPciConfigSpaceHandle: Pci-Confi
g 1 (8) Seg(0000) Bus(0080) Dev(0002) Func(0000) Reg(00aa)
Dec  7 09:09:36 pughole kernel:  exregion-0527 ExPciConfigSpaceHandle: Pci-Confi
g 0 (8) Seg(0000) Bus(0080) Dev(0002) Func(0000) Reg(00aa)
Dec  7 09:09:36 pughole kernel:  exregion-0527 ExPciConfigSpaceHandle: Pci-Confi
g 0 (8) Seg(0000) Bus(0080) Dev(0002) Func(0001) Reg(00b2)
Dec  7 09:09:36 pughole kernel:  exregion-0527 ExPciConfigSpaceHandle: Pci-Confi
g 0 (8) Seg(0000) Bus(0080) Dev(0002) Func(0001) Reg(00aa)
Dec  7 09:09:36 pughole syslogd: last message repeated 3 times
Dec  7 09:09:36 pughole kernel:  exregion-0527 ExPciConfigSpaceHandle: Pci-Confi
g 1 (8) Seg(0000) Bus(0080) Dev(0002) Func(0001) Reg(00aa)
Dec  7 09:09:36 pughole kernel:  exregion-0527 ExPciConfigSpaceHandle: Pci-Confi
g 0 (8) Seg(0000) Bus(0080) Dev(0002) Func(0001) Reg(00aa)
Dec  7 09:09:36 pughole kernel:  exregion-0527 ExPciConfigSpaceHandle: Pci-Confi
g 0 (8) Seg(0000) Bus(0080) Dev(0002) Func(0002) Reg(00b2)
Dec  7 09:09:36 pughole kernel:  exregion-0527 ExPciConfigSpaceHandle: Pci-Confi
g 0 (8) Seg(0000) Bus(0080) Dev(0002) Func(0002) Reg(00aa)
Dec  7 09:09:36 pughole syslogd: last message repeated 3 times
Dec  7 09:09:36 pughole kernel:  exregion-0527 ExPciConfigSpaceHandle: Pci-Confi
g 1 (8) Seg(0000) Bus(0080) Dev(0002) Func(0002) Reg(00aa)
Dec  7 09:09:36 pughole kernel:  exregion-0527 ExPciConfigSpaceHandle: Pci-Confi
g 0 (8) Seg(0000) Bus(0080) Dev(0002) Func(0002) Reg(00aa)
Dec  7 09:09:36 pughole kernel: uhub_reattach_port: giving up port reset - devic
e vanished

I have no idea if this includes any useful information.

Comment 79 Hans Petter Selasky freebsd_committer

2019-12-08 21:02:08 UTC

Can you show the output of "pciconf -lv" ?

Comment 80 Hans Petter Selasky freebsd_committer

2019-12-08 21:17:06 UTC

Are you familiar with dtrace. Could you get the backtrace of these function calls?

kldload dtraceall

cat << EOF > dtrace.d 
#!/usr/sbin/dtrace -s

fbt::$1:entry
{
        @[probefunc, stack()] = count();
}
EOF

chmod +x dtrace.d

./dtrace.d ExPciConfigSpaceHandle

Wait, then CTRL+C

Comment 81 Mike Karels freebsd_committer

2019-12-08 21:58:45 UTC

Created attachment 209793 [details]
pciconf -lv from Gigabyte system

Comment 82 Mike Karels freebsd_committer

2019-12-08 22:07:38 UTC

pciconf-lv is attached.

When I tried the dtrace script after kldloading dtraceall, I got this:

pughole# ./dtrace.d ExPciConfigSpaceHandle
dtrace: failed to compile script ./dtrace.d: line 3: probe description fbt:ExPciConfigSpaceHandle:filec:entry does not match any probes

Comment 83 Hans Petter Selasky freebsd_committer

2019-12-08 22:21:30 UTC

Try:

AcpiExPciConfigSpaceHandler

Instead.

Comment 84 Hans Petter Selasky freebsd_committer

2019-12-08 22:27:55 UTC

@kib: It looks like ACPI is addressing a device at PCI address 128, which is not in the list of PCI devices on the given machine. Is this allowed?

Comment 85 Mike Karels freebsd_committer

2019-12-08 23:48:35 UTC

Same result with dtrace.d:

pughole# ./dtrace.d AcpiExPciConfigSpaceHandler
dtrace: failed to compile script ./dtrace.d: line 3: probe description fbt:AcpiExPciConfigSpaceHandler:filec:entry does not match any probes

This is the kernel with ACPI debug but no other options added to GENERIC.

Comment 86 Hans Petter Selasky freebsd_committer

2019-12-11 14:29:01 UTC

Did you do:

kldload dtraceall

first?

--HPS

dtrace -l | grep -i ACPI

Comment 87 Mike Karels freebsd_committer

2019-12-12 01:25:02 UTC

Yes, I loaded dtraceall.

Derived from the command you sent, and copying/pasting the function name:

pughole# dtrace -l | grep -i acpiexpci
30153        fbt            kernel          AcpiExPciBarSpaceHandler entry
30154        fbt            kernel          AcpiExPciBarSpaceHandler return
30155        fbt            kernel       AcpiExPciConfigSpaceHandler entry
30156        fbt            kernel       AcpiExPciConfigSpaceHandler return
30418        fbt            kernel              AcpiExPciClsToString entry
30419        fbt            kernel              AcpiExPciClsToString return
pughole# ./dtrace.d AcpiExPciConfigSpaceHandler
dtrace: failed to compile script ./dtrace.d: line 3: probe description fbt:AcpiExPciConfigSpaceHandler:filec:entry does not match any probes
pughole# cat dtrace.d
#!/usr/sbin/dtrace -s

fbt:$1:filec:entry
{
        @[probefunc, stack()] = count();
}
pughole#

Comment 88 Mike Karels freebsd_committer

2019-12-12 04:06:02 UTC

Followup on previous comment: I'm not sure why, but the system is not logging the continuous acpi messages that it did previously.  It has been rebooted several times, including other kernels, but is back to the ACPI_DEBUG configuration.  Perhaps the error means the function has not been called?

Comment 89 Hans Petter Selasky freebsd_committer

2019-12-12 07:51:18 UTC

Can you remove "filec:" from the dtrace script?

--HPS

Comment 90 Mike Karels freebsd_committer

2019-12-12 12:13:48 UTC

Similar result:

pughole# ./dtrace.d AcpiExPciConfigSpaceHandler
dtrace: failed to compile script ./dtrace.d: line 3: probe description :fbt:AcpiExPciConfigSpaceHandler:entry does not match any probes

Comment 91 Hans Petter Selasky freebsd_committer

2019-12-12 12:17:14 UTC

#!/usr/sbin/dtrace -s

fbt::$1:entry
{
        @[probefunc, stack()] = count();
}

I think the $1 was expanded when you created dtrace.d!

Can you type this script verbatim and try again?

It basically should work, tested over here!

Comment 92 Mike Karels freebsd_committer

2019-12-14 02:57:47 UTC

The $1 was expanded (to nothing) when I ran your script to cat it to a file, but I fixed it at the time.  The latest version runs, but produces no output in a minute or two.  I am also not getting the debug messages, even after a reboot.  The current settings from "sysctl debug.acpi" are:

debug.acpi.level: ACPI_LV_INFO
debug.acpi.layer: ACPI_UTILITIES ACPI_HARDWARE ACPI_EVENTS ACPI_TABLES ACPI_NAMESPACE ACPI_PARSER ACPI_DISPATCHER ACPI_EXECUTER ACPI_RESOURCES AC

I also tried "debug.acpi.level=ACPI_LV_VERBOSITY1"; I'm still getting nothing.

Comment 93 Hans Petter Selasky freebsd_committer

2019-12-16 08:54:07 UTC

Can you retry enabling full ACPI debug messages, and also set:
kern.consmute=1

In /boot/loader.conf or try via the loader first.

You will need to log in via SSH.

That way the excessive debug prints should not halt the booting of the OS.

Comment 94 Mike Karels freebsd_committer

2019-12-17 02:49:01 UTC

Looks like kern.consmute is not a loader tunable, just a sysctl.  It didn't work, and I don't see that string in the kernel.  I let it scroll messages madly for about 30 minutes, but didn't get to the point of being able to ping it.  Of course, 99% of the messages would be gone by the time I got logged in.  Any thoughts on narrowing the scope of the messages?

Comment 95 Felix Johnson freebsd_triage

2019-12-23 05:44:32 UTC

Created attachment 210170 [details]
Dmesg with hw.usb.xhci.debug=1

Verbose boot for similar error with Mediasonic Probox (external USB3 hard drive enclosure).

Comment 96 Gerard Seibert 2020-01-13 11:15:58 UTC

Has there been any movement on this issue? FreeBSD 12.2 is on the horizon, and I and others, are stuck with FreeBSD 11.x because we cannot get FreeBSD 12.x to install and run satisfactorily. This is happening with new, modern PCs, not some dinosaurs leftover from the '90s.

Comment 97 Hans Petter Selasky freebsd_committer

2020-01-13 11:22:36 UTC

There has been one patch for XHCI:

https://svnweb.freebsd.org/changeset/base/356545

The issue is believed to be ACPI related.

--HPS

Comment 98 Gerard Seibert 2020-01-13 11:36:55 UTC

(In reply to Hans Petter Selasky from comment #97)

Will this fix be applied to FreeBSD 12.1, or will I have to wait until 12.2 is released so that I can try it and see if it works? I am not familiar with how FreeBSD backports their corrections/updates/patches.

Comment 99 Hans Petter Selasky freebsd_committer

2020-01-13 11:39:56 UTC

This fix will only be applied to 12-stable, next week, and will be part of the next stable release.

Else you need to build a custom kernel with this patch applied, in order to test.

Comment 100 Mike Karels freebsd_committer

2020-01-25 16:32:41 UTC

I am now running a stable-12 kernel at r356868, which includes this change.  The problem is unchanged (also my problem of disconnects by the UPS).  I also tried booting with no USB devices attached; the problem still occurs.

Comment 101 Gerard Seibert 2020-02-06 12:21:09 UTC

Apparently, this problem has existed since at least April 2019. That seems, at least to me, to be an awfully long time, at least at this level, for a bug to go unsquashed. I am preparing to purchase another PC similar to the one I now have to replace an older unit. It is apparent that FreeBSD 12.x will not operate on it correctly.

I am starting to wonder if FreeBSD is the best platform for my use. If it is unable to run modern hardware, then perhaps it is time for me to investigate other operating systems. I would hate to start over with a new OS, but I don't seem to be left with too many other options. While 12.2 is still a long way off, 11.4 is due this summer. I am not even sure if that will work on my PC or simple make the system unbootable.

I guess my quest is if there is anyone actively working on this bug and what the prognosis is for this problem.

I don't mean to sound crass, but this problem has gone on, in my opinion, far too long. With newer systems coming on the market daily, I honestly believe that this problem will manifest into a much larger problem and potential embarrassment for FreeBSD.

Comment 102 Hans Petter Selasky freebsd_committer

2020-02-06 13:18:23 UTC

Hi,

I have a no-name Skylake based laptop and USB XHCI works fine.

You might find the following read interesting:

http://freebsd.1045724.x6.nabble.com/Can-t-reattach-USB-devices-Lenovo-bug-td6369656.html

If I'm right, this issue has nothing to do with the generic USB/XHCI, it is entirely ACPI/BIOS related and you should bring it up with the vendor of your computer.

--HPS

Comment 103 Gerard Seibert 2020-02-06 16:48:31 UTC

I just updated the BIOS on my system. That version was only one version older than the new one. Anyway, this is the output from 'dmidecode'.

# dmidecode 3.2
# SMBIOS entry point at 0x6fe8d000
Found SMBIOS entry point in EFI, reading table from /dev/mem.
SMBIOS 3.1 present.

Handle 0x0000, DMI type 0, 24 bytes
BIOS Information
Vendor: Dell Inc.
Version: 1.1.10
Release Date: 10/31/2019
Address: 0xF0000
Runtime Size: 64 kB
ROM Size: 16 MB
Characteristics:
PCI is supported
PNP is supported
BIOS is upgradeable
BIOS shadowing is allowed
Boot from CD is supported
Selectable boot is supported
EDD is supported
5.25"/1.2 MB floppy services are supported (int 13h)
3.5"/720 kB floppy services are supported (int 13h)
3.5"/2.88 MB floppy services are supported (int 13h)
Print screen service is supported (int 5h)
Serial services are supported (int 14h)
Printer services are supported (int 17h)
ACPI is supported
USB legacy is supported
BIOS boot specification is supported
Function key-initiated network boot is supported
Targeted content distribution is supported
UEFI is supported
BIOS Revision: 1.1

Does anyone have any idea if this will correct the problem? I would really hate to trash the whole system again if this doesn't correct the problem. Also, why would this problem NOT exist on a FreeBSD 11.x (and perhaps older) system? It only started with 12.x. I find it somewhat difficult to believe it is solely a BIOS version problem.

Comment 104 Hans Petter Selasky freebsd_committer

2020-02-06 16:58:50 UTC

> Does anyone have any idea if this will correct the problem? I would really hate to trash the whole system again if this doesn't correct the problem. Also, why would this problem NOT exist on a FreeBSD 11.x (and perhaps older) system? It only started with 12.x. I find it somewhat difficult to believe it is solely a BIOS version problem.

I think for the kind of debugging which is needed, physical access to your machine will be required. How much are you willing to pay to have this issue resolved? Can you for example donate or ship a machine to me (Norway) which have these problems and I can probably look into it in a 1-2 months time frame.

--HPS

Comment 105 Gerard Seibert 2020-02-16 19:14:41 UTC

(In reply to Hans Petter Selasky from comment #104)

I wish I had a spare machine to ship off to you; however, I don't. I still find it hard to believe that this bug was not caught during the initial beta testing of version 12.x. I did contact Dell again, and they insisted the problem is with FreeBSD. They have not had a single report of a failure like this with another OS. That is probably true. Even FreeBSD 11.x works fine.

Comment 106 Gerard Seibert 2020-04-04 13:27:13 UTC

Created attachment 213055 [details]
Complete 'dmidecode' output for Dell XPS 8930

Comment 107 Gerard Seibert 2020-04-26 18:37:15 UTC

Created attachment 213822 [details]
FreeBSD 11.3 boot log

Comment 108 Gerard Seibert 2020-04-26 18:39:00 UTC

Hans Petter Selasky, since I don't have a spare machine to send you, I thought that sending the bootup log from FreeBSD 11.3-RELEASE-p8, which does work on this machine, might be of some use.

Comment 109 Mike Karels freebsd_committer

2020-05-16 20:23:45 UTC

Note that FreeBSD 11.4-BETA2 does not have this problem either.

Comment 110 Gerard Seibert 2020-05-16 23:49:05 UTC

(In reply to Mike Karels from comment #109)

Thanks, Mike. I was somewhat hesitant about moving up to 11.4 because of this bug. It would seem that someone "screwed the pooch" with version 12.x, and no one seems particularly interested in tracking it down. In my humble opinion, this bug will not be squashed until enough newer or similar PCs come on-line.

Comment 111 Gerard Seibert 2020-06-18 00:01:14 UTC

Can anyone confirm that this bug has been squashed? Aryeh Friedman <aryeh.friedman@gmail.com> claims that this bug has been fixed eons ago and that this incident should be closed. I would like some confirmation on this, if possible.

Thanks!

Comment 112 Mike Karels freebsd_committer

2020-06-18 01:25:13 UTC

This bug is not fixed in 12.1-STABLE as of April 26 (r360337).  If there is reason to think that it has been fixed since then, I'm happy to test.

Comment 113 Ihor Antonov 2020-07-03 23:57:18 UTC

The bug is alive. Just had it today on FreeBSD 13.0-CURRENT r362882

ThinkPad T470

The exact circumstances under which it happened are not clear. I noticed it when I detached external USB-C monitor and USB2 keyboard and took my laptop to another room.

I will try to figure out the exact sequence to reproduce the bug.

Comment 114 Ihor Antonov 2020-07-04 03:09:38 UTC

Yes, I can reproduce the bug by unplugging my external monitor.
Monitor is connected via USB-C cable. As soon as I unplug it I see this message in dmesg appearing roughly every second.

uhub_reattach_port: giving up port reset - device vanished


If I plug the monitor back - the message stops.

Comment 115 Gerard Seibert 2020-07-04 09:17:48 UTC

(In reply to Ihor Antonov from comment #114)

My monitor is connected in the usual manner; however, I am utilizing a keyboard & mouse connected via USB. The next time I attempt to install 12.x, I'll use a hardwired keyboard and mouse and see if the error still exists.

Comment 116 Gerard Seibert 2020-08-03 16:10:09 UTC

Hi, I am just doing my 'beginning of the month' request to see if any headway had been made on discovering the cause and hopefully the cure for this bug. With the release of FreeBSD 12.2 expected on or about the end of October and version 13.0 only five months later, it would really be nice if I, and others whose PCs are suffering from this affliction, could update our systems to something <= 11.x

Thank you for your efforts in this matter.

Comment 117 Kubilay Kocak freebsd_committer

2020-08-05 09:13:30 UTC

Idea: Might be worth someone (who is affected) explicitly testing reverting base r343134 to see if the issue is still reproducible

Comment 118 Gerard Seibert 2020-08-05 10:28:16 UTC

(In reply to Kubilay Kocak from comment #117)

I tried installing FreeBSD 12.1-RELEASE a couple of months ago, and the bug still existed in that version. Are you inferring that I should try the STABLE version instead?

Comment 119 Mike Karels freebsd_committer

2020-08-17 02:06:36 UTC

I built a fresh kernel(r364289) with r343134 reverted; the problem still occurs.

Comment 120 Gerard Seibert 2020-08-17 10:17:47 UTC

(In reply to Mike Karels from comment #119)

This bug has obviously not been given the attention it deserves. With the planned release of versions 12.2 in October of this year and 13.0 next March, it will certainly not bode well for FreeBSD or its user base if they are released with this bug still intact.

Since this bug did not exist in versions prior to 12.x, there should exist some mechanism to backtrack and discover what went wrong.

Comment 121 Bjoern A. Zeeb freebsd_committer

2020-08-21 18:13:08 UTC

Hi,

I just unplugged a USB hub (cascade) USB-C (usb3 port) -> usb2 hub -> usb2 hub -> 3 uftdis and another uftdi at the first usb2 hub.

After that console endlessly started to scroll with:

uhub_reattach_port: giving up port reset - device vanished
uhub_reattach_port: giving up port reset - device vanished
uhub_reattach_port: giving up port reset - device vanished
uhub_reattach_port: giving up port reset - device vanished
uhub_reattach_port: giving up port reset - device vanished
uhub_reattach_port: giving up port reset - device vanished
uhub_reattach_port: giving up port reset - device vanished
uhub_reattach_port: giving up port reset - device vanished


as well and didn't clear up.  It is clear the device is gone and it is correct, why do we not clean this up?

I did run usbconfig in another shell and it is locked:

# usbconfig
load: 0.03  cmd: usbconfig 5017 [USB config SX lock] 1.36r 0.00u 0.00s 0% 2496k
mi_switch+0xc1 sleepq_catch_signals+0x3d1 sleepq_wait_sig+0x9 _sx_xlock_hard+0x451 usbd_enum_lock_sig+0xa1 usb_ref_device+0x221 usb_open+0x106 devfs_open+0x145 VOP_OPEN_APV+0x1c vn_open_vnode+0x1eb vn_open_cred+0x3b7 kern_openat+0x249 amd64_syscall+0x119 fast_syscall_common+0xf8

(coming back to this now after applying the two sysctls mentioned in this thread I see):

ugen0.1: <0x8086 XHCI root HUB> at usbus0, cfg=0 md=HOST spd=SUPER (5.0Gbps) pwr=SAVE (0mA)

Which is not all that is there on this either...


I have saved some logs but I don't know how long I can keep the device in this state (probably another hour or so).  I'll save as much dmesg/log I can.
In case you want me to quickly test something let me know.

Comment 122 Hans Petter Selasky freebsd_committer

2020-08-21 18:19:50 UTC

Try if this unlocks usbconfig:

sysctl hw.usb.disable_enumeration=1
sleep 10
usbconfig

--HPS

Comment 123 Bjoern A. Zeeb freebsd_committer

2020-08-21 20:07:59 UTC

(In reply to Hans Petter Selasky from comment #122)

usbconfig had already returned (maybe by setting the uhub or xhci sysctl to 0x17 I think it was).

I had in the mean time just plugged the USB-C multi-port adapter back in (without anything connected to it) and that stopped the repeated message.

If you want some of the uhub/xhci sysctl dmesg I can email it to you.
It seems the USB-C disconnect was detected but not fully happening?

Comment 124 Hans Petter Selasky freebsd_committer

2020-08-21 20:15:23 UTC

Can you try to reach out to Scott Long about USB-C ?

Comment 125 Bjoern A. Zeeb freebsd_committer

2020-08-22 03:24:28 UTC

(In reply to Hans Petter Selasky from comment #122)

Hmm, I applied the sysctl and removed the USB-C connection again and:

uhub0: detached
ugen0.1: <0x8086 XHCI root HUB> at usbus0 (disconnected)
unknown: at usbus0, port 1, addr 1 (disconnected)
usbus0: detached
xhci0: Controller reset timeout.
xhci0: detached
pci8: detached
pcib8: detached
pci7: detached
pcib7: detached
pci6: detached
pcib6: detached
pci5: detached
pcib5: detached


All the stuff went away :-)

Set the sysctl hw.usb.disable_enumeration=0

and plugged it back in:

pcib5: <ACPI PCI-PCI bridge> at device 0.0 on pci4
pci5: <ACPI PCI bus> on pcib5
pcib6: <PCI-PCI bridge> at device 0.0 on pci5
pci6: <PCI bus> on pcib6
pcib7: <PCI-PCI bridge> at device 1.0 on pci5
pci7: <PCI bus> on pcib7
pcib8: <ACPI PCI-PCI bridge> at device 2.0 on pci5
pci8: <ACPI PCI bus> on pcib8
xhci0: <XHCI (generic) USB 3.0 controller> mem 0xd9f00000-0xd9f0ffff at device 0.0 on pci8
xhci0: 32 bytes context size, 64-bit DMA
usbus0 on xhci0
usbus0: 5.0Gbps Super Speed USB v3.0
ugen0.1: <0x8086 XHCI root HUB> at usbus0
uhub0 on usbus0
uhub0: <0x8086 XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus0
uhub0: 4 ports with 4 removable, self powered
xhci0: Resetting controller
usb_alloc_device: device init 2 failed (USB_ERR_TIMEOUT, ignored)
ugen0.2: <Unknown > at usbus0 (disconnected)
uhub_reattach_port: could not allocate new device
usb_alloc_device: device init 2 failed (USB_ERR_TIMEOUT, ignored)
ugen0.2: <Unknown > at usbus0 (disconnected)
uhub_reattach_port: could not allocate new device
uhub0: at usbus0, port 1, addr 1 (disconnected)
uhub0: detached
uhub0 on usbus0
uhub0: <0x8086 XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus0
uhub0: 4 ports with 4 removable, self powered
xhci0: Resetting controller
usb_alloc_device: device init 2 failed (USB_ERR_TIMEOUT, ignored)
ugen0.2: <Unknown > at usbus0 (disconnected)
uhub_reattach_port: could not allocate new device
usb_alloc_device: device init 2 failed (USB_ERR_TIMEOUT, ignored)
ugen0.2: <Unknown > at usbus0 (disconnected)
uhub_reattach_port: could not allocate new device
uhub0: at usbus0, port 1, addr 1 (disconnected)
uhub0: detached
uhub0 on usbus0
uhub0: <0x8086 XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus0
uhub0: 4 ports with 4 removable, self powered
xhci0: Resetting controller
usb_alloc_device: device init 2 failed (USB_ERR_TIMEOUT, ignored)
ugen0.2: <Unknown > at usbus0 (disconnected)
uhub_reattach_port: could not allocate new device
usb_alloc_device: device init 2 failed (USB_ERR_TIMEOUT, ignored)
ugen0.2: <Unknown > at usbus0 (disconnected)
uhub_reattach_port: could not allocate new device
uhub0: at usbus0, port 1, addr 1 (disconnected)
uhub0: detached
uhub0 on usbus0
uhub0: <0x8086 XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus0
...
(and so on)

Comment 126 Mark Millard 2020-08-24 20:18:14 UTC

(In reply to Bjoern A. Zeeb from comment #125)

Your repeating sequence with the "(USB_ERR_TIMEOUT, ignored)"
text prompts the following note for a RPi4B context (edk2
UEFI/ACPI based boot):

I found that trying to boot with a kernel that had been built
with -mcpu=cortex-a72 the RPi4B would not boot and was reporting
such a repeating sequence.  (No microsd card is present for the
configuration: direct USB3 SSD based booting.)

But when the kernel is instead build with -mcpu=cortex-a53 the
kernel boots, getting only one:

xhci0: Resetting controller

The two -mcpu's select the same -march value but different -mtune
values. To my knowledge, the only differences should be performance
related as far as operation goes. So timing sensitive for the
problem.

For reference for the failed boot (cortex-a72 tuning) reports:
(I later show a success from cortex-a53 tuning.)

ugen0.1: <Generic XHCI root HUB> at usbus0
ugen1.1: <DWCOTG OTG Root HUB> at usbus1
uhub0 on usbus0
uhub1 on usbus1
uhub1: <DWCOTG OTG Root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus1
uhub0: <Generic XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus0
uhub1: 1 port with 1 removable, self powered
uhub0: 5 ports with 4 removable, self powered
Root mount waiting for: usbus0 CAM
xhci0: Resetting controller
usb_alloc_device: device init 2 failed (USB_ERR_TIMEOUT, ignored)
ugen0.2: <Unknown > at usbus0 (disconnected)
uhub_reattach_port: could not allocate new device
Root mount waiting for: usbus0 CAM
usb_alloc_device: device init 2 failed (USB_ERR_TIMEOUT, ignored)
ugen0.2: <Unknown > at usbus0 (disconnected)
uhub_reattach_port: could not allocate new device
uhub0: at usbus0, port 1, addr 1 (disconnected)
uhub0: detached
uhub0 on usbus0
uhub0: <Generic XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus0
uhub0: 5 ports with 4 removable, self powered
Root mount waiting for: CAM
xhci0: Resetting controller
usb_alloc_device: device init 2 failed (USB_ERR_TIMEOUT, ignored)
ugen0.2: <Unknown > at usbus0 (disconnected)
uhub_reattach_port: could not allocate new device
Root mount waiting for: CAM
usb_alloc_device: device init 2 failed (USB_ERR_TIMEOUT, ignored)
ugen0.2: <Unknown > at usbus0 (disconnected)
uhub_reattach_port: could not allocate new device
uhub0: at usbus0, port 1, addr 1 (disconnected)
uhub0: detached
uhub0 on usbus0
uhub0: <Generic XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus0
uhub0: 5 ports with 4 removable, self powered
Root mount waiting for: CAM
xhci0: Resetting controller
usb_alloc_device: device init 2 failed (USB_ERR_TIMEOUT, ignored)
ugen0.2: <Unknown > at usbus0 (disconnected)
uhub_reattach_port: could not allocate new device
Root mount waiting for: CAM
usb_alloc_device: device init 2 failed (USB_ERR_TIMEOUT, ignored)
Root mount waiting for:ugen0.2: <Unknown > at usbus0 (disconnected)
CAM
uhub_reattach_port: could not allocate new device
uhub0: at usbus0, port 1, addr 1 (disconnected)
uhub0: detached
uhub0 on usbus0
uhub0: <Generic XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus0
uhub0: 5 ports with 4 removable, self powered
Root mount waiting for: CAM
xhci0: Resetting controller
usb_alloc_device: device init 2 failed (USB_ERR_TIMEOUT, ignored)
ugen0.2: <Unknown > at usbus0 (disconnected)
uhub_reattach_port: could not allocate new device
Root mount waiting for: CAM
Mounting from ufs:/dev/gpt/RPi4Broot failed with error 2; retrying for 10 more seconds
usb_alloc_device: device init 2 failed (USB_ERR_TIMEOUT, ignored)
ugen0.2: <Unknown > at usbus0 (disconnected)
uhub_reattach_port: could not allocate new device
uhub0: at usbus0, port 1, addr 1 (disconnected)
uhub0: detached
uhub0 on usbus0
uhub0: <Generic XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus0
uhub0: 5 ports with 4 removable, self powered
xhci0: Resetting controller
usb_alloc_device: device init 2 failed (USB_ERR_TIMEOUT, ignored)
ugen0.2: <Unknown > at usbus0 (disconnected)
uhub_reattach_port: could not allocate new device
usb_alloc_device: device init 2 failed (USB_ERR_TIMEOUT, ignored)
ugen0.2: <Unknown > at usbus0 (disconnected)
uhub_reattach_port: could not allocate new device
uhub0: at usbus0, port 1, addr 1 (disconnected)
uhub0: detached
uhub0 on usbus0
uhub0: <Generic XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus0
uhub0: 5 ports with 4 removable, self powered
xhci0: Resetting controller
usb_alloc_device: device init 2 failed (USB_ERR_TIMEOUT, ignored)
ugen0.2: <Unknown > at usbus0 (disconnected)
uhub_reattach_port: could not allocate new device
usb_alloc_device: device init 2 failed (USB_ERR_TIMEOUT, ignored)
ugen0.2: <Unknown > at usbus0 (disconnected)
uhub_reattach_port: could not allocate new device
uhub0: at usbus0, port 1, addr 1 (disconnected)
uhub0: detached
uhub0 on usbus0
uhub0: <Generic XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus0
uhub0: 5 ports with 4 removable, self powered
xhci0: Resetting controller
usb_alloc_device: device init 2 failed (USB_ERR_TIMEOUT, ignored)
ugen0.2: <Unknown > at usbus0 (disconnected)
uhub_reattach_port: could not allocate new device
usb_alloc_device: device init 2 failed (USB_ERR_TIMEOUT, ignored)
ugen0.2: <Unknown > at usbus0 (disconnected)
uhub_reattach_port: could not allocate new device
uhub0: at usbus0, port 1, addr 1 (disconnected)
uhub0: detached
uhub0 on usbus0
uhub0: <Generic XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus0
uhub0: 5 ports with 4 removable, self powered
xhci0: Resetting controller
usb_alloc_device: device init 2 failed (USB_ERR_TIMEOUT, ignored)
ugen0.2: <Unknown > at usbus0 (disconnected)
uhub_reattach_port: could not allocate new device
Mounting from ufs:/dev/gpt/RPi4Broot failed with error 2.

Loader variables:
 vfs.root.mountfrom=ufs:/dev/gpt/RPi4Broot

Manual root filesystem specification:
 <fstype>:<device> [options]
     Mount <device> using filesystem <fstype>
     and with the specified (optional) option list.

   eg. ufs:/dev/da0s1a
       zfs:zroot/ROOT/default
       cd9660:/dev/cd0 ro
         (which is equivalent to: mount -t cd9660 -o ro /dev/cd0 /)

 ?               List valid disk boot devices
 .               Yield 1 second (for background tasks)
 <empty line>    Abort manual input

mountroot> usb_alloc_device: device init 2 failed (USB_ERR_TIMEOUT, ignored)
ugen0.2: <Unknown > at usbus0 (disconnected)
uhub_reattach_port: could not allocate new device
uhub0: at usbus0, port 1, addr 1 (disconnected)
uhub0: detached
uhub0 on usbus0
uhub0: <Generic XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus0
uhub0: 5 ports with 4 removable, self powered
. . .

(It keeps going.)

For reference the successful boot (cortex-a53 tuning) reports:

ugen0.1: <Generic XHCI root HUB> at usbus0
ugen1.1: <DWCOTG OTG Root HUB> at usbus1
uhub0 on usbus0
uhub1 on usbus1
uhub0: <Generic XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus0
uhub1: <DWCOTG OTG Root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus1
uhub1: 1 port with 1 removable, self powered
uhub0: 5 ports with 4 removable, self powered
Root mount waiting for: usbus0 CAM
xhci0: Resetting controller
usb_alloc_device: device init 2 failed (USB_ERR_TIMEOUT, ignored)
ugen0.2: <Unknown > at usbus0 (disconnected)
uhub_reattach_port: could not allocate new device
Root mount waiting for: usbus0 CAM
usb_alloc_device: device init 2 failed (USB_ERR_TIMEOUT, ignored)
ugen0.2: <Unknown > at usbus0 (disconnected)
uhub_reattach_port: could not allocate new device
uhub0: at usbus0, port 1, addr 1 (disconnected)
uhub0: detached
uhub0 on usbus0
uhub0: <Generic XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus0
uhub0: 5 ports with 4 removable, self powered
Root mount waiting for: CAM
ugen0.2: <vendor 0x2109 USB2.0 Hub> at usbus0
uhub2 on uhub0
uhub2: <vendor 0x2109 USB2.0 Hub, class 9/0, rev 2.10/4.21, addr 1> on usbus0
uhub2: 4 ports with 4 removable, self powered
Root mount waiting for: CAM
Root mount waiting for: CAM
ugen0.3: <OWC Envoy Pro mini> at usbus0
umass0 on uhub0
umass0: <OWC Envoy Pro mini, class 0/0, rev 3.00/1.00, addr 2> on usbus0
umass0:  SCSI over Bulk-Only; quirks = 0x0100
umass0:0:0: Attached to scbus0
Root mount waiting for: CAM
Root mount waiting for: CAM
Root mount waiting for: CAM
da0 at umass-sim0 bus 0 scbus0 target 0 lun 0
da0: <OWC Envoy Pro mini 0> Fixed Direct Access SPC-4 SCSI device
da0: Serial Number 000000000011
da0: 400.000MB/s transfers
da0: 228936MB (468862128 512 byte sectors)
da0: quirks=0x2<NO_6_BYTE>
Setting hostuuid: a4f7fbeb-f668-11de-b280-ebb65474e619.
Setting hostid: 0xcd8e9e25.
Starting file system checks:
/dev/gpt/RPi4Broot: FILE SYSTEM CLEAN; SKIPPING CHECKS
. . .


head -r363590 was in use for the experiments.

I only noticed this issue when I stopped using the same
media to boot a Rock64 and the RPi4B: The Rock64 is a
cortex-a53 context and I'd been using only that kind of
kernel build until I made separate RPi4B media. I
reverted to building with cortex-a53 for the RPi4B on
the separate media in order to have it operational.

Comment 127 Bawdy 2020-09-10 20:28:31 UTC

Hi, I want to report having duplicated this exact same error on BHYVE virtual machine with PCI passthrough for the USB.

HARDWARE:  TRX40 Tachi with Ryzen 3960x (amd64)
HOST SYSTEM:  12.1-RELEASE-p8
BHYVE Versions tried:  11.4-RELEASE, 12.1-RELEASE, 12.1-STABLE, 13.0-CURRENT

BEHAVIOR:  
Without PCI passthrough, the host xhci controller operates normally with no errors.  After rebooting host, the bhyve VM can be started, only once, with no USB xhci errors.

However, if the VM is power cycled without cycling the host, I get the same sequence of errors reported by Mark, in a continuous loop, and inserted USB devices are undetected. 

Although the screen is impractical to use due to error messages, it is possible to login.  I can SSH into the VM and otherwise use it normally.  After 3-5 minutes, the loop stops, the VM detects devices, and I can access them.

SPECULATION
Problems with PCI reset?  Perhaps my hardware doesn't support reset or re-initialization of the device without reseting all devices?

Comment 128 Mark Millard 2020-09-14 03:41:59 UTC

(In reply to bawdyanarchist@disroot.org from comment #127)

I'm not sure how your speculation would fit with Cortex-A72 vs.
Cortex-A53 making a difference in my context. One area where
A72 vs. A53 might have a relevant difference is:

Cortex-A72 has speculative execution
Cortex-A53 does not have speculative execution

The Cortex-A72 needs to have some types cache content explicitly
managed in places where speculative execution can have cache
consequences visible if not handled correctly. I've seen code
patched for what was reported to be such an issue for code that
was starting to support A72's. (Not necessarily a FreeBSD
context, just general looking around.)

It leads me to wonder if something is not being properly forced
to be up to date when used. Differing code sequences from tuning
for different targets might have differing properties for such
at some places in the code.

Outside the A72 vs. 53 context, some machines might have similar
variations in behavior, leading some to show problems and others
not.

But, I've still no solid evidence of a specific type of error in
any specific place. The above is just an example of speculation
on my part.

Comment 129 Gerard Seibert 2020-09-14 11:18:16 UTC

(In reply to Mark Millard from comment #128)

As far as I can ascertain, this problem is localized to FreeBSD. I have friends that use other operating systems like Linux on equipment similar to mine that are not experiencing this phenomenon. I am no genius, but perhaps it might be worth investigating what other operating systems are doing or not doing that might be causing this problem. Furthermore, since this problem does not exist in FreeBSD versions prior to 12.x, I would have thought that it would be somewhat easier to find the culprit in this matter. I am not trying to infer that the solution is axiomatic; however, I do find it rather odd that it has persisted for such an extended time and now is evident in at least two versions of FreeBSD -- 12.x and 13.x.

And in the immortal words of Elvira Hancock's advice to Tony Montana, "Don't get high on your own supply."

Comment 130 Bawdy 2020-09-14 23:50:57 UTC

(In reply to Mark Millard from comment #128)

I see your point about the ARM devices.  If I read correctly, this is a problem you didn't have until upgrading to 12.1.  Could be that we have different root causes.  On my Qubes machine, I have to set an option for pci "no-strict-reset" when starting the USB VM, and is quite common with x86 architecture.  

At any rate, I just wanted to add my 0.02 just in case it helps devs troubleshoot.

Comment 131 Mark Millard 2020-09-15 00:36:14 UTC

(In reply to bawdyanarchist@disroot.org from comment #130)

Just to avoid misinterpretations of what I've reported:

I run head/CURRENT, not 12.x . It was not an FreeBSD version
upgrade context at all.

What I did that got the problem in an cortex-a72 context
was to build and install the kernel based on -mcpu=cortex-a72
instead of -mcpu=cortex-a53 (same arch, different tunings).
This was without changing versions (still head -r363590 based).
What I did that got rid of the problem was to revert to
my -mcpu=cortex-a53 kernel build (also used on an cortex-a53
system).

So the -mcpu=??? use was sufficient to vary/control the
behavior, other things unchanged.

Comment 132 Mark Millard 2020-09-15 05:12:04 UTC

(In reply to Mark Millard from comment #126, #128, #131)

Interesting finding? . . .

I've tested putting the -mcpu=cortex-a72 based kernel
on the cortext-a53 system and . . .

The result boots and operates fine. (It is also a USB3
SSD based root file system.)

But the xhci context (from the cortext-a53 system) is:

# dmesg -a | grep -i xhci
xhci0: <Synopsys Designware DWC3> mem 0xff600000-0xff6fffff irq 57 on rk_dwc30
xhci0: 64 bytes context size, 32-bit DMA
usbus0 on xhci0
ugen0.1: <Synopsys XHCI root HUB> at usbus0
uhub0: <Synopsys XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus0

Instead of (from the cortext-a53 kernel on the cortext-a72 system):

# dmesg -a | grep -i xhci | more
xhci0: <Generic USB 3.0 controller> iomem 0x600000000-0x600000fff irq 0 on acpi0
xhci0: 32 bytes context size, 32-bit DMA
usbus0 on xhci0
ugen0.1: <Generic XHCI root HUB> at usbus0
uhub0: <Generic XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus0
xhci0: Resetting controller
uhub0: <Generic XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus0

So more than the cpu type and its properties provide distinctions.

Comment 133 Mark Millard 2020-09-16 06:12:20 UTC

The message:

xhci0: Resetting controller

comes via code in xhci_do_command that has the following comment:

                /*
                 * After some weeks of continuous operation, it has
                 * been observed that the ASMedia Technology, ASM1042
                 * SuperSpeed USB Host Controller can suddenly stop
                 * accepting commands via the command queue. Try to
                 * first reset the command queue. If that fails do a
                 * host controller reset.
                 */

The code involves the only call to usb_bus_reset_async_locked
(in an else) --and that routine produces the message. The special
code for the ASM1042 seems to be the source of the host controller
reset operation sequences in what I and Bjoern reported. (This code
does not check if the device involved is a ASM1042.)

In what I report in comment #126 the cortex-A72 system always gets
just one of those messages for the -mcpu=cortex-a53 based kernel
--but for the -mcpu=cortex-a72 based kernel the A72 overall reruns
the host controller reset indefinitely.

May be the above observation can help narrow down what needs to be
looked at for at least some of what is being reported.

Comment 134 Mark Millard 2020-09-18 05:04:35 UTC

(In reply to Mark Millard from comment #133)

The code in xhci_do_command does a doorbell/mailbox style:

        XWRITE4(sc, door, XHCI_DOORBELL(0), 0);

which seems to get to:

ffff0000007c4b60 <generic_bs_w_4>     str   w3, [x1, x2]
ffff0000007c4b64 <generic_bs_w_4+0x4> ret

But I have not been able to track down there being the
apparently-architecturally-required DSB ST (or DSB)
after prior memory updates but before the write to the
doorbell/mailbox location (execution order for after
then before).

I'm using Barrier_Litmus_Tests_and_Cookbook_A08.pdf for
reference, section 8.1. But I'm not an aarch64 expert
so my interpretation of the material could be at issue.

Comment 135 Mark Millard 2020-09-19 02:51:11 UTC

(In reply to Mark Millard from comment #134)

A cortex-a72 success! (In the form of investigatory code, not
code to check-in as is.)

Just by adding some "dsb st" commands and a "dsb ld" command in
places in the xhci code (just for __arach64__), I've gotten the
cortext-A72 to boot from the USB3 SSD via a -mcpu=cortex-a72
based kernel build. No more indefinitely repeating "Resetting
controller" problem.

I do not claim the additions are the minimal ones. I know one place
is required for sure: the last one added enabled the boot (given
the others were already present). Prior to that I still had the
indefinitely repeating "Resetting controller" problem.

There could be more places that should have dsb st or dsb ld for
overall correctness.

This evidence may suggest somewhat analogous problems for other
platforms than aarch64 that are seeing problems.

For the aarch64 context, the evidence is (the changes are)
as follows. The first "dsb st" textually is the last one I
added.

# svnlite diff /usr/src/sys/dev/usb/controller/xhci.c
Index: /usr/src/sys/dev/usb/controller/xhci.c
===================================================================
--- /usr/src/sys/dev/usb/controller/xhci.c	(revision 363590)
+++ /usr/src/sys/dev/usb/controller/xhci.c	(working copy)
@@ -431,6 +431,9 @@
 
 	phwr->hwr_ring_seg[0].qwEvrsTablePtr = htole64(addr);
 	phwr->hwr_ring_seg[0].dwEvrsTableSize = htole32(XHCI_MAX_EVENTS);
+#ifdef __aarch64__
+__asm __volatile("dsb st" : : : "memory");
+#endif
 
 	DPRINTF("ERDP(0)=0x%016llx\n", (unsigned long long)addr);
 
@@ -1098,6 +1101,9 @@
 
 	while (1) {
 
+#ifdef __aarch64__
+__asm __volatile("dsb ld" : : : "memory");
+#endif
 		temp = le32toh(phwr->hwr_events[i].dwTrb3);
 
 		k = (temp & XHCI_TRB_3_CYCLE_BIT) ? 1 : 0;
@@ -1107,6 +1113,9 @@
 
 		event = XHCI_TRB_3_TYPE_GET(temp);
 
+#ifdef __aarch64__
+__asm __volatile("dsb ld" : : : "memory");
+#endif
 		DPRINTFN(10, "event[%u] = %u (0x%016llx 0x%08lx 0x%08lx)\n",
 		    i, event, (long long)le64toh(phwr->hwr_events[i].qwTrb0),
 		    (long)le32toh(phwr->hwr_events[i].dwTrb2),
@@ -1239,6 +1248,9 @@
 	sc->sc_command_idx = i;
 	sc->sc_command_ccs = j;
 
+#ifdef __aarch64__
+__asm __volatile("dsb st" : : : "memory");
+#endif
 	XWRITE4(sc, door, XHCI_DOORBELL(0), 0);
 
 	err = cv_timedwait(&sc->sc_cmd_cv, &sc->sc_bus.bus_mtx,
@@ -2855,6 +2867,9 @@
 	index = xfer->xroot->udev->controller_slot_id;
 
 	if (xfer->xroot->udev->flags.self_suspended == 0) {
+#ifdef __aarch64__
+__asm __volatile("dsb st" : : : "memory");
+#endif
 		XWRITE4(sc, door, XHCI_DOORBELL(index),
 		    epno | XHCI_DB_SID_SET(xfer->stream_id));
 	}
@@ -4180,6 +4195,9 @@
 
 	for (n = 1; n != XHCI_MAX_ENDPOINTS; n++) {
 		for (p = 0; p != XHCI_MAX_STREAMS; p++) {
+#ifdef __aarch64__
+__asm __volatile("dsb st" : : : "memory");
+#endif
 			XWRITE4(sc, door, XHCI_DOORBELL(index),
 			    n | XHCI_DB_SID_SET(p));
 		}


I'm clearly just "evidence hacking" here. I've no clue what a
good design for allowing aarch64 build to supply having any of
those "dsb st"s or the "dsb ld".

Hopefully someone that knows what they are doing can figure out
if any of the other examples on other platforms are analogous
--and ,if they are, provide some hook for platform to contribute
such code to the xhci code.

Comment 136 Hans Petter Selasky freebsd_committer

2020-09-19 08:04:53 UTC

Mark,

Your finding indicate a problem in usb_pc_cpu_flush() and 

bus_dmamap_sync(pc->tag, pc->map, BUS_DMASYNC_PREWRITE);

Try to put the dsb only after dmamap_sync.

void
usb_pc_cpu_flush(struct usb_page_cache *pc)
{
        if (pc->page_offset_end == pc->page_offset_buf) {
                /* nothing has been loaded into this page cache! */
                return;
        }
        bus_dmamap_sync(pc->tag, pc->map, BUS_DMASYNC_PREWRITE);
}

The PCI I/O memory should be coherent and should not need any DSB's.

--HPS

Comment 137 commit-hook freebsd_committer

2020-09-19 22:38:07 UTC

A commit references this bug:

Author: hselasky
Date: Sat Sep 19 22:37:46 UTC 2020
New revision: 365918
URL: https://svnweb.freebsd.org/changeset/base/365918

Log:
  Fix for use of the XHCI driver on Cortex-A72 by adding a missing cache
  flush operation before writing to the XHCI_ERSTBA_LO/HI register(s).

  PR:		237666
  Discussed with:	Mark Millard <marklmi@yahoo.com>
  MFC after:	1 week
  Sponsored by:	Mellanox Technologies // Nvidia

Changes:
  head/sys/dev/usb/controller/xhci.c

Comment 138 Mark Millard 2020-09-19 22:47:25 UTC

(In reply to commit-hook from comment #137)

Folks with other platforms that show problems under this
bugzilla id (237666) may want to check if the update removes
their problems in this area. I had only the one example type
of context that showed behavior that seemed to fit with
237666. So my testing is limited compared to the variety of
reports that are listed in the comments.

Comment 139 Mike Karels freebsd_committer

2020-09-20 14:56:50 UTC

(In reply to Mark Millard from comment #138)
The change in xhci.c does not fix the repeating "device vanished" messages on my amd64; tested with releng/12.1 plus this fix.  I suspect that Mark tripped over a different bug on arm with similar symptoms (but not quite the same).

Comment 140 Mark Millard 2020-09-20 20:47:28 UTC

(In reply to Mike Karels from comment #139)

Drat. Well, at least Bjoern A. Zeeb's comment #125 might also
be covered by the patch involved for changeset/base/365918 ,
given how closely the console log contents match to what I
reported in response to his report.

Hopefully at some point Bjoern can reported if he sees a
behavior change and if it is an improvement for whatever
type of system #125 was reporting about.

Comment 141 Bjoern A. Zeeb freebsd_committer

2020-09-20 21:00:25 UTC

(In reply to Mark Millard from comment #140)

"ugen0.1: <0x8086 XHCI root HUB> at usbus0 (disconnected)"

is a huge clue that it wasn't arm64 ;-)  It was on an Intel laptop with external USB-C multi-port adapter.

Comment 142 Mark Millard 2020-09-20 21:57:01 UTC

(In reply to Bjoern A. Zeeb from comment #141)

Good to know.

If you repeat the #125 experiment based on the patch,
does it still end up stuck in the repeating sequence
that involves "Resetting controller"? (It seems that
is the only aspect likely to have been changed.) Or
does that part now do something different?

Comment 143 commit-hook freebsd_committer

2020-09-26 08:45:15 UTC

A commit references this bug:

Author: hselasky
Date: Sat Sep 26 08:44:16 UTC 2020
New revision: 366182
URL: https://svnweb.freebsd.org/changeset/base/366182

Log:
  MFC r365918:
  Fix for use of the XHCI driver on Cortex-A72 by adding a missing cache
  flush operation before writing to the XHCI_ERSTBA_LO/HI register(s).

  PR:		237666
  Discussed with:	Mark Millard <marklmi@yahoo.com>
  Sponsored by:	Mellanox Technologies // Nvidia

Changes:
_U  stable/12/
  stable/12/sys/dev/usb/controller/xhci.c

Comment 144 commit-hook freebsd_committer

2020-09-26 08:45:21 UTC

A commit references this bug:

Author: hselasky
Date: Sat Sep 26 08:45:09 UTC 2020
New revision: 366183
URL: https://svnweb.freebsd.org/changeset/base/366183

Log:
  MFC r365918:
  Fix for use of the XHCI driver on Cortex-A72 by adding a missing cache
  flush operation before writing to the XHCI_ERSTBA_LO/HI register(s).

  PR:		237666
  Discussed with:	Mark Millard <marklmi@yahoo.com>
  Sponsored by:	Mellanox Technologies // Nvidia

Changes:
_U  stable/11/
  stable/11/sys/dev/usb/controller/xhci.c

Comment 145 Gerard Seibert 2020-10-27 14:30:36 UTC

I admit to not being the sharpest knife in the drawer, so bear with me here.

1) Will the fix mentioned above correct the problem on a Dell XPS 8930 PC? I have never been able to run anything newer than 11.x.

2) Is/will this fix be incorporated into the FreeBSD 12.2-RELEASE?

Thanks!

Comment 146 Mark Millard 2020-10-27 16:56:16 UTC

(In reply to Gerard Seibert from comment #145)

Comment 139 makes it seem unlikely that the fix above to my context
would be the fix to yours or most of the reports here. (No one, besides
me, commented that there context was fixed and 139 comments that the
context involved was not fixed.)

https://svnweb.freebsd.org/base/releng/12.2/sys/dev/usb/controller/xhci.c?view=log stops at -r365618 from 2020-Sep-10, which is before -r366182 from 2020-Sep-26 .
So currently it does not look like 12.2 is including what fixed my context.

Comment 147 Rpapousek 2020-11-01 14:21:30 UTC

Greetings to everyone at this forum. I'm not good at Linux, but I've figured out how to solve this problem. Each time you see an error message, remove the USB memory and insert it into another USB port, one is correct and then error message will never show up again.
Have a nice day to all.

Comment 148 Gerard Seibert 2020-11-01 18:01:01 UTC

(In reply to Rpapousek from comment #147)
I don't have a machine set up to try this on presently. However, I question why it would work though. I don't have a great knowledge of how USB works exactly on FreeBSD. Can anyone with more knowledge comment?

Rpapousek, are you inferring that the error 'never' shows up again or just doesn't appear until a new boot is started?

Comment 149 Mike Karels freebsd_committer

2020-11-01 18:55:00 UTC

In my case, and in the original bug description, the problem happened with *no* USB devices attached; nothing to unplug.  The reported solution is not a general solution.

Comment 150 Rpapousek 2020-11-01 20:11:16 UTC

(In reply to Mike Karels from comment #149)
I don´t know how did you install your system but in my case when I try install from DVD-Rom or USB DVD-Rom error still occurred. Same result when I chose other usb ports then as the chosen one. After that when I found right one , system was installed without any problems and that stupid error messaga never shown up again. My advice for all users, make Bootable USB then trayng port by port. That will take some time but after that system will working and how I said that stupid message you will newer seen again anymore. That is really easy way but working. I tryed on 2 PC´s and it is OK.

Comment 151 Mike Karels freebsd_committer

2020-11-01 20:46:10 UTC

(In reply to Rpapousek from comment #150)
I think we are talking about different problems, or different manifestations.  My problem had nothing to do with install.  I had installed long ago on a much earlier release, and upgraded numerous times.  When I got to FreeBSD 12 (12.1, I think), I started to see these messages once per second after *every* boot even with no USB devices plugged in.  That was still true with 12.2.  I have since upgraded the motherboard, so the problem went with it.

Comment 152 Rpapousek 2020-11-01 20:59:23 UTC

(In reply to Mike Karels from comment #151)
OK that was my help for users who have has problems as mine with installation. Just by the way I solved problems with Truenas ver.12

Comment 153 Bjoern A. Zeeb freebsd_committer

2020-11-05 16:17:58 UTC

Ok, got it on an arm64 on HEAD which also shows:

uhub_reattach_port: giving up port reset - device vanished
uhub_reattach_port: giving up port reset - device vanished
uhub_reattach_port: giving up port reset - device vanished
uhub_reattach_port: giving up port reset - device vanished
uhub_reattach_port: giving up port reset - device vanished
uhub_reattach_port: giving up port reset - device vanished
uhub_reattach_port: giving up port reset - device vanished
[... for ever ...]

after having added USB_DEBUG to the kernel config.
I assume it is the UBC-C port which I recently added in host mode in the dts.

Comment 154 Bjoern A. Zeeb freebsd_committer

2020-11-05 23:11:24 UTC

I found myself trying to see what is going on (given sometimes it does happen to me on x86 when I plug in/unplug USB[-C] hubs and USB devices.

By the time it started turning logging on is possibly too late.

So I started to log the two values based on which this log message is printed;  I just opened a review at https://reviews.freebsd.org/D27108 for everyone who wants to add it to their self-compiled kernel.

Comment 155 commit-hook freebsd_committer

2020-11-06 22:40:31 UTC

A commit references this bug:

Author: bz
Date: Fri Nov  6 22:40:00 UTC 2020
New revision: 367435
URL: https://svnweb.freebsd.org/changeset/base/367435

Log:
  usb_hub: giving up port reset - device vanished

  Improve the output of the recently often experienced debug message in order
  to gather further data.

  PR:		237666
  Reviewed by:	hselasky
  MFC after:	3 days
  Differential Revision:	https://reviews.freebsd.org/D27108

Changes:
  head/sys/dev/usb/usb_hub.c

Comment 156 commit-hook freebsd_committer

2020-11-09 23:39:05 UTC

A commit references this bug:

Author: bz
Date: Mon Nov  9 23:38:39 UTC 2020
New revision: 367540
URL: https://svnweb.freebsd.org/changeset/base/367540

Log:
  MFC r367435,367539:

    usb_hub: giving up port reset - device vanished

    Improve the output of the recently often experienced debug message in order
    to gather further data.

  PR:		237666

Changes:
_U  stable/12/
  stable/12/sys/dev/usb/usb_hub.c

Comment 157 Gerard Seibert 2020-12-10 11:25:15 UTC

(In reply to commit-hook from comment #156)

I was unable to install FreeBSD 12.x due to this bug. Does this commit correct it, or does it simply change the way debug info for the problem is disclosed? I don't have the time to waste attempting to install this version if it is simply going to fail again.

Thanks!

Comment 158 Hans Petter Selasky freebsd_committer

2020-12-10 12:03:55 UTC

Created attachment 220423 [details]
PCI debug patch

Comment 159 Hans Petter Selasky freebsd_committer

2020-12-10 12:07:05 UTC

Hi Gerard,

There is no news here basically.

However, can you try installing the latest 11 release.

Then checkout sources for FreeBSD 12-stable.

Then do two experiments:

a) Apply the PCI debug patch to an 11-stable kernel. Reboot. Is any output produced?

b) Apply the PCI debug patch to an 12-stable kernel. Reboot. Is any output produced?

I have a suspicion something in ACPI is using input from an invalid PCI register and writing to a valid one.

--HPS

Comment 160 Gerard Seibert 2020-12-10 13:03:35 UTC

(In reply to Hans Petter Selasky from comment #159)

Unfortunately, I only have one system running FreeBSD at present. To do as you ask would require me to take that system offline, and I cannot afford to do it at this time. I am planning on getting another system up and running in the next quarter, so hopefully, I could try your suggestion then. This bug has gone 'unsquashed' for nearly two years now. Believe me when I say I really do want to find a way to fix it permanently. Honestly, I cannot figure out why it apparently only affects FreeBSD systems and not Linux, etcetera.

Comment 161 Eirik Oeverby 2020-12-31 01:28:26 UTC

Thinkpad X1 Carbon gen6, current as of a couple days ago:

uhub_reattach_port: giving up port 1 reset - device vanished: change 0xfb status 0x7fb
uhub_reattach_port: giving up port 2 reset - device vanished: change 0xfb status 0x7fb
uhub_reattach_port: giving up port 3 reset - device vanished: change 0xfb status 0x7fb
uhub_reattach_port: giving up port 4 reset - device vanished: change 0xfb status 0x7fb

Repeating forever. And right now I need to reboot since usbconfig just hangs (when run as root). It does show my Yubikey if I run usbconfig as my own user (limited USB privileges), but pcscd does not start up (hangs). 

This most commonly happens when I plug in/remove something via USB-C. I am not aware of it happening in any other situations.

I agree with everyone who says this is ... for the lack of a better term, annoying.

Comment 162 tech-lists 2021-02-25 17:58:15 UTC

Hi,

I get this error with a supermicro-based server on stable/12 
(stable/12-n232694-f2a88e74470) built on 2012-02-12

My "fix" previously was to strip usb functionality from the kernel and rebuild, 
which did not impact functionality for my use case.

I noticed the problem again after updating stable/12 and installing
a generic kernel:

% dmesg |grep vanished |wc -l
1129

There's nothing attached internally or externally on any usb port.

dmidecode shows:

Base Board Information
        Manufacturer: Supermicro
        Product Name: X9DRE-TF+/X9DR7-TF+

pciconf -lv |grep -i usb

   device     = 'C600/X79 series chipset USB2 Enhanced Host Controller'                                            
    subclass   = USB
    device     = 'C600/X79 series chipset USB2 Enhanced Host Controller'
    subclass   = USB
    device     = 'ASM1142 USB 3.1 Host Controller'
    subclass   = USB

In the meantime, need to upgrade this hardware to stable/13

Comment 163 Gerard Seibert 2021-02-25 21:01:04 UTC

(In reply to tech-lists from comment #162)
I am in the same boat. Presently, I am stuck on FreeBSD 11. I cannot move past this version due to this bug. Considering that it was not present in versions <= 11 and has not been corrected in version 12, I doubt that it is of any priority to anyone. At the rate this is going, I will have to invest in a new distro.

# dmidecode 3.3
# SMBIOS entry point at 0x6fe8d000
Found SMBIOS entry point in EFI, reading table from /dev/mem.
SMBIOS 3.1 present.
89 structures occupying 5312 bytes.

Handle 0x0000, DMI type 0, 24 bytes
BIOS Information
     Vendor: Dell Inc.
     Version: 1.1.15
     Release Date: 07/01/2020
     Address: 0xF0000
     Runtime Size: 64 kB
     ROM Size: 16 MB
     Characteristics:
          PCI is supported
          PNP is supported
          BIOS is upgradeable
          BIOS shadowing is allowed
          Boot from CD is supported
          Selectable boot is supported
          EDD is supported
          5.25"/1.2 MB floppy services are supported (int 13h)
          3.5"/720 kB floppy services are supported (int 13h)
          3.5"/2.88 MB floppy services are supported (int 13h)
          Print screen service is supported (int 5h)
          Serial services are supported (int 14h)
          Printer services are supported (int 17h)
          ACPI is supported
          USB legacy is supported
          BIOS boot specification is supported
          Function key-initiated network boot is supported
          Targeted content distribution is supported
          UEFI is supported
     BIOS Revision: 1.1

Handle 0x0001, DMI type 1, 27 bytes
System Information
     Manufacturer: Dell Inc.
     Product Name: XPS 8930
     Version: 1.1.15
     Serial Number: 19WS903
     UUID: 4c4c4544-0039-5710-8053-b1c04f393033
     Wake-up Type: Power Switch
     SKU Number: 0859
     Family: XPS

Handle 0x0002, DMI type 2, 15 bytes
Base Board Information
     Manufacturer: Dell Inc.
     Product Name: 0T2HR0
     Version: A01
     Serial Number: .19WS903.CNPE10098R08VP.
     Asset Tag:
     Features:
          Board is a hosting board
          Board is replaceable
     Location In Chassis: Default string
     Chassis Handle: 0x0003
     Type: Motherboard
     Contained Object Handles: 0

pciconf -lv | grep -i "usb"
    device     = '200 Series/Z370 Chipset Family USB 3.0 xHCI Controller'
    subclass   = USB
    device     = 'ASM1142 USB 3.1 Host Controller'
    subclass   = USB
    device     = 'JHL6540 Thunderbolt 3 USB Controller (C step) [Alpine Ridge 4C 2016]'
    subclass   = USB

Comment 164 tech-lists 2021-02-26 16:43:08 UTC

(In reply to Gerard Seibert from comment #163)

Do you absolutely require usb3?

We have in common this hardware:

device     = 'ASM1142 USB 3.1 Host Controller'

I'm going to build a GENERIC kernel but without xhci to see if the issue goes away. I'll post results here.

Comment 165 Gerard Seibert 2021-02-26 17:56:05 UTC

(In reply to tech-lists from comment #164)

Nice catch. I want to find out if others experiencing this problem are also using the same hardware.

The question should not be if I really require usb3, but assuming that is the problem, why everyone else except apparently FreeBSD can support it, and why did this problem not manifest itself until FreeBSD version => 12?

Comment 166 tech-lists 2021-02-27 15:09:57 UTC

(In reply to Gerard Seibert from comment #165)

OK. Commenting out device xhci from the generic config "fixes" the problem, in that the "device vanished" messages no longer appear at the console/dmesg.[1]

So this narrows it down to a problem between 'ASM1142 USB 3.1 Host Controller' and xhci

[1] ofc theres no usb3 connectivity now, either.

This is on stable/12-n232774-d88ac710ce5 built today (new world+kernel)

So there needs to be either a fix for the issue or a couple of workarounds depending on your use case:

1. presumably your usb3 is built into the motherboard. switch it off in the bios and install a usb3 card known to work if you really need usb3.

2. just do what i did and disable xhci and recompile if you can do without usb3 functionality. I can do this because the machine in question is a headless server rather than a desktop.

Comment 167 tech-lists 2021-02-27 18:06:49 UTC

(In reply to Gerard Seibert from comment #165)

Just for curiosity, I tried commenting out ehci (i.e. disabling usb2) and uncommented xhci (so enabling usb3) and then rebuilding kernel, and it *worked* with no errors on usb3.

The usb3-connected disk is Seagate ST4000LM016 4TB SATA. Transferred large files across, no errors, tested with sha512

Comment 168 Gerard Seibert 2021-03-04 18:16:13 UTC

(In reply to tech-lists from comment #167)

I really wish that this could be corrected and included with the release of FreeBSD version 13. I know it will never happen; however, it would be nice to at least hope that it gets into 13.1 anyway. I am stuck at version 11. I don't build custom kernels. I download the 'bootonly' disc and start from there on a clean HD. I hate having a lot of leftover clutter hanging around.

Comment 169 Hans Petter Selasky freebsd_committer

2021-03-04 18:33:12 UTC

Is it possible to get SSH root access to a machine exhibiting this problem?

Need to be able to install a new kernel there at least.

--HPS

Comment 170 Mohammad Noureldin 2021-03-24 10:05:48 UTC

(In reply to Hans Petter Selasky from comment #169)
Hi HPS,

I came to know about this issue from this mail thread [1]. I went through all the comments mentioned here and given the comments form both tech-lists and Gerard Seibert, starting from here [2], I did this analysis:


Mike Karels => Gigabyte MB => GA-X99P-SLI (rev. 1.0)
Gerard Seibert => Dell XPS 8930 => 0T2HR0
tech-lists => Supermincro MB => X9DRE-TF+/X9DR7-TF+

Possible Root Causes:
---------------------
- Having a mix of USB 2.0, USB 3.0 and USB 3.x ?

- Chipsets ?
-# Gigabyte (GA-X99P-SLI (rev. 1.0))
-#- C610: Server and Workstation/X99 Highend Desktop (HEDT)

-# Dell XPS 8930 (0T2HR0, Version: A01)
-#- 200 Series (Including X299 Successor of X99) (HEDT) /Z370 (DT)

-# Supermicro X9DRE-TF+
-#- Intel C602J chipset a variant of C600 (Server and Workstation)/X79 (Predecessor of X99)

-# Supermicro X9DR7-TF+
-#- Intel C602J chipset a variant of C600 (Server and Workstation)/X79 (Predecessor of X99)


From what I could notice (given my limited experience I have to say), these are the common details amongst all reported cases so far.

Trying to test the 1st possible root cause (Having a mix of USB 2.0, USB 3.0 and USB 3.x ?), I ran tests on:

Lenovo Thinkpad W540:
---------------------
Chipset: Mobile Intel QM87 Express (having USB 2.0 and USB 3.0)


I tried different USB UEFI Setup Configurations, but I didn't see any of the errors mentioned.

That lead me to the path that it could be a mix of both Possible Root Causes, hence I did a bit of research and I've found these 2 machines available for purchase:


Dell XPS 8920:
--------------
Chipset: Intel Z270
    => https://www.dell.com/support/manuals/nl-nl/xps-8920-desktop/xps-8920-setup-and-specifications/system-information?guid=guid-eb144d9a-c8d8-45b8-bab5-9399c3eafb16&lang=en-us
    => https://www.dell.com/support/manuals/nl-nl/xps-8920-desktop/xps-8920-setup-and-specifications/poorten-en-connectoren?guid=guid-881045b1-98e0-4e6d-a2bc-6478dc7ea6ce&lang=en-us

Dell Precision T5810:
---------------------
Chipset: Intel C612
    => https://www.dell.com/support/manuals/nl-nl/precision-t5810-workstation/precision_t5810_om_pub/technical-specifications?guid=guid-cb1a5aa6-1e70-44b9-b690-59507a3a9f31&lang=en-us


I guess that the Dell XPS 8920 one looks like a better test case, but your input would be highly appreciated. 

Also, I want to take the chance and learn more as I want to contribute to the FreeBSD project, so it would be great if you can provide me with guidance and mentorship.


Looking forward to your reply


[1] - https://lists.freebsd.org/pipermail/freebsd-questions/2021-March/293390.html
[2] - https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=237666#c164

Comment 171 Gerard Seibert 2021-05-02 13:01:18 UTC

Just checking in. Has a fix been found for this problem? It doesn't appear that there has been any movement on it for quite a while.

Comment 172 tech-lists 2021-05-03 02:10:09 UTC

(In reply to Gerard Seibert from comment #171)
Hi

as far as I can tell, the issue is still present in a world built 28th March 2021 from stable/13-n245213-f28241ca134

Comment 173 Gerard Seibert 2021-05-03 10:05:40 UTC

Well, considering that this bug is over two years old now, and apparently with no one actively researching the problem, that I can safely say it will never get fixed. It is a shame since the bug did not exist in versions 11 and older.

Time to talk Fedora f r a ride. :(

Comment 174 Graham Perrin freebsd_committer

2021-05-03 10:23:09 UTC

(In reply to Gerard Seibert from comment #173)

Kindly refrain. Observe and understand the comments from people who contribute positively towards progress.

Comment 175 Kurt Jaeger freebsd_committer

2021-05-03 19:31:27 UTC

(In reply to Gerard Seibert from comment #173)

The problem is not that no-one is working on it, the problem is that
hsp, who might be able to fix it, does not have a hardware where he can reproduce the problem.

So normally, users bisect in the commits until they find the commit in question or hsp@ somehow gets access to a system where it occurs to debug it.

Comment 176 Bjoern A. Zeeb freebsd_committer

2021-05-07 22:11:38 UTC

Created attachment 224756 [details]
usb: uhub return error from uhub_explore

! usb: uhub return error from uhub_explore
!
! uhub_explore would always return no-error even if
! and error occured in the loop over all ports.
! The loop properly terminates in case of error
! (or continues to the next port in which case the error will
! be reset).
! Propagate the possible error to the caller.
! This seems to fix the continous
!   "uhub_reattach_port: giving up port reset - device vanished"
! messages people are seeing (in theory and based on my own testing).
! XXX-BZ ^^^^^^^^^^^^^^^^^^^^ remove for final commit message.
!
! PR:           237666
! Submitted-by: phk
! Tested-by:    bz, ...
! Reviewed-by:  ...
! Diff-Re..:
!

Comment 177 Bjoern A. Zeeb freebsd_committer

2021-05-07 22:12:52 UTC

(In reply to Bjoern A. Zeeb from comment #176)

If people can test this 3-line change and report back that would be awesome.
If more people find this to be the solution I'll put it up in Phabricator for oficcial review unless @hsealsky wants to approve here.

Comment 178 Hans Petter Selasky freebsd_committer

2021-05-07 23:25:52 UTC

Patch looks good. Strange HUB with zero ports :-)

--HPS

Comment 179 Hans Petter Selasky freebsd_committer

2021-05-18 13:57:46 UTC

@bz: I'm taking your patch upstream with one minor modification.

The return value from the sub-explore function, must not be propagated down, only checked :-)

Comment 180 commit-hook freebsd_committer

2021-05-18 14:16:43 UTC

A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=e5ff940a81b56cb236795e0059c44981053f8404

commit e5ff940a81b56cb236795e0059c44981053f8404
Author:     Hans Petter Selasky <hselasky@FreeBSD.org>
AuthorDate: 2021-05-18 13:52:00 +0000
Commit:     Hans Petter Selasky <hselasky@FreeBSD.org>
CommitDate: 2021-05-18 14:11:35 +0000

    Propagate down USB explore error codes, so that failures to enumerate USB HUBs
    behind USB HUBs are detected and the USB reset counter logic will kick in
    preventing enumeration of continuously failing ports.

    Submitted by:   phk@
    Tested by:      bz@
    PR:             237666
    MFC after:      1 week
    Sponsored by:   Mellanox Technologies // NVIDIA Networking

 sys/dev/usb/usb_hub.c | 23 +++++++++++++----------
 1 file changed, 13 insertions(+), 10 deletions(-)

Comment 181 Bjoern A. Zeeb freebsd_committer

2021-05-19 08:58:50 UTC

Thanks Hans for picking this up.  I was waiting to hear back from anyone here.  I'll pass the PR to you and you can probably close it after MFCs.

For what is worth I think I do understand what triggered the problem for me a lot better these days.  I wrote some notes here https://wiki.freebsd.org/BjoernZeeb/USB  ;  if needed we can take that offline.

Comment 182 tech-lists 2021-05-20 03:51:48 UTC

(In reply to Bjoern A. Zeeb from comment #181)

Hi,

If you need another data point, I'll patch and rebuild an affected stable/13 system later this evening UTC and report back here

Comment 183 Hans Petter Selasky freebsd_committer

2021-05-20 08:29:48 UTC

Please apply both of these patches when testing:

Patch 1:
https://cgit.freebsd.org/src/commit/?id=70ffaaa69c830d26b59136d0b0447ab2f8683db8

Patch 2:
https://cgit.freebsd.org/src/commit/?id=e5ff940a81b56cb236795e0059c44981053f8404

--HPS

Comment 184 tech-lists 2021-05-20 19:00:01 UTC

(In reply to Hans Petter Selasky from comment #183)

Hi,

ok have applied those commits and rebuilt/rebooted. Seems to be no console noise from usb3. I just need to wait a few hrs to get someone to plug something usb3 into the usb3 interface ;)

Comment 185 tech-lists 2021-05-20 20:46:57 UTC

(In reply to Hans Petter Selasky from comment #183)

Hi,

Seems those patches fix the issue permanently, thank you

Comment 186 Hans Petter Selasky freebsd_committer

2021-05-20 21:31:21 UTC

Nice to know.

Anyone else can confirm these fixes?

--HPS

Comment 187 Mike Karels freebsd_committer

2021-05-20 21:37:55 UTC

fwiw, I can't easily confirm whether this fixes the problem on my system, as that motherboard is no longer in a FreeBSD system.

Comment 188 commit-hook freebsd_committer

2021-06-02 12:25:25 UTC

A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=ad9d0df7c66a02db53f102e87e6f8414e8da67fc

commit ad9d0df7c66a02db53f102e87e6f8414e8da67fc
Author:     Hans Petter Selasky <hselasky@FreeBSD.org>
AuthorDate: 2021-05-18 13:52:00 +0000
Commit:     Hans Petter Selasky <hselasky@FreeBSD.org>
CommitDate: 2021-06-02 11:25:20 +0000

    Propagate down USB explore error codes, so that failures to enumerate USB HUBs
    behind USB HUBs are detected and the USB reset counter logic will kick in
    preventing enumeration of continuously failing ports.

    Submitted by:   phk@
    Tested by:      bz@
    PR:             237666
    Sponsored by:   Mellanox Technologies // NVIDIA Networking

    (cherry picked from commit e5ff940a81b56cb236795e0059c44981053f8404)

 sys/dev/usb/usb_hub.c | 23 +++++++++++++----------
 1 file changed, 13 insertions(+), 10 deletions(-)

Comment 189 Gerard Seibert 2021-06-02 16:26:32 UTC

(In reply to commit-hook from comment #188)
I assume that this will be part of the 13.1 release and not backported to the present version "13" available for download.

Comment 190 Ulrich Spörlein freebsd_committer

2021-08-10 16:41:43 UTC

So this was MFCed to stable/13 and I've rebuild at 
13.0-STABLE #1 n246608-efe3bf0fb092: Fri Jul 30 18:59:02 CEST 2021

but I still get an endless stream of messages after a while, potentially related to plugging and unplugging devices and hubs.

% dmesg|tail
uhub_reattach_port: giving up port 4 reset - device vanished: change 0xfb status 0x7fb
uhub_reattach_port: giving up port 1 reset - device vanished: change 0xfb status 0x7fb
uhub_reattach_port: giving up port 2 reset - device vanished: change 0xfb status 0x7fb
uhub_reattach_port: giving up port 3 reset - device vanished: change 0xfb status 0x7fb
uhub_reattach_port: giving up port 4 reset - device vanished: change 0xfb status 0x7fb
uhub_reattach_port: giving up port 1 reset - device vanished: change 0xfb status 0x7fb
uhub_reattach_port: giving up port 2 reset - device vanished: change 0xfb status 0x7fb
uhub_reattach_port: giving up port 3 reset - device vanished: change 0xfb status 0x7fb
uhub_reattach_port: giving up port 4 reset - device vanished: change 0xfb status 0x7fb
uhub_reattach_port: giving up port 1 reset - device vanished: change 0xfb status 0x7fb


Is this supposed to fully work on stable/13 or am I running into a different issue here?

Comment 191 tech-lists 2021-08-17 12:20:34 UTC

(In reply to Ulrich Spörlein from comment #190)

Hi,

Can you paste the output from the following:

pciconf -lv | grep -i "usb"

thanks

Comment 192 Ulrich Spörlein freebsd_committer

2021-08-17 12:45:47 UTC

% pciconf -lv | grep -i "usb"
    device     = 'Cannon Point-LP USB 3.1 xHCI Controller'
    subclass   = USB
    device     = 'JHL6240 Thunderbolt 3 USB 3.1 Controller (Low Power) [Alpine Ridge LP 2016]'
    subclass   = USB

This is on a Thinkpad T490

Comment 193 Gerard Seibert 2021-08-17 21:07:26 UTC

(In reply to Ulrich Spörlein from comment #192)

I have experienced similar problems.

pciconf -lv | grep -i "usb"
    device     = '200 Series/Z370 Chipset Family USB 3.0 xHCI Controller'
    subclass   = USB
    device     = 'ASM1142 USB 3.1 Host Controller'
    subclass   = USB
    device     = 'JHL6540 Thunderbolt 3 USB Controller (C step) [Alpine Ridge 4C 2016]'
    subclass   = USB

Comment 194 Bjoern A. Zeeb freebsd_committer

2021-08-17 21:23:16 UTC

(In reply to Ulrich Spörlein from comment #190)

I believe that commit does not fix our problems on TB ports.
I found a manual workaround for now.

See some more notes via the link from comment #181
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=237666#c181

Comment 195 Ulrich Spörlein freebsd_committer

2021-09-27 19:02:06 UTC

Hmmm, I'm not sure that applies to my system, tbh. You write "the USB-C port is TB", but I have 2 of those on the left side (T490).

The one in the back is where a pure power supply is connected. The one in the front sometimes (but not often, not necessarily when the problem happens) has a USB-C to HDMI adapter hooked up.

The only plugging and unplugging happens on the USB-A port on the right side of the laptop, as a USB switch is connected there, which I use to switch keyboard/mouse over.

Due to other bugs I no longer suspend/resume that laptop, and, well, I think the problem also happens when the HDMI adapter is not plugged in. I'll keep an eye out, just in case though.

Comment 196 Gerard Seibert 2021-10-28 11:55:33 UTC

I am still stuck on FreeBSD 11.4-RELEASE (Now EOL'd) because a fix for this bug is not included in the FreeBSD 13.0-RELEASE. Can anyone tell me if a fix for this bug has been incorporated into the upcoming FreeBSD 12.3 Release?

Comment 197 Graham Perrin freebsd_committer

2021-10-31 06:58:16 UTC

(In reply to Gerard Seibert from comment #196)

> … 𡀦upcoming FreeBSD 12.3 Release?

Where cgit lacks clarity e.g. <https://cgit.freebsd.org/src/commit/?id=ad9d0df7c66a02db53f102e87e6f8414e8da67fc&h=stable%2F13>, 
GitHub <https://github.com/freebsd/freebsd-src/commit/ad9d0df7c66a02db53f102e87e6f8414e8da67fc> and GitLab <https://gitlab.com/FreeBSD/freebsd-src/-/commit/ad9d0df7c66a02db53f102e87e6f8414e8da67fc> offer a clearer view. Cherry-picked for: 

stable/13

Comment 198 Gerard Seibert 2021-11-26 12:02:18 UTC

(In reply to Graham Perrin from comment #197)
FreeBSD 11.4-RELEASE-p9

FreeBSD-version: 11.4-RELEASE-p13

I have spent two years attempting to update my system. FreeBSD 11.x works perfectly. At least it used to. Now, because it is EOL’s, several things are starting to break down.

I recently downloaded “FreeBSD-13.0-STABLE-amd64-20211118-3a15ccadf87-248076-memstick.img” and installed it on a USB drive and booted up the system. The following error message was then prominently and repeatedly displayed on the screen.

“uhub_reattach_port: giving up port 1 reset-device vanished: change 0xfb status 0x7fb”

The port number cycles from 1 to 4 and then the cycle restarts.

The machine works perfectly with Windows, so the problem obviously lies with FreeBSD.

Since this machine worked fine with FreeBSD prior to version 12.x, the only conclusion I can come up with is that someone fixed something that wasn’t broken. Is anyone actually working on this problem, or has it been put aside as unfixable?

Since I am not able to update several packages any longer with the older 11.x OS installed, I am assuming my only option is to investigate a Linux OS. I liked FreeBSD, but obviously, they are not interested in keeping up with current technology.

Comment 199 tech-lists 2021-11-27 03:14:09 UTC

(In reply to Gerard Seibert from comment #198)

My system was affected by the bug; it's been "fixed" [1] for a while, with my chipset at least, on this motherboard.

The affected system runs stable/13-n245706-526441c080e with GENERIC kernel.
It seems the usb3 controller card is no longer detected by freebsd:

$ pciconf -lv |grep -i usb
    device     = 'C600/X79 series chipset USB2 Enhanced Host Controller'
    subclass   = USB
    device     = 'C600/X79 series chipset USB2 Enhanced Host Controller'
    subclass   = USB

$ cat /usr/src/sys/amd64/conf/GENERIC | grep xhci
device		xhci			# XHCI PCI->USB interface (USB 3.0)

It was detected as 'ASM1142 USB 3.1 Host Controller'

Personally, i'm happy that this seemingly problematic card first threw errors and then was eventually undetectable (and therefore unusable). It means that if i really need a working usb3 then i'll slot in a different card known to work. The worst is when something errors undetected.

[1] the way I "fixed" it before was to just build a kernel without xhci. But now with or witout xhci in the kernel, this hardware is undetected.

Comment 200 Gerard Seibert 2021-11-27 11:55:34 UTC

(In reply to tech-lists from comment #199)
~ $ pciconf -lv | grep -i usb
    device     = '200 Series/Z370 Chipset Family USB 3.0 xHCI Controller'
    subclass   = USB
    device     = 'ASM1142 USB 3.1 Host Controller'
    subclass   = USB
    device     = 'JHL6540 Thunderbolt 3 USB Controller (C step) [Alpine Ridge 4C 2016]'
    subclass   = USB
~ $
~ $ cat /usr/src/sys/amd64/conf/GENERIC | grep -i xhci
device          xhci                    # XHCI PCI->USB interface (USB 3.0)

Comment 201 tech-lists 2021-11-27 18:13:28 UTC

(In reply to Gerard Seibert from comment #200)

In contrast to the controller/board I was having problems with, here's another board controller running nearly the same software version:

[...]
% uname -prsvKU
FreeBSD 13.0-STABLE FreeBSD 13.0-STABLE #0 stable/13-n248055-81eae114459: Sat Nov 13 12:38:33 GMT 2021     root@desktop:/usr/obj/usr/src/amd64.amd64/sys/DESKTOP  amd64 1300520 1300520
[...]

where usb3 is fully functional:

[...]
% pciconf -lv |grep -i usb
    device     = '8 Series/C220 Series Chipset Family USB xHCI'
    subclass   = USB
    device     = '8 Series/C220 Series Chipset Family USB EHCI'
    subclass   = USB
    device     = '8 Series/C220 Series Chipset Family USB EHCI'
    subclass   = USB

% cat /sys/amd64/conf/DESKTOP | grep xhci
device		xhci			# XHCI PCI->USB interface (USB 3.0)

[ouput of dmidecode]

Base Board Information
        Manufacturer: ASUSTeK COMPUTER INC.
        Product Name: Z87-PRO
        Version: Rev 1.xx

Comment 202 tech-lists 2021-11-27 18:23:08 UTC

(In reply to Gerard Seibert from comment #200)
The reldate of the bios with the problematic chipset is 2012-08-20

System Information
        Manufacturer: Supermicro
        Product Name: X9DRE-TF+/X9DR7-TF+

reldate of the one that works fine: 28-01-2014

Base Board Information
        Manufacturer: ASUSTeK COMPUTER INC.
        Product Name: Z87-PRO

Comment 203 tech-lists 2021-11-27 18:24:21 UTC

(i mean 2014-01-28)

Comment 204 Tomoaki AOKI 2021-12-04 07:08:32 UTC

I have the same issue on my ThinkPad P52, but only on USB type-C / Thunderbolt (TB) 3 dual mode port.
If problematic-on-typeC device is attached to type A port, this doesn't happen unless accicentally detach while transferring.

The difference seems to be...

 *xhci1 and uhub1 only appears when connected to type C /TB3 port.

 *When a device exposing this issue having power switch is attached to
  type A port while power switch on it off, nothing happens. Powering on
  the device after that, it appears without new xhci device and uhub device.

If the attached device is a type C hub, this doesn't seem to happen.
Attaching type C device though USB type C hub is not yet tested.
I'll report back when I could take some time to try.

Will check which patch here is committed / MFC'ed and try not-yet-committed ones later.

Comment 205 Tomoaki AOKI 2021-12-05 02:54:07 UTC

(In reply to Tomoaki AOKI from comment #204)

Just a quick correction. :-(
USB type-C hub exposed the issue, too.
At the time I posted the comment #204 above, I've tried detaching the hub just before the previous shutting down. So there was too few time to expose the issue.

Comment 206 Tomoaki AOKI 2021-12-05 07:24:42 UTC

Additional info.
My HW probe: See [1] at the bottom.

Tried bz@'s workaround introduced at Comment 181 (for me, `devctl detach xhci1` BEFORE detaching) worked fine for me. Thanks, bz@!

The TB3 controller seems to behave as follows:
 *On boot, standby and expose nothing.

 *If any device is attached to TB3 / USB type-C port, sniff it and decide
  which controller device (XHCI or TB3) to expose.

 *Hide controller device when attached external device is detached and standby.

 *If detached device is re-attached just after the detach, and it is USB one,
  silently re-route to other XHCI-only controller. Observed once for now, by 
  trying bz@'s workaround and re-attaching the same device (this time, uaudio),
  xhci1 didn't re-appear and uaudio appeared.
  If attached device was different one, xhci1 re-appeared.
  This behavior would be useful for not-enough-stable cable or connector.

Maybe the third behavior is breaking things for us.


Differences between before attaching and detaching the type-C hub.
Others appearing while the hub is attached are sanely dissappeard.

devinfo -r :
           pci2
               pcib2 bus numbers:
                   4
+            pcib5
+                pcib2 bus numbers:
+                    5-110
+                pcib2 memory window:
+                    0x74000000-0xa20fffff
+                pcib2 prefetch window:
+                    0x4000000000-0x4049ffffff
+              pci5
+                  pcib5 bus numbers:
+                      5
+                pcib6
+                    pcib5 bus numbers:
+                        6
+                    pcib5 memory window:
+                        0xa2000000-0xa20fffff
+                  pci6
+                      pcib6 bus numbers:
+                          6
+                pcib7
+                    pcib5 bus numbers:
+                        7-57
+                    pcib5 memory window:
+                        0x74000000-0x8befffff
+                    pcib5 prefetch window:
+                        0x4000000000-0x401fffffff
+                  pci7
+                      pcib7 bus numbers:
+                          7
+                pcib8
+                    pcib5 bus numbers:
+                        58
+                    pcib5 memory window:
+                        0x8bf00000-0x8bffffff
+                  pci8
+                      pcib8 bus numbers:
+                          58
+                    xhci1
+                        Interrupt request lines:
+                            0x8f
+                        pcib8 memory window:
+                            0x8bf00000-0x8bf0ffff
+                      usbus1
+                        uhub2
         pcib3
             I/O memory addresses:
                 0xa4200000-0xa42fffff


pciconf -lv :

     device     = 'NVMe SSD Controller SM981/PM981/PM983'
     class      = mass storage
     subclass   = NVM
+pcib5@pci0:4:0:0:	class=0x060400 rev=0x06 hdr=0x01 vendor=0x8086 device=0x15ea subvendor=0x17aa subdevice=0x225f
+    vendor     = 'Intel Corporation'
+    device     = 'JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018]'
+    class      = bridge
+    subclass   = PCI-PCI
+pcib6@pci0:5:0:0:	class=0x060400 rev=0x06 hdr=0x01 vendor=0x8086 device=0x15ea subvendor=0x17aa subdevice=0x225f
+    vendor     = 'Intel Corporation'
+    device     = 'JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018]'
+    class      = bridge
+    subclass   = PCI-PCI
+pcib7@pci0:5:1:0:	class=0x060400 rev=0x06 hdr=0x01 vendor=0x8086 device=0x15ea subvendor=0x17aa subdevice=0x225f
+    vendor     = 'Intel Corporation'
+    device     = 'JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018]'
+    class      = bridge
+    subclass   = PCI-PCI
+pcib8@pci0:5:2:0:	class=0x060400 rev=0x06 hdr=0x01 vendor=0x8086 device=0x15ea subvendor=0x17aa subdevice=0x225f
+    vendor     = 'Intel Corporation'
+    device     = 'JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018]'
+    class      = bridge
+    subclass   = PCI-PCI
+none6@pci0:5:4:0:	class=0x060400 rev=0x06 hdr=0x01 vendor=0x8086 device=0x15ea subvendor=0x17aa subdevice=0x225f
+    vendor     = 'Intel Corporation'
+    device     = 'JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018]'
+    class      = bridge
+    subclass   = PCI-PCI
+xhci1@pci0:58:0:0:	class=0x0c0330 rev=0x06 hdr=0x00 vendor=0x8086 device=0x15ec subvendor=0x17aa subdevice=0x225f
+    vendor     = 'Intel Corporation'
+    device     = 'JHL7540 Thunderbolt 3 USB Controller [Titan Ridge 4C 2018]'
+    class      = serial bus
+    subclass   = USB




[1] http://bsd-hardware.info/?probe=676f16ac86

Comment 207 Bjoern A. Zeeb freebsd_committer

2021-12-05 22:01:18 UTC

I don't know where the information went;  here, on the wiki in an email or a private conversation so I'll repeat from the top of my head after months.

The problem is when the XHCI on the USB-C port gets unplugged, that XHCI internally becomes a TB again and we don't have the proper code to unfold it all;  usually there is no "hot-plug" on the PCI set on my laptops either;  the problem hence is there is a PCI device again and XHCI wasn't detached.

The bar of the PCI device is mostly bogus but the vendor information is still there and hence XHCI will stay (as the PCI didn't go away).

It will then re-add a UHUB which will start enumeration and we do hold the SX lock over that and obviously the enumeration will not go well (but if I am not misremebering the number of ports was still known somehow) and we will keep iterating, never giving up and never dropping the SX lock.

A possible fix for this would be (in the right place) XHCI to know itself that the PCI device it is attached to being bogus and detaching itself which to my understanding back then wasn't easily doable as the cleanup needed from the place of calling didn't make sense).

There may be a chance;  I'll talk to imp to see if I misunderstand something.  But essentially until we'll have TB support, the only way would be to nuke the XHCI away forcefully if we detect that situation and for that also prevent further enumeration on the uhub at the right moment;  possibly hide it behind a systctl or tuneable ...

I'll try to find my logs and emails from back then and inquire with people as to what can be done and report back (within the week or two).

Comment 208 Gerard Seibert 2021-12-06 20:28:46 UTC

(In reply to Bjoern A. Zeeb from comment #207)

The device works flawlessly on MS-Windows; no surprise there, and I have seen it work on at least on *nix distro. Unfortunately, I don't remember which one.

So, my question is, why has no one found a cure after all this time? I suppose using the code from Windows is not an option, but why not employ code from a Linux distro that does work correctly?

An open bug of over two years on a commonly available device does not bode well for potential users of FreeBSD.

Comment 209 Bjoern A. Zeeb freebsd_committer

2021-12-06 21:16:09 UTC

(In reply to Gerard Seibert from comment #208)

The answer might be that you'll likely need BSD licensed ThunderBolt support.

Comment 210 Gerard Seibert 2021-12-09 19:13:48 UTC

(In reply to Bjoern A. Zeeb from comment #209)

It makes me wonder if anyone from FreeBSD has tried contacting ThunderBolt regarding this situation. It has been going on for over two years with no apparent fix in sight.

I still do not understand why this problem existed before to ver 12.x. It just doesn't make sense to me. It is almost like FreeBSD is going backward in terms of hardware support.

Comment 211 Bjoern A. Zeeb freebsd_committer

2021-12-09 23:13:08 UTC

(In reply to Gerard Seibert from comment #210)

I understand your frustration.

People have.  You probably want to watch the entire video.  Please do not ask me what happened to it ..  This is a year and a half old ...

https://www.youtube.com/watch?v=VbAJf2PBE-M&t=1400s

Comment 212 Gerard Seibert 2021-12-10 19:59:07 UTC

(In reply to Bjoern A. Zeeb from comment #211)
The video does point out some fascinating points. It also helps to highlight the fallacy of some of FreeBSD’s embedded methods of operation.

The refusal to use GPL or other licensed code is ridiculous. It reminds me of a man who falls off a boat and then refuses to use a life preserver thrown at him because it was made in China, or it was the wrong color, or whatever. If code is available that works, then use the dam thing. Worry about modifying it at a later date. Would you let millions of people die because your favorite pharmaceutical company did not manufacture the vaccine that could cure a disease? If so, then someone has a severe problem.

FreeBSD’s long-held reputation for failure to support modern standards is well known. I work in a municipality. I could never in good conscience recommend using FreeBSD in our environment. Most of our equipment is less than five years old, which means it probably is not fully supported by FreeBSD. I know for a fact that at least five units here employ Thunderbolt. I have not checked all of the units, but I would be surprised to find that few are using that technology, if not all.

This is not the first time FreeBSD has fallen short of the mark. Hardware support is seriously lacking in the FreeBSD world. Perhaps before they push out a new version, they should first sit back and concentrate on fixing all the broken hardware issues.

Even if they did fix the Thunderbolt issue, they would never backport it. They would hold it for ransom until they released another version of FreeBSD. Thank GOD Microsoft and other OS systems treat their users with more respect.

Comment 213 Tomoaki AOKI 2021-12-10 23:35:33 UTC

(In reply to Gerard Seibert from comment #212)
GPL'ed or any other BSD-incompatible-licensed drivers can live in ports. :-)
But TB driver would be tricky, as it would have TB mode and already-existing XHCI mode.

The best way is that Intel provides TB driver with BSD-compatible license.
(Or original author of Linux TB driver, if any, kindly make it dual (or more) license including BSD-compatible one. So someone can port it in-kernel.

Comment 214 omars 2022-07-29 23:47:04 UTC

Can someone verify the fix in (In reply to Hans Petter Selasky from comment #186)

Was the change actually merged for 13.1-Release? I can't determine if it was.
If it was, I still see the same problem after booting from the 13.1 installer.

Same as others, it was totally fine for all releases of FB10 and FB11 (and maybe last FB9), but scrolling console message on all release versions of FB12 and FB13.

At this point I'm contemplating just retiring the hardware (or not using FreeBSD for this workstation). It's unfortunate that existing functionality has remained broken for the 3+ years of this bug being open.

Comment 215 Gerard Seibert 2022-07-30 10:21:43 UTC

(In reply to omars from comment #214)

It didn't work for me. I have lost faith in FreeBSD being able to stay current with modern hardware. It continually falls behind other OSs in supporting newer, more efficient, and safe hardware. Now, it can not even support the latest versions of Samba. Maybe it is time to turn off the light and close the door. 😞

Comment 216 Hans Petter Selasky freebsd_committer

2022-07-30 10:48:03 UTC

Hi,

Recently I bought some new personal and server hardware, and I cannot find any traces of the behaviour you are reporting. Likely it is a local fault / BIOS / electrical / non-USB compliant device and so on. Try booting both legacy mode and UEFI mode. Any difference?

The only issue I've seen so far is a failure on an external 16-port USB 3.x HUB, which again I think is not related to FreeBSD.

FreeBSD is provided AS-IS. Feel free to use something else if you are not happy.

--HPS

Comment 217 Gerard Seibert 2022-07-30 17:31:23 UTC

(In reply to Hans Petter Selasky from comment #216)

You have shown no proof that you are employing the Intel Thunderbolt technology.

Comment 218 Eirik Oeverby 2022-07-30 17:44:46 UTC

(In reply to Gerard Seibert from comment #217)
You are of course aware that, unlike certain other UNIX-like operating systems, FreeBSD is largely developed by volunteers on their spare time or in some lucky cases tolerated/supported by their employers?

And being aware of that, perhaps a less confrontational and demanding tone would be a more logical (and polite!) approach?

/Eirik

Comment 219 Graham Perrin freebsd_committer

2022-12-29 15:32:25 UTC

Re: comment 197

Branches shown at <https://github.com/freebsd/freebsd-src/commit/ad9d0df7c66a02db53f102e87e6f8414e8da67fc> now include: 

    releng/13.1

Other commits that went through routine release engineering included: 

* <https://github.com/freebsd/freebsd-src/commit/4cb98e0bea637b18e196015291441feb1c5107fe>

* <https://github.com/freebsd/freebsd-src/commit/a29c0348f0653c25adb65e340e6f5a7edc86b8a2>

* <https://github.com/freebsd/freebsd-src/commit/2144eb756895d97f8a09961b270f08b166a71d65>


Triage
======

Affected: 13.0-RELEASE (based partly on comments that related to 13.0-STABLE). Please note, this is not intended to be definitive; available versions in this menu may change through bug 262494, and so on. 

Hardware: any (based partly on comment 153).

With <https://www.freebsd.org/security/#sup> and no plan for 12.5, a cherry-pick to stable/12 (leading to a release engineered patch for 12.4-RELEASE) seems unlikely. 

CC: usb@

Given the diversity of USB hardware – and the current length and breadth of this bug report – here's a single question for usb@ or any other committer with a good overview: 

* should tracking (the keyword) apply to this report, or maybe to a 
  new report that can be _solely_ for tracking? 

(A newcomer to this report will probably find it overwhelming, and I'm not the person to summarise. I'm more double-decker than universal serial.)

That's not the only question …

<https://wiki.freebsd.org/Discord> #kernel or nearby should be ideal to kick the ball around without adding noise to Bugzilla. Either that, or <https://wiki.freebsd.org/IRC/Channels#Official_Channels> #freebsd-dev (not logged). 

HTH

Graham (triage trainee without a ^ hat of officialdom)

Comment 220 Gerard Seibert 2023-04-05 20:38:53 UTC

According to my trusty abacus, it has now been 3 years, 11 months, and 6 days since this bug report was opened. Out of morbid curiosity, why is it taking so long to squash this bug?

Comment 221 Mark Millard 2023-04-05 20:48:37 UTC

(In reply to Gerard Seibert from comment #220)

The answer to your question will likely only be fully known
once the bug has actually been squashed. Such is the nature
problems that prove difficult to fix.

Comment 222 Marco Beishuizen 2023-05-19 10:43:42 UTC

Created attachment 242272 [details]
dmesg.boot from Marco

Comment 223 Marco Beishuizen 2023-05-19 10:45:00 UTC

I'm having these exact USB problems recently, since I'm using Gnome. When using a windowmanager like Fluxbox or FVWM there are no problems. In Gnome, the machine is not waking up completely from sleeping, and the USB stops working. Only remedy then is a hard reset.

Comment 224 Marco Beishuizen 2023-05-29 19:16:43 UTC

After plugging my USB webcam in a different port the messages just stopped.
Really weird. Hope it stays like this.

Comment 225 omars 2023-07-07 08:36:50 UTC

(In reply to Gerard Seibert from comment #220)
After all this time, I think users still experiencing the issue would have mostly either replaced the hardware or migrated operating systems. Personally, I retired the system showing this bug after several iterations of fixes hadn't worked and being so far out of date was no longer viable (and neither was running without USB3.x, of course).

Comment 226 Mark Linimon freebsd_committer

2024-01-02 20:03:57 UTC

^Triage: FreeBSD 12 is now out of support.

Comment 227 Sergey A. Osokin freebsd_committer

2024-02-28 20:22:05 UTC

I have found tons or new error messages after I unplug and plug in a video cable in my laptop:

Feb 25 15:16:57 x1c kernel: drmn0: [drm] *ERROR* Fault errors on pipe B: 0x00000800
Feb 25 15:16:57 x1c syslogd: last message repeated 163 times
Feb 25 15:17:22 x1c kernel: 0000800
Feb 25 15:17:22 x1c kernel: drmn0: [drm] *ERROR* Fault errors on pipe B: 0x00000800
Feb 25 15:17:22 x1c syslogd: last message repeated 1753 times
Feb 25 15:17:52 x1c kernel: 0000800
Feb 25 15:17:52 x1c kernel: drmn0: [drm] *ERROR* Fault errors on pipe B: 0x00000800
Feb 25 15:17:52 x1c syslogd: last message repeated 1755 times
Feb 25 15:18:22 x1c kernel: 0000800
Feb 25 15:18:22 x1c kernel: drmn0: [drm] *ERROR* Fault errors on pipe B: 0x00000800
Feb 25 15:18:22 x1c syslogd: last message repeated 1755 times
Feb 25 15:18:52 x1c syslogd: last message repeated 1681 times

Comment 228 Ed Maste freebsd_committer

2024-02-28 22:07:42 UTC

(In reply to Sergey A. Osokin from comment #2)
Those errors appear to have nothing to do with USB?

Comment 229 Sergey A. Osokin freebsd_committer

2024-02-29 14:47:32 UTC

(In reply to Ed Maste from comment #228)

Well, this is the USB-C/DP port, next to the power supply on my ThinkPad X1 Carbon Gen5.

When I remove cable from that port I have these messages:
Feb 29 07:43:46 x1c kernel: uhub_reattach_port: giving up port 1 reset - device vanished: change 0xfb status 0x7fb
Feb 29 07:43:47 x1c kernel: uhub_reattach_port: giving up port 2 reset - device vanished: change 0xfb status 0x7fb
Feb 29 07:43:47 x1c kernel: uhub_reattach_port: giving up port 3 reset - device vanished: change 0xfb status 0x7fb
Feb 29 07:43:48 x1c kernel: uhub_reattach_port: giving up port 4 reset - device vanished: change 0xfb status 0x7fb
Feb 29 07:43:50 x1c kernel: uhub_reattach_port: giving up port 1 reset - device vanished: change 0xfb status 0x7fb

When I return the cable back into the port I saw reported messages, however that's been fixed (no more DRM-specific messages in /var/log/messages) by rebuilding graphics/drm-510-kmod.

amurphy
bawdyanarchist
bz
chris
ebay
emaste
felix.the.red
gerard_seibert
grahamperrin
ihor
jhb
jkim
junchoon
karels
kib
ltning-freebsd
marklmi26-fbsd
mbeis
mohammad
oleg.cherkasov
oleg.nauman
oleglelchuk
omars1234
osa
pi
re
rob2g2-freebsd
rpapousek
ruben
tech-lists
uqs
usb