Bug 237666 - repeated messages of "uhub_reattach_port: giving up port reset - device vanished"
Summary: repeated messages of "uhub_reattach_port: giving up port reset - device vanis...
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: usb (show other bugs)
Version: 12.1-STABLE
Hardware: amd64 Any
: --- Affects Some People
Assignee: freebsd-usb (Nobody)
URL:
Keywords: needs-qa, regression
Depends on:
Blocks:
 
Reported: 2019-04-30 08:22 UTC by schmidt
Modified: 2020-10-27 16:56 UTC (History)
16 users (show)

See Also:
koobs: mfc-stable12?
koobs: mfc-stable11-


Attachments
dmesg.boot from the maschin (91.15 KB, text/plain)
2019-04-30 08:22 UTC, schmidt
no flags Details
dmesg.boot with stable kernel an verbose on (91.66 KB, text/plain)
2019-04-30 10:28 UTC, schmidt
no flags Details
dmesg.boot with stable kernel an verbose off (86.24 KB, text/plain)
2019-04-30 10:29 UTC, schmidt
no flags Details
dmesg.boot from the same machine running 11.2 (54.29 KB, text/plain)
2019-05-15 06:07 UTC, schmidt
no flags Details
dmesg.boot 11.2 with Lagacy USB Off an hw.usb.uhub.debug=17 set (93.10 KB, text/plain)
2019-05-15 09:27 UTC, schmidt
no flags Details
12.0 boot messages on Gigabyte motherboard (38.60 KB, text/plain)
2019-08-07 08:18 UTC, Mike Karels
no flags Details
Patch for XHCI (1.64 KB, patch)
2019-08-09 16:49 UTC, Hans Petter Selasky
no flags Details | Diff
devinfo -r (9.33 KB, text/plain)
2019-08-11 17:47 UTC, Mike Karels
no flags Details
dmesg from FreeBSD 11.3 on Dell 8930 PC (10.63 KB, text/plain)
2019-11-22 16:26 UTC, Gerard Seibert
no flags Details
pciconf -lv from Gigabyte system (19.98 KB, text/plain)
2019-12-08 21:58 UTC, Mike Karels
no flags Details
Dmesg with hw.usb.xhci.debug=1 (93.26 KB, text/plain)
2019-12-23 05:44 UTC, Felix Guest
no flags Details
Complete 'dmidecode' output for Dell XPS 8930 (28.34 KB, text/plain)
2020-04-04 13:27 UTC, Gerard Seibert
no flags Details
FreeBSD 11.3 boot log (20.40 KB, text/plain)
2020-04-26 18:37 UTC, Gerard Seibert
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description schmidt 2019-04-30 08:22:45 UTC
Created attachment 204107 [details]
dmesg.boot from the maschin

I've just updated my System from FreeBSD 11.1-Release to FreeBSD 12.0-Release. 

After Booting i get every second the message 

uhub_reattach_port: giving up port reset - device vanished

I've disconnected all external usb devices (including mouse and keyboard) the messages continue. 

I've use the Hardware for at least 4 years now. No such problem in the past.
Comment 1 Hans Petter Selasky freebsd_committer 2019-04-30 09:06:54 UTC
Can you try a 12-stable based kernel?

I suspect APCI changes might have turned on something in the hardware which was off before. USB changes are down to a minimum.

The only relevant change I can think of is this one:
https://svnweb.freebsd.org/changeset/base/343134

--HPS
Comment 2 schmidt 2019-04-30 10:27:58 UTC
I've build a stable kernel and booted with it. 

The Problems are still there. 


If it's only the messages i could live with that but my keyboard and mouse behave strangely. 

I about 30 sec after boot mouse and keyboard are stuck and only disconnecting the keyboard revive keyboard and mouse.

X11 lags sometimes. I think it's the usb resets the freeze everything. 

I have the dmesg from the stable kernel attached. 

If anything under stable the problems with keyboard are more pronounced. Keys repeating or missing without reason.

So I'm back with the release kernel.

Regards 
  Estartu
Comment 3 schmidt 2019-04-30 10:28:51 UTC
Created attachment 204110 [details]
dmesg.boot with stable kernel an verbose on
Comment 4 schmidt 2019-04-30 10:29:24 UTC
Created attachment 204111 [details]
dmesg.boot with stable kernel an verbose off
Comment 5 srlemke 2019-05-14 14:43:37 UTC
Just to pump this thread a little more:

<slemke@besta>/home/slemke/dropbox # dmesg |grep vanished |wc -l
815

<root@besta>/root # pciconf -lv |grep -i usb
    device     = '100 Series/C230 Series Chipset Family USB 3.0 xHCI Controller'
    subclass   = USB
    device     = 'DSL6340 USB 3.1 Controller [Alpine Ridge]'
    subclass   = USB
<root@besta>/root # 

So this is a very interesting scenario and will probably very difficult to debug and solve:

I have one of these portable docking station(works fine on linux):
https://www.czc.cz/dell-dokovaci-stanice-wd15-130w/187601/produkt - its and usb-c docking station.

On system boot it boot-locks with the message:
"uhub_reattach_port: giving up port reset - device vanished" - if I quickly unplug and replug the usb-c the boot proceeds and images appears on the 2 screens attached to the docking station. No regular USB2 are working on the docking station (only the 2 monitors) - Only the screns are correctly detected.

<root@besta>/home/slemke # uname -a
FreeBSD besta 12.0-RELEASE-p3 FreeBSD 12.0-RELEASE-p3 GENERIC  amd64
<root@besta>/home/slemke #
Comment 6 schmidt 2019-05-15 06:07:46 UTC
Created attachment 204382 [details]
dmesg.boot from the same machine running 11.2

There was no working with 12.0 and this bug. So i downgraded back to 11.2. 

There are some USB errors there as well but no repeated errors and no ill effects.
Comment 7 schmidt 2019-05-15 06:17:09 UTC
I've the same USB Controller chip as srlemke@protonmail.com but mine is on the Mainboard, because the error is still there when all USB devices are disconnected. 

xhci0@pci0:6:0:0:       class=0x0c0330 card=0x11112222 chip=0x15b68086 rev=0x00 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = 'DSL6540 USB 3.1 Controller [Alpine Ridge]'
    class      = serial bus
    subclass   = USB
Comment 8 Hans Petter Selasky freebsd_committer 2019-05-15 07:43:11 UTC
Can you try to figure out if anything is connected to the failing port?
Comment 9 schmidt 2019-05-15 08:00:11 UTC
The error is there even if nothing is connected to any USB-Port.
Comment 10 Hans Petter Selasky freebsd_committer 2019-05-15 08:20:11 UTC
And you're sure this is no internal USB device?

--HPS
Comment 11 schmidt 2019-05-15 08:37:44 UTC
I have tested is without any device connected. No keyboard, no mouse, no external HUB. Nothing and the error is still there.
Comment 12 Hans Petter Selasky freebsd_committer 2019-05-15 08:47:45 UTC
I understand, but sometimes the vendor wire internal devices to the USB ports, like bluetooth and BIOS related features.

BTW: Have you tried playing with BIOS legacy USB support ON/OFF?
Comment 13 Hans Petter Selasky freebsd_committer 2019-05-15 09:12:18 UTC
You might want to set:
sysctl hw.usb.uhub.debug=17

When collecting dmesg from the failing case.
Comment 14 schmidt 2019-05-15 09:27:49 UTC
Created attachment 204388 [details]
dmesg.boot 11.2 with Lagacy USB Off an hw.usb.uhub.debug=17 set

Disabling Legacy USB support didn't change anything. Therefore i booted with hw.usb.uhub.debug=17 set. The Buffer for dmesg isn't large enough for the whole kernel messages. so only the last part are here but the USB Errors are in the file.
Comment 15 Mike Karels freebsd_committer 2019-08-07 08:18:33 UTC
Created attachment 206324 [details]
12.0 boot messages on Gigabyte motherboard

I just switched from 11.3 to 12.0, and also have this problem since upgrade on a system with a Gigabyte motherboard.  I attached a dmesg.  I note that I am also losing communication with a UPS (via USB) every 10 minutes, which is also new.  I have lots of USB devices attached; I have not attempted to remove them, etc, but can do so. I can also provide 11.3 dmesgs.
Comment 16 Hans Petter Selasky freebsd_committer 2019-08-07 08:52:49 UTC
Does it help to set:

sysctl hw.usb.disable_enumeration=1

After boot. Does the UPS still work?

I suspect the issue has nothing to do with USB, but likely early AP startup (timing issue, can be disabled in kernel build) or ACPI's recent USB support.

1) Try removing this option from kernel config:
sys/amd64/conf/GENERIC:options 	EARLY_AP_STARTUP

2) Try booting with the following line added to /boot/loader.conf:

debug.acpi.disabled="usb"

--HPS
Comment 17 Mike Karels freebsd_committer 2019-08-07 12:21:49 UTC
Setting sysctl hw.usb.disable_enumeration=1 makes the messages stop. The UPS does not appear to work; I stopped apcupsd, and apctest cannot open the device. Clearing the sysctl makes the messages start again, but apctest does not work. Unfortunately, I did not try apctest before setting the sysctl.

Booting with debug.acpi.disabled="usb" did not seem to change anything: the messages continued, apcupsd complained about losing communication, and apctest didn't work.  hw.usb.disable_enumeration=1 still stopped the messages, but apcupsd still complains about lost communiction.  I can test the rest this evening.
Comment 18 Mike Karels freebsd_committer 2019-08-08 05:00:37 UTC
Commenting out EARLY_AP_STARTUP from the kernel config did not seem to change anything: the error message still repeats, and apcupsd loses contact with the UPS (probably never has it). I forgot to change the ident, but the APs were not started until nearly the end of autoconfiguration, just before attaching cd0 and mounting root.  The error messages start after that, after finding several more USB devices.  From messages:

Aug  7 21:43:31 pughole kernel: Launching APs: 1 10 3 8 9 11 5 7 4 2 6
Aug  7 21:43:31 pughole kernel: cd0 at ahcich1 bus 0 scbus0 target 0 lun 0
Aug  7 21:43:31 pughole kernel: cd0: <HL-DT-ST BD-RE  WH14NS40 1.00> Removable C
D-ROM SCSI device
Aug  7 21:43:31 pughole kernel: cd0: Serial Number SIK9IFALJ462
Aug  7 21:43:31 pughole kernel: cd0: 150.000MB/s transfers (SATA 1.x, UDMA6, ATA
PI 12bytes, PIO 8192bytes)
Aug  7 21:43:31 pughole kernel: cd0: Attempt to query device size failed: NOT RE
ADY, Medium not present - tray closed
Aug  7 21:43:31 pughole kernel: Trying to mount root from ufs:/dev/nvd0p5 [rw]..
.
Aug  7 21:43:31 pughole kernel: uhub0: 4 ports with 4 removable, self powered
Aug  7 21:43:31 pughole kernel: ZFS filesystem version: 5
Aug  7 21:43:31 pughole kernel: ZFS storage pool version: features support (5000
)
Aug  7 21:43:31 pughole kernel: uhub2: 2 ports with 2 removable, self powered
Aug  7 21:43:31 pughole kernel: uhub3: 2 ports with 2 removable, self powered
Aug  7 21:43:31 pughole kernel: uhub1: 21 ports with 21 removable, self powered
Aug  7 21:43:31 pughole kernel: uhub_reattach_port: giving up port reset - device vanished
Aug  7 21:43:31 pughole kernel: ugen1.2: <vendor 0x045b product 0x0209> at usbus1
Aug  7 21:43:31 pughole kernel: uhub4 numa-domain 0 on uhub1
Aug  7 21:43:31 pughole kernel: uhub4: <vendor 0x045b product 0x0209, class 9/0, rev 2.10/1.00, addr 1> on usbus1

Any reason to try these in combination?
Comment 19 Hans Petter Selasky freebsd_committer 2019-08-08 06:33:40 UTC
Hi,

Can you collect logs from FreeBSD 12 with:

sysctl hw.usb.uhub.debug=17

Just let it run for 12 minutes. According to your logs it takes around 10 minutes for the cycle before UPS detaches.

--HPS
Comment 20 Hans Petter Selasky freebsd_committer 2019-08-08 06:37:06 UTC
Aug  6 18:07:20 pughole kernel: usbus0 numa-domain 0 on xhci0
Aug  6 18:07:20 pughole kernel: usbus0: 5.0Gbps Super Speed USB v3.0

I'm a little surprised that your xhci has 21 ports on the root-HUB. That seems a lot. Maybe 5 is more correct. Can you check this?

--HPS
Comment 21 Mike Karels freebsd_committer 2019-08-08 07:11:51 UTC
I'll collect the logs tomorrow.  About the 21 ports: 11.3 reported the same thing, but I agree that the number sounds suspicious.  Does a verbose boot provide any information about that?
Comment 22 Hans Petter Selasky freebsd_committer 2019-08-08 07:20:13 UTC
Hi,

I think bootverbose won't help.

hw.usb.uhub.debug will reveal more information about which ports are active and which are causing problems. Currently xhci.debug is a last resort, because it is very verbose.

Thanks for testing. Lets try to figure out the reason for this "giving up port reset" failure, which also happens on 11.x - right.

--HPS
Comment 23 Mike Karels freebsd_committer 2019-08-08 23:39:20 UTC
The output from the time with debug enabled is too big to attach. ftp://ftp.karels.net/outgoing/messages.usb_debug should work (normal anonymous access).
Comment 24 Hans Petter Selasky freebsd_committer 2019-08-09 07:29:20 UTC
Can you repeat with hw.usb.xhci.debug=17. It looks like the XHCI port status register is returning all ones, which is strange.
Comment 25 Mike Karels freebsd_committer 2019-08-09 13:26:50 UTC
I wasn't sure about hub_debug, so I did both.  ftp://ftp.karels.net/outgoing/messages.xhci_debug.gz (it's 14M without compression).
Comment 26 Hans Petter Selasky freebsd_committer 2019-08-09 16:38:52 UTC
Aug  9 07:47:30 pughole kernel: xhci_roothub_exec: port status=0x000802a0
Aug  9 07:47:30 pughole kernel: uhub_read_port_status: port 21, wPortStatus=0x07a0, wPortChange=0x0020, err=USB_ERR_NORMAL_COMPLETION
Aug  9 07:47:31 pughole kernel: xhci_roothub_exec: type=0xa3 request=0x00 wLen=0x0004 wValue=0x0000 wIndex=0x0003
Aug  9 07:47:31 pughole kernel: xhci_roothub_exec: UR_GET_STATUS i=3
Aug  9 07:47:31 pughole kernel: xhci_roothub_exec: port status=0xffffffff
Aug  9 07:47:31 pughole kernel: uhub_read_port_status: port 3, wPortStatus=0x07fb, wPortChange=0x00fb, err=USB_ERR_NORMAL_COMPLETION
Aug  9 07:47:31 pughole kernel: uhub_reattach_port: giving up port reset - device vanished
Aug  9 07:47:31 pughole kernel: xhci_roothub_exec: type=0x23 request=0x01 wLen=0x0000 wValue=0x0001 wIndex=0x0003
Aug  9 07:47:31 pughole kernel: xhci_roothub_exec: UR_CLEAR_PORT_FEATURE


Here it is "port status=0xffffffff" which is a totally off-track XHCI PCI read value.

I'll try to make a patch you can try. This looks like a low-level hardware PCI issue.
Comment 27 Hans Petter Selasky freebsd_committer 2019-08-09 16:49:00 UTC
Created attachment 206399 [details]
Patch for XHCI

Can you repeat XHCI logging with this patch?
Comment 28 Mike Karels freebsd_committer 2019-08-10 16:55:14 UTC
ftp://ftp.karels.net/outgoing/messages.xhci_retry.gz.  Oddly, port status is still sometimes -1, sometimes a more reasonable value, and they are in clusters.
Comment 29 Mike Karels freebsd_committer 2019-08-10 17:16:27 UTC
Hmm, I see that there are two xhci controllers.  One is on pci7, and I see this:

pcib7: <PCI-PCI bridge> at device 2.0 numa-domain 0 on pci4
pcib7: failed to allocate initial I/O port window: 0-0xfff
pcib7: failed to allocate initial memory window: 0-0xfffff
pcib7: failed to allocate initial prefetch window: 0-0xfffff
pci7: <PCI bus> numa-domain 0 on pcib7
xhci0: <XHCI (generic) USB 3.0 controller> mem 0xd8000000-0xd800ffff at device 0
.0 numa-domain 0 on pci7
xhci0: 32 bytes context size, 64-bit DMA
usbus0 numa-domain 0 on xhci0

and

pcib1: <ACPI Host-PCI bridge> port 0xcf8-0xcff numa-domain 0 on acpi0
pci1: <ACPI PCI bus> numa-domain 0 on pcib1
...
pci1: <unknown> at device 17.0 (no driver attached)
xhci1: <Intel Wellsburg USB 3.0 controller> mem 0xf6120000-0xf612ffff irq 19 at device 20.0 numa-domain 0 on pci1
xhci1: 32 bytes context size, 64-bit DMA
usbus1 numa-domain 0 on xhci1
pci1: <simple comms> at device 22.0 (no driver attached)

I'm also not using any USB3 device regularly, so I don't know if the misbehaving controller is functioning at all.

The messages above are similar with 11.3.
Comment 30 Hans Petter Selasky freebsd_committer 2019-08-10 17:46:21 UTC
Hi Konstatin,

We are seeing a 32-bit PCI read resulting in -1U. This is a regression issue in more recent versions of FreeBSD. Can you comment on these failure messages:

--HPS

> Hmm, I see that there are two xhci controllers.  One is on pci7, and I see this:
> 
> pcib7: <PCI-PCI bridge> at device 2.0 numa-domain 0 on pci4
> pcib7: failed to allocate initial I/O port window: 0-0xfff
> pcib7: failed to allocate initial memory window: 0-0xfffff
> pcib7: failed to allocate initial prefetch window: 0-0xfffff
> pci7: <PCI bus> numa-domain 0 on pcib7
> xhci0: <XHCI (generic) USB 3.0 controller> mem 0xd8000000-0xd800ffff at device 0
> .0 numa-domain 0 on pci7
> xhci0: 32 bytes context size, 64-bit DMA
> usbus0 numa-domain 0 on xhci0
> 
> and
> 
> pcib1: <ACPI Host-PCI bridge> port 0xcf8-0xcff numa-domain 0 on acpi0
> pci1: <ACPI PCI bus> numa-domain 0 on pcib1
> ...
> pci1: <unknown> at device 17.0 (no driver attached)
> xhci1: <Intel Wellsburg USB 3.0 controller> mem 0xf6120000-0xf612ffff irq 19 at device 20.0 numa-domain 0 on pci1
> xhci1: 32 bytes context size, 64-bit DMA
> usbus1 numa-domain 0 on xhci1
> pci1: <simple comms> at device 22.0 (no driver attached)
> 
> I'm also not using any USB3 device regularly, so I don't know if the misbehaving controller is functioning at all.
> 
> The messages above are similar with 11.3.
Comment 31 Hans Petter Selasky freebsd_committer 2019-08-10 17:47:10 UTC
Mike, can you repeat when you started seeing this bug?
Comment 32 Mike Karels freebsd_committer 2019-08-10 18:27:49 UTC
Yes, I started seeing these when I upgraded from FreeBSD 11.3 (RELEASE) to 12.0-RELEASE-p9 about a week ago.  I have /var/log/messages from 11.3 for comparison. btw, I added your xhci patch to the system with no EARLY_AP_STARTUP, although it probably doesn't matter.  I'll note that 12.0 is older than 11.3 in various ways; I should probably try 12-stable.

fwiw, this is a Gigabyte GA-X99P-SLI motherboard (rev. 1.0) with Intel Core i7-5820K, 6 cores plus hyperthreading, several years old, and has run many versions of FreeBSD.  I believe the chipset is an Intel X99; I could research the details.  I might be able to reboot 11.3 if it mattered (it's on an alternate partition, but I don't know how to boot it using UEFI).

Thanks for looking into this!
Comment 33 Mike Karels freebsd_committer 2019-08-10 20:21:48 UTC
An up-to-date GENERIC 12-stable kernel has the problem also.
Comment 34 Konstantin Belousov freebsd_committer 2019-08-11 08:52:45 UTC
(In reply to Hans Petter Selasky from comment #30)
Minus one reads mean that there is no more a device to respond at the address read.  I have no idea why, and there seems to be the mix of several scenarios of failure.  At least I would not be suprised to see that if the docking station was detached, assuming the controller is inside it.
Comment 35 Mike Karels freebsd_committer 2019-08-11 13:51:32 UTC
This is not a laptop; it's a tower computer.  Also, note that the messages start even before autoconfiguration is complete.
Comment 36 Hans Petter Selasky freebsd_committer 2019-08-11 14:58:06 UTC
@kib: What about these PCI errors?

> pcib7: failed to allocate initial I/O port window: 0-0xfff
> pcib7: failed to allocate initial memory window: 0-0xfffff
> pcib7: failed to allocate initial prefetch window: 0-0xfffff
Comment 37 Hans Petter Selasky freebsd_committer 2019-08-11 15:13:44 UTC
Hi,

Can you attach output from:

devinfo -r

and

acpidump -dt

--HPS
Comment 38 Hans Petter Selasky freebsd_committer 2019-08-11 15:15:27 UTC
Adding John Baldwin aswell after searching for similar issues.
Comment 39 Konstantin Belousov freebsd_committer 2019-08-11 15:37:14 UTC
(In reply to Hans Petter Selasky from comment #36)
These 'failed to allocate' messages are the secondary effects of reading all ones.  Values of the bridge' registers to specify io and memory window to forward to subordinary buses are all ones (AKA 0xfff...), and that range conflicts with already allocated resources.

So the question is how it comes that bridges were enumerated, but accessing their PCI config space reads failing.  I am curious if manually reading e.g. pci vendor or device id using pciconf would show sane values or 1111...s.
Comment 40 Mike Karels freebsd_committer 2019-08-11 17:47:57 UTC
Created attachment 206443 [details]
devinfo -r

devinfo -r is attached; acpdidump -dt is ftp://ftp.karels.net/outgoing/acpidump-dt.
Comment 41 John Baldwin freebsd_committer freebsd_triage 2019-08-12 16:47:54 UTC
If PCI messed up you would get -1 constantly, not sporadically.  FWIW, I have the exact same problem on my desktop since I upgraded it to 12.0.  I also use apcupsd and usually get a couple of e-mails a day due to "losing communications", but apcupsd is able to reattach when the device comes back so it still works overall aside from sending spammy e-mails.

Mine is also not a laptop, it's an Ivy Bridge desktop motherboard:

smbios.planar.maker="MSI"
smbios.planar.product="Z77A-G45 (MS-7752)"

xhci0: <Intel Panther Point USB 3.0 controller> mem 0xf7100000-0xf710ffff irq 16 at device 20.0 on pci0
xhci0: 32 bytes context size, 64-bit DMA
usbus0: waiting for BIOS to give up control
xhci0: Port routing mask set to 0xffffffff
usbus0 on xhci0
usbus0: 5.0Gbps Super Speed USB v3.0
ehci0: <Intel Panther Point USB 2.0 controller> mem 0xf7118000-0xf71183ff irq 16 at device 26.0 on pci0
usbus1: EHCI version 1.0
usbus1 on ehci0
usbus1: 480Mbps High Speed USB v2.0
...
ehci1: <Intel Panther Point USB 2.0 controller> mem 0xf7117000-0xf71173ff irq 23 at device 29.0 on pci0
usbus2: EHCI version 1.0
usbus2 on ehci1
usbus2: 480Mbps High Speed USB v2.0
...
ugen1.1: <Intel EHCI root HUB> at usbus1
ugen2.1: <Intel EHCI root HUB> at usbus2
ugen0.1: <0x8086 XHCI root HUB> at usbus0
...
uhub0: <Intel EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus1
uhub1: <Intel EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus2
uhub2: <0x8086 XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus0
uhub2: 8 ports with 8 removable, self powered
uhub0: 2 ports with 2 removable, self powered
uhub1: 2 ports with 2 removable, self powered
ugen1.2: <vendor 0x8087 product 0x0024> at usbus1
uhub3 on uhub0
uhub3: <vendor 0x8087 product 0x0024, class 9/0, rev 2.00/0.00, addr 2> on usbus1
ugen2.2: <vendor 0x8087 product 0x0024> at usbus2
uhub4 on uhub1
uhub4: <vendor 0x8087 product 0x0024, class 9/0, rev 2.00/0.00, addr 2> on usbus2
uhub3: 6 ports with 6 removable, self powered
uhub4: 8 ports with 8 removable, self powered
ugen1.3: <American Power Conversion Back-UPS RS 1000G FW:868.L3 .D USB FW:L3> at usbus1
ugen1.4: <vendor 0x0451 product 0x8043> at usbus1
uhub5 on uhub3
uhub5: <vendor 0x0451 product 0x8043, class 9/0, rev 2.10/1.00, addr 4> on usbus1
uhub5: MTT enabled
uhub5: 4 ports with 4 removable, self powered
ugen1.5: <Logitech Trackball> at usbus1
ums0 on uhub5
ums0: <Logitech Trackball, class 0/0, rev 1.10/2.20, addr 5> on usbus1
ums0: 3 buttons and [XYZ] coordinates ID=0
ugen1.6: <LITE-ON Technology USB NetVista Full Width Keyboard.> at usbus1
ukbd0 on uhub5
ukbd0: <EP1 Interrupt> on usbus1
kbd2 at ukbd0

This is a typical set of logs from /var/log/messages when the UPS bounces:

Aug  9 17:35:01 ralph kernel: ugen1.3: <American Power Conversion Back-UPS RS 10
00G FW:868.L3 .D USB FW:L3> at usbus1 (disconnected)
Aug  9 17:35:01 ralph kernel: ugen1.3: <American Power Conversion Back-UPS RS 10
00G FW:868.L3 .D USB FW:L3> at usbus1
Aug  9 17:35:01 ralph kernel: interface uhid.1 already present in the KLD 'kerne
l'!
Aug  9 17:35:01 ralph kernel: linker_load_file: /boot/kernel/uhid.ko - unsupport
ed file type
Aug  9 17:35:07 ralph apcupsd[1967]: Communications with UPS restored.
Comment 42 John Baldwin freebsd_committer freebsd_triage 2019-08-12 16:50:06 UTC
(In reply to Hans Petter Selasky from comment #36)
Those are a red herring.  The next few lines down in dmesg show that a child device allocated resources ok which meant that pcib7 allocated fresh resources for its I/O windows.  You could verify this with 'pciconf -lB pcib7' and see that it has valid ranges for I/O and memory decoding, but the child device (xhci0) would not be able to attach successfully if we hadn't allocated valid windows in the parent bridge.
Comment 43 Konstantin Belousov freebsd_committer 2019-08-12 17:46:16 UTC
(In reply to John Baldwin from comment #42)
apcupsd loosing ugen's is nothing new.  I have this since 9.x times on very different machines, from old cedarview atoms up to the Skylake-X.  I do not think it is related.
Comment 44 John Baldwin freebsd_committer freebsd_triage 2019-08-12 18:02:11 UTC
(In reply to Konstantin Belousov from comment #43)
Yes, but I never had a single instance while running 10.x and 11.x on this desktop.  The day I upgraded to 12.0 I started getting multiple alerts a day.
Comment 45 Mike Karels freebsd_committer 2019-09-08 20:09:40 UTC
Any additional thoughts on this?  I rebooted with my UPS disconnected, and I still get constant messages about device vanished.  Then I rebooted with *no* external USB devices connected, and I *still* get these messages immediately and constantly.  I never saw this with 11.3, and I may have to revert to it.  I'm happy to run experiments to test any theories.
Comment 46 Hans Petter Selasky freebsd_committer 2019-09-09 07:14:34 UTC
Maybe you could try to do some bisecting with the FreeBSD master branch? To find the offending commit?

--HPS
Comment 47 Hans Petter Selasky freebsd_committer 2019-09-09 07:31:16 UTC
Start with r308158 of FreeBSD head branch.

Then build kernels and add revision steps like 1 2 4 8 16 32 and so on until the issue appears. Then repeat like in binary search to find the exact revision.

--HPS
Comment 48 Mike Karels freebsd_committer 2019-09-10 01:13:49 UTC
I'll give it a try soon.
Comment 49 Hans Petter Selasky freebsd_committer 2019-09-17 14:06:27 UTC
Ping - any progress?
Comment 50 Mike Karels freebsd_committer 2019-09-17 23:21:05 UTC
I've been working on this off and on.  It looks like the repeated messages started with r330957, an ACPI update.  However, the problem with the UPS started much earlier.  I haven't bracketed that, as lots of the early systems won't boot, at least multi-user, which makes it hard to test.  It seems to be before r315272.
Comment 51 Hans Petter Selasky freebsd_committer 2019-09-18 07:17:14 UTC
You should be able to identify the ACPI update in the stable/12 branch aswell. Then use that to bisect, because head might be a bit unstable as you see :-)

--HPS
Comment 52 Hans Petter Selasky freebsd_committer 2019-09-18 09:18:56 UTC
commit 48286dea101b3aad1e566952e372a646c87a8970
Author: jkim <jkim@FreeBSD.org>
Date:   Wed Mar 14 23:45:48 2018 +0000

    Merge ACPICA 20180313.


HEAD r330957 appears to be before 12.0 was released and was never MFC'ed to FreeBSD 11. I'm adding jkim, in case he has any input on this.

--HPS
Comment 53 Hans Petter Selasky freebsd_committer 2019-09-18 09:23:39 UTC
jkim: What is the right way to add ACPI debugging?
Comment 54 Mike Karels freebsd_committer 2019-09-19 03:31:01 UTC
(In reply to Hans Petter Selasky from comment #52)
In an "svn log" that goes far enough back to see this ACPI update on stable/12, I see the same revision number as on head.  "svn update -r330957" in that tree deletes most of the kernel.  I'm not sure what this means, but I know I can't compile it.
Comment 55 Jung-uk Kim freebsd_committer 2019-09-20 20:12:40 UTC
(In reply to Hans Petter Selasky from comment #53)
Add "options ACPI_DEBUG" to kernel configuration and set debug.acpi.layer and debug.acpi.level as described in acpi(4).

https://www.freebsd.org/cgi/man.cgi?query=acpi&format=html#DEBUGGING_OUTPUT
Comment 56 Jung-uk Kim freebsd_committer 2019-09-20 22:13:35 UTC
(In reply to Mike Karels from comment #54)
I just updated stable/12 to ACPICA 20190703 (r352566).

https://svnweb.freebsd.org/changeset/base/352566
Comment 57 Mike Karels freebsd_committer 2019-09-21 01:25:38 UTC
(In reply to Jung-uk Kim from comment #56)
I built a kernel at r352566.  It still prints this error message once per second.  However, I did not get the messages from apcupsd about losing contact.  I don't know if that means if it is working better or worse.  As I am running 12.0 userland, I went back to my previous kernel.

btw, I had to add "device iflib" to GENERIC.  I'll check whether anyone has commented on that.
Comment 58 Gerard Seibert 2019-11-15 21:35:01 UTC
I am getting this exact same message when I try to install FreeBSD 12.1 onto a new Dell XPS 8930 PC. I am unable to complete the installation. This was while using a "Boot Only" CD.
Comment 59 Hans Petter Selasky freebsd_committer 2019-11-16 09:13:11 UTC
Is this a regression issue?
Comment 60 Hans Petter Selasky freebsd_committer 2019-11-16 09:15:46 UTC
Hi,

This issue likely has to do with some PCI register spuriously returning -1U . Someone has to bisect the revision history to figure out the commit that introduced it!

--HPS
Comment 61 Mike Karels freebsd_committer 2019-11-16 13:22:52 UTC
Answering questions in the last two comments:

- Yes, this is a regression.  See the description and comment 15.

- I did the bisection on head, see Comment 50.  I started to look at bisection on the 12 branch; I don't remember how far I got.
Comment 62 Mike Karels freebsd_committer 2019-11-16 19:20:37 UTC
If I'm interpreting things right, r330957 was committed on head before releng/12.0 was branched.  If I try to update releng/12.0 or stable/12 with that revision (actually the one before, r330956), svn says that the target path does not exist.
Comment 63 Gerard Seibert 2019-11-18 17:57:07 UTC
I am just wondering if there has been any movement on this issue. At this point, I am stuck with a new computer that I cannot install FreeBSD 12.1 on, which effectively leaves me with an expensive paperweight that, unfortunately, I cannot get my money back on.
Comment 64 Hans Petter Selasky freebsd_committer 2019-11-19 06:48:36 UTC
Can you try to install FreeBSD 11?
Comment 65 Gerard Seibert 2019-11-19 10:38:53 UTC
(In reply to Hans Petter Selasky from comment #64)

I hate to go backward. It goes against my basic instincts. However, when I get a chance, I will download the "FreeBSD-11.3-RELEASE-amd64-bootonly.iso" and try to install it from there. Would that be acceptable?
Comment 66 Gerard Seibert 2019-11-21 15:55:29 UTC
I can confirm that FreeBSD 11.3 installs and works fine on this PC. Obviously, the problem lies with the 12 branch. I am not a skilled programmer, but what else can I do to help? I really want to squash this bug so I can install the 12 branch.
Comment 67 Hans Petter Selasky freebsd_committer 2019-11-21 15:58:46 UTC
It is likely a regression in sys/dev/pci or ACPI code.

We need to identify the exact commit that breaks it. It would be a great help.

If you are on IRC and can offer you to build the needed kernels, and then you have to install/boot them from a FreeBSD 11 install as base.

--HPS
Comment 68 Gerard Seibert 2019-11-22 16:26:55 UTC
Created attachment 209346 [details]
dmesg from FreeBSD 11.3 on Dell 8930 PC

This is the dmesg displayed when FreeBSD 11.3 loads up. I hope it will be of some use.
Comment 69 Hans Petter Selasky freebsd_committer 2019-12-04 12:13:59 UTC
Ping - any news here? Did anyone complete bisecting?
Comment 70 Gerard Seibert 2019-12-04 13:35:10 UTC
(In reply to Hans Petter Selasky from comment #69)

I don't believe that a solution has been discovered as of yet. The really bad news is that this "bug" apparently also infects the FreeBSD 13.x branch. I am still stuck on the 11.x branch with no avenue available to upgrade my OS.

I have faith that they will find the bug and squash it.

Thought of the Day:
"Real stupidity beats artificial intelligence every time.”
Comment 71 Hans Petter Selasky freebsd_committer 2019-12-04 14:23:07 UTC
Is this hardware something that can be bought cheaply?

--HPS
Comment 72 Gerard Seibert 2019-12-04 15:55:16 UTC
(In reply to Hans Petter Selasky from comment #71)

I cannot speak for anyone else; however, in my case, this was a brand new Dell XPS 8930 PC. I purchased it as a cheap replacement for one of my old (2007) PCs that was barely limping along. Everything was stock. No fancy add-ons, etcetera. I have posted the DMESG, comment #68, previously. For the record, my "OLD" unit was running FreeBSD 12.
Comment 73 schmidt 2019-12-04 16:01:51 UTC
(In reply to Hans Petter Selasky from comment #71)
My Mainboad is a servermainboard from supermicro. So nothing realy cheap there.
Comment 74 Gerard Seibert 2019-12-04 16:21:39 UTC
While I do not believe this to be an "edge" case, I don't think it will get any serious attention until more users speak up. I know of at least one other user who has the same problem and has not posted it. I will encourage them to do so.

I do know that this problem has been reported on the Dell Community Forum. They are claiming it is FreeBSD's problem since there have been no reports of issues from users of other operating systems.

Until enough users file a complaint against this bug, I doubt that it will get any serious attention.
Comment 75 Mike Karels freebsd_committer 2019-12-05 01:46:18 UTC
(In reply to Hans Petter Selasky from comment #71)

My hardware is not especially cheap, but is also not new and might not be readily available.  It's a Gigabyte motherboard with a Haswell-E CPU.

About bisection: I bisected on head, and ended up with r330957 as the commit that started the problem (see comment #50).  It appears that that revision was not merged separately to 12, so there is nothing to bisect there.  That commit was an ACPI update.
Comment 76 Hans Petter Selasky freebsd_committer 2019-12-06 09:59:39 UTC
OK,

Can you add "options ACPI_DEBUG" to your kernel configuration file?

Then in /boot/loader.conf add:

debug.acpi.layer="ACPI_ALL_COMPONENTS ACPI_ALL_DRIVERS"
debug.acpi.level="ACPI_LV_ERROR"

The list of valid keywords are given by:
sys/dev/acpica/acpi.c

Try to correlate an ACPI event with the enumeration failure!

If no ACPI events, try to enable more ACPI debug layers/levels until you find such an event or error message.

--HPS
Comment 77 Hans Petter Selasky freebsd_committer 2019-12-06 10:16:37 UTC
If that fails, try to compare ACPI debug output before and after r330957.

--HPS
Comment 78 Mike Karels freebsd_committer 2019-12-07 15:19:15 UTC
I built a stable/12 kernel with ACPI debug.  ACPI_DEBUG by itself didn't compile; it used global variables conditional on ACPI_DEBUGGER.  I added ACPI_DEBUGGER, and added that to conf/options.

Using the suggested loader variables, there was no new output.  I don't see ACPI_LV_ERROR in acpi.c.  I tried verbosity1, and that spewed messages so fast it prevented a successful boot in 20 minutes or so.  I then tried ACPI_LV_INFO.  Here is what is logged in about a second, from one "vanished" message to the next:

Dec  7 09:09:35 pughole kernel: uhub_reattach_port: giving up port reset - devic
e vanished
Dec  7 09:09:35 pughole kernel:  exregion-0527 ExPciConfigSpaceHandle: Pci-Confi
g 0 (8) Seg(0000) Bus(0000) Dev(0003) Func(0002) Reg(00b2)
Dec  7 09:09:35 pughole kernel:  exregion-0527 ExPciConfigSpaceHandle: Pci-Confi
g 0 (8) Seg(0000) Bus(0000) Dev(0003) Func(0002) Reg(00aa)
Dec  7 09:09:35 pughole syslogd: last message repeated 3 times
Dec  7 09:09:35 pughole kernel:  exregion-0527 ExPciConfigSpaceHandle: Pci-Confi
g 1 (8) Seg(0000) Bus(0000) Dev(0003) Func(0002) Reg(00aa)
Dec  7 09:09:35 pughole kernel:  exregion-0527 ExPciConfigSpaceHandle: Pci-Confi
g 0 (8) Seg(0000) Bus(0000) Dev(0003) Func(0002) Reg(00aa)
Dec  7 09:09:35 pughole kernel:  exregion-0527 ExPciConfigSpaceHandle: Pci-Confi
g 0 (8) Seg(0000) Bus(0000) Dev(0003) Func(0003) Reg(00b2)
Dec  7 09:09:35 pughole kernel:  exregion-0527 ExPciConfigSpaceHandle: Pci-Confi
g 0 (8) Seg(0000) Bus(0000) Dev(0003) Func(0003) Reg(00aa)
Dec  7 09:09:35 pughole syslogd: last message repeated 3 times
Dec  7 09:09:35 pughole kernel:  exregion-0527 ExPciConfigSpaceHandle: Pci-Confi
g 1 (8) Seg(0000) Bus(0000) Dev(0003) Func(0003) Reg(00aa)
Dec  7 09:09:35 pughole kernel:  exregion-0527 ExPciConfigSpaceHandle: Pci-Confi
g 0 (8) Seg(0000) Bus(0000) Dev(0003) Func(0003) Reg(00aa)
Dec  7 09:09:35 pughole kernel:  exregion-0527 ExPciConfigSpaceHandle: Pci-Confi
g 0 (8) Seg(0000) Bus(0080) Dev(0000) Func(0000) Reg(00b2)
Dec  7 09:09:35 pughole kernel:  exregion-0527 ExPciConfigSpaceHandle: Pci-Confi
g 0 (8) Seg(0000) Bus(0080) Dev(0000) Func(0000) Reg(00aa)
Dec  7 09:09:35 pughole syslogd: last message repeated 3 times
Dec  7 09:09:35 pughole kernel:  exregion-0527 ExPciConfigSpaceHandle: Pci-Confi
g 1 (8) Seg(0000) Bus(0080) Dev(0000) Func(0000) Reg(00aa)
Dec  7 09:09:35 pughole kernel:  exregion-0527 ExPciConfigSpaceHandle: Pci-Confi
g 0 (8) Seg(0000) Bus(0080) Dev(0000) Func(0000) Reg(00aa)
Dec  7 09:09:36 pughole kernel:  exregion-0527 ExPciConfigSpaceHandle: Pci-Confi
g 0 (8) Seg(0000) Bus(0080) Dev(0002) Func(0000) Reg(00b2)
Dec  7 09:09:36 pughole kernel:  exregion-0527 ExPciConfigSpaceHandle: Pci-Confi
g 0 (8) Seg(0000) Bus(0080) Dev(0002) Func(0000) Reg(00aa)
Dec  7 09:09:36 pughole syslogd: last message repeated 3 times

ec  7 09:09:36 pughole kernel:  exregion-0527 ExPciConfigSpaceHandle: Pci-Confi
g 1 (8) Seg(0000) Bus(0080) Dev(0002) Func(0000) Reg(00aa)
Dec  7 09:09:36 pughole kernel:  exregion-0527 ExPciConfigSpaceHandle: Pci-Confi
g 0 (8) Seg(0000) Bus(0080) Dev(0002) Func(0000) Reg(00aa)
Dec  7 09:09:36 pughole kernel:  exregion-0527 ExPciConfigSpaceHandle: Pci-Confi
g 0 (8) Seg(0000) Bus(0080) Dev(0002) Func(0001) Reg(00b2)
Dec  7 09:09:36 pughole kernel:  exregion-0527 ExPciConfigSpaceHandle: Pci-Confi
g 0 (8) Seg(0000) Bus(0080) Dev(0002) Func(0001) Reg(00aa)
Dec  7 09:09:36 pughole syslogd: last message repeated 3 times
Dec  7 09:09:36 pughole kernel:  exregion-0527 ExPciConfigSpaceHandle: Pci-Confi
g 1 (8) Seg(0000) Bus(0080) Dev(0002) Func(0001) Reg(00aa)
Dec  7 09:09:36 pughole kernel:  exregion-0527 ExPciConfigSpaceHandle: Pci-Confi
g 0 (8) Seg(0000) Bus(0080) Dev(0002) Func(0001) Reg(00aa)
Dec  7 09:09:36 pughole kernel:  exregion-0527 ExPciConfigSpaceHandle: Pci-Confi
g 0 (8) Seg(0000) Bus(0080) Dev(0002) Func(0002) Reg(00b2)
Dec  7 09:09:36 pughole kernel:  exregion-0527 ExPciConfigSpaceHandle: Pci-Confi
g 0 (8) Seg(0000) Bus(0080) Dev(0002) Func(0002) Reg(00aa)
Dec  7 09:09:36 pughole syslogd: last message repeated 3 times
Dec  7 09:09:36 pughole kernel:  exregion-0527 ExPciConfigSpaceHandle: Pci-Confi
g 1 (8) Seg(0000) Bus(0080) Dev(0002) Func(0002) Reg(00aa)
Dec  7 09:09:36 pughole kernel:  exregion-0527 ExPciConfigSpaceHandle: Pci-Confi
g 0 (8) Seg(0000) Bus(0080) Dev(0002) Func(0002) Reg(00aa)
Dec  7 09:09:36 pughole kernel: uhub_reattach_port: giving up port reset - devic
e vanished

I have no idea if this includes any useful information.
Comment 79 Hans Petter Selasky freebsd_committer 2019-12-08 21:02:08 UTC
Can you show the output of "pciconf -lv" ?
Comment 80 Hans Petter Selasky freebsd_committer 2019-12-08 21:17:06 UTC
Are you familiar with dtrace. Could you get the backtrace of these function calls?

kldload dtraceall

cat << EOF > dtrace.d 
#!/usr/sbin/dtrace -s

fbt::$1:entry
{
        @[probefunc, stack()] = count();
}
EOF

chmod +x dtrace.d

./dtrace.d ExPciConfigSpaceHandle

Wait, then CTRL+C
Comment 81 Mike Karels freebsd_committer 2019-12-08 21:58:45 UTC
Created attachment 209793 [details]
pciconf -lv from Gigabyte system
Comment 82 Mike Karels freebsd_committer 2019-12-08 22:07:38 UTC
pciconf-lv is attached.

When I tried the dtrace script after kldloading dtraceall, I got this:

pughole# ./dtrace.d ExPciConfigSpaceHandle
dtrace: failed to compile script ./dtrace.d: line 3: probe description fbt:ExPciConfigSpaceHandle:filec:entry does not match any probes
Comment 83 Hans Petter Selasky freebsd_committer 2019-12-08 22:21:30 UTC
Try:

AcpiExPciConfigSpaceHandler

Instead.
Comment 84 Hans Petter Selasky freebsd_committer 2019-12-08 22:27:55 UTC
@kib: It looks like ACPI is addressing a device at PCI address 128, which is not in the list of PCI devices on the given machine. Is this allowed?
Comment 85 Mike Karels freebsd_committer 2019-12-08 23:48:35 UTC
Same result with dtrace.d:

pughole# ./dtrace.d AcpiExPciConfigSpaceHandler
dtrace: failed to compile script ./dtrace.d: line 3: probe description fbt:AcpiExPciConfigSpaceHandler:filec:entry does not match any probes

This is the kernel with ACPI debug but no other options added to GENERIC.
Comment 86 Hans Petter Selasky freebsd_committer 2019-12-11 14:29:01 UTC
Did you do:

kldload dtraceall

first?

--HPS

dtrace -l | grep -i ACPI
Comment 87 Mike Karels freebsd_committer 2019-12-12 01:25:02 UTC
Yes, I loaded dtraceall.

Derived from the command you sent, and copying/pasting the function name:

pughole# dtrace -l | grep -i acpiexpci
30153        fbt            kernel          AcpiExPciBarSpaceHandler entry
30154        fbt            kernel          AcpiExPciBarSpaceHandler return
30155        fbt            kernel       AcpiExPciConfigSpaceHandler entry
30156        fbt            kernel       AcpiExPciConfigSpaceHandler return
30418        fbt            kernel              AcpiExPciClsToString entry
30419        fbt            kernel              AcpiExPciClsToString return
pughole# ./dtrace.d AcpiExPciConfigSpaceHandler
dtrace: failed to compile script ./dtrace.d: line 3: probe description fbt:AcpiExPciConfigSpaceHandler:filec:entry does not match any probes
pughole# cat dtrace.d
#!/usr/sbin/dtrace -s

fbt:$1:filec:entry
{
        @[probefunc, stack()] = count();
}
pughole#
Comment 88 Mike Karels freebsd_committer 2019-12-12 04:06:02 UTC
Followup on previous comment: I'm not sure why, but the system is not logging the continuous acpi messages that it did previously.  It has been rebooted several times, including other kernels, but is back to the ACPI_DEBUG configuration.  Perhaps the error means the function has not been called?
Comment 89 Hans Petter Selasky freebsd_committer 2019-12-12 07:51:18 UTC
Can you remove "filec:" from the dtrace script?

--HPS
Comment 90 Mike Karels freebsd_committer 2019-12-12 12:13:48 UTC
Similar result:

pughole# ./dtrace.d AcpiExPciConfigSpaceHandler
dtrace: failed to compile script ./dtrace.d: line 3: probe description :fbt:AcpiExPciConfigSpaceHandler:entry does not match any probes
Comment 91 Hans Petter Selasky freebsd_committer 2019-12-12 12:17:14 UTC
#!/usr/sbin/dtrace -s

fbt::$1:entry
{
        @[probefunc, stack()] = count();
}

I think the $1 was expanded when you created dtrace.d!

Can you type this script verbatim and try again?

It basically should work, tested over here!
Comment 92 Mike Karels freebsd_committer 2019-12-14 02:57:47 UTC
The $1 was expanded (to nothing) when I ran your script to cat it to a file, but I fixed it at the time.  The latest version runs, but produces no output in a minute or two.  I am also not getting the debug messages, even after a reboot.  The current settings from "sysctl debug.acpi" are:

debug.acpi.level: ACPI_LV_INFO
debug.acpi.layer: ACPI_UTILITIES ACPI_HARDWARE ACPI_EVENTS ACPI_TABLES ACPI_NAMESPACE ACPI_PARSER ACPI_DISPATCHER ACPI_EXECUTER ACPI_RESOURCES AC

I also tried "debug.acpi.level=ACPI_LV_VERBOSITY1"; I'm still getting nothing.
Comment 93 Hans Petter Selasky freebsd_committer 2019-12-16 08:54:07 UTC
Can you retry enabling full ACPI debug messages, and also set:
kern.consmute=1

In /boot/loader.conf or try via the loader first.

You will need to log in via SSH.

That way the excessive debug prints should not halt the booting of the OS.
Comment 94 Mike Karels freebsd_committer 2019-12-17 02:49:01 UTC
Looks like kern.consmute is not a loader tunable, just a sysctl.  It didn't work, and I don't see that string in the kernel.  I let it scroll messages madly for about 30 minutes, but didn't get to the point of being able to ping it.  Of course, 99% of the messages would be gone by the time I got logged in.  Any thoughts on narrowing the scope of the messages?
Comment 95 Felix Guest 2019-12-23 05:44:32 UTC
Created attachment 210170 [details]
Dmesg with hw.usb.xhci.debug=1

Verbose boot for similar error with Mediasonic Probox (external USB3 hard drive enclosure).
Comment 96 Gerard Seibert 2020-01-13 11:15:58 UTC
Has there been any movement on this issue? FreeBSD 12.2 is on the horizon, and I and others, are stuck with FreeBSD 11.x because we cannot get FreeBSD 12.x to install and run satisfactorily. This is happening with new, modern PCs, not some dinosaurs leftover from the '90s.
Comment 97 Hans Petter Selasky freebsd_committer 2020-01-13 11:22:36 UTC
There has been one patch for XHCI:

https://svnweb.freebsd.org/changeset/base/356545

The issue is believed to be ACPI related.

--HPS
Comment 98 Gerard Seibert 2020-01-13 11:36:55 UTC
(In reply to Hans Petter Selasky from comment #97)

Will this fix be applied to FreeBSD 12.1, or will I have to wait until 12.2 is released so that I can try it and see if it works? I am not familiar with how FreeBSD backports their corrections/updates/patches.
Comment 99 Hans Petter Selasky freebsd_committer 2020-01-13 11:39:56 UTC
This fix will only be applied to 12-stable, next week, and will be part of the next stable release.

Else you need to build a custom kernel with this patch applied, in order to test.
Comment 100 Mike Karels freebsd_committer 2020-01-25 16:32:41 UTC
I am now running a stable-12 kernel at r356868, which includes this change.  The problem is unchanged (also my problem of disconnects by the UPS).  I also tried booting with no USB devices attached; the problem still occurs.
Comment 101 Gerard Seibert 2020-02-06 12:21:09 UTC
Apparently, this problem has existed since at least April 2019. That seems, at least to me, to be an awfully long time, at least at this level, for a bug to go unsquashed. I am preparing to purchase another PC similar to the one I now have to replace an older unit. It is apparent that FreeBSD 12.x will not operate on it correctly.

I am starting to wonder if FreeBSD is the best platform for my use. If it is unable to run modern hardware, then perhaps it is time for me to investigate other operating systems. I would hate to start over with a new OS, but I don't seem to be left with too many other options. While 12.2 is still a long way off, 11.4 is due this summer. I am not even sure if that will work on my PC or simple make the system unbootable.

I guess my quest is if there is anyone actively working on this bug and what the prognosis is for this problem.

I don't mean to sound crass, but this problem has gone on, in my opinion, far too long. With newer systems coming on the market daily, I honestly believe that this problem will manifest into a much larger problem and potential embarrassment for FreeBSD.
Comment 102 Hans Petter Selasky freebsd_committer 2020-02-06 13:18:23 UTC
Hi,

I have a no-name Skylake based laptop and USB XHCI works fine.

You might find the following read interesting:

http://freebsd.1045724.x6.nabble.com/Can-t-reattach-USB-devices-Lenovo-bug-td6369656.html

If I'm right, this issue has nothing to do with the generic USB/XHCI, it is entirely ACPI/BIOS related and you should bring it up with the vendor of your computer.

--HPS
Comment 103 Gerard Seibert 2020-02-06 16:48:31 UTC
I just updated the BIOS on my system. That version was only one version older than the new one. Anyway, this is the output from 'dmidecode'.

# dmidecode 3.2
# SMBIOS entry point at 0x6fe8d000
Found SMBIOS entry point in EFI, reading table from /dev/mem.
SMBIOS 3.1 present.

Handle 0x0000, DMI type 0, 24 bytes
BIOS Information
        Vendor: Dell Inc.
        Version: 1.1.10
        Release Date: 10/31/2019
        Address: 0xF0000
        Runtime Size: 64 kB
        ROM Size: 16 MB
        Characteristics:
                PCI is supported
                PNP is supported
                BIOS is upgradeable
                BIOS shadowing is allowed
                Boot from CD is supported
                Selectable boot is supported
                EDD is supported
                5.25"/1.2 MB floppy services are supported (int 13h)
                3.5"/720 kB floppy services are supported (int 13h)
                3.5"/2.88 MB floppy services are supported (int 13h)
                Print screen service is supported (int 5h)
                Serial services are supported (int 14h)
                Printer services are supported (int 17h)
                ACPI is supported
                USB legacy is supported
                BIOS boot specification is supported
                Function key-initiated network boot is supported
                Targeted content distribution is supported
                UEFI is supported
        BIOS Revision: 1.1

Does anyone have any idea if this will correct the problem? I would really hate to trash the whole system again if this doesn't correct the problem. Also, why would this problem NOT exist on a FreeBSD 11.x (and perhaps older) system? It only started with 12.x. I find it somewhat difficult to believe it is solely a BIOS version problem.
Comment 104 Hans Petter Selasky freebsd_committer 2020-02-06 16:58:50 UTC
> Does anyone have any idea if this will correct the problem? I would really hate to trash the whole system again if this doesn't correct the problem. Also, why would this problem NOT exist on a FreeBSD 11.x (and perhaps older) system? It only started with 12.x. I find it somewhat difficult to believe it is solely a BIOS version problem.

I think for the kind of debugging which is needed, physical access to your machine will be required. How much are you willing to pay to have this issue resolved? Can you for example donate or ship a machine to me (Norway) which have these problems and I can probably look into it in a 1-2 months time frame.

--HPS
Comment 105 Gerard Seibert 2020-02-16 19:14:41 UTC
(In reply to Hans Petter Selasky from comment #104)

I wish I had a spare machine to ship off to you; however, I don't. I still find it hard to believe that this bug was not caught during the initial beta testing of version 12.x. I did contact Dell again, and they insisted the problem is with FreeBSD. They have not had a single report of a failure like this with another OS. That is probably true. Even FreeBSD 11.x works fine.
Comment 106 Gerard Seibert 2020-04-04 13:27:13 UTC
Created attachment 213055 [details]
Complete 'dmidecode' output for Dell XPS 8930
Comment 107 Gerard Seibert 2020-04-26 18:37:15 UTC
Created attachment 213822 [details]
FreeBSD 11.3 boot log
Comment 108 Gerard Seibert 2020-04-26 18:39:00 UTC
Hans Petter Selasky, since I don't have a spare machine to send you, I thought that sending the bootup log from FreeBSD 11.3-RELEASE-p8, which does work on this machine, might be of some use.
Comment 109 Mike Karels freebsd_committer 2020-05-16 20:23:45 UTC
Note that FreeBSD 11.4-BETA2 does not have this problem either.
Comment 110 Gerard Seibert 2020-05-16 23:49:05 UTC
(In reply to Mike Karels from comment #109)

Thanks, Mike. I was somewhat hesitant about moving up to 11.4 because of this bug. It would seem that someone "screwed the pooch" with version 12.x, and no one seems particularly interested in tracking it down. In my humble opinion, this bug will not be squashed until enough newer or similar PCs come on-line.
Comment 111 Gerard Seibert 2020-06-18 00:01:14 UTC
Can anyone confirm that this bug has been squashed? Aryeh Friedman <aryeh.friedman@gmail.com> claims that this bug has been fixed eons ago and that this incident should be closed. I would like some confirmation on this, if possible.

Thanks!
Comment 112 Mike Karels freebsd_committer 2020-06-18 01:25:13 UTC
This bug is not fixed in 12.1-STABLE as of April 26 (r360337).  If there is reason to think that it has been fixed since then, I'm happy to test.
Comment 113 Ihor Antonov 2020-07-03 23:57:18 UTC
The bug is alive. Just had it today on FreeBSD 13.0-CURRENT r362882

ThinkPad T470

The exact circumstances under which it happened are not clear. I noticed it when I detached external USB-C monitor and USB2 keyboard and took my laptop to another room.

I will try to figure out the exact sequence to reproduce the bug.
Comment 114 Ihor Antonov 2020-07-04 03:09:38 UTC
Yes, I can reproduce the bug by unplugging my external monitor.
Monitor is connected via USB-C cable. As soon as I unplug it I see this message in dmesg appearing roughly every second.

uhub_reattach_port: giving up port reset - device vanished


If I plug the monitor back - the message stops.
Comment 115 Gerard Seibert 2020-07-04 09:17:48 UTC
(In reply to Ihor Antonov from comment #114)

My monitor is connected in the usual manner; however, I am utilizing a keyboard & mouse connected via USB. The next time I attempt to install 12.x, I'll use a hardwired keyboard and mouse and see if the error still exists.
Comment 116 Gerard Seibert 2020-08-03 16:10:09 UTC
Hi, I am just doing my 'beginning of the month' request to see if any headway had been made on discovering the cause and hopefully the cure for this bug. With the release of FreeBSD 12.2 expected on or about the end of October and version 13.0 only five months later, it would really be nice if I, and others whose PCs are suffering from this affliction, could update our systems to something <= 11.x

Thank you for your efforts in this matter.
Comment 117 Kubilay Kocak freebsd_committer freebsd_triage 2020-08-05 09:13:30 UTC
Idea: Might be worth someone (who is affected) explicitly testing reverting base r343134 to see if the issue is still reproducible
Comment 118 Gerard Seibert 2020-08-05 10:28:16 UTC
(In reply to Kubilay Kocak from comment #117)

I tried installing FreeBSD 12.1-RELEASE a couple of months ago, and the bug still existed in that version. Are you inferring that I should try the STABLE version instead?
Comment 119 Mike Karels freebsd_committer 2020-08-17 02:06:36 UTC
I built a fresh kernel(r364289) with r343134 reverted; the problem still occurs.
Comment 120 Gerard Seibert 2020-08-17 10:17:47 UTC
(In reply to Mike Karels from comment #119)

This bug has obviously not been given the attention it deserves. With the planned release of versions 12.2 in October of this year and 13.0 next March, it will certainly not bode well for FreeBSD or its user base if they are released with this bug still intact.

Since this bug did not exist in versions prior to 12.x, there should exist some mechanism to backtrack and discover what went wrong.
Comment 121 Bjoern A. Zeeb freebsd_committer 2020-08-21 18:13:08 UTC
Hi,

I just unplugged a USB hub (cascade) USB-C (usb3 port) -> usb2 hub -> usb2 hub -> 3 uftdis and another uftdi at the first usb2 hub.

After that console endlessly started to scroll with:

uhub_reattach_port: giving up port reset - device vanished
uhub_reattach_port: giving up port reset - device vanished
uhub_reattach_port: giving up port reset - device vanished
uhub_reattach_port: giving up port reset - device vanished
uhub_reattach_port: giving up port reset - device vanished
uhub_reattach_port: giving up port reset - device vanished
uhub_reattach_port: giving up port reset - device vanished
uhub_reattach_port: giving up port reset - device vanished


as well and didn't clear up.  It is clear the device is gone and it is correct, why do we not clean this up?

I did run usbconfig in another shell and it is locked:

# usbconfig
load: 0.03  cmd: usbconfig 5017 [USB config SX lock] 1.36r 0.00u 0.00s 0% 2496k
mi_switch+0xc1 sleepq_catch_signals+0x3d1 sleepq_wait_sig+0x9 _sx_xlock_hard+0x451 usbd_enum_lock_sig+0xa1 usb_ref_device+0x221 usb_open+0x106 devfs_open+0x145 VOP_OPEN_APV+0x1c vn_open_vnode+0x1eb vn_open_cred+0x3b7 kern_openat+0x249 amd64_syscall+0x119 fast_syscall_common+0xf8

(coming back to this now after applying the two sysctls mentioned in this thread I see):

ugen0.1: <0x8086 XHCI root HUB> at usbus0, cfg=0 md=HOST spd=SUPER (5.0Gbps) pwr=SAVE (0mA)

Which is not all that is there on this either...


I have saved some logs but I don't know how long I can keep the device in this state (probably another hour or so).  I'll save as much dmesg/log I can.
In case you want me to quickly test something let me know.
Comment 122 Hans Petter Selasky freebsd_committer 2020-08-21 18:19:50 UTC
Try if this unlocks usbconfig:

sysctl hw.usb.disable_enumeration=1
sleep 10
usbconfig

--HPS
Comment 123 Bjoern A. Zeeb freebsd_committer 2020-08-21 20:07:59 UTC
(In reply to Hans Petter Selasky from comment #122)

usbconfig had already returned (maybe by setting the uhub or xhci sysctl to 0x17 I think it was).

I had in the mean time just plugged the USB-C multi-port adapter back in (without anything connected to it) and that stopped the repeated message.

If you want some of the uhub/xhci sysctl dmesg I can email it to you.
It seems the USB-C disconnect was detected but not fully happening?
Comment 124 Hans Petter Selasky freebsd_committer 2020-08-21 20:15:23 UTC
Can you try to reach out to Scott Long about USB-C ?
Comment 125 Bjoern A. Zeeb freebsd_committer 2020-08-22 03:24:28 UTC
(In reply to Hans Petter Selasky from comment #122)

Hmm, I applied the sysctl and removed the USB-C connection again and:

uhub0: detached
ugen0.1: <0x8086 XHCI root HUB> at usbus0 (disconnected)
unknown: at usbus0, port 1, addr 1 (disconnected)
usbus0: detached
xhci0: Controller reset timeout.
xhci0: detached
pci8: detached
pcib8: detached
pci7: detached
pcib7: detached
pci6: detached
pcib6: detached
pci5: detached
pcib5: detached


All the stuff went away :-)

Set the sysctl hw.usb.disable_enumeration=0

and plugged it back in:

pcib5: <ACPI PCI-PCI bridge> at device 0.0 on pci4
pci5: <ACPI PCI bus> on pcib5
pcib6: <PCI-PCI bridge> at device 0.0 on pci5
pci6: <PCI bus> on pcib6
pcib7: <PCI-PCI bridge> at device 1.0 on pci5
pci7: <PCI bus> on pcib7
pcib8: <ACPI PCI-PCI bridge> at device 2.0 on pci5
pci8: <ACPI PCI bus> on pcib8
xhci0: <XHCI (generic) USB 3.0 controller> mem 0xd9f00000-0xd9f0ffff at device 0.0 on pci8
xhci0: 32 bytes context size, 64-bit DMA
usbus0 on xhci0
usbus0: 5.0Gbps Super Speed USB v3.0
ugen0.1: <0x8086 XHCI root HUB> at usbus0
uhub0 on usbus0
uhub0: <0x8086 XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus0
uhub0: 4 ports with 4 removable, self powered
xhci0: Resetting controller
usb_alloc_device: device init 2 failed (USB_ERR_TIMEOUT, ignored)
ugen0.2: <Unknown > at usbus0 (disconnected)
uhub_reattach_port: could not allocate new device
usb_alloc_device: device init 2 failed (USB_ERR_TIMEOUT, ignored)
ugen0.2: <Unknown > at usbus0 (disconnected)
uhub_reattach_port: could not allocate new device
uhub0: at usbus0, port 1, addr 1 (disconnected)
uhub0: detached
uhub0 on usbus0
uhub0: <0x8086 XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus0
uhub0: 4 ports with 4 removable, self powered
xhci0: Resetting controller
usb_alloc_device: device init 2 failed (USB_ERR_TIMEOUT, ignored)
ugen0.2: <Unknown > at usbus0 (disconnected)
uhub_reattach_port: could not allocate new device
usb_alloc_device: device init 2 failed (USB_ERR_TIMEOUT, ignored)
ugen0.2: <Unknown > at usbus0 (disconnected)
uhub_reattach_port: could not allocate new device
uhub0: at usbus0, port 1, addr 1 (disconnected)
uhub0: detached
uhub0 on usbus0
uhub0: <0x8086 XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus0
uhub0: 4 ports with 4 removable, self powered
xhci0: Resetting controller
usb_alloc_device: device init 2 failed (USB_ERR_TIMEOUT, ignored)
ugen0.2: <Unknown > at usbus0 (disconnected)
uhub_reattach_port: could not allocate new device
usb_alloc_device: device init 2 failed (USB_ERR_TIMEOUT, ignored)
ugen0.2: <Unknown > at usbus0 (disconnected)
uhub_reattach_port: could not allocate new device
uhub0: at usbus0, port 1, addr 1 (disconnected)
uhub0: detached
uhub0 on usbus0
uhub0: <0x8086 XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus0
...
(and so on)
Comment 126 Mark Millard 2020-08-24 20:18:14 UTC
(In reply to Bjoern A. Zeeb from comment #125)

Your repeating sequence with the "(USB_ERR_TIMEOUT, ignored)"
text prompts the following note for a RPi4B context (edk2
UEFI/ACPI based boot):

I found that trying to boot with a kernel that had been built
with -mcpu=cortex-a72 the RPi4B would not boot and was reporting
such a repeating sequence.  (No microsd card is present for the
configuration: direct USB3 SSD based booting.)

But when the kernel is instead build with -mcpu=cortex-a53 the
kernel boots, getting only one:

xhci0: Resetting controller

The two -mcpu's select the same -march value but different -mtune
values. To my knowledge, the only differences should be performance
related as far as operation goes. So timing sensitive for the
problem.

For reference for the failed boot (cortex-a72 tuning) reports:
(I later show a success from cortex-a53 tuning.)

ugen0.1: <Generic XHCI root HUB> at usbus0
ugen1.1: <DWCOTG OTG Root HUB> at usbus1
uhub0 on usbus0
uhub1 on usbus1
uhub1: <DWCOTG OTG Root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus1
uhub0: <Generic XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus0
uhub1: 1 port with 1 removable, self powered
uhub0: 5 ports with 4 removable, self powered
Root mount waiting for: usbus0 CAM
xhci0: Resetting controller
usb_alloc_device: device init 2 failed (USB_ERR_TIMEOUT, ignored)
ugen0.2: <Unknown > at usbus0 (disconnected)
uhub_reattach_port: could not allocate new device
Root mount waiting for: usbus0 CAM
usb_alloc_device: device init 2 failed (USB_ERR_TIMEOUT, ignored)
ugen0.2: <Unknown > at usbus0 (disconnected)
uhub_reattach_port: could not allocate new device
uhub0: at usbus0, port 1, addr 1 (disconnected)
uhub0: detached
uhub0 on usbus0
uhub0: <Generic XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus0
uhub0: 5 ports with 4 removable, self powered
Root mount waiting for: CAM
xhci0: Resetting controller
usb_alloc_device: device init 2 failed (USB_ERR_TIMEOUT, ignored)
ugen0.2: <Unknown > at usbus0 (disconnected)
uhub_reattach_port: could not allocate new device
Root mount waiting for: CAM
usb_alloc_device: device init 2 failed (USB_ERR_TIMEOUT, ignored)
ugen0.2: <Unknown > at usbus0 (disconnected)
uhub_reattach_port: could not allocate new device
uhub0: at usbus0, port 1, addr 1 (disconnected)
uhub0: detached
uhub0 on usbus0
uhub0: <Generic XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus0
uhub0: 5 ports with 4 removable, self powered
Root mount waiting for: CAM
xhci0: Resetting controller
usb_alloc_device: device init 2 failed (USB_ERR_TIMEOUT, ignored)
ugen0.2: <Unknown > at usbus0 (disconnected)
uhub_reattach_port: could not allocate new device
Root mount waiting for: CAM
usb_alloc_device: device init 2 failed (USB_ERR_TIMEOUT, ignored)
Root mount waiting for:ugen0.2: <Unknown > at usbus0 (disconnected)
CAM
uhub_reattach_port: could not allocate new device
uhub0: at usbus0, port 1, addr 1 (disconnected)
uhub0: detached
uhub0 on usbus0
uhub0: <Generic XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus0
uhub0: 5 ports with 4 removable, self powered
Root mount waiting for: CAM
xhci0: Resetting controller
usb_alloc_device: device init 2 failed (USB_ERR_TIMEOUT, ignored)
ugen0.2: <Unknown > at usbus0 (disconnected)
uhub_reattach_port: could not allocate new device
Root mount waiting for: CAM
Mounting from ufs:/dev/gpt/RPi4Broot failed with error 2; retrying for 10 more seconds
usb_alloc_device: device init 2 failed (USB_ERR_TIMEOUT, ignored)
ugen0.2: <Unknown > at usbus0 (disconnected)
uhub_reattach_port: could not allocate new device
uhub0: at usbus0, port 1, addr 1 (disconnected)
uhub0: detached
uhub0 on usbus0
uhub0: <Generic XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus0
uhub0: 5 ports with 4 removable, self powered
xhci0: Resetting controller
usb_alloc_device: device init 2 failed (USB_ERR_TIMEOUT, ignored)
ugen0.2: <Unknown > at usbus0 (disconnected)
uhub_reattach_port: could not allocate new device
usb_alloc_device: device init 2 failed (USB_ERR_TIMEOUT, ignored)
ugen0.2: <Unknown > at usbus0 (disconnected)
uhub_reattach_port: could not allocate new device
uhub0: at usbus0, port 1, addr 1 (disconnected)
uhub0: detached
uhub0 on usbus0
uhub0: <Generic XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus0
uhub0: 5 ports with 4 removable, self powered
xhci0: Resetting controller
usb_alloc_device: device init 2 failed (USB_ERR_TIMEOUT, ignored)
ugen0.2: <Unknown > at usbus0 (disconnected)
uhub_reattach_port: could not allocate new device
usb_alloc_device: device init 2 failed (USB_ERR_TIMEOUT, ignored)
ugen0.2: <Unknown > at usbus0 (disconnected)
uhub_reattach_port: could not allocate new device
uhub0: at usbus0, port 1, addr 1 (disconnected)
uhub0: detached
uhub0 on usbus0
uhub0: <Generic XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus0
uhub0: 5 ports with 4 removable, self powered
xhci0: Resetting controller
usb_alloc_device: device init 2 failed (USB_ERR_TIMEOUT, ignored)
ugen0.2: <Unknown > at usbus0 (disconnected)
uhub_reattach_port: could not allocate new device
usb_alloc_device: device init 2 failed (USB_ERR_TIMEOUT, ignored)
ugen0.2: <Unknown > at usbus0 (disconnected)
uhub_reattach_port: could not allocate new device
uhub0: at usbus0, port 1, addr 1 (disconnected)
uhub0: detached
uhub0 on usbus0
uhub0: <Generic XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus0
uhub0: 5 ports with 4 removable, self powered
xhci0: Resetting controller
usb_alloc_device: device init 2 failed (USB_ERR_TIMEOUT, ignored)
ugen0.2: <Unknown > at usbus0 (disconnected)
uhub_reattach_port: could not allocate new device
Mounting from ufs:/dev/gpt/RPi4Broot failed with error 2.

Loader variables:
 vfs.root.mountfrom=ufs:/dev/gpt/RPi4Broot

Manual root filesystem specification:
 <fstype>:<device> [options]
     Mount <device> using filesystem <fstype>
     and with the specified (optional) option list.

   eg. ufs:/dev/da0s1a
       zfs:zroot/ROOT/default
       cd9660:/dev/cd0 ro
         (which is equivalent to: mount -t cd9660 -o ro /dev/cd0 /)

 ?               List valid disk boot devices
 .               Yield 1 second (for background tasks)
 <empty line>    Abort manual input

mountroot> usb_alloc_device: device init 2 failed (USB_ERR_TIMEOUT, ignored)
ugen0.2: <Unknown > at usbus0 (disconnected)
uhub_reattach_port: could not allocate new device
uhub0: at usbus0, port 1, addr 1 (disconnected)
uhub0: detached
uhub0 on usbus0
uhub0: <Generic XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus0
uhub0: 5 ports with 4 removable, self powered
. . .

(It keeps going.)

For reference the successful boot (cortex-a53 tuning) reports:

ugen0.1: <Generic XHCI root HUB> at usbus0
ugen1.1: <DWCOTG OTG Root HUB> at usbus1
uhub0 on usbus0
uhub1 on usbus1
uhub0: <Generic XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus0
uhub1: <DWCOTG OTG Root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus1
uhub1: 1 port with 1 removable, self powered
uhub0: 5 ports with 4 removable, self powered
Root mount waiting for: usbus0 CAM
xhci0: Resetting controller
usb_alloc_device: device init 2 failed (USB_ERR_TIMEOUT, ignored)
ugen0.2: <Unknown > at usbus0 (disconnected)
uhub_reattach_port: could not allocate new device
Root mount waiting for: usbus0 CAM
usb_alloc_device: device init 2 failed (USB_ERR_TIMEOUT, ignored)
ugen0.2: <Unknown > at usbus0 (disconnected)
uhub_reattach_port: could not allocate new device
uhub0: at usbus0, port 1, addr 1 (disconnected)
uhub0: detached
uhub0 on usbus0
uhub0: <Generic XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus0
uhub0: 5 ports with 4 removable, self powered
Root mount waiting for: CAM
ugen0.2: <vendor 0x2109 USB2.0 Hub> at usbus0
uhub2 on uhub0
uhub2: <vendor 0x2109 USB2.0 Hub, class 9/0, rev 2.10/4.21, addr 1> on usbus0
uhub2: 4 ports with 4 removable, self powered
Root mount waiting for: CAM
Root mount waiting for: CAM
ugen0.3: <OWC Envoy Pro mini> at usbus0
umass0 on uhub0
umass0: <OWC Envoy Pro mini, class 0/0, rev 3.00/1.00, addr 2> on usbus0
umass0:  SCSI over Bulk-Only; quirks = 0x0100
umass0:0:0: Attached to scbus0
Root mount waiting for: CAM
Root mount waiting for: CAM
Root mount waiting for: CAM
da0 at umass-sim0 bus 0 scbus0 target 0 lun 0
da0: <OWC Envoy Pro mini 0> Fixed Direct Access SPC-4 SCSI device
da0: Serial Number 000000000011
da0: 400.000MB/s transfers
da0: 228936MB (468862128 512 byte sectors)
da0: quirks=0x2<NO_6_BYTE>
Setting hostuuid: a4f7fbeb-f668-11de-b280-ebb65474e619.
Setting hostid: 0xcd8e9e25.
Starting file system checks:
/dev/gpt/RPi4Broot: FILE SYSTEM CLEAN; SKIPPING CHECKS
. . .


head -r363590 was in use for the experiments.

I only noticed this issue when I stopped using the same
media to boot a Rock64 and the RPi4B: The Rock64 is a
cortex-a53 context and I'd been using only that kind of
kernel build until I made separate RPi4B media. I
reverted to building with cortex-a53 for the RPi4B on
the separate media in order to have it operational.
Comment 127 bawdyanarchist@disroot.org 2020-09-10 20:28:31 UTC
Hi, I want to report having duplicated this exact same error on BHYVE virtual machine with PCI passthrough for the USB.

HARDWARE:  TRX40 Tachi with Ryzen 3960x (amd64)
HOST SYSTEM:  12.1-RELEASE-p8
BHYVE Versions tried:  11.4-RELEASE, 12.1-RELEASE, 12.1-STABLE, 13.0-CURRENT

BEHAVIOR:  
Without PCI passthrough, the host xhci controller operates normally with no errors.  After rebooting host, the bhyve VM can be started, only once, with no USB xhci errors.

However, if the VM is power cycled without cycling the host, I get the same sequence of errors reported by Mark, in a continuous loop, and inserted USB devices are undetected. 

Although the screen is impractical to use due to error messages, it is possible to login.  I can SSH into the VM and otherwise use it normally.  After 3-5 minutes, the loop stops, the VM detects devices, and I can access them.

SPECULATION
Problems with PCI reset?  Perhaps my hardware doesn't support reset or re-initialization of the device without reseting all devices?
Comment 128 Mark Millard 2020-09-14 03:41:59 UTC
(In reply to bawdyanarchist@disroot.org from comment #127)

I'm not sure how your speculation would fit with Cortex-A72 vs.
Cortex-A53 making a difference in my context. One area where
A72 vs. A53 might have a relevant difference is:

Cortex-A72 has speculative execution
Cortex-A53 does not have speculative execution

The Cortex-A72 needs to have some types cache content explicitly
managed in places where speculative execution can have cache
consequences visible if not handled correctly. I've seen code
patched for what was reported to be such an issue for code that
was starting to support A72's. (Not necessarily a FreeBSD
context, just general looking around.)

It leads me to wonder if something is not being properly forced
to be up to date when used. Differing code sequences from tuning
for different targets might have differing properties for such
at some places in the code.

Outside the A72 vs. 53 context, some machines might have similar
variations in behavior, leading some to show problems and others
not.

But, I've still no solid evidence of a specific type of error in
any specific place. The above is just an example of speculation
on my part.
Comment 129 Gerard Seibert 2020-09-14 11:18:16 UTC
(In reply to Mark Millard from comment #128)

As far as I can ascertain, this problem is localized to FreeBSD. I have friends that use other operating systems like Linux on equipment similar to mine that are not experiencing this phenomenon. I am no genius, but perhaps it might be worth investigating what other operating systems are doing or not doing that might be causing this problem. Furthermore, since this problem does not exist in FreeBSD versions prior to 12.x, I would have thought that it would be somewhat easier to find the culprit in this matter. I am not trying to infer that the solution is axiomatic; however, I do find it rather odd that it has persisted for such an extended time and now is evident in at least two versions of FreeBSD -- 12.x and 13.x.

And in the immortal words of Elvira Hancock's advice to Tony Montana, "Don't get high on your own supply."
Comment 130 bawdyanarchist@disroot.org 2020-09-14 23:50:57 UTC
(In reply to Mark Millard from comment #128)

I see your point about the ARM devices.  If I read correctly, this is a problem you didn't have until upgrading to 12.1.  Could be that we have different root causes.  On my Qubes machine, I have to set an option for pci "no-strict-reset" when starting the USB VM, and is quite common with x86 architecture.  

At any rate, I just wanted to add my 0.02 just in case it helps devs troubleshoot.
Comment 131 Mark Millard 2020-09-15 00:36:14 UTC
(In reply to bawdyanarchist@disroot.org from comment #130)

Just to avoid misinterpretations of what I've reported:

I run head/CURRENT, not 12.x . It was not an FreeBSD version
upgrade context at all.

What I did that got the problem in an cortex-a72 context
was to build and install the kernel based on -mcpu=cortex-a72
instead of -mcpu=cortex-a53 (same arch, different tunings).
This was without changing versions (still head -r363590 based).
What I did that got rid of the problem was to revert to
my -mcpu=cortex-a53 kernel build (also used on an cortex-a53
system).

So the -mcpu=??? use was sufficient to vary/control the
behavior, other things unchanged.
Comment 132 Mark Millard 2020-09-15 05:12:04 UTC
(In reply to Mark Millard from comment #126, #128, #131)

Interesting finding? . . .

I've tested putting the -mcpu=cortex-a72 based kernel
on the cortext-a53 system and . . .

The result boots and operates fine. (It is also a USB3
SSD based root file system.)

But the xhci context (from the cortext-a53 system) is:

# dmesg -a | grep -i xhci
xhci0: <Synopsys Designware DWC3> mem 0xff600000-0xff6fffff irq 57 on rk_dwc30
xhci0: 64 bytes context size, 32-bit DMA
usbus0 on xhci0
ugen0.1: <Synopsys XHCI root HUB> at usbus0
uhub0: <Synopsys XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus0

Instead of (from the cortext-a53 kernel on the cortext-a72 system):

# dmesg -a | grep -i xhci | more
xhci0: <Generic USB 3.0 controller> iomem 0x600000000-0x600000fff irq 0 on acpi0
xhci0: 32 bytes context size, 32-bit DMA
usbus0 on xhci0
ugen0.1: <Generic XHCI root HUB> at usbus0
uhub0: <Generic XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus0
xhci0: Resetting controller
uhub0: <Generic XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus0

So more than the cpu type and its properties provide distinctions.
Comment 133 Mark Millard 2020-09-16 06:12:20 UTC
The message:

xhci0: Resetting controller

comes via code in xhci_do_command that has the following comment:

                /*
                 * After some weeks of continuous operation, it has
                 * been observed that the ASMedia Technology, ASM1042
                 * SuperSpeed USB Host Controller can suddenly stop
                 * accepting commands via the command queue. Try to
                 * first reset the command queue. If that fails do a
                 * host controller reset.
                 */

The code involves the only call to usb_bus_reset_async_locked
(in an else) --and that routine produces the message. The special
code for the ASM1042 seems to be the source of the host controller
reset operation sequences in what I and Bjoern reported. (This code
does not check if the device involved is a ASM1042.)

In what I report in comment #126 the cortex-A72 system always gets
just one of those messages for the -mcpu=cortex-a53 based kernel
--but for the -mcpu=cortex-a72 based kernel the A72 overall reruns
the host controller reset indefinitely.

May be the above observation can help narrow down what needs to be
looked at for at least some of what is being reported.
Comment 134 Mark Millard 2020-09-18 05:04:35 UTC
(In reply to Mark Millard from comment #133)

The code in xhci_do_command does a doorbell/mailbox style:

        XWRITE4(sc, door, XHCI_DOORBELL(0), 0);

which seems to get to:

ffff0000007c4b60 <generic_bs_w_4>     str   w3, [x1, x2]
ffff0000007c4b64 <generic_bs_w_4+0x4> ret

But I have not been able to track down there being the
apparently-architecturally-required DSB ST (or DSB)
after prior memory updates but before the write to the
doorbell/mailbox location (execution order for after
then before).

I'm using Barrier_Litmus_Tests_and_Cookbook_A08.pdf for
reference, section 8.1. But I'm not an aarch64 expert
so my interpretation of the material could be at issue.
Comment 135 Mark Millard 2020-09-19 02:51:11 UTC
(In reply to Mark Millard from comment #134)

A cortex-a72 success! (In the form of investigatory code, not
code to check-in as is.)

Just by adding some "dsb st" commands and a "dsb ld" command in
places in the xhci code (just for __arach64__), I've gotten the
cortext-A72 to boot from the USB3 SSD via a -mcpu=cortex-a72
based kernel build. No more indefinitely repeating "Resetting
controller" problem.

I do not claim the additions are the minimal ones. I know one place
is required for sure: the last one added enabled the boot (given
the others were already present). Prior to that I still had the
indefinitely repeating "Resetting controller" problem.

There could be more places that should have dsb st or dsb ld for
overall correctness.

This evidence may suggest somewhat analogous problems for other
platforms than aarch64 that are seeing problems.

For the aarch64 context, the evidence is (the changes are)
as follows. The first "dsb st" textually is the last one I
added.

# svnlite diff /usr/src/sys/dev/usb/controller/xhci.c
Index: /usr/src/sys/dev/usb/controller/xhci.c
===================================================================
--- /usr/src/sys/dev/usb/controller/xhci.c	(revision 363590)
+++ /usr/src/sys/dev/usb/controller/xhci.c	(working copy)
@@ -431,6 +431,9 @@
 
 	phwr->hwr_ring_seg[0].qwEvrsTablePtr = htole64(addr);
 	phwr->hwr_ring_seg[0].dwEvrsTableSize = htole32(XHCI_MAX_EVENTS);
+#ifdef __aarch64__
+__asm __volatile("dsb st" : : : "memory");
+#endif
 
 	DPRINTF("ERDP(0)=0x%016llx\n", (unsigned long long)addr);
 
@@ -1098,6 +1101,9 @@
 
 	while (1) {
 
+#ifdef __aarch64__
+__asm __volatile("dsb ld" : : : "memory");
+#endif
 		temp = le32toh(phwr->hwr_events[i].dwTrb3);
 
 		k = (temp & XHCI_TRB_3_CYCLE_BIT) ? 1 : 0;
@@ -1107,6 +1113,9 @@
 
 		event = XHCI_TRB_3_TYPE_GET(temp);
 
+#ifdef __aarch64__
+__asm __volatile("dsb ld" : : : "memory");
+#endif
 		DPRINTFN(10, "event[%u] = %u (0x%016llx 0x%08lx 0x%08lx)\n",
 		    i, event, (long long)le64toh(phwr->hwr_events[i].qwTrb0),
 		    (long)le32toh(phwr->hwr_events[i].dwTrb2),
@@ -1239,6 +1248,9 @@
 	sc->sc_command_idx = i;
 	sc->sc_command_ccs = j;
 
+#ifdef __aarch64__
+__asm __volatile("dsb st" : : : "memory");
+#endif
 	XWRITE4(sc, door, XHCI_DOORBELL(0), 0);
 
 	err = cv_timedwait(&sc->sc_cmd_cv, &sc->sc_bus.bus_mtx,
@@ -2855,6 +2867,9 @@
 	index = xfer->xroot->udev->controller_slot_id;
 
 	if (xfer->xroot->udev->flags.self_suspended == 0) {
+#ifdef __aarch64__
+__asm __volatile("dsb st" : : : "memory");
+#endif
 		XWRITE4(sc, door, XHCI_DOORBELL(index),
 		    epno | XHCI_DB_SID_SET(xfer->stream_id));
 	}
@@ -4180,6 +4195,9 @@
 
 	for (n = 1; n != XHCI_MAX_ENDPOINTS; n++) {
 		for (p = 0; p != XHCI_MAX_STREAMS; p++) {
+#ifdef __aarch64__
+__asm __volatile("dsb st" : : : "memory");
+#endif
 			XWRITE4(sc, door, XHCI_DOORBELL(index),
 			    n | XHCI_DB_SID_SET(p));
 		}


I'm clearly just "evidence hacking" here. I've no clue what a
good design for allowing aarch64 build to supply having any of
those "dsb st"s or the "dsb ld".

Hopefully someone that knows what they are doing can figure out
if any of the other examples on other platforms are analogous
--and ,if they are, provide some hook for platform to contribute
such code to the xhci code.
Comment 136 Hans Petter Selasky freebsd_committer 2020-09-19 08:04:53 UTC
Mark,

Your finding indicate a problem in usb_pc_cpu_flush() and 

bus_dmamap_sync(pc->tag, pc->map, BUS_DMASYNC_PREWRITE);

Try to put the dsb only after dmamap_sync.

void
usb_pc_cpu_flush(struct usb_page_cache *pc)
{
        if (pc->page_offset_end == pc->page_offset_buf) {
                /* nothing has been loaded into this page cache! */
                return;
        }
        bus_dmamap_sync(pc->tag, pc->map, BUS_DMASYNC_PREWRITE);
}

The PCI I/O memory should be coherent and should not need any DSB's.

--HPS
Comment 137 commit-hook freebsd_committer 2020-09-19 22:38:07 UTC
A commit references this bug:

Author: hselasky
Date: Sat Sep 19 22:37:46 UTC 2020
New revision: 365918
URL: https://svnweb.freebsd.org/changeset/base/365918

Log:
  Fix for use of the XHCI driver on Cortex-A72 by adding a missing cache
  flush operation before writing to the XHCI_ERSTBA_LO/HI register(s).

  PR:		237666
  Discussed with:	Mark Millard <marklmi@yahoo.com>
  MFC after:	1 week
  Sponsored by:	Mellanox Technologies // Nvidia

Changes:
  head/sys/dev/usb/controller/xhci.c
Comment 138 Mark Millard 2020-09-19 22:47:25 UTC
(In reply to commit-hook from comment #137)

Folks with other platforms that show problems under this
bugzilla id (237666) may want to check if the update removes
their problems in this area. I had only the one example type
of context that showed behavior that seemed to fit with
237666. So my testing is limited compared to the variety of
reports that are listed in the comments.
Comment 139 Mike Karels freebsd_committer 2020-09-20 14:56:50 UTC
(In reply to Mark Millard from comment #138)
The change in xhci.c does not fix the repeating "device vanished" messages on my amd64; tested with releng/12.1 plus this fix.  I suspect that Mark tripped over a different bug on arm with similar symptoms (but not quite the same).
Comment 140 Mark Millard 2020-09-20 20:47:28 UTC
(In reply to Mike Karels from comment #139)

Drat. Well, at least Bjoern A. Zeeb's comment #125 might also
be covered by the patch involved for changeset/base/365918 ,
given how closely the console log contents match to what I
reported in response to his report.

Hopefully at some point Bjoern can reported if he sees a
behavior change and if it is an improvement for whatever
type of system #125 was reporting about.
Comment 141 Bjoern A. Zeeb freebsd_committer 2020-09-20 21:00:25 UTC
(In reply to Mark Millard from comment #140)

"ugen0.1: <0x8086 XHCI root HUB> at usbus0 (disconnected)"

is a huge clue that it wasn't arm64 ;-)  It was on an Intel laptop with external USB-C multi-port adapter.
Comment 142 Mark Millard 2020-09-20 21:57:01 UTC
(In reply to Bjoern A. Zeeb from comment #141)

Good to know.

If you repeat the #125 experiment based on the patch,
does it still end up stuck in the repeating sequence
that involves "Resetting controller"? (It seems that
is the only aspect likely to have been changed.) Or
does that part now do something different?
Comment 143 commit-hook freebsd_committer 2020-09-26 08:45:15 UTC
A commit references this bug:

Author: hselasky
Date: Sat Sep 26 08:44:16 UTC 2020
New revision: 366182
URL: https://svnweb.freebsd.org/changeset/base/366182

Log:
  MFC r365918:
  Fix for use of the XHCI driver on Cortex-A72 by adding a missing cache
  flush operation before writing to the XHCI_ERSTBA_LO/HI register(s).

  PR:		237666
  Discussed with:	Mark Millard <marklmi@yahoo.com>
  Sponsored by:	Mellanox Technologies // Nvidia

Changes:
_U  stable/12/
  stable/12/sys/dev/usb/controller/xhci.c
Comment 144 commit-hook freebsd_committer 2020-09-26 08:45:21 UTC
A commit references this bug:

Author: hselasky
Date: Sat Sep 26 08:45:09 UTC 2020
New revision: 366183
URL: https://svnweb.freebsd.org/changeset/base/366183

Log:
  MFC r365918:
  Fix for use of the XHCI driver on Cortex-A72 by adding a missing cache
  flush operation before writing to the XHCI_ERSTBA_LO/HI register(s).

  PR:		237666
  Discussed with:	Mark Millard <marklmi@yahoo.com>
  Sponsored by:	Mellanox Technologies // Nvidia

Changes:
_U  stable/11/
  stable/11/sys/dev/usb/controller/xhci.c
Comment 145 Gerard Seibert 2020-10-27 14:30:36 UTC
I admit to not being the sharpest knife in the drawer, so bear with me here.

1) Will the fix mentioned above correct the problem on a Dell XPS 8930 PC? I have never been able to run anything newer than 11.x.

2) Is/will this fix be incorporated into the FreeBSD 12.2-RELEASE?

Thanks!
Comment 146 Mark Millard 2020-10-27 16:56:16 UTC
(In reply to Gerard Seibert from comment #145)

Comment 139 makes it seem unlikely that the fix above to my context
would be the fix to yours or most of the reports here. (No one, besides
me, commented that there context was fixed and 139 comments that the
context involved was not fixed.)

https://svnweb.freebsd.org/base/releng/12.2/sys/dev/usb/controller/xhci.c?view=log stops at -r365618 from 2020-Sep-10, which is before -r366182 from 2020-Sep-26 .
So currently it does not look like 12.2 is including what fixed my context.