Bug 213713 - xhci and ehci interrupt storms
Summary: xhci and ehci interrupt storms
Status: Closed Not A Bug
Alias: None
Product: Base System
Classification: Unclassified
Component: usb (show other bugs)
Version: CURRENT
Hardware: amd64 Any
: --- Affects Only Me
Assignee: freebsd-usb (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-10-23 03:47 UTC by Dustin Marquess
Modified: 2016-10-31 04:40 UTC (History)
2 users (show)

See Also:


Attachments
output from dmesg, vmstat -i, usbconfig, and debug sysctls (92.12 KB, text/plain)
2016-10-23 03:47 UTC, Dustin Marquess
no flags Details
output from debug=32 (202.62 KB, text/plain)
2016-10-24 02:56 UTC, Dustin Marquess
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Dustin Marquess 2016-10-23 03:47:44 UTC
Created attachment 176069 [details]
output from dmesg, vmstat -i, usbconfig, and debug sysctls

I have 2 Lenovo ThinkServer RD450 machines that are pretty much identical:

Intel Haswell Xeon E5-2630 CPUs
2 SD cards in internal SD/USB adapters in a ZFS mirror for booting
6 4TB Hitachi SAS drives (ZFS "RAID-10")
2 200GB Seagate SAS SSDs (1 SLOG, 1 L2ARCH)

These used to work perfectly probably around 10 months ago using then -CURRENT (which was 11.x before the first BETAs/RCs).  Since then they've pretty much been powered off and I'm trying to revive them.  So far I've tried both 11-STABLE and 12-CURRENT. Under both of them, sometimes booting dies as the USB bus the SD adapters are on time out, which keeps the kernel from finding the boot devices.  Other times everything works, however both the xhci0 and ehci0 interrupts receive over ~250k interrupts/second.  If I disable XHCI in the UEFI firmware, the xhci0 interrupts obviously stop, but the ehci0 ones are unchanged.  As I said, I don't recall this happening before, so it's either a change that happened in the kernel, or it's something introduced by an UEFI update (I'm on the latest that was released about a month ago).

UEFI settings that might be relevant:

PCI/PCIE Settings

SR-IOV Tech Support - Enabled
ARI Support - Enabled
Above 4GB Decoding - Enabled
ASPM Support - Auto

USB Settings

Legacy USB Support - Disabled
Port 60/64 Emulation - Disabled
XHCI - Enabled

Miscellaneous Settings

X2APIC - Enabled (toggling this doesn't seem to change the issue)

dmesg, 'vmstat -i', usbconfig, and output from setting hw.usb.debug, hw.usb.ehci.debug, and hw.usb.xhci.debug are attached.

I got this all off of the 11-STABLE host booted off LiveUSB, so it will have an extra device listed.  The 12-CURRENT host *NOT* booted off of LiveUSB (so without the extra device) has the same issues.
Comment 1 Hans Petter Selasky freebsd_committer freebsd_triage 2016-10-23 07:11:42 UTC
Can you set xhci.debug=32 and ehci.debug=32

That will reveal the interrupts that are happening.

There hasn't been any IRQ related changes recently in the USB code, so I suspect this is something more generic.

--HPS
Comment 2 Dustin Marquess 2016-10-24 02:56:46 UTC
Created attachment 176090 [details]
output from debug=32
Comment 3 Dustin Marquess 2016-10-24 05:28:55 UTC
I'm really starting to think that this is a Lenovo bug now.  On a hunch I booted a live USB of "that other OS" and sure enough, it has the same issue!
Comment 4 Hans Petter Selasky freebsd_committer freebsd_triage 2016-10-24 06:37:36 UTC
Hi,

From what I can see, something is generating the fake USB interrupts. There is no real reason for USB to interrupt that frequently.

--HPS
Comment 5 Hans Petter Selasky freebsd_committer freebsd_triage 2016-10-24 06:38:46 UTC
Sometimes this can happend because the BIOS was using a PCI device and didn't clear the IRQ's properly before exiting. Typically the VGA adapter.
Comment 6 Dustin Marquess 2016-10-24 08:22:06 UTC
There's no monitor nor keyboard attached. Access is normally done using Serial over IPMI, however the system does have a built-in IPKVM in the BMC that I assume attached over USB.

I downgraded the BIOS as far as it'll let me, and it still happens.  I went to go downgrade the BMC, but they don't seem to post older releases of firmware for that. Sounds like I'll be opening a ticket with them!

Thanks for the help.
Comment 7 Dustin Marquess 2016-10-31 04:40:45 UTC
Okay, for some reason this seems to be caused by the Chelsio boot ROM.  Not sure how, but disabling that fixes it.  Sorry for the spam!