Bug 156596 - [ehci] Extremely high interrupt rate on ehci/uhci IRQ16 80% cpu utilization on CPU0
Summary: [ehci] Extremely high interrupt rate on ehci/uhci IRQ16 80% cpu utilization o...
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: usb (show other bugs)
Version: 8.2-STABLE
Hardware: Any Any
: Normal Affects Only Me
Assignee: Hans Petter Selasky
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-04-23 13:00 UTC by Dayne Jordan
Modified: 2015-06-17 12:44 UTC (History)
8 users (show)

See Also:


Attachments
message.log (67.44 KB, text/plain)
2011-07-08 21:37 UTC, Colin Percival
no flags Details
ehci.c.diff (294 bytes, patch)
2011-07-09 01:26 UTC, Colin Percival
no flags Details | Diff
ehci_fix_terminate.patch (1.20 KB, patch)
2011-07-09 10:48 UTC, Hans Petter Selasky
no flags Details | Diff
dmesg.txt (10.85 KB, text/plain; charset=US-ASCII)
2013-06-05 16:58 UTC, nabeken
no flags Details
UHCI patch (373 bytes, patch)
2015-02-05 09:00 UTC, Hans Petter Selasky
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Dayne Jordan 2011-04-23 13:00:20 UTC
Intel DH55C board
Intel i5 650 single processor
4GB Ram
3ware RAID-1 Controller
No mobo attached drives
No USB attached devices
Headless machine

FreeBSD 8.2-Stable amd64 GENERIC, unmodified kernel build.

Noticed load average of .80 .80 .80 even when all user daemons and internet services are shut down, and nothing else running but default basic things.

A quick check of vmstat -i shows extremely high interrupt rate on irq16:
server2# vmstat -i
interrupt                          total       rate
irq1: atkbd0                         680          0
irq16: ehci0 uhci4+          25976006936      87649  <-----****
cpu0: timer                    592700901       1999
irq256: em0                     17240082         58
cpu1: timer                    592700711       1999
cpu3: timer                    592700711       1999
cpu2: timer                    592700711       1999
Total                        28364050732      95707
server2#

Output from top -PS:
last pid: 23203;  load averages:  0.80,  0.80,  0.79                                                        up 3+10:38:16  07:34:40
151 processes: 6 running, 127 sleeping, 18 waiting
CPU 0:  0.0% user,  0.0% nice,  0.0% system, 80.5% interrupt, 19.5% idle
CPU 1:  0.0% user,  0.0% nice,  0.0% system,  2.6% interrupt, 97.4% idle
CPU 2:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
CPU 3:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
Mem: 283M Active, 2444M Inact, 651M Wired, 70M Cache, 398M Buf, 292M Free
Swap: 4096M Total, 0K Used, 4096M Free

CPU0 is always busy doing....nothing.

Processor output from dmesg.boot:
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Intel(R) Core(TM) i5 CPU         650  @ 3.20GHz (3192.02-MHz K8-class CPU)
  Origin = "GenuineIntel"  Id = 0x20655  Family = 6  Model = 25  Stepping = 5

Motherboard info:
ACPI APIC Table: <INTEL DH55HC>

A quick check thru dmesg.boot for irq16:
server2# grep -i "irq 16" /var/run/dmesg.boot
vgapci0: <VGA-compatible display> port 0xf220-0xf227 mem 0xfe000000-0xfe3fffff,0xd0000000-0xdfffffff irq 16 at device 2.0 on pci0
ehci0: <Intel PCH USB 2.0 controller USB-B> mem 0xfe626000-0xfe6263ff irq 16 at device 26.7 on pci0
atapci1: <SiI 3132 SATA300 controller> port 0xe000-0xe07f mem 0xfe584000-0xfe58407f,0xfe580000-0xfe583fff irq 16 at device 0.0 on pci1
uhci4: <UHCI (generic) USB controller> port 0xf040-0xf05f irq 16 at device 29.1 on pci0
vgapci0: <VGA-compatible display> port 0xf220-0xf227 mem 0xfe000000-0xfe3fffff,0xd0000000-0xdfffffff irq 16 at device 2.0 on pci0
ehci0: <Intel PCH USB 2.0 controller USB-B> mem 0xfe626000-0xfe6263ff irq 16 at device 26.7 on pci0
atapci1: <SiI 3132 SATA300 controller> port 0xe000-0xe07f mem 0xfe584000-0xfe58407f,0xfe580000-0xfe583fff irq 16 at device 0.0 on pci1
uhci4: <UHCI (generic) USB controller> port 0xf040-0xf05f irq 16 at device 29.1 on pci0
server2#

Output of pciconf -l:
server2# pciconf -l
hostb0@pci0:0:0:0:      class=0x060000 card=0x00378086 chip=0x00408086 rev=0x02 hdr=0x00
vgapci0@pci0:0:2:0:     class=0x030000 card=0x00378086 chip=0x00428086 rev=0x02 hdr=0x00
none0@pci0:0:22:0:      class=0x078000 card=0x00378086 chip=0x3b648086 rev=0x06 hdr=0x00
atapci0@pci0:0:22:2:    class=0x010185 card=0x00378086 chip=0x3b668086 rev=0x06 hdr=0x00
none1@pci0:0:22:3:      class=0x070002 card=0x00378086 chip=0x3b678086 rev=0x06 hdr=0x00
em0@pci0:0:25:0:        class=0x020000 card=0x00378086 chip=0x10f08086 rev=0x06 hdr=0x00
uhci0@pci0:0:26:0:      class=0x0c0300 card=0x00378086 chip=0x3b3b8086 rev=0x06 hdr=0x00
uhci1@pci0:0:26:1:      class=0x0c0300 card=0x00378086 chip=0x3b3e8086 rev=0x06 hdr=0x00
uhci2@pci0:0:26:2:      class=0x0c0300 card=0x00378086 chip=0x3b3f8086 rev=0x06 hdr=0x00
ehci0@pci0:0:26:7:      class=0x0c0320 card=0x00378086 chip=0x3b3c8086 rev=0x06 hdr=0x00
none2@pci0:0:27:0:      class=0x040300 card=0x00378086 chip=0x3b568086 rev=0x06 hdr=0x00
pcib1@pci0:0:28:0:      class=0x060400 card=0x00378086 chip=0x3b428086 rev=0x06 hdr=0x01
pcib2@pci0:0:28:4:      class=0x060400 card=0x00378086 chip=0x3b4a8086 rev=0x06 hdr=0x01
uhci3@pci0:0:29:0:      class=0x0c0300 card=0x00378086 chip=0x3b368086 rev=0x06 hdr=0x00
uhci4@pci0:0:29:1:      class=0x0c0300 card=0x00378086 chip=0x3b378086 rev=0x06 hdr=0x00
uhci5@pci0:0:29:2:      class=0x0c0300 card=0x00378086 chip=0x3b388086 rev=0x06 hdr=0x00
ehci1@pci0:0:29:7:      class=0x0c0320 card=0x00378086 chip=0x3b348086 rev=0x06 hdr=0x00
pcib3@pci0:0:30:0:      class=0x060401 card=0x00378086 chip=0x244e8086 rev=0xa6 hdr=0x01
isab0@pci0:0:31:0:      class=0x060100 card=0x00378086 chip=0x3b068086 rev=0x06 hdr=0x00
atapci2@pci0:0:31:2:    class=0x01018f card=0x00378086 chip=0x3b208086 rev=0x06 hdr=0x00
none3@pci0:0:31:3:      class=0x0c0500 card=0x00378086 chip=0x3b308086 rev=0x06 hdr=0x00
atapci3@pci0:0:31:5:    class=0x010185 card=0x00378086 chip=0x3b268086 rev=0x06 hdr=0x00
atapci1@pci0:1:0:0:     class=0x010401 card=0x02429005 chip=0x02421095 rev=0x01 hdr=0x00
em1@pci0:3:0:0: class=0x020000 card=0x13768086 chip=0x107c8086 rev=0x05 hdr=0x00
server2#

I hope that I have provided enough initial information.
Thank you in advance.
Comment 1 Mark Linimon freebsd_committer freebsd_triage 2011-04-24 04:15:01 UTC
Responsible Changed
From-To: freebsd-amd64->freebsd-usb

reclassify.
Comment 2 Hans Petter Selasky 2011-04-24 09:26:52 UTC
Hi,

Compile a kernel with:

options USB_DEBUG

Then enable after boot:

sysctl hw.usb.ehci.debug=15 ; sleep 1; sysctl hw.usb.ehci.debug=0

sysctl hw.usb.uhci.debug=15 ; sleep 1; sysctl hw.usb.uhci.debug=0


The high interrupt rate might be caused by a non-cleared IRQ.

Else it is an ACPI issue.

--HPS
Comment 3 Colin Percival freebsd_committer freebsd_triage 2011-07-08 21:37:45 UTC
Hi,

I'm seeing this ehci interrupt storm with 8.2-RELEASE on my Dell E5420 laptop;
vmstat -i reports 187300 interrupts per second from ehci.

I've attached my /var/log/messages showing the boot messages and 1 seconds worth
of hw.usb.ehci.debug=15 output as suggested.  Hopefully it means more to you
than it does to me...

-- 
Colin Percival
Security Officer, FreeBSD | freebsd.org | The power to serve
Founder / author, Tarsnap | tarsnap.com | Online backups for the truly paranoid
Comment 4 Colin Percival freebsd_committer freebsd_triage 2011-07-09 01:26:37 UTC
Hi again,

The attached patch seems to fix the problem while not breaking anything on my
laptop.  It makes handles the case of EHCI_REMOVE_QH(sqh, last) with sqh == last
and sqh->prev == NULL by setting last = NULL -- the idea being that if sqh is
the only entry in the queue, we still ought to delete it even though it doesn't
have a predecessor.

I have no idea if this is correct, since I don't know the rest of the code in
this file.  Maybe we should also delete the first entry in a queue even when it
isn't also the last entry; maybe the problem lies somewhere else entirely.  I'm
hoping that someone who knows this code will be able to figure out the right
fix easily now that I have an apparently-working hack.

-- 
Colin Percival
Security Officer, FreeBSD | freebsd.org | The power to serve
Founder / author, Tarsnap | tarsnap.com | Online backups for the truly paranoid
Comment 5 Hans Petter Selasky 2011-07-09 10:07:16 UTC
On Friday 08 July 2011 22:37:45 Colin Percival wrote:
> Hi,
> 
> I'm seeing this ehci interrupt storm with 8.2-RELEASE on my Dell E5420
> laptop; vmstat -i reports 187300 interrupts per second from ehci.
> 
> I've attached my /var/log/messages showing the boot messages and 1 seconds
> worth of hw.usb.ehci.debug=15 output as suggested.  Hopefully it means
> more to you than it does to me...

Have you tried to set any of these quirks:

hw.usb.ehci.lostintrbug: 0
hw.usb.ehci.iaadbug: 0

--HPS
Comment 6 Hans Petter Selasky 2011-07-09 10:33:40 UTC
On Saturday 09 July 2011 02:26:37 Colin Percival wrote:
> Hi again,
> 
> The attached patch seems to fix the problem while not breaking anything on
> my laptop.  It makes handles the case of EHCI_REMOVE_QH(sqh, last) with
> sqh == last and sqh->prev == NULL by setting last = NULL -- the idea being
> that if sqh is the only entry in the queue, we still ought to delete it
> even though it doesn't have a predecessor.
> 
> I have no idea if this is correct, since I don't know the rest of the code
> in this file.  Maybe we should also delete the first entry in a queue even
> when it isn't also the last entry; maybe the problem lies somewhere else
> entirely.  I'm hoping that someone who knows this code will be able to
> figure out the right fix easily now that I have an apparently-working
> hack.

Hi,

The last element should always be there. If it is NULL I would like to see a 
backtrace from that. KASSERT(last != NULL) in both add and remove. If you 
clear that the USB queue will stop working I believe. Anyway, I think your 
clue might give some hints what is wrong. It appears that the EHCI is spinning 
on the IDLE queue somehow producing a lot of IRQ's.

--HPS
Comment 7 Hans Petter Selasky 2011-07-09 10:48:29 UTC
On Friday 08 July 2011 22:37:45 Colin Percival wrote:
> Hi,
> 
> I'm seeing this ehci interrupt storm with 8.2-RELEASE on my Dell E5420
> laptop; vmstat -i reports 187300 interrupts per second from ehci.
> 
> I've attached my /var/log/messages showing the boot messages and 1 seconds
> worth of hw.usb.ehci.debug=15 output as suggested.  Hopefully it means
> more to you than it does to me...

Try the attached patch and report back. The EHCI should never generate more 
than 8000 IRQ/s second, so this clearly indicates a Hardware Problem.

--HPS
Comment 8 Colin Percival freebsd_committer freebsd_triage 2011-07-11 02:30:22 UTC
On 07/09/11 02:07, Hans Petter Selasky wrote:
> Have you tried to set any of these quirks:
> 
> hw.usb.ehci.lostintrbug: 0
> hw.usb.ehci.iaadbug: 0

Yes.  Neither has any effect.  (At least, not when I set them via sysctl; I
could try setting them as loader tunables if you think that would make a
difference.)

I wrote:
> The attached patch seems to fix the problem while not breaking anything on
> my laptop.

Key words, "seems to".  On further use I've found that my change didn't make
any difference; it was just a coincidence that the problem temporarily went
away at that time.

However, I've managed (I think!) to figure out what's triggering this: The
IRQ flood starts when the laptop battery is recharging.  Letting the battery
run down for a couple hours and then plugging in AC power has 100% consistently
triggered this; but if the battery is already fully charged when FreeBSD boots
I don't get the IRQ flood.

Hans Petter Selasky wrote:
> Try the attached patch and report back. The EHCI should never generate more 
> than 8000 IRQ/s second, so this clearly indicates a Hardware Problem.

No change, I'm afraid.

-- 
Colin Percival
Security Officer, FreeBSD | freebsd.org | The power to serve
Founder / author, Tarsnap | tarsnap.com | Online backups for the truly paranoid
Comment 9 andrew 2013-04-24 11:33:02 UTC
Based on the fact this only seems to happen on Intel Core architecture
machines, I think this issue may be miscategorized.

In my case, the flooding starts when I yank the VGA cable from the back of
the machine. vgapci0 shares irq16 with ehci0 in every example of this issue
i've been able to find.

The forum thread on this issue may make for relevant reading:
http://forums.freebsd.org/showthread.php?t=24952
Comment 10 nabeken 2013-06-05 16:58:09 UTC
Hi,

On Wed, 24 Apr 2013 03:33:02 -0700, Andrew Fremantle <andrew@skyhawk.ca> wrote:
> Based on the fact this only seems to happen on Intel Core architecture
> machines, I think this issue may be miscategorized.

> In my case, the flooding starts when I yank the VGA cable from the back of
> the machine. vgapci0 shares irq16 with ehci0 in every example of this issue
> i've been able to find.

same here.
I confirmed that the fooding starts when I remove the VGA cable from
the machine (onboard).

I've attached my full dmesg.

--
TANABE Ken-ichi / @nabeken
Mailto: nabeken@tknetworks.org / gmail.com / gentoo.gr.jp
Comment 11 Andreas Gustafsson 2013-09-18 08:33:22 UTC
FWIW, this problem also affects NetBSD, see http://gnats.netbsd.org/46596 .
-- 
Andreas Gustafsson, gson@gson.org
Comment 12 bt 2014-09-16 19:24:44 UTC
Confirmed on this motherboard: Intel BOXDH61BEB3

As a temporary workaround, I found removing ehci from the kernel works splendidly. Idle power consumption dropped from 41 watts to 28 watts.
Comment 13 proler 2014-10-17 10:41:17 UTC
Confirmed on Intel DH77KC, H77
FreeBSD 10.0-RELEASE

Problem starts when detaching-attaching hdmi cable
Comment 14 Tobias Berner 2015-01-28 18:29:12 UTC
Hi

I think I got bitten by the same bug -- monitor detached from dvi:

interrupt                          total       rate
irq16: ehci0                   465310134     142166
irq23: ehci1                        8786          2
cpu0:timer                       1485088        453
irq264: mps0                      387611        118
irq265: hdac0                          7          0
irq266: xhci0                      28208          8
irq268: hdac1                        105          0
irq269: igb0:que 0                  8763          2
irq270: igb0:que 1                128669         39
irq271: igb0:que 2               1012967        309
irq272: igb0:que 3                491974        150
irq273: igb0:link                      2          0
irq274: ahci0                          6          0
irq275: ahci1                     551817        168
cpu1:timer                       1482123        452
cpu7:timer                       1475647        450
cpu6:timer                       1477623        451
cpu2:timer                       1555822        475
cpu4:timer                       2821875        862
cpu3:timer                       1502125        458
cpu5:timer                       1426102        435
Total                          481155454     147007



FreeBSD odo.firefly 11.0-CURRENT FreeBSD 11.0-CURRENT #6 r277147: Tue Jan 13 22:15:03 CET 2015     Tobias@odo.firefly:/usr/obj/usr/src/sys/ODO  amd64

dev.ehci.0.%desc: Intel Lynx Point USB 2.0 controller USB-B
Comment 15 steven_nikkel 2015-02-03 23:50:07 UTC
Seeing the same thing on 10.1-RELEASE-p5 with a Gigabyte GA-C1037UN (Intel NM70 Chipset). Problem started after I unplugged a VGA monitor.
Comment 16 Hans Petter Selasky freebsd_committer freebsd_triage 2015-02-05 09:00:40 UTC
Created attachment 152572 [details]
UHCI patch

Does the attached patch make any difference?

--HPS
Comment 17 Tobias Berner 2015-02-09 19:27:22 UTC
I still have to test it on the system from above (should have time on the weekend) -- but the patch made no difference on my Lenovo x200:
interrupt                          total       rate
irq9: acpi0                          306          4
irq16: uhci3                    14875930     193324
irq20: hpet0 uhci0                 85546       1112
[...]
Comment 18 commit-hook freebsd_committer freebsd_triage 2015-02-17 07:53:29 UTC
A commit references this bug:

Author: hselasky
Date: Tue Feb 17 07:52:51 UTC 2015
New revision: 278883
URL: https://svnweb.freebsd.org/changeset/base/278883

Log:
  Try to resolve infinite interrupts by clearing an undocumented
  interrupt status bit. According to the UHCI controller specification
  the host controller halted interrupt is non-maskable.

  PR:		156596
  Tested by:	adrian @
  MFC after:	1 week

Changes:
  head/sys/dev/usb/controller/uhci.c
Comment 19 sos 2015-05-14 16:35:11 UTC
Hi All

Just stumbled over this one, and the quick solution is to just ignore the interrupt from the (emulated) VGA device, it has nothing todo with USB :)

The real problem is that our VGA driver has no idea how to handle the interrupts from the intel built in video HW, so when you yank the VGA cable the chip wants to signal that event so the system can DTRT(tm). In our case there is no ack on that interrupt => instant interrupt storm.

My hack just disables the VGA interrupt completely, that might be a bad idea if you have anything using it :) However it lets me use my servers with a KVM without problems.

Index: vga_pci.c
===================================================================
--- vga_pci.c
+++ vga_pci.c
@@ -125,6 +125,9 @@
 	if ((config & (PCIM_CMD_PORTEN | PCIM_CMD_MEMEN)) == 0)
 		return (0);
 
+	/* Disable interrupts */
+	pci_write_config(dev, PCIR_COMMAND, config | (1<<10), 2);
+
 	/* This video card is the boot display: record its unit number. */
 	vga_pci_default_unit = unit;
 	device_set_flags(dev, 1);

--
Søren Schmidt
sos@deepcore.dk / sos@freebsd.org
"So much code to hack, so little time"
Comment 20 commit-hook freebsd_committer freebsd_triage 2015-06-05 06:23:18 UTC
A commit references this bug:

Author: hselasky
Date: Fri Jun  5 06:23:04 UTC 2015
New revision: 284012
URL: https://svnweb.freebsd.org/changeset/base/284012

Log:
  Disable VGA PCI interrupts until a chipset driver is loaded for VGA
  PCI devices. Else unhandled display adapter interrupts might freeze
  the CPU or consume a lot of CPU.

  PR:		156596
  MFC after:	1 week

Changes:
  head/sys/dev/pci/vga_pci.c
Comment 21 commit-hook freebsd_committer freebsd_triage 2015-06-17 07:42:41 UTC
A commit references this bug:

Author: hselasky
Date: Wed Jun 17 07:41:54 UTC 2015
New revision: 284503
URL: https://svnweb.freebsd.org/changeset/base/284503

Log:
  MFC r284012:
  Disable VGA PCI interrupts until a chipset driver is loaded for VGA
  PCI devices. Else unhandled display adapter interrupts might freeze
  the CPU or consume a lot of CPU.

  PR:	156596

Changes:
_U  stable/10/
  stable/10/sys/dev/pci/vga_pci.c
Comment 22 Hans Petter Selasky freebsd_committer freebsd_triage 2015-06-17 07:55:09 UTC
If you want a fix in 8-stable, please provide an adapted patch for 8-stable.
Comment 23 commit-hook freebsd_committer freebsd_triage 2015-06-17 12:44:09 UTC
A commit references this bug:

Author: hselasky
Date: Wed Jun 17 07:43:21 UTC 2015
New revision: 284504
URL: https://svnweb.freebsd.org/changeset/base/284504

Log:
  MFC r284012:
  Disable VGA PCI interrupts until a chipset driver is loaded for VGA
  PCI devices. Else unhandled display adapter interrupts might freeze
  the CPU or consume a lot of CPU.

  PR:	156596

Changes:
_U  stable/9/sys/
_U  stable/9/sys/dev/
  stable/9/sys/dev/pci/vga_pci.c