Intel DH55C board Intel i5 650 single processor 4GB Ram 3ware RAID-1 Controller No mobo attached drives No USB attached devices Headless machine FreeBSD 8.2-Stable amd64 GENERIC, unmodified kernel build. Noticed load average of .80 .80 .80 even when all user daemons and internet services are shut down, and nothing else running but default basic things. A quick check of vmstat -i shows extremely high interrupt rate on irq16: server2# vmstat -i interrupt total rate irq1: atkbd0 680 0 irq16: ehci0 uhci4+ 25976006936 87649 <-----**** cpu0: timer 592700901 1999 irq256: em0 17240082 58 cpu1: timer 592700711 1999 cpu3: timer 592700711 1999 cpu2: timer 592700711 1999 Total 28364050732 95707 server2# Output from top -PS: last pid: 23203; load averages: 0.80, 0.80, 0.79 up 3+10:38:16 07:34:40 151 processes: 6 running, 127 sleeping, 18 waiting CPU 0: 0.0% user, 0.0% nice, 0.0% system, 80.5% interrupt, 19.5% idle CPU 1: 0.0% user, 0.0% nice, 0.0% system, 2.6% interrupt, 97.4% idle CPU 2: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle CPU 3: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle Mem: 283M Active, 2444M Inact, 651M Wired, 70M Cache, 398M Buf, 292M Free Swap: 4096M Total, 0K Used, 4096M Free CPU0 is always busy doing....nothing. Processor output from dmesg.boot: Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: Intel(R) Core(TM) i5 CPU 650 @ 3.20GHz (3192.02-MHz K8-class CPU) Origin = "GenuineIntel" Id = 0x20655 Family = 6 Model = 25 Stepping = 5 Motherboard info: ACPI APIC Table: <INTEL DH55HC> A quick check thru dmesg.boot for irq16: server2# grep -i "irq 16" /var/run/dmesg.boot vgapci0: <VGA-compatible display> port 0xf220-0xf227 mem 0xfe000000-0xfe3fffff,0xd0000000-0xdfffffff irq 16 at device 2.0 on pci0 ehci0: <Intel PCH USB 2.0 controller USB-B> mem 0xfe626000-0xfe6263ff irq 16 at device 26.7 on pci0 atapci1: <SiI 3132 SATA300 controller> port 0xe000-0xe07f mem 0xfe584000-0xfe58407f,0xfe580000-0xfe583fff irq 16 at device 0.0 on pci1 uhci4: <UHCI (generic) USB controller> port 0xf040-0xf05f irq 16 at device 29.1 on pci0 vgapci0: <VGA-compatible display> port 0xf220-0xf227 mem 0xfe000000-0xfe3fffff,0xd0000000-0xdfffffff irq 16 at device 2.0 on pci0 ehci0: <Intel PCH USB 2.0 controller USB-B> mem 0xfe626000-0xfe6263ff irq 16 at device 26.7 on pci0 atapci1: <SiI 3132 SATA300 controller> port 0xe000-0xe07f mem 0xfe584000-0xfe58407f,0xfe580000-0xfe583fff irq 16 at device 0.0 on pci1 uhci4: <UHCI (generic) USB controller> port 0xf040-0xf05f irq 16 at device 29.1 on pci0 server2# Output of pciconf -l: server2# pciconf -l hostb0@pci0:0:0:0: class=0x060000 card=0x00378086 chip=0x00408086 rev=0x02 hdr=0x00 vgapci0@pci0:0:2:0: class=0x030000 card=0x00378086 chip=0x00428086 rev=0x02 hdr=0x00 none0@pci0:0:22:0: class=0x078000 card=0x00378086 chip=0x3b648086 rev=0x06 hdr=0x00 atapci0@pci0:0:22:2: class=0x010185 card=0x00378086 chip=0x3b668086 rev=0x06 hdr=0x00 none1@pci0:0:22:3: class=0x070002 card=0x00378086 chip=0x3b678086 rev=0x06 hdr=0x00 em0@pci0:0:25:0: class=0x020000 card=0x00378086 chip=0x10f08086 rev=0x06 hdr=0x00 uhci0@pci0:0:26:0: class=0x0c0300 card=0x00378086 chip=0x3b3b8086 rev=0x06 hdr=0x00 uhci1@pci0:0:26:1: class=0x0c0300 card=0x00378086 chip=0x3b3e8086 rev=0x06 hdr=0x00 uhci2@pci0:0:26:2: class=0x0c0300 card=0x00378086 chip=0x3b3f8086 rev=0x06 hdr=0x00 ehci0@pci0:0:26:7: class=0x0c0320 card=0x00378086 chip=0x3b3c8086 rev=0x06 hdr=0x00 none2@pci0:0:27:0: class=0x040300 card=0x00378086 chip=0x3b568086 rev=0x06 hdr=0x00 pcib1@pci0:0:28:0: class=0x060400 card=0x00378086 chip=0x3b428086 rev=0x06 hdr=0x01 pcib2@pci0:0:28:4: class=0x060400 card=0x00378086 chip=0x3b4a8086 rev=0x06 hdr=0x01 uhci3@pci0:0:29:0: class=0x0c0300 card=0x00378086 chip=0x3b368086 rev=0x06 hdr=0x00 uhci4@pci0:0:29:1: class=0x0c0300 card=0x00378086 chip=0x3b378086 rev=0x06 hdr=0x00 uhci5@pci0:0:29:2: class=0x0c0300 card=0x00378086 chip=0x3b388086 rev=0x06 hdr=0x00 ehci1@pci0:0:29:7: class=0x0c0320 card=0x00378086 chip=0x3b348086 rev=0x06 hdr=0x00 pcib3@pci0:0:30:0: class=0x060401 card=0x00378086 chip=0x244e8086 rev=0xa6 hdr=0x01 isab0@pci0:0:31:0: class=0x060100 card=0x00378086 chip=0x3b068086 rev=0x06 hdr=0x00 atapci2@pci0:0:31:2: class=0x01018f card=0x00378086 chip=0x3b208086 rev=0x06 hdr=0x00 none3@pci0:0:31:3: class=0x0c0500 card=0x00378086 chip=0x3b308086 rev=0x06 hdr=0x00 atapci3@pci0:0:31:5: class=0x010185 card=0x00378086 chip=0x3b268086 rev=0x06 hdr=0x00 atapci1@pci0:1:0:0: class=0x010401 card=0x02429005 chip=0x02421095 rev=0x01 hdr=0x00 em1@pci0:3:0:0: class=0x020000 card=0x13768086 chip=0x107c8086 rev=0x05 hdr=0x00 server2# I hope that I have provided enough initial information. Thank you in advance.
Responsible Changed From-To: freebsd-amd64->freebsd-usb reclassify.
Hi, Compile a kernel with: options USB_DEBUG Then enable after boot: sysctl hw.usb.ehci.debug=15 ; sleep 1; sysctl hw.usb.ehci.debug=0 sysctl hw.usb.uhci.debug=15 ; sleep 1; sysctl hw.usb.uhci.debug=0 The high interrupt rate might be caused by a non-cleared IRQ. Else it is an ACPI issue. --HPS
Hi, I'm seeing this ehci interrupt storm with 8.2-RELEASE on my Dell E5420 laptop; vmstat -i reports 187300 interrupts per second from ehci. I've attached my /var/log/messages showing the boot messages and 1 seconds worth of hw.usb.ehci.debug=15 output as suggested. Hopefully it means more to you than it does to me... -- Colin Percival Security Officer, FreeBSD | freebsd.org | The power to serve Founder / author, Tarsnap | tarsnap.com | Online backups for the truly paranoid
Hi again, The attached patch seems to fix the problem while not breaking anything on my laptop. It makes handles the case of EHCI_REMOVE_QH(sqh, last) with sqh == last and sqh->prev == NULL by setting last = NULL -- the idea being that if sqh is the only entry in the queue, we still ought to delete it even though it doesn't have a predecessor. I have no idea if this is correct, since I don't know the rest of the code in this file. Maybe we should also delete the first entry in a queue even when it isn't also the last entry; maybe the problem lies somewhere else entirely. I'm hoping that someone who knows this code will be able to figure out the right fix easily now that I have an apparently-working hack. -- Colin Percival Security Officer, FreeBSD | freebsd.org | The power to serve Founder / author, Tarsnap | tarsnap.com | Online backups for the truly paranoid
On Friday 08 July 2011 22:37:45 Colin Percival wrote: > Hi, > > I'm seeing this ehci interrupt storm with 8.2-RELEASE on my Dell E5420 > laptop; vmstat -i reports 187300 interrupts per second from ehci. > > I've attached my /var/log/messages showing the boot messages and 1 seconds > worth of hw.usb.ehci.debug=15 output as suggested. Hopefully it means > more to you than it does to me... Have you tried to set any of these quirks: hw.usb.ehci.lostintrbug: 0 hw.usb.ehci.iaadbug: 0 --HPS
On Saturday 09 July 2011 02:26:37 Colin Percival wrote: > Hi again, > > The attached patch seems to fix the problem while not breaking anything on > my laptop. It makes handles the case of EHCI_REMOVE_QH(sqh, last) with > sqh == last and sqh->prev == NULL by setting last = NULL -- the idea being > that if sqh is the only entry in the queue, we still ought to delete it > even though it doesn't have a predecessor. > > I have no idea if this is correct, since I don't know the rest of the code > in this file. Maybe we should also delete the first entry in a queue even > when it isn't also the last entry; maybe the problem lies somewhere else > entirely. I'm hoping that someone who knows this code will be able to > figure out the right fix easily now that I have an apparently-working > hack. Hi, The last element should always be there. If it is NULL I would like to see a backtrace from that. KASSERT(last != NULL) in both add and remove. If you clear that the USB queue will stop working I believe. Anyway, I think your clue might give some hints what is wrong. It appears that the EHCI is spinning on the IDLE queue somehow producing a lot of IRQ's. --HPS
On Friday 08 July 2011 22:37:45 Colin Percival wrote: > Hi, > > I'm seeing this ehci interrupt storm with 8.2-RELEASE on my Dell E5420 > laptop; vmstat -i reports 187300 interrupts per second from ehci. > > I've attached my /var/log/messages showing the boot messages and 1 seconds > worth of hw.usb.ehci.debug=15 output as suggested. Hopefully it means > more to you than it does to me... Try the attached patch and report back. The EHCI should never generate more than 8000 IRQ/s second, so this clearly indicates a Hardware Problem. --HPS
On 07/09/11 02:07, Hans Petter Selasky wrote: > Have you tried to set any of these quirks: > > hw.usb.ehci.lostintrbug: 0 > hw.usb.ehci.iaadbug: 0 Yes. Neither has any effect. (At least, not when I set them via sysctl; I could try setting them as loader tunables if you think that would make a difference.) I wrote: > The attached patch seems to fix the problem while not breaking anything on > my laptop. Key words, "seems to". On further use I've found that my change didn't make any difference; it was just a coincidence that the problem temporarily went away at that time. However, I've managed (I think!) to figure out what's triggering this: The IRQ flood starts when the laptop battery is recharging. Letting the battery run down for a couple hours and then plugging in AC power has 100% consistently triggered this; but if the battery is already fully charged when FreeBSD boots I don't get the IRQ flood. Hans Petter Selasky wrote: > Try the attached patch and report back. The EHCI should never generate more > than 8000 IRQ/s second, so this clearly indicates a Hardware Problem. No change, I'm afraid. -- Colin Percival Security Officer, FreeBSD | freebsd.org | The power to serve Founder / author, Tarsnap | tarsnap.com | Online backups for the truly paranoid
Based on the fact this only seems to happen on Intel Core architecture machines, I think this issue may be miscategorized. In my case, the flooding starts when I yank the VGA cable from the back of the machine. vgapci0 shares irq16 with ehci0 in every example of this issue i've been able to find. The forum thread on this issue may make for relevant reading: http://forums.freebsd.org/showthread.php?t=24952
Hi, On Wed, 24 Apr 2013 03:33:02 -0700, Andrew Fremantle <andrew@skyhawk.ca> wrote: > Based on the fact this only seems to happen on Intel Core architecture > machines, I think this issue may be miscategorized. > In my case, the flooding starts when I yank the VGA cable from the back of > the machine. vgapci0 shares irq16 with ehci0 in every example of this issue > i've been able to find. same here. I confirmed that the fooding starts when I remove the VGA cable from the machine (onboard). I've attached my full dmesg. -- TANABE Ken-ichi / @nabeken Mailto: nabeken@tknetworks.org / gmail.com / gentoo.gr.jp
FWIW, this problem also affects NetBSD, see http://gnats.netbsd.org/46596 . -- Andreas Gustafsson, gson@gson.org
Confirmed on this motherboard: Intel BOXDH61BEB3 As a temporary workaround, I found removing ehci from the kernel works splendidly. Idle power consumption dropped from 41 watts to 28 watts.
Confirmed on Intel DH77KC, H77 FreeBSD 10.0-RELEASE Problem starts when detaching-attaching hdmi cable
Hi I think I got bitten by the same bug -- monitor detached from dvi: interrupt total rate irq16: ehci0 465310134 142166 irq23: ehci1 8786 2 cpu0:timer 1485088 453 irq264: mps0 387611 118 irq265: hdac0 7 0 irq266: xhci0 28208 8 irq268: hdac1 105 0 irq269: igb0:que 0 8763 2 irq270: igb0:que 1 128669 39 irq271: igb0:que 2 1012967 309 irq272: igb0:que 3 491974 150 irq273: igb0:link 2 0 irq274: ahci0 6 0 irq275: ahci1 551817 168 cpu1:timer 1482123 452 cpu7:timer 1475647 450 cpu6:timer 1477623 451 cpu2:timer 1555822 475 cpu4:timer 2821875 862 cpu3:timer 1502125 458 cpu5:timer 1426102 435 Total 481155454 147007 FreeBSD odo.firefly 11.0-CURRENT FreeBSD 11.0-CURRENT #6 r277147: Tue Jan 13 22:15:03 CET 2015 Tobias@odo.firefly:/usr/obj/usr/src/sys/ODO amd64 dev.ehci.0.%desc: Intel Lynx Point USB 2.0 controller USB-B
Seeing the same thing on 10.1-RELEASE-p5 with a Gigabyte GA-C1037UN (Intel NM70 Chipset). Problem started after I unplugged a VGA monitor.
Created attachment 152572 [details] UHCI patch Does the attached patch make any difference? --HPS
I still have to test it on the system from above (should have time on the weekend) -- but the patch made no difference on my Lenovo x200: interrupt total rate irq9: acpi0 306 4 irq16: uhci3 14875930 193324 irq20: hpet0 uhci0 85546 1112 [...]
A commit references this bug: Author: hselasky Date: Tue Feb 17 07:52:51 UTC 2015 New revision: 278883 URL: https://svnweb.freebsd.org/changeset/base/278883 Log: Try to resolve infinite interrupts by clearing an undocumented interrupt status bit. According to the UHCI controller specification the host controller halted interrupt is non-maskable. PR: 156596 Tested by: adrian @ MFC after: 1 week Changes: head/sys/dev/usb/controller/uhci.c
Hi All Just stumbled over this one, and the quick solution is to just ignore the interrupt from the (emulated) VGA device, it has nothing todo with USB :) The real problem is that our VGA driver has no idea how to handle the interrupts from the intel built in video HW, so when you yank the VGA cable the chip wants to signal that event so the system can DTRT(tm). In our case there is no ack on that interrupt => instant interrupt storm. My hack just disables the VGA interrupt completely, that might be a bad idea if you have anything using it :) However it lets me use my servers with a KVM without problems. Index: vga_pci.c =================================================================== --- vga_pci.c +++ vga_pci.c @@ -125,6 +125,9 @@ if ((config & (PCIM_CMD_PORTEN | PCIM_CMD_MEMEN)) == 0) return (0); + /* Disable interrupts */ + pci_write_config(dev, PCIR_COMMAND, config | (1<<10), 2); + /* This video card is the boot display: record its unit number. */ vga_pci_default_unit = unit; device_set_flags(dev, 1); -- Søren Schmidt sos@deepcore.dk / sos@freebsd.org "So much code to hack, so little time"
A commit references this bug: Author: hselasky Date: Fri Jun 5 06:23:04 UTC 2015 New revision: 284012 URL: https://svnweb.freebsd.org/changeset/base/284012 Log: Disable VGA PCI interrupts until a chipset driver is loaded for VGA PCI devices. Else unhandled display adapter interrupts might freeze the CPU or consume a lot of CPU. PR: 156596 MFC after: 1 week Changes: head/sys/dev/pci/vga_pci.c
A commit references this bug: Author: hselasky Date: Wed Jun 17 07:41:54 UTC 2015 New revision: 284503 URL: https://svnweb.freebsd.org/changeset/base/284503 Log: MFC r284012: Disable VGA PCI interrupts until a chipset driver is loaded for VGA PCI devices. Else unhandled display adapter interrupts might freeze the CPU or consume a lot of CPU. PR: 156596 Changes: _U stable/10/ stable/10/sys/dev/pci/vga_pci.c
If you want a fix in 8-stable, please provide an adapted patch for 8-stable.
A commit references this bug: Author: hselasky Date: Wed Jun 17 07:43:21 UTC 2015 New revision: 284504 URL: https://svnweb.freebsd.org/changeset/base/284504 Log: MFC r284012: Disable VGA PCI interrupts until a chipset driver is loaded for VGA PCI devices. Else unhandled display adapter interrupts might freeze the CPU or consume a lot of CPU. PR: 156596 Changes: _U stable/9/sys/ _U stable/9/sys/dev/ stable/9/sys/dev/pci/vga_pci.c