Bug 230590

Summary: [ehci] ehci_interrupt: unrecoverable error, controller halted
Product: Base System Reporter: Samy Mahmoudi <samy.mahmoudi>
Component: usbAssignee: freebsd-usb (Nobody) <usb>
Status: Closed FIXED    
Severity: Affects Only Me CC: hselasky
Priority: ---    
Version: 11.2-RELEASE   
Hardware: amd64   
OS: Any   
Attachments:
Description Flags
dmesg output none

Description Samy Mahmoudi 2018-08-13 12:13:47 UTC
Created attachment 196155 [details]
dmesg output

Hello,

Upgrading from 11.1-RELEASE to 11.2-RELEASE broke two of my USB ports. dmesg gives me:

ehci0: <Intel Cougar Point USB 2.0 controller> mem 0xf252a000-0xf252a3ff at device 26.0 on pci0
usbus0: EHCI version 1.0
ehci_interrupt: unrecoverable error, controller halted
cmd=0x00010030
 EHCI_CMD_ITC_1
 EHCI_CMD_ASE
 EHCI_CMD_PSE
sts=0x0000d004
 EHCI_STS_ASS
 EHCI_STS_PSS
 EHCI_STS_HCH
 EHCI_STS_PCD
ien=0x00000037
frindex=0x00000000 ctrdsegm=0x00000000 periodic=0x03c2f000 async=0xd3427000
port 1 status=0x00001803
port 2 status=0x00001000
port 3 status=0x00001000
ehci_dump_isoc: isochronous dump from frame 0x000:
ITD(0xfffffe01139f5000) at 0x03c58000
 next=0x20a86004
 status[0]=0x00000000; <>
 status[1]=0x00000000; <>
 status[2]=0x00000000; <>
 status[3]=0x00000000; <>
 status[4]=0x00000000; <>
 status[5]=0x00000000; <>
 status[6]=0x00000000; <>
 status[7]=0x00000000; <>
 bp[0]=0x00000000
  addr=0x00; endpt=0x0
 bp[1]=0x00000000
 dir=out; mpl=0x00
 bp[2..6]=0x00000000,0x00000000,0x00000000,0x00000000,0x00000000
 bp_hi=0x00000000,0x00000000,0x00000000,0x00000000,
       0x00000000,0x00000000,0x00000000
SITD(0xfffffe00ef486000) at 0x20a86000
 next=0xd3458002
 portaddr=0x00000000 dir=out addr=0 endpt=0x0 port=0x0 huba=0x0
 mask=0x00000000
 status=0x00000000 <> len=0x0
 back=0x00000001, bp=0x00000000,0x00000000,0x00000000,0x00000000
ehci_interrupt: blocking interrupts 0x10
usbus0: run timeout
ehci0: USB init failed err=18
device_attach: ehci0 attach returned 6

I have attached the complete output of dmesg.
Comment 1 Hans Petter Selasky freebsd_committer freebsd_triage 2018-08-13 12:30:41 UTC
Hi,

Did you try setting any of the EHCI quirks in the loader?

hw.usb.ehci.lostintrbug: 0
hw.usb.ehci.iaadbug: 0

What does "pciconf -lv" say about your device?

--HPS
Comment 2 Samy Mahmoudi 2018-08-13 13:32:06 UTC
Hi,

Thank you for your prompt reply.

No, I did not try to set any of these. dmesg showed the ehci controller is identified as "Intel Cougar Point USB 2.0 controller" which is the result of this revision: https://svnweb.freebsd.org/base?view=revision&revision=316412.

Output of "pciconf -lv":

hostb0@pci0:0:0:0:	class=0x060000 card=0x21cf17aa chip=0x01048086 rev=0x09 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '2nd Generation Core Processor Family DRAM Controller'
    class      = bridge
    subclass   = HOST-PCI
vgapci0@pci0:0:2:0:	class=0x030000 card=0x21cf17aa chip=0x01268086 rev=0x09 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '2nd Generation Core Processor Family Integrated Graphics Controller'
    class      = display
    subclass   = VGA
none0@pci0:0:22:0:	class=0x078000 card=0x21cf17aa chip=0x1c3a8086 rev=0x04 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '6 Series/C200 Series Chipset Family MEI Controller'
    class      = simple comms
em0@pci0:0:25:0:	class=0x020000 card=0x21ce17aa chip=0x15028086 rev=0x04 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '82579LM Gigabit Network Connection (Lewisville)'
    class      = network
    subclass   = ethernet
none1@pci0:0:26:0:	class=0x0c0320 card=0x21cf17aa chip=0x1c2d8086 rev=0x04 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '6 Series/C200 Series Chipset Family USB Enhanced Host Controller'
    class      = serial bus
    subclass   = USB
hdac0@pci0:0:27:0:	class=0x040300 card=0x21cf17aa chip=0x1c208086 rev=0x04 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '6 Series/C200 Series Chipset Family High Definition Audio Controller'
    class      = multimedia
    subclass   = HDA
pcib1@pci0:0:28:0:	class=0x060400 card=0x21cf17aa chip=0x1c108086 rev=0xb4 hdr=0x01
    vendor     = 'Intel Corporation'
    device     = '6 Series/C200 Series Chipset Family PCI Express Root Port 1'
    class      = bridge
    subclass   = PCI-PCI
pcib2@pci0:0:28:1:	class=0x060400 card=0x21cf17aa chip=0x1c128086 rev=0xb4 hdr=0x01
    vendor     = 'Intel Corporation'
    device     = '6 Series/C200 Series Chipset Family PCI Express Root Port 2'
    class      = bridge
    subclass   = PCI-PCI
pcib3@pci0:0:28:3:	class=0x060400 card=0x21cf17aa chip=0x1c168086 rev=0xb4 hdr=0x01
    vendor     = 'Intel Corporation'
    device     = '6 Series/C200 Series Chipset Family PCI Express Root Port 4'
    class      = bridge
    subclass   = PCI-PCI
pcib4@pci0:0:28:4:	class=0x060400 card=0x21cf17aa chip=0x1c188086 rev=0xb4 hdr=0x01
    vendor     = 'Intel Corporation'
    device     = '6 Series/C200 Series Chipset Family PCI Express Root Port 5'
    class      = bridge
    subclass   = PCI-PCI
ehci0@pci0:0:29:0:	class=0x0c0320 card=0x21cf17aa chip=0x1c268086 rev=0x04 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '6 Series/C200 Series Chipset Family USB Enhanced Host Controller'
    class      = serial bus
    subclass   = USB
isab0@pci0:0:31:0:	class=0x060100 card=0x21cf17aa chip=0x1c4f8086 rev=0x04 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = 'QM67 Express Chipset Family LPC Controller'
    class      = bridge
    subclass   = PCI-ISA
ahci0@pci0:0:31:2:	class=0x010601 card=0x21cf17aa chip=0x1c038086 rev=0x04 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '6 Series/C200 Series Chipset Family 6 port Mobile SATA AHCI Controller'
    class      = mass storage
    subclass   = SATA
none2@pci0:0:31:3:	class=0x0c0500 card=0x21cf17aa chip=0x1c228086 rev=0x04 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '6 Series/C200 Series Chipset Family SMBus Controller'
    class      = serial bus
    subclass   = SMBus
iwn0@pci0:3:0:0:	class=0x028000 card=0x13118086 chip=0x00858086 rev=0x34 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = 'Centrino Advanced-N 6205 [Taylor Peak]'
    class      = network
sdhci_pci0@pci0:13:0:0:	class=0x088001 card=0x21cf17aa chip=0xe8221180 rev=0x08 hdr=0x00
    vendor     = 'Ricoh Co Ltd'
    device     = 'MMC/SD Host Controller'
    class      = base peripheral
none3@pci0:13:0:3:	class=0x0c0010 card=0x21cf17aa chip=0xe8321180 rev=0x04 hdr=0x00
    vendor     = 'Ricoh Co Ltd'
    device     = 'R5C832 PCIe IEEE 1394 Controller'
    class      = serial bus
    subclass   = FireWire


N.B. I just built the generic kernel from the last source tree (releng/11.2) and the problem disappeared. For further investigation as I was not satisfied with that logic, I ran freebsd-update fetch and freebsd-update install which (unexpectedly) give rise to a kernel reinstall. After rebooting, the problem came back so I can confirm the problem is present with the distributed generic kernel and absent with a home-made kernel.

I doubt my /etc/make.conf has something to do with this:
CPUTYPE?=sandybridge
MAKE_JOBS_NUMBER=4

OPTIONS_UNSET+=DOCS EXAMPLES IPV6 LPR

OPTIONS_SET+=CUPS
CUPS_OVERWRITE_BASE=YES

DEVELOPER=YES
Comment 3 Samy Mahmoudi 2018-08-13 14:24:27 UTC
I have set hw.usb.ehci.lostintrbug and hw.usb.ehci.iaadbug to 0 and it seems to solve the problem. I will now try to isolate which one is relevant to the issue, if not both.
Comment 4 Hans Petter Selasky freebsd_committer freebsd_triage 2018-08-13 14:32:42 UTC
It might be a quirk has already been added for your device in 11-stable or the issue was found and fixed. Is it a problem to run 11-stable kernel?

--HPS
Comment 5 Samy Mahmoudi 2018-08-13 16:15:22 UTC
It is absolutely not a problem to run a 11-STABLE kernel (especially because I use ZFS with a beadm-compatible layout) nor it is to build the generic kernel by my own. The problem is this kernel trap occuring after an upgrade to 11.2-RELEASE.

Unfortunately, I can not reproduce the problem right now as I have not made a back up of the distributed generic kernel. Moreover, I can not confirm what I wrote in comment 3. I will try to reproduce the upgrade with a rollback as soon as possible.

Could you please develop your hypothesis about this ?
Comment 6 Samy Mahmoudi 2018-08-14 13:05:05 UTC
I have been able to reproduce the issue, even with my home-built generic kernel and/or the tunables hw.usb.ehci.(lostintrbug|iaadbug) set to 0. At least, it now makes more sense.

I even encountered crashes. I will try to obtain a crash dump as soon as possible.
Comment 7 Samy Mahmoudi 2019-01-24 13:13:52 UTC
I did not obtain a crash dump since the relevant machine does not have a regular swap partition on drive. When I partitioned this drive, I did not think I could get involved in any form of kernel debugging...

Using a zvol as a swap device would have been useless for debugging so I wrote something like 'dumpdev="/dev/gpt/usbswap"' to /etc/rc.conf. Then I thought it would have been too hazardous to dump the crash to a USB swap device as the crash was precisely related to USB, so I dropped that idea.

Anyway, thank you for your help Hans Petter. As you said, this bug has probably been fixed since then.