Bug 222624 - [usb] [panic] thread's sleep queue 0 is not empty at usb_process.c:193
Summary: [usb] [panic] thread's sleep queue 0 is not empty at usb_process.c:193
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 11.1-RELEASE
Hardware: amd64 Any
: --- Affects Some People
Assignee: freebsd-usb mailing list
URL:
Keywords: crash
Depends on:
Blocks:
 
Reported: 2017-09-26 14:17 UTC by Bernhard Froehlich
Modified: 2017-11-14 10:44 UTC (History)
7 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Bernhard Froehlich freebsd_committer 2017-09-26 14:17:31 UTC
panic: thread's sleep queue 0 is not empty

GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...

Unread portion of the kernel message buffer:
panic: thread's sleep queue 0 is not empty
cpuid = 0
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe04607079d0
vpanic() at vpanic+0x186/frame 0xfffffe0460707a50
kassert_panic() at kassert_panic+0x126/frame 0xfffffe0460707ac0
sleepq_add() at sleepq_add+0x382/frame 0xfffffe0460707b10
_cv_wait() at _cv_wait+0x1a2/frame 0xfffffe0460707b70
usb_process() at usb_process+0x17b/frame 0xfffffe0460707bb0
fork_exit() at fork_exit+0x84/frame 0xfffffe0460707bf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0460707bf0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
Uptime: 4d20h19m6s
Dumping 1697 out of 16239 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%



Loaded symbols for /boot/kernel/cpuctl.ko
#0  doadump (textdump=1) at pcpu.h:222
222     pcpu.h: No such file or directory.
        in pcpu.h
(kgdb) #0  doadump (textdump=1) at pcpu.h:222
#1  0xffffffff80a4f205 in kern_reboot (howto=260)
    at /usr/src/sys/kern/kern_shutdown.c:366
#2  0xffffffff80a4f7e0 in vpanic (fmt=<value optimized out>,
    ap=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:759
#3  0xffffffff80a4f616 in kassert_panic (fmt=<value optimized out>)
    at /usr/src/sys/kern/kern_shutdown.c:649
#4  0xffffffff80a9dbd2 in sleepq_add (wchan=0xfffffe0000fcee70,
    lock=0xfffffe0000fcf140, wmesg=0xffffffff81411eaf "-",
    flags=<value optimized out>, queue=0)
    at /usr/src/sys/kern/subr_sleepqueue.c:332
#5  0xffffffff809edce2 in _cv_wait (cvp=0xfffffe0000fcee70,
    lock=0xfffffe0000fcf140) at /usr/src/sys/kern/kern_condvar.c:135
#6  0xffffffff808b129b in usb_process (arg=0xfffffe0000fcee60)
    at /usr/src/sys/dev/usb/usb_process.c:193
#7  0xffffffff80a11614 in fork_exit (
    callout=0xffffffff808b1120 <usb_process>, arg=0xfffffe0000fcee60,
    frame=0xfffffe0460707c00) at /usr/src/sys/kern/kern_fork.c:1042
#8  0xffffffff80eaca2e in fork_trampoline ()
    at /usr/src/sys/amd64/amd64/exception.S:611
#9  0x0000000000000000 in ?? ()
Current language:  auto; currently minimal
(kgdb)


Kernel config:

include GENERIC
ident DEBUG

# https://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-options.html

# Debugging options
options		DDB
options		INVARIANTS
options		INVARIANT_SUPPORT
options		WITNESS
options		WITNESS_SKIPSPIN
Comment 1 Hans Petter Selasky freebsd_committer 2017-09-29 14:58:40 UTC
Did you do anything special to trigger this? Might be a generic kernel issue.

--HPS
Comment 2 Bernhard Froehlich freebsd_committer 2017-09-29 19:40:34 UTC
No it happens around every 1-5 days on two similar Supermicro boards.
Comment 3 Hans Petter Selasky freebsd_committer 2017-09-30 08:40:38 UTC
Hi,

I see you have kernel built with INVARIANTS. There is a chance that the asserts triggering this issue are false. I've added some more people to analyze this issue.

--HPS
Comment 4 Bernhard Froehlich freebsd_committer 2017-09-30 17:26:51 UTC
I also do have this panics with the release kernel but never got any useful info from it when it happened because it just stopped and did not even show a panic. What makes me confident that it's not a faulty hardware is that 10.3 was running very solid on it for around a year and that I have two of those machines with the exact same symptoms.
Comment 5 Hans Petter Selasky freebsd_committer 2017-10-06 14:20:55 UTC
Is it possible to do an attended KDB and get the output from:

show sleepqueue

--HPS
Comment 6 Hans Petter Selasky freebsd_committer 2017-10-06 14:26:01 UTC
Some closer inspections reveals the show sleepqueue command is broken ... Stay tuned for better debug ideas.
Comment 7 commit-hook freebsd_committer 2017-10-09 18:33:48 UTC
A commit references this bug:

Author: hselasky
Date: Mon Oct  9 18:33:29 UTC 2017
New revision: 324445
URL: https://svnweb.freebsd.org/changeset/base/324445

Log:
  When showing the sleepqueues from the in-kernel debugger,
  properly dump all the sendqueues and not just the first one

  History:
  It appears that in the commit which introduced the code,
  r165272, the array indexes of "sq_blocked[0]" and "td_name[i]"
  were interchanged. In r180927 "td_name[i]" was corrected to
  "td_name[0]", but "sq_blocked[0]" was left unchanged.

  PR:		222624
  Discussed with:	kmacy @
  MFC after:	1 week
  Sponsored by:	Mellanox Technologies

Changes:
  head/sys/kern/subr_sleepqueue.c
Comment 8 commit-hook freebsd_committer 2017-10-20 10:05:28 UTC
A commit references this bug:

Author: hselasky
Date: Fri Oct 20 10:04:43 UTC 2017
New revision: 324799
URL: https://svnweb.freebsd.org/changeset/base/324799

Log:
  MFC r324445:
  When showing the sleepqueues from the in-kernel debugger,
  properly dump all the sleepqueues and not just the first one

  History:
  It appears that in the commit which introduced the code,
  r165272, the array indexes of "sq_blocked[0]" and "td_name[i]"
  were interchanged. In r180927 "td_name[i]" was corrected to
  "td_name[0]", but "sq_blocked[0]" was left unchanged.

  PR:		222624
  Discussed with:	kmacy @
  Sponsored by:	Mellanox Technologies

Changes:
_U  stable/11/
  stable/11/sys/kern/subr_sleepqueue.c
Comment 9 commit-hook freebsd_committer 2017-10-20 10:06:31 UTC
A commit references this bug:

Author: hselasky
Date: Fri Oct 20 10:06:03 UTC 2017
New revision: 324800
URL: https://svnweb.freebsd.org/changeset/base/324800

Log:
  MFC r324445:
  When showing the sleepqueues from the in-kernel debugger,
  properly dump all the sleepqueues and not just the first one

  History:
  It appears that in the commit which introduced the code,
  r165272, the array indexes of "sq_blocked[0]" and "td_name[i]"
  were interchanged. In r180927 "td_name[i]" was corrected to
  "td_name[0]", but "sq_blocked[0]" was left unchanged.

  PR:		222624
  Discussed with:	kmacy @
  Sponsored by:	Mellanox Technologies

Changes:
_U  stable/10/
  stable/10/sys/kern/subr_sleepqueue.c
Comment 10 commit-hook freebsd_committer 2017-10-20 10:07:34 UTC
A commit references this bug:

Author: hselasky
Date: Fri Oct 20 10:07:17 UTC 2017
New revision: 324801
URL: https://svnweb.freebsd.org/changeset/base/324801

Log:
  MFC r324445:
  When showing the sleepqueues from the in-kernel debugger,
  properly dump all the sleepqueues and not just the first one

  History:
  It appears that in the commit which introduced the code,
  r165272, the array indexes of "sq_blocked[0]" and "td_name[i]"
  were interchanged. In r180927 "td_name[i]" was corrected to
  "td_name[0]", but "sq_blocked[0]" was left unchanged.

  PR:		222624
  Discussed with:	kmacy @
  Sponsored by:	Mellanox Technologies

Changes:
_U  stable/9/sys/
  stable/9/sys/kern/subr_sleepqueue.c
Comment 11 Bernhard Froehlich freebsd_committer 2017-11-06 09:02:47 UTC
I have updated the kernel to a very recent 11-STABLE.


FreeBSD 11.1-STABLE #1 r325275: Wed Nov  1 13:08:01 CET 2017

panic: thread's sleep queue 0 is not empty

GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...

Unread portion of the kernel message buffer:
panic: thread's sleep queue 0 is not empty
cpuid = 0
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0460ccb9d0
vpanic() at vpanic+0x186/frame 0xfffffe0460ccba50
kassert_panic() at kassert_panic+0x126/frame 0xfffffe0460ccbac0
sleepq_add() at sleepq_add+0x382/frame 0xfffffe0460ccbb10
_cv_wait() at _cv_wait+0x1a2/frame 0xfffffe0460ccbb70
usb_process() at usb_process+0x17b/frame 0xfffffe0460ccbbb0
fork_exit() at fork_exit+0x84/frame 0xfffffe0460ccbbf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0460ccbbf0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
Uptime: 1d18h0m42s
Dumping 953 out of 16239 MB:..2%..11%..21%..31%..41%..51%..61%..71%..81%..91%


#0  doadump (textdump=1) at pcpu.h:222
222     pcpu.h: No such file or directory.
        in pcpu.h
(kgdb) #0  doadump (textdump=1) at pcpu.h:222
#1  0xffffffff80a5d5d5 in kern_reboot (howto=260)
    at /usr/src/sys/kern/kern_shutdown.c:366
#2  0xffffffff80a5dbb0 in vpanic (fmt=<value optimized out>,
    ap=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:759
#3  0xffffffff80a5d9e6 in kassert_panic (fmt=<value optimized out>)
    at /usr/src/sys/kern/kern_shutdown.c:649
#4  0xffffffff80aaca32 in sleepq_add (wchan=0xfffffe0000fc2e70,
    lock=0xfffffe0000fc3140, wmesg=0xffffffff8142963f "-",
    flags=<value optimized out>, queue=0)
    at /usr/src/sys/kern/subr_sleepqueue.c:332
#5  0xffffffff809fbb42 in _cv_wait (cvp=0xfffffe0000fc2e70,
    lock=0xfffffe0000fc3140) at /usr/src/sys/kern/kern_condvar.c:135
#6  0xffffffff808bc61b in usb_process (arg=0xfffffe0000fc2e60)
    at /usr/src/sys/dev/usb/usb_process.c:193
#7  0xffffffff80a1f944 in fork_exit (
    callout=0xffffffff808bc4a0 <usb_process>, arg=0xfffffe0000fc2e60,
    frame=0xfffffe0460ccbc00) at /usr/src/sys/kern/kern_fork.c:1044
#8  0xffffffff80ebcc5e in fork_trampoline ()
    at /usr/src/sys/amd64/amd64/exception.S:651
#9  0x0000000000000000 in ?? ()
Current language:  auto; currently minimal
Comment 12 Bernhard Froehlich freebsd_committer 2017-11-06 09:06:19 UTC
Another panic but this time a different one still USB related as it seems.

FreeBSD 11.1-STABLE #1 r325275: Wed Nov  1 13:08:01 CET 2017

Unread portion of the kernel message buffer:
Kernel page fault with the following non-sleepable locks held:
exclusive sleep mutex ehci0 (usb_def_mtx) r = 0 (0xfffffe0000fc3140) locked @ /usr/src/sys/dev/usb/controller/ehci.c:1465
stack backtrace:
#0 0xffffffff80abe640 at witness_debugger+0x70
#1 0xffffffff80abfa6e at witness_warn+0x45e
#2 0xffffffff80edb013 at trap_pfault+0x53
#3 0xffffffff80eda810 at trap+0x2b0
#4 0xffffffff80ebc671 at calltrap+0x8
#5 0xffffffff80886dd1 at ehci_interrupt+0x221
#6 0xffffffff80a22259 at intr_event_execute_handlers+0x99
#7 0xffffffff80a228f6 at ithread_loop+0xb6
#8 0xffffffff80a1f944 at fork_exit+0x84
#9 0xffffffff80ebcc5e at fork_trampoline+0xe


Fatal trap 12: page fault while in kernel mode
cpuid = 3; apic id = 06
fault virtual address   = 0x48
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff80887814
stack pointer           = 0x28:0xfffffe0460cb2ab0
frame pointer           = 0x28:0xfffffe0460cb2ad0
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 12 (irq23: ehci0)
trap number             = 12
panic: page fault
cpuid = 3
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0460cb2690
vpanic() at vpanic+0x186/frame 0xfffffe0460cb2710
panic() at panic+0x43/frame 0xfffffe0460cb2770
trap_fatal() at trap_fatal+0x322/frame 0xfffffe0460cb27c0
trap_pfault() at trap_pfault+0x62/frame 0xfffffe0460cb2820
trap() at trap+0x2b0/frame 0xfffffe0460cb29e0
calltrap() at calltrap+0x8/frame 0xfffffe0460cb29e0
--- trap 0xc, rip = 0xffffffff80887814, rsp = 0xfffffe0460cb2ab0, rbp = 0xfffffe0460cb2ad0 ---
ehci_check_transfer() at ehci_check_transfer+0x94/frame 0xfffffe0460cb2ad0
ehci_interrupt() at ehci_interrupt+0x221/frame 0xfffffe0460cb2b20
intr_event_execute_handlers() at intr_event_execute_handlers+0x99/frame 0xfffffe0460cb2b60
ithread_loop() at ithread_loop+0xb6/frame 0xfffffe0460cb2bb0
fork_exit() at fork_exit+0x84/frame 0xfffffe0460cb2bf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0460cb2bf0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
Uptime: 11h1m19s
Comment 13 Hans Petter Selasky freebsd_committer 2017-11-06 09:26:37 UTC
Do you have a core dump of the last crash?

Can you run gdb:

gdb
file /boot/kernel/ehci.ko
info line *(void *)(ehci_check_transfer+0x94)

In my system this leads to the xfer->td_transfer_cache is NULL.
This variable should not be NULL during regular usage. This might be an indication of another memory problem or maybe a mutex issue.

It would be interesting to dump some structures to see if their contents is valid.

--HPS
Comment 14 Bernhard Froehlich freebsd_committer 2017-11-13 10:23:10 UTC
I'm not sure if that is what you expected but anyway here it is.

root@r5:/var/crash # ll /boot/kernel/kernel 
-r-xr-xr-x  1 root  wheel  27242832 Nov  1 13:08 /boot/kernel/kernel*
root@r5:/var/crash # ll /boot/kernel/ehci.ko 
-r-xr-xr-x  1 root  wheel  90272 Nov  1 14:04 /boot/kernel/ehci.ko*
root@r5:/var/crash # gdb
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd".
(gdb) file /boot/kernel/ehci.ko
Reading symbols from /boot/kernel/ehci.ko...Reading symbols from /usr/lib/debug//boot/kernel/ehci.ko.debug...done.
done.
(gdb) info line *(void *)(ehci_check_transfer+0x94)
Line 218 of "cpufunc.h" starts at address 0x2f6c <ehci_get_port_speed_hostc+44>
   and ends at 0x3fae <ehci_device_isoc_fs_enter+110>.


In the meantime I have 3 more panics from the original kind "panic: thread's sleep queue 0 is not empty".
Comment 15 Bernhard Froehlich freebsd_committer 2017-11-13 10:29:28 UTC
I'm not sure if it's related but I want to mention that bhyve is used on that box.
Comment 16 Hans Petter Selasky freebsd_committer 2017-11-13 11:13:33 UTC
Hi,

ehci_get_port_speed_portsc() should not be in use on amd64. Either the stack is corrupt or this indicates a problem with the CPU or some registers have been clobbered.

--HPS

sys/dev/usb/controller/ehci_ixp4xx.c:	sc->sc_vendor_get_port_speed = ehci_get_port_speed_portsc;
sys/dev/usb/controller/ehci_imx.c:	esc->sc_vendor_get_port_speed = ehci_get_port_speed_portsc;
sys/dev/usb/controller/ehci.c:ehci_get_port_speed_portsc(struct ehci_softc *sc, uint16_t index)
sys/dev/usb/controller/ehci_mv.c:	sc->sc_vendor_get_port_speed = ehci_get_port_speed_portsc;
sys/dev/usb/controller/ehci.h:uint16_t ehci_get_port_speed_portsc(struct ehci_softc *sc, uint16_t index);

cpufunc.h (amd64):218

static __inline u_int
inl(u_int port)
{
        u_int   data;
 
        __asm __volatile("inl %w1, %0" : "=a" (data) : "Nd" (port));
        return (data);
}
Comment 17 Bernhard Froehlich freebsd_committer 2017-11-14 10:44:54 UTC
I have copied the kernel and crashdumps to freefall:/home/decke/222624-thread-sleep-queue if someone wants to have a look.