Bug 171814 - [panic] bioq_init or bioq_remove (unsure which)
Summary: [panic] bioq_init or bioq_remove (unsure which)
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: amd64 (show other bugs)
Version: 9.0-RELEASE
Hardware: Any Any
: Normal Affects Only Me
Assignee: freebsd-amd64 (Nobody)
URL:
Keywords: crash
Depends on:
Blocks:
 
Reported: 2012-09-20 18:40 UTC by pprocacci
Modified: 2022-10-17 12:17 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description pprocacci 2012-09-20 18:40:03 UTC
cpuid = 5; acpic id = 13
fault virtual address      = 0x20
fault code                 = supervisor read data, page not present
instruction pointer        = 0x20 :0xffffffff80865023
stack pointer              = 0x28 :0xffffff80002b3b30
frame pointer              = 0x28 :0xffffff80002b3b50
code segment               = base rx0, limit 0xfffff, type 0x1b
                           = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags           = interrupt enabled, resume, IOPL = 0
current process            = 13 (g_event)
trap number                = 12
panic: page fault
cpuid = 5
KDB: stack backtrace:
#0 0xffffffff808680fe at kdb_backtrace+0x5e
#1 0xffffffff8x832cb7 at panic+0x187
#2 0xffffffff80b185a0 at trap_fatal+0x290
#3 0xffffffff80b188e9 at trap_pfault+0x1f9
#4 0xffffffff80b18daf at trap+0x3df
#5 0xffffffff80b0324f at calltrap+0x8
#6 0xffffffff807d165c at g_destroy_consumer+0x4c
#7 0xffffffff807ce6cc at g_run_events+0x1ec
#8 0xffffffff8080682f at fork_exit+0x11f
#9 0xffffffff80b0377e at fork_trampoline+0xe

#############################################################

- I'm using a GENERIC kernel.
- Following the instructions here: http://www.freebsd.org/doc/faq/advanced.html
  I'm able to ascertain that the problem exists in one of the following two functions:

db1# nm -n /boot/kernel/kernel | fgrep ffffffff808650
ffffffff80865080 T bioq_init
ffffffff808650b0 T bioq_remove

#############################################################

I'm using zfs over gmultipath over an isp device.

Here is the last errors from /var/log/messages leading up to the panic:
#############################################################
Sep 18 22:48:57 db1 kernel: (da3:isp1:0:0:1): lost device - 4 outstanding
Sep 18 22:48:57 db1 kernel: (ses2:isp1:0:0:254): lost device
Sep 18 22:48:57 db1 kernel: (ses2:isp1:0:0:254): removing device entry
Sep 18 22:48:57 db1 kernel: (da3:isp1:0:0:1): oustanding 3
Sep 18 22:48:57 db1 kernel: GEOM_MULTIPATH: da3 failed in PG
Sep 18 22:48:57 db1 kernel: (da3:GEOM_MULTIPATH: da1 now active path in PG
Sep 18 22:48:57 db1 kernel: isp1:0:0:1): oustanding 2
Sep 18 22:48:57 db1 kernel: (da3:isp1:0:0:1): oustanding 1
Sep 18 22:48:57 db1 kernel: (da3:isp1:0:0:1): oustanding 0
Sep 18 22:48:57 db1 kernel: (da3:isp1:0:0:1): removing device entry
Sep 18 22:48:57 db1 kernel: GEOM_MULTIPATH: da3 removed from PG
Sep 18 22:48:57 db1 kernel:
Sep 18 22:48:57 db1 kernel:
Sep 18 22:48:57 db1 kernel: Fatal trap 12: page fault while in kernel mode
Sep 18 22:48:57 db1 kernel: cpuid = 5; apic id = 13
Sep 18 22:48:57 db1 kernel: fault virtual address       = 0x20
Sep 18 22:48:57 db1 kernel: fault code          = supervisor read data, page not present
#############################################################

Fix: 

Unknown.
How-To-Repeat: I cannot repeat this problem on demand.  It's happened twice in the past couple of months, but I do not have a test case in which it can be reproduced.
Comment 1 John Baldwin freebsd_committer freebsd_triage 2012-09-25 13:45:59 UTC
On Thursday, September 20, 2012 1:37:14 pm Paul Procacci wrote:
> 
> >Number:         171814
> >Category:       ia64
> >Synopsis:       [panic] bioq_init or bioq_remove (unsure which)
> >Confidential:   no
> >Severity:       non-critical
> >Priority:       low
> >Responsible:    freebsd-ia64
> >State:          open
> >Quarter:        
> >Keywords:       
> >Date-Required:
> >Class:          sw-bug
> >Submitter-Id:   current-users
> >Arrival-Date:   Thu Sep 20 17:40:03 UTC 2012
> >Closed-Date:
> >Last-Modified:
> >Originator:     Paul Procacci
> >Release:        9.0-RELEASE-p3
> >Organization:
> Datapipe
> >Environment:
> FreeBSD db1.xxxxxxxxxxxxx.com 9.0-RELEASE-p3 FreeBSD 9.0-RELEASE-p3 #0: Tue 
Jun 12 02:52:29 UTC 2012     root@amd64-
builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  amd64
> >Description:
> cpuid = 5; acpic id = 13
> fault virtual address      = 0x20
> fault code                 = supervisor read data, page not present
> instruction pointer        = 0x20 :0xffffffff80865023
> stack pointer              = 0x28 :0xffffff80002b3b30
> frame pointer              = 0x28 :0xffffff80002b3b50
> code segment               = base rx0, limit 0xfffff, type 0x1b
>                            = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags           = interrupt enabled, resume, IOPL = 0
> current process            = 13 (g_event)
> trap number                = 12
> panic: page fault
> cpuid = 5
> KDB: stack backtrace:
> #0 0xffffffff808680fe at kdb_backtrace+0x5e
> #1 0xffffffff8x832cb7 at panic+0x187
> #2 0xffffffff80b185a0 at trap_fatal+0x290
> #3 0xffffffff80b188e9 at trap_pfault+0x1f9
> #4 0xffffffff80b18daf at trap+0x3df
> #5 0xffffffff80b0324f at calltrap+0x8
> #6 0xffffffff807d165c at g_destroy_consumer+0x4c
> #7 0xffffffff807ce6cc at g_run_events+0x1ec
> #8 0xffffffff8080682f at fork_exit+0x11f
> #9 0xffffffff80b0377e at fork_trampoline+0xe
> 
> #############################################################
> 
> - I'm using a GENERIC kernel.
> - Following the instructions here: 
http://www.freebsd.org/doc/faq/advanced.html
>   I'm able to ascertain that the problem exists in one of the following two 
functions:
> 
> db1# nm -n /boot/kernel/kernel | fgrep ffffffff808650
> ffffffff80865080 T bioq_init
> ffffffff808650b0 T bioq_remove

No, I think it occurred in some other routine.  Note that 5023 < 5080, so the 
PC is before the start of 'bioq_init()'.  It's probably in some static 
function called by g_destroy_consumer() such as g_do_wither().  Do you have
a kernel.symbols file?  If so, doing 'gdb /boot/kernel/kernel' followed by
'l *0xffffffff80865023' would be very helpful.

-- 
John Baldwin
Comment 2 pprocacci 2012-09-25 18:11:17 UTC
Thanks John for your response.

Here is the output provided what you had explained to do:


0xffffffff80865023 is in devstat_remove_entry
(/usr/src/sys/kern/subr_devstat.c:193).
188
189             /* Remove this entry from the devstat queue */
190             atomic_add_acq_int(&ds->sequence1, 1);
191             if (ds->id == NULL) {
192                     devstat_num_devs--;
193                     STAILQ_REMOVE(devstat_head, ds, devstat, dev_links);
194             }
195             devstat_free(ds);
196             devstat_generation++;
197             mtx_unlock(&devstat_mutex);



-- 
__________________

:(){ :|:& };:
Comment 3 Marcel Moolenaar freebsd_committer freebsd_triage 2012-09-25 23:03:08 UTC
Responsible Changed
From-To: freebsd-ia64->freebsd-amd64

Change category to match environment.
Comment 4 Eitan Adler freebsd_committer freebsd_triage 2017-12-31 07:59:46 UTC
For bugs matching the following criteria:

Status: In Progress Changed: (is less than) 2014-06-01

Reset to default assignee and clear in-progress tags.

Mail being skipped
Comment 5 Graham Perrin freebsd_committer freebsd_triage 2022-10-17 12:17:27 UTC
Keyword: 

    crash

– in lieu of summary line prefix: 

    [panic]

* bulk change for the keyword
* summary lines may be edited manually (not in bulk). 

Keyword descriptions and search interface: 

    <https://bugs.freebsd.org/bugzilla/describekeywords.cgi>