Bug 131209

Summary: [panic] [bce] 7.1-STABLE amd64 crash - m0 NULL
Product: Base System Reporter: Roar Pettersen <roar.pettersen>
Component: amd64Assignee: freebsd-amd64 (Nobody) <amd64>
Status: Closed FIXED    
Severity: Affects Only Me    
Priority: Normal    
Version: 7.1-STABLE   
Hardware: Any   
OS: Any   

Description Roar Pettersen 2009-01-31 15:30:02 UTC
After 6 hours the system crash / freeze :

# kgdb kernel.debug /var/crash/vmcore.0
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...

Unread portion of the kernel message buffer:


Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
fault virtual address   = 0x800000108
fault code              = supervisor read data, page not present
instruction pointer     = 0x8:0xffffffff8037690c
stack pointer           = 0x10:0xfffffffec329e5e0
frame pointer           = 0x10:0x0
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 25972 (cron)
trap number             = 12
panic: page fault
cpuid = 1
Uptime: 6h33m10s
Physical memory: 2039 MB
Dumping 411 MB: 396 380 364 348 332 316 300 284 268 252 236 220 204 188 172 156 140 124 108 92 76 60 44 28 12

Reading symbols from /boot/kernel/blank_saver.ko...Reading symbols from /boot/kernel/blank_saver.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/blank_saver.ko
#0  0xffffffff802fa8fa in doadump () at /usr/src/sys/kern/kern_shutdown.c:238
238             if (dumper.dumper == NULL) {


Powe on / Off was the solution to re-start the system.
Comment 1 John Baldwin freebsd_committer freebsd_triage 2009-02-02 13:49:05 UTC
On Saturday 31 January 2009 10:29:45 am Roar Pettersen wrote:
> Fatal trap 12: page fault while in kernel mode
> cpuid = 1; apic id = 01
> fault virtual address   = 0x800000108
> fault code              = supervisor read data, page not present
> instruction pointer     = 0x8:0xffffffff8037690c
> stack pointer           = 0x10:0xfffffffec329e5e0
> frame pointer           = 0x10:0x0
> code segment            = base rx0, limit 0xfffff, type 0x1b
>                         = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags        = interrupt enabled, resume, IOPL = 0
> current process         = 25972 (cron)
> trap number             = 12
> panic: page fault
> cpuid = 1
> Uptime: 6h33m10s
> Physical memory: 2039 MB
> Dumping 411 MB: 396 380 364 348 332 316 300 284 268 252 236 220 204 188 172 
156 140 124 108 92 76 60 44 28 12
> 
> Reading symbols from /boot/kernel/blank_saver.ko...Reading symbols 
from /boot/kernel/blank_saver.ko.symbols...done.
> done.
> Loaded symbols for /boot/kernel/blank_saver.ko
> #0  0xffffffff802fa8fa in doadump () 
at /usr/src/sys/kern/kern_shutdown.c:238
> 238             if (dumper.dumper == NULL) {

Can you provide the full backtrace?  This stack frame is well after the panic 
in the code that writes out the crash dump, so it is not very useful.

-- 
John Baldwin
Comment 2 roar.pettersen 2009-02-02 15:34:39 UTC
Hello !

> Can you provide the full backtrace?  This stack frame is well after the panic
> in the code that writes out the crash dump, so it is not very useful.



# kgdb kernel.debug /var/crash/vmcore.0
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you 
are
welcome to change it and/or distribute copies of it under certain 
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for 
details.
This GDB was configured as "amd64-marcel-freebsd"...

Unread portion of the kernel message buffer:


Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
fault virtual address   = 0x800000108
fault code              = supervisor read data, page not present
instruction pointer     = 0x8:0xffffffff8037690c
stack pointer           = 0x10:0xfffffffec329e5e0
frame pointer           = 0x10:0x0
code segment            = base rx0, limit 0xfffff, type 0x1b
                         = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 25972 (cron)
trap number             = 12
panic: page fault
cpuid = 1
Uptime: 6h33m10s
Physical memory: 2039 MB
Dumping 411 MB: 396 380 364 348 332 316 300 284 268 252 236 220 204 188 
172 156 140 124 108 92 76 60 44 28 12

Reading symbols from /boot/kernel/blank_saver.ko...Reading symbols from 
/boot/kernel/blank_saver.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/blank_saver.ko
#0  0xffffffff802fa8fa in doadump () at 
/usr/src/sys/kern/kern_shutdown.c:238
238             if (dumper.dumper == NULL) {
(kgdb) backtrace
#0  0xffffffff802fa8fa in doadump () at 
/usr/src/sys/kern/kern_shutdown.c:238
#1  0x0000000000000004 in ?? ()
#2  0xffffffff802fae29 in boot (howto=260) at 
/usr/src/sys/kern/kern_shutdown.c:417
#3  0xffffffff802fb232 in panic (fmt=Variable "fmt" is not available.
) at /usr/src/sys/kern/kern_shutdown.c:572
#4  0xffffffff804fbc53 in trap_fatal (frame=0xffffff000a5dfa50, 
eva=Variable "eva" is not available.
)
     at /usr/src/sys/amd64/amd64/trap.c:735
#5  0xffffffff804fc025 in trap_pfault (frame=0xfffffffec329e530, 
usermode=0)
     at /usr/src/sys/amd64/amd64/trap.c:674
#6  0xffffffff804fc968 in trap (frame=0xfffffffec329e530)
     at /usr/src/sys/amd64/amd64/trap.c:571
#7  0xffffffff804e23ae in Xtss () at 
/usr/src/sys/amd64/amd64/exception.S:138
#8  0xffffffff8037690c in vattr_null (vap=0x0) at 
/usr/src/sys/kern/vfs_subr.c:536
#9  0xfffffffec329e640 in ?? ()
#10 0xffffff000a5dfa50 in ?? ()
#11 0xfffffffec329ea60 in ?? ()
#12 0xfffffffec329ea10 in ?? ()
#13 0x0000000000000000 in ?? ()
#14 0xffffffff80540f11 in VOP_VPTOFH_APV (vop=0x800000000, a=0x0)
     at vnode_if.c:2767
#15 0xffffff000439e3f0 in ?? ()
#16 0xffffff000439e3f0 in ?? ()
#17 0x0000000000000004 in ?? ()
#18 0xffffffff8037ac07 in vput (vp=0xffffff000439cb28) at vnode_if.h:31
#19 0xffffffff80377ede in vaccess (type=Variable "type" is not available.
) at /usr/src/sys/kern/vfs_subr.c:3400
#20 0xffffff0000000000 in ?? ()
#21 0xffffff007ff8a998 in ?? ()
#22 0xffffffff804c1805 in uma_zfree_arg (zone=0xffffffff804d9261,
     item=0xffffff000a5dfa50, udata=0x0) at /usr/src/sys/vm/uma_core.c:2367
#23 0xfffffffec329e8d0 in ?? ()
#24 0xffffff000439c9d8 in ?? ()
#25 0xffffff00043829a0 in ?? ()
#26 0x0000000000000000 in ?? ()
#27 0x0000000000000100 in ?? ()
#28 0xffffff000439c9d8 in ?? ()
#29 0xffffffff804b6d5a in ufs_getattr (ap=Variable "ap" is not available.
) at /usr/src/sys/ufs/ufs/ufs_vnops.c:444
#30 0x0000000000000004 in ?? ()
#31 0xffffff000156b150 in ?? ()
#32 0xfffffffec329ea10 in ?? ()
#33 0x0000000000000000 in ?? ()
#34 0x0000000000000001 in ?? ()
#35 0x0000000000000000 in ?? ()
#36 0x0000000000000100 in ?? ()
#37 0xffffff000439c9d8 in ?? ()
#38 0xffffffff804b685a in ufs_whiteout (ap=0x0)
     at /usr/src/sys/ufs/ufs/ufs_vnops.c:941
#39 0xffffffff803686d6 in breada (vp=0x0, rablkno=0x800, rabsize=0x4000, 
cnt=0,
     cred=0x1000) at /usr/src/sys/kern/vfs_bio.c:2288
#40 0x0000000000001001 in ?? ()
#41 0xffffff000439c9d8 in ?? ()
#42 0x0000000000000000 in ?? ()
#43 0xffffff00043829a0 in ?? ()
#44 0xffffff000439c9d8 in ?? ()
#45 0xfffffffec329eb10 in ?? ()
#46 0x000010000439c9d8 in ?? ()
#47 0x0002000000000002 in ?? ()
#48 0xffffff000a5dfa50 in ?? ()
#49 0x0000000000000001 in ?? ()
#50 0x0000000000000000 in ?? ()
#51 0x0000000080744500 in ?? ()
#52 0xffffff0037df6a00 in ?? ()
#53 0xfffffffe8001a340 in ?? ()
#54 0x0000000000000000 in ?? ()
#55 0xffffff000439c9d8 in ?? ()
#56 0xfffffffec329eb10 in ?? ()
#57 0xffffff000a5dfa50 in ?? ()
#58 0xffffffff80388dff in vn_read (fp=0xffffffff803687fa, 
uio=0xffffffff804ab959,
     active_cred=Variable "active_cred" is not available.
) at atomic.h:143
#59 0xffffffff8033480d in dofileread (td=0xffffff000a5dfa50, fd=3,
     fp=0xffffff0037df6a00, auio=0xfffffffec329eb10, offset=-5315631632, 
flags=0)
     at file.h:244
#60 0xffffffff80334b7e in kern_readv (td=0xffffff000a5dfa50, fd=3,
     auio=0xfffffffec329eb10) at /usr/src/sys/kern/sys_generic.c:192
#61 0xffffffff80334c6c in read (td=0x0, uap=0x0)
     at /usr/src/sys/kern/sys_generic.c:103
#62 0xffffffff804fc2a7 in syscall (frame=0xfffffffec329ec80)
     at /usr/src/sys/amd64/amd64/trap.c:878
#63 0xffffffff804e25bb in Xpage () at 
/usr/src/sys/amd64/amd64/exception.S:268
#64 0x0000000800932b8c in ?? ()
Previous frame inner to this frame (corrupt stack?)


--
Med vennlig hilsen / Regards;

   Roar Pettersen
   Universitetet i Bergen -  The University of Bergen
   Nygardsgt. 5  -  N-5020 BERGEN  - Norway
Comment 3 John Baldwin freebsd_committer freebsd_triage 2009-02-02 15:59:27 UTC
On Monday 02 February 2009 10:34:39 am Roar Pettersen wrote:
> Hello !
> 
> > Can you provide the full backtrace?  This stack frame is well after the 
panic
> > in the code that writes out the crash dump, so it is not very useful.
> 
> 
> 
> # kgdb kernel.debug /var/crash/vmcore.0
> GNU gdb 6.1.1 [FreeBSD]
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you 
> are
> welcome to change it and/or distribute copies of it under certain 
> conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for 
> details.
> This GDB was configured as "amd64-marcel-freebsd"...
> 
> Unread portion of the kernel message buffer:
> 
> 
> Fatal trap 12: page fault while in kernel mode
> cpuid = 1; apic id = 01
> fault virtual address   = 0x800000108
> fault code              = supervisor read data, page not present
> instruction pointer     = 0x8:0xffffffff8037690c
> stack pointer           = 0x10:0xfffffffec329e5e0
> frame pointer           = 0x10:0x0
> code segment            = base rx0, limit 0xfffff, type 0x1b
>                          = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags        = interrupt enabled, resume, IOPL = 0
> current process         = 25972 (cron)
> trap number             = 12
> panic: page fault
> cpuid = 1
> Uptime: 6h33m10s
> Physical memory: 2039 MB
> Dumping 411 MB: 396 380 364 348 332 316 300 284 268 252 236 220 204 188 
> 172 156 140 124 108 92 76 60 44 28 12
> 
> Reading symbols from /boot/kernel/blank_saver.ko...Reading symbols from 
> /boot/kernel/blank_saver.ko.symbols...done.
> done.
> Loaded symbols for /boot/kernel/blank_saver.ko
> #0  0xffffffff802fa8fa in doadump () at 
> /usr/src/sys/kern/kern_shutdown.c:238
> 238             if (dumper.dumper == NULL) {
> (kgdb) backtrace
> #0  0xffffffff802fa8fa in doadump () at 
> /usr/src/sys/kern/kern_shutdown.c:238
> #1  0x0000000000000004 in ?? ()
> #2  0xffffffff802fae29 in boot (howto=260) at 
> /usr/src/sys/kern/kern_shutdown.c:417
> #3  0xffffffff802fb232 in panic (fmt=Variable "fmt" is not available.
> ) at /usr/src/sys/kern/kern_shutdown.c:572
> #4  0xffffffff804fbc53 in trap_fatal (frame=0xffffff000a5dfa50, 
> eva=Variable "eva" is not available.
> )
>      at /usr/src/sys/amd64/amd64/trap.c:735
> #5  0xffffffff804fc025 in trap_pfault (frame=0xfffffffec329e530, 
> usermode=0)
>      at /usr/src/sys/amd64/amd64/trap.c:674
> #6  0xffffffff804fc968 in trap (frame=0xfffffffec329e530)
>      at /usr/src/sys/amd64/amd64/trap.c:571
> #7  0xffffffff804e23ae in Xtss () at 
> /usr/src/sys/amd64/amd64/exception.S:138
> #8  0xffffffff8037690c in vattr_null (vap=0x0) at 
> /usr/src/sys/kern/vfs_subr.c:536

Is your source tree out of date wrt your kernel?  The kernel messages clearly 
show a page fault, not a TSS fault as Xtss() would indicate.  Also, if 
vattr_null() was passed a NULL pointer, it should have faulted at the start 
of its routine rather than halfway through it.

-- 
John Baldwin
Comment 4 roar.pettersen 2009-02-03 07:34:04 UTC
Hello John !

> Is your source tree out of date wrt your kernel?  The kernel messages clearly
> show a page fault, not a TSS fault as Xtss() would indicate.  Also, if
> vattr_null() was passed a NULL pointer, it should have faulted at the start
> of its routine rather than halfway through it.

Yes, forgot that I had done a buildworld and build kernel to get all new
patches installed.


No crash yet, but each time we do a "shutdown -r now" because the system 
get unstable/unusable after some hours (4-6), we now get a dump each time 
:

# kgdb kernel.debug /var/crash/vmcore.6
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you 
are
welcome to change it and/or distribute copies of it under certain 
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for 
details.
This GDB was configured as "amd64-marcel-freebsd"...

Unread portion of the kernel message buffer:
<118>Feb  3 07:27:50 proxy-gw syslogd: exiting on signal 15
Waiting (max 60 seconds) for system process `vnlru' to stop...done
WaitSiynncing dgi s(kmsa,x  v6n0o dseesc ornedmsa)i nfionrg .s.y.st6e m 
process `syncer' to stop...5 1 2 2 1 1 0 0 0 done
Waiting (max 60 seconds) for system process `bufdaemon' to stop...done
All buffers synced.
Uptime: 7h8m11s


Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
fault virtual address   = 0x10
fault code              = supervisor read data, page not present
instruction pointer     = 0x8:0xffffffff8021d746
stack pointer           = 0x10:0xfffffffef7cf3b20
frame pointer           = 0x10:0x12000
code segment            = base rx0, limit 0xfffff, type 0x1b
                         = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 29 (irq257: bce1)
trap number             = 12
panic: page fault
cpuid = 1
Uptime: 7h8m11s
Physical memory: 4087 MB
Dumping 362 MB: 347 331 315 299 283 267 251 235 219 203 187 171 155 139 
123 107 91 75 59 43 27 11

#0  doadump () at pcpu.h:195
195             __asm __volatile("movq %%gs:0,%0" : "=r" (td));
(kgdb)
(kgdb) backtrace
#0  doadump () at pcpu.h:195
#1  0x0000000000000004 in ?? ()
#2  0xffffffff802fae39 in boot (howto=260) at 
/usr/src/sys/kern/kern_shutdown.c:418
#3  0xffffffff802fb242 in panic (fmt=0x104 <Address 0x104 out of bounds>)
     at /usr/src/sys/kern/kern_shutdown.c:574
#4  0xffffffff804fbd63 in trap_fatal (frame=0xffffff0001559000, 
eva=Variable "eva" is not available.
) at /usr/src/sys/amd64/amd64/trap.c:764
#5  0xffffffff804fc135 in trap_pfault (frame=0xfffffffef7cf3a70, 
usermode=0)
     at /usr/src/sys/amd64/amd64/trap.c:680
#6  0xffffffff804fca78 in trap (frame=0xfffffffef7cf3a70) at 
/usr/src/sys/amd64/amd64/trap.c:449
#7  0xffffffff804e24be in calltrap () at 
/usr/src/sys/amd64/amd64/exception.S:209
#8  0xffffffff8021d746 in bce_intr (xsc=Variable "xsc" is not available.
) at /usr/src/sys/dev/bce/if_bce.c:5748
#9  0xffffffff802db730 in ithread_loop (arg=0xffffff0001584080) at 
/usr/src/sys/kern/kern_intr.c:1088
#10 0xffffffff802d85d3 in fork_exit (callout=0xffffffff802db5c0 
<ithread_loop>, arg=0xffffff0001584080,
     frame=0xfffffffef7cf3c80) at /usr/src/sys/kern/kern_fork.c:804
#11 0xffffffff804e288e in fork_trampoline () at 
/usr/src/sys/amd64/amd64/exception.S:455
#12 0x0000000000000000 in ?? ()
#13 0x0000000000000000 in ?? ()
#14 0x0000000000000001 in ?? ()
#15 0x0000000000000000 in ?? ()
#16 0x0000000000000000 in ?? ()
#17 0x0000000000000000 in ?? ()
#18 0x0000000000000000 in ?? ()
#19 0x0000000000000000 in ?? ()
#20 0x0000000000000000 in ?? ()
#21 0x0000000000000000 in ?? ()
#22 0x0000000000000000 in ?? ()
#23 0x0000000000000000 in ?? ()
#24 0x0000000000000000 in ?? ()
#25 0x0000000000000000 in ?? ()
#26 0x0000000000000000 in ?? ()
#27 0x0000000000000000 in ?? ()
#28 0x0000000000000000 in ?? ()
#29 0x0000000000000000 in ?? ()
#30 0x0000000000000000 in ?? ()
#31 0x0000000000000000 in ?? ()
#32 0x0000000000000000 in ?? ()
#33 0x0000000000000000 in ?? ()
#34 0x0000000000000000 in ?? ()
#35 0x0000000000000000 in ?? ()
#36 0x00000000008b2000 in ?? ()
#37 0xffffffff80768fc0 in tdq_cpu ()
#38 0xffffffff80774bc0 in tdq_groups ()
#39 0xffffffff80774b40 in tdq_cpu ()
#40 0xffffff0001559000 in ?? ()
#41 0xffffffff80768340 in tdg_maxid ()
#42 0xfffffffef7cf36e8 in ?? ()
#43 0xffffff0001559000 in ?? ()
#44 0xffffffff8031be88 in sched_switch (td=0xffffffff802db5c0, 
newtd=0x800602040, flags=Variable "flags" is not available.
)
     at /usr/src/sys/kern/sched_ule.c:1938
#45 0x0000000000000000 in ?? ()
#46 0x0000000000000000 in ?? ()
#47 0x0000000000000000 in ?? ()
#48 0x0000000000000000 in ?? ()
#49 0x0000000000000000 in ?? ()
#50 0x0000000000000000 in ?? ()
#51 0x0000000000000000 in ?? ()
#52 0x0000000000000000 in ?? ()
#53 0x0000000000000000 in ?? ()
#54 0x0000000000000000 in ?? ()
#55 0x0000000000000000 in ?? ()
#56 0x0000000000000000 in ?? ()
#57 0x0000000000000000 in ?? ()
#58 0x0000000000000000 in ?? ()
#59 0x0000000000000000 in ?? ()
#60 0x0000000000000000 in ?? ()
#61 0x0000000000000000 in ?? ()
#62 0x0000000000000000 in ?? ()
#63 0x0000000000000000 in ?? ()
#64 0x0000000000000000 in ?? ()
#65 0x0000000000000000 in ?? ()
#66 0x0000000000000000 in ?? ()
#67 0x0000000000000000 in ?? ()
#68 0x0000000000000000 in ?? ()
#69 0x0000000000000000 in ?? ()
#70 0x0000000000000000 in ?? ()
#71 0x0000000000000000 in ?? ()
#72 0x0000000000000000 in ?? ()
#73 0x0000000000000000 in ?? ()
#74 0x0000000000000000 in ?? ()
---Type <return> to continue, or q <return> to quit---
#75 0x0000000000000000 in ?? ()
#76 0x0000000000000000 in ?? ()
#77 0x0000000000000000 in ?? ()
#78 0x0000000000000000 in ?? ()
#79 0x0000000000000000 in ?? ()
#80 0x0000000000000000 in ?? ()
#81 0x0000000000000000 in ?? ()
#82 0x0000000000000000 in ?? ()
#83 0x0000000000000000 in ?? ()
#84 0x0000000000000000 in ?? ()
#85 0x0000000000000000 in ?? ()
#86 0x0000000000000000 in ?? ()
#87 0x0000000000000000 in ?? ()
#88 0x0000000000000000 in ?? ()
#89 0x0000000000000000 in ?? ()
#90 0x0000000000000000 in ?? ()
#91 0x0000000000000000 in ?? ()
#92 0x0000000000000000 in ?? ()
#93 0x0000000000000000 in ?? ()
#94 0x0000000000000000 in ?? ()
#95 0x0000000000000000 in ?? ()
#96 0x0000000000000000 in ?? ()
#97 0x0000000000000000 in ?? ()
#98 0x0000000000000000 in ?? ()
#99 0x0000000000000000 in ?? ()
#100 0x0000000000000000 in ?? ()
#101 0x0000000000000000 in ?? ()
#102 0x0000000000000000 in ?? ()
#103 0x0000000000000000 in ?? ()
#104 0x0000000000000000 in ?? ()
#105 0x0000000000000000 in ?? ()
#106 0x0000000000000000 in ?? ()
#107 0x0000000000000000 in ?? ()
#108 0x0000000000000000 in ?? ()
#109 0x0000000000000000 in ?? ()
#110 0x0000000000000000 in ?? ()
#111 0x0000000000000000 in ?? ()
#112 0x0000000000000000 in ?? ()
Cannot access memory at address 0xfffffffef7cf4000



--
Med vennlig hilsen / Regards;

   Roar Pettersen
   Universitetet i Bergen -  The University of Bergen
   Nygardsgt. 5  -  N-5020 BERGEN  - Norway
   Tlf: +47 55 58 40 55  fax: +47 55 58 40 70
   roar.pettersen@it.uib.no - IT-Avd, UiB - http://www.uib.no
Comment 5 John Baldwin freebsd_committer freebsd_triage 2009-02-03 15:20:32 UTC
On Tuesday 03 February 2009 2:34:04 am Roar Pettersen wrote:
> Hello John !
> 
> > Is your source tree out of date wrt your kernel?  The kernel messages 
clearly
> > show a page fault, not a TSS fault as Xtss() would indicate.  Also, if
> > vattr_null() was passed a NULL pointer, it should have faulted at the 
start
> > of its routine rather than halfway through it.
> 
> Yes, forgot that I had done a buildworld and build kernel to get all new
> patches installed.
> 
> 
> No crash yet, but each time we do a "shutdown -r now" because the system 
> get unstable/unusable after some hours (4-6), we now get a dump each time 
> :
> 
> # kgdb kernel.debug /var/crash/vmcore.6
> GNU gdb 6.1.1 [FreeBSD]
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you 
> are
> welcome to change it and/or distribute copies of it under certain 
> conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for 
> details.
> This GDB was configured as "amd64-marcel-freebsd"...
> 
> Unread portion of the kernel message buffer:
> <118>Feb  3 07:27:50 proxy-gw syslogd: exiting on signal 15
> Waiting (max 60 seconds) for system process `vnlru' to stop...done
> WaitSiynncing dgi s(kmsa,x  v6n0o dseesc ornedmsa)i nfionrg .s.y.st6e m 
> process `syncer' to stop...5 1 2 2 1 1 0 0 0 done
> Waiting (max 60 seconds) for system process `bufdaemon' to stop...done
> All buffers synced.
> Uptime: 7h8m11s
> 
> 
> Fatal trap 12: page fault while in kernel mode
> cpuid = 1; apic id = 01
> fault virtual address   = 0x10
> fault code              = supervisor read data, page not present
> instruction pointer     = 0x8:0xffffffff8021d746
> stack pointer           = 0x10:0xfffffffef7cf3b20
> frame pointer           = 0x10:0x12000
> code segment            = base rx0, limit 0xfffff, type 0x1b
>                          = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags        = interrupt enabled, resume, IOPL = 0
> current process         = 29 (irq257: bce1)
> trap number             = 12
> panic: page fault
> cpuid = 1
> Uptime: 7h8m11s
> Physical memory: 4087 MB
> Dumping 362 MB: 347 331 315 299 283 267 251 235 219 203 187 171 155 139 
> 123 107 91 75 59 43 27 11
> 
> #0  doadump () at pcpu.h:195
> 195             __asm __volatile("movq %%gs:0,%0" : "=r" (td));
> (kgdb)
> (kgdb) backtrace
> #0  doadump () at pcpu.h:195
> #1  0x0000000000000004 in ?? ()
> #2  0xffffffff802fae39 in boot (howto=260) at 
> /usr/src/sys/kern/kern_shutdown.c:418
> #3  0xffffffff802fb242 in panic (fmt=0x104 <Address 0x104 out of bounds>)
>      at /usr/src/sys/kern/kern_shutdown.c:574
> #4  0xffffffff804fbd63 in trap_fatal (frame=0xffffff0001559000, 
> eva=Variable "eva" is not available.
> ) at /usr/src/sys/amd64/amd64/trap.c:764
> #5  0xffffffff804fc135 in trap_pfault (frame=0xfffffffef7cf3a70, 
> usermode=0)
>      at /usr/src/sys/amd64/amd64/trap.c:680
> #6  0xffffffff804fca78 in trap (frame=0xfffffffef7cf3a70) at 
> /usr/src/sys/amd64/amd64/trap.c:449
> #7  0xffffffff804e24be in calltrap () at 
> /usr/src/sys/amd64/amd64/exception.S:209
> #8  0xffffffff8021d746 in bce_intr (xsc=Variable "xsc" is not available.
> ) at /usr/src/sys/dev/bce/if_bce.c:5748

Looks to be a bug here.  Can you do 'frame 8' followed by 'l' in kgdb?

-- 
John Baldwin
Comment 6 roar.pettersen 2009-02-03 17:32:47 UTC
Hello John !

> Looks to be a bug here.  Can you do 'frame 8' followed by 'l' in kgdb?

(kgdb) frame 8
#8  0xffffffff8021d746 in bce_intr (xsc=Variable "xsc" is not available.
) at /usr/src/sys/dev/bce/if_bce.c:5748
5748                    sc->rx_mbuf_ptr[sw_rx_cons_idx] = NULL;

(kgdb) l
5743                    bus_dmamap_unload(sc->rx_mbuf_tag,
5744                        sc->rx_mbuf_map[sw_rx_cons_idx]);
5745
5746                    /* Remove the mbuf from the RX chain. */
5747                    m0 = sc->rx_mbuf_ptr[sw_rx_cons_idx];
5748                    sc->rx_mbuf_ptr[sw_rx_cons_idx] = NULL;
5749                    DBRUN(sc->debug_rx_mbuf_alloc--);
5750                    sc->free_rx_bd++;
5751
5752                    /*


--
Med vennlig hilsen / Regards;

   Roar Pettersen
   Universitetet i Bergen -  The University of Bergen
   Nygardsgt. 5  -  N-5020 BERGEN  - Norway
   Tlf: +47 55 58 40 55  fax: +47 55 58 40 70
   roar.pettersen@it.uib.no - IT-Avd, UiB - http://www.uib.no
Comment 7 John Baldwin freebsd_committer freebsd_triage 2009-02-03 17:55:29 UTC
On Tuesday 03 February 2009 12:32:47 pm Roar Pettersen wrote:
> Hello John !
> 
> > Looks to be a bug here.  Can you do 'frame 8' followed by 'l' in kgdb?
> 
> (kgdb) frame 8
> #8  0xffffffff8021d746 in bce_intr (xsc=Variable "xsc" is not available.
> ) at /usr/src/sys/dev/bce/if_bce.c:5748
> 5748                    sc->rx_mbuf_ptr[sw_rx_cons_idx] = NULL;
> 
> (kgdb) l
> 5743                    bus_dmamap_unload(sc->rx_mbuf_tag,
> 5744                        sc->rx_mbuf_map[sw_rx_cons_idx]);
> 5745
> 5746                    /* Remove the mbuf from the RX chain. */
> 5747                    m0 = sc->rx_mbuf_ptr[sw_rx_cons_idx];
> 5748                    sc->rx_mbuf_ptr[sw_rx_cons_idx] = NULL;
> 5749                    DBRUN(sc->debug_rx_mbuf_alloc--);
> 5750                    sc->free_rx_bd++;
> 5751
> 5752                    /*

Hmm, it shouldn't be faulting here. :(  Can you do 'p m0' 'p sw_rx_cons_idx' 
and 'p sc->rx_mbuf_ptr[sw_rx_cons_idx]'?  Also, 'x/i 0xffffffff8021d746'.

-- 
John Baldwin
Comment 8 roar.pettersen 2009-02-03 18:02:04 UTC
Hello !

> Hmm, it shouldn't be faulting here. :(  Can you do 'p m0' 'p sw_rx_cons_idx'
> and 'p sc->rx_mbuf_ptr[sw_rx_cons_idx]'?  Also, 'x/i 0xffffffff8021d746'.


(kgdb) p m0
No symbol "m0" in current context.
(kgdb) p sw_rx_cons_idx
No symbol "sw_rx_cons_idx" in current context.
(kgdb) p sc->rx_mbuf_ptr[sw_rx_cons_idx]
No symbol "sc" in current context.
(kgdb) x/i 0xffffffff8021d746
0xffffffff8021d746 <bce_intr+710>:      mov    0x10(%r14),%r10


Just re-loaded the system, uptime is now only 3-4 hours between each
manually reboot.


--
Med vennlig hilsen / Regards;

   Roar Pettersen
   Universitetet i Bergen -  The University of Bergen
   Nygardsgt. 5  -  N-5020 BERGEN  - Norway
   Tlf: +47 55 58 40 55  fax: +47 55 58 40 70
   roar.pettersen@it.uib.no - IT-Avd, UiB - http://www.uib.no
Comment 9 John Baldwin freebsd_committer freebsd_triage 2009-02-03 18:40:08 UTC
On Tuesday 03 February 2009 1:02:04 pm Roar Pettersen wrote:
> Hello !
> 
> > Hmm, it shouldn't be faulting here. :(  Can you do 'p m0' 'p 
sw_rx_cons_idx'
> > and 'p sc->rx_mbuf_ptr[sw_rx_cons_idx]'?  Also, 'x/i 0xffffffff8021d746'.
> 
> 
> (kgdb) p m0
> No symbol "m0" in current context.
> (kgdb) p sw_rx_cons_idx
> No symbol "sw_rx_cons_idx" in current context.
> (kgdb) p sc->rx_mbuf_ptr[sw_rx_cons_idx]
> No symbol "sc" in current context.

You have to be at 'frame 8' for these to work.

> (kgdb) x/i 0xffffffff8021d746
> 0xffffffff8021d746 <bce_intr+710>:      mov    0x10(%r14),%r10

Ok, so %r14 is presumably NULL.  Looking at the disassembly, I think m0 is 
NULL.  This is a bug in bce(4) of some sort.  You can try e-mailing davidch@.

-- 
John Baldwin
Comment 10 Andriy Gapon freebsd_committer freebsd_triage 2010-12-05 12:17:15 UTC
Is this still an issue?
Have you tried contacting davidch@ and/or discussing the bug on the appropriate
mailing lists (net@, stable@)?

-- 
Andriy Gapon
Comment 11 Andriy Gapon freebsd_committer freebsd_triage 2010-12-05 12:32:08 UTC
State Changed
From-To: open->suspended

Originator's email address seems to bounce, so I am not 
sure about the current status of this issue.
Comment 12 Konstantin 2011-03-01 12:56:46 UTC
We faced the same problem, see http://www.freebsd.org/cgi/query-pr.cgi?pr=155004

--
Konstantin Malov
External Services Group system administrator
Kaspersky Lab. - Russian Federation
Tel: +7(495)797-8700 (ext. 2867)
http://www.kaspersky.com<http://www.kaspersky.com/>
Comment 13 Jaakko Heinonen freebsd_committer freebsd_triage 2011-12-11 09:54:20 UTC
State Changed
From-To: suspended->closed

Apparently a duplicate of kern/155004.