I wrote a program to test the multi-process multi-threading performance of FreeBSD and libthr. In the program, I used two file locks to synchronize the processes and a mmap shared memory to do IPC. However, the program can cause kernel panic randomly. I used kgdb to check the resuling core dump. It showed kernel was trapped in kern_lockf.c line 294, where it apparently refers to a NULL pointer. Here is the code around line 294: 292: waitblock = (struct lockf *)td->td_wchan; 293: /* Get the owner of the blocking lock */ 294: waitblock = waitblock->lf_next; 295: if ((waitblock->lf_flags & F_POSIX) == 0) 296: break; 297: nproc = (struct proc *)waitblock->lf_id; How-To-Repeat: Run my program repeatedly. Sometimes, it will cause kernel panic.
State Changed From-To: open->feedback To submitter: please could you also provide the backtrace from kgdb? Also, is there any chance you can share the program that causes this panic?
Responsible Changed From-To: freebsd-i386->gavin Track
State Changed From-To: feedback->feedback To submitter: Can you please update to 7-STABLE and see if this problem still exists? The code in question was replaced shortly after 7.0-RELEASE and as a result may well already be fixed.
I have run into the same problem as the submitter. I have yet to upgrade to -STABLE, but will try that now and report on my findings. web01# kgdb kernel.debug /var/crash/vmcore.3 [GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"] GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd". Unread portion of the kernel message buffer: kernel trap 12 with interrupts disabled Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x0 fault code = supervisor read data, page not present instruction pointer = 0x8:0xffffffff80279c3f stack pointer = 0x10:0xffffffffb44919c0 frame pointer = 0x10:0xffffff000be7b8d0 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = resume, IOPL = 0 current process = 5093 (php-cgi) trap number = 12 panic: page fault cpuid = 0 Uptime: 49m50s Physical memory: 8183 MB Dumping 508 MB: 493 477 461 445 429 413 397 381 365 349 333 317 301 285 269 253 237 221 205 189 173 157 141 125 109 93 77 61 45 29 13 #0 doadump () at pcpu.h:194 194 __asm __volatile("movq %%gs:0,%0" : "=r" (td)); (kgdb) list *0xffffffff80279c3f 0xffffffff80279c3f is in lf_advlock (/usr/src/sys/kern/kern_lockf.c:295). 290 (td->td_wmesg == lockstr) && 291 (i++ < maxlockdepth)) { 292 waitblock = (struct lockf *)td->td_wchan; 293 /* Get the owner of the blocking lock */ 294 waitblock = waitblock->lf_next; 295 if ((waitblock->lf_flags & F_POSIX) == 0) 296 break; 297 nproc = (struct proc *)waitblock->lf_id; 298 if (nproc == (struct proc *)lock->lf_id) { 299 PROC_SUNLOCK(wproc); (kgdb) bt #0 doadump () at pcpu.h:194 #1 0x0000000000000004 in ?? () #2 0xffffffff80288319 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:409 #3 0xffffffff8028871d in panic (fmt=0x104 <Address 0x104 out of bounds>) at /usr/src/sys/kern/kern_shutdown.c:563 #4 0xffffffff803b95b4 in trap_fatal (frame=0xffffff0003cd3000, eva=18446742974256829544) at /usr/src/sys/amd64/amd64/trap.c:724 #5 0xffffffff803ba1ff in trap (frame=0xffffffffb4491910) at /usr/src/sys/amd64/amd64/trap.c:251 #6 0xffffffff8039ff4e in calltrap () at /usr/src/sys/amd64/amd64/exception.S:169 #7 0xffffffff80279c3f in lf_advlock (ap=Variable "ap" is not available. ) at /usr/src/sys/kern/kern_lockf.c:294 #8 0xffffffff8025fbdb in kern_fcntl (td=0xffffff0003cd3000, fd=Variable "fd" is not available. ) at vnode_if.h:1036 #9 0xffffffff8025ff9f in fcntl (td=0xffffff0003cd3000, uap=0xffffffffb4491be0) at /usr/src/sys/kern/kern_descrip.c:336 #10 0xffffffff803b9b9e in syscall (frame=0xffffffffb4491c70) at /usr/src/sys/amd64/amd64/trap.c:852 #11 0xffffffff803a015b in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:290 #12 0x000000080108607c in ?? () Previous frame inner to this frame (corrupt stack?)
For what it's worth, since upgrading to FBSD-STABLE last night, I have not run into the same problem. This of coarse being after several hours of heavy testing.
State Changed From-To: feedback->closed Feedback timeout (~9 months). I suspect this is resolved with the new lock manager code. Other similar panics have been confirmed as fixed with the new code.