I have 3 apache22-itk web servers with DOCUMENT_ROOT shared over NFS. Sometimes i get kernel panic: Fatal trap 12: page fault while in kernel mode cpuid = 3; apic id = 03 fault virtual address = 0x10 fault code = supervisor write, page not present instruction pointer = 0x20:0xc0aa3236 stack pointer = 0x28:0xea1ae528 frame pointer = 0x28:0xea1ae5f4 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 78988 (httpd) trap number = 12 panic: page fault cpuid = 3 KDB: stack backtrace: #0 0xc08e0d07 at kdb_backtrace+0x47 #1 0xc08b1dc7 at panic+0x117 #2 0xc0be4b43 at trap_fatal+0x323 #3 0xc0be4dc0 at trap_pfault+0x270 #4 0xc0be5305 at trap+0x465 #5 0xc0bcbebc at calltrap+0x6 #6 0xc0aa89c7 at clnt_call_private+0xf7 #7 0xc0a97dcb at nlm_get_rpc+0x19b #8 0xc0a98379 at nlm_host_get_rpc+0x169 #9 0xc0a949eb at nlm_clearlock+0xeb #10 0xc0a95d2a at nlm_advlock_internal+0x9ca #11 0xc0a9651a at nlm_advlock+0x3a #12 0xc0a80239 at nfs_advlock+0xa9 #13 0xc0c038c7 at VOP_ADVLOCK_APV+0x47 #14 0xc0875dee at closef+0xfe #15 0xc087653f at kern_close+0x17f #16 0xc087661a at close+0x1a #17 0xc08eca39 at syscallenter+0x329 Uptime: 6d22h54m7s Physical memory: 3059 MB Dumping 335 MB: 320 304 288 272 256 240 224 208 192 176 160 144 128 112 96 80 64 48 32 16
Responsible Changed From-To: freebsd-bugs->freebsd-fs Over to maintainer(s).
All I am seeing a similar crash on 7.3-RELEASE-p2 amd64 when using apache-1.3.34 with accf_httpd and a nfs docroot The servers that have crashed are all FreeBSD 7.3-RELEASE amd64. Hardware is HP Dl145 g2 They have 2G of ram and 2G swap with one single core opteron cpu. We are using the following sysctls . kern.ipc.maxsockbuf=2097152 kern.ipc.nmbclusters=32768 kern.ipc.somaxconn=1024 kern.maxfiles=131072 kern.maxfilesperproc=32768 net.inet.tcp.inflight.enable=0 net.inet.tcp.path_mtu_discovery=0 net.inet.tcp.recvbuf_inc=524288 net.inet.tcp.recvbuf_max=8388608 net.inet.tcp.recvspace=32768 net.inet.tcp.sendbuf_inc=16384 net.inet.tcp.sendbuf_max=8388608 net.inet.tcp.sendspace=32768 net.inet.udp.recvspace=42080 net.isr.direct=1 vm.pmap.shpgperproc=600 Up time prior to the crash was not the other system was up for 11 days this one was 6 days. Here is the contents of my crash [root@web29 /var/crash]# kgdb /boot/kernel/kernel /var/crash/vmcore.0 GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd"... Unread portion of the kernel message buffer: Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x258 fault code = supervisor read data, page not present instruction pointer = 0x8:0xffffffff8051a66d stack pointer = 0x10:0xffffff803e69b1c0 frame pointer = 0x10:0xffffff0001b50ae0 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 9336 (libhttpd.ep) trap number = 12 panic: page fault cpuid = 0 Uptime: 6d5h18m39s Physical memory: 2034 MB Dumping 1451 MB: 1436 1420 1404 1388 1372 1356 1340 1324 1308 1292 1276 1260 1244 1228 1212 1196 1180 1164 1148 1132 1116 1100 1084 1068 1052 1036 1020 1004 988 972 956 940 924 908 892 876 860 844 828 812 796 780 764 748 732 716 700 684 668 652 636 620 604 588 572 556 540 524 508 492 476 460 444 428 412 396 380 364 348 332 316 300 284 268 252 236 220 204 188 172 156 140 124 108 92 76 60 44 28 12 Reading symbols from /boot/kernel/accf_http.ko...Reading symbols from /boot/kernel/accf_http.ko.symbols...done. done. Loaded symbols for /boot/kernel/accf_http.ko #0 doadump () at pcpu.h:195 195 pcpu.h: No such file or directory. in pcpu.h (kgdb) bt #0 doadump () at pcpu.h:195 #1 0x0000000000000004 in ?? () #2 0xffffffff805285f9 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:418 #3 0xffffffff80528a02 in panic (fmt=0x104 <Address 0x104 out of bounds>) at /usr/src/sys/kern/kern_shutdown.c:574 #4 0xffffffff807ec813 in trap_fatal (frame=0xffffff0001b50ae0, eva=Variable "eva" is not available. ) at /usr/src/sys/amd64/amd64/trap.c:777 #5 0xffffffff807ecbe5 in trap_pfault (frame=0xffffff803e69b110, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:693 #6 0xffffffff807ed50c in trap (frame=0xffffff803e69b110) at /usr/src/sys/amd64/amd64/trap.c:464 #7 0xffffffff807d614e in calltrap () at /usr/src/sys/amd64/amd64/exception.S:218 #8 0xffffffff8051a66d in _mtx_lock_sleep (m=0xffffff002f3d7a80, tid=18446742974226565856, opts=Variable "opts" is not available. ) at /usr/src/sys/kern/kern_mutex.c:339 #9 0xffffffff80701f60 in clnt_dg_create (so=0xffffff00017755a0, svcaddr=0xffffff803e69b310, program=100000, version=4, sendsz=Variable "sendsz" is not available. ) at /usr/src/sys/rpc/clnt_dg.c:259 #10 0xffffffff806e97c9 in nlm_get_rpc (sa=Variable "sa" is not available. ) at /usr/src/sys/nlm/nlm_prot_impl.c:327 #11 0xffffffff806e9d39 in nlm_host_get_rpc (host=0xffffff0001705000) at /usr/src/sys/nlm/nlm_prot_impl.c:1199 #12 0xffffffff806e680f in nlm_clearlock (host=0xffffff0001705000, ext=0xffffff803e69b9a0, vers=4, timo=0xffffff803e69b9d0, retries=2147483647, vp=0xffffff004881edc8, op=2, fl=0xffffff803e69bac0, flags=64, svid=9336, fhlen=32, fh=0xffffff803e69b750, size=689) at /usr/src/sys/nlm/nlm_advlock.c:943 #13 0xffffffff806e7801 in nlm_advlock_internal (vp=0xffffff004881edc8, id=Variable "id" is not available. ) at /usr/src/sys/nlm/nlm_advlock.c:355 #14 0xffffffff806e8166 in nlm_advlock (ap=Variable "ap" is not available. ) at /usr/src/sys/nlm/nlm_advlock.c:392 #15 0xffffffff806ced28 in nfs_advlock (ap=0xffffff803e69ba90) at /usr/src/sys/nfsclient/nfs_vnops.c:3153 #16 0xffffffff804f40e2 in closef (fp=0xffffff0073716d80, td=0xffffff0001b50ae0) at vnode_if.h:1036 #17 0xffffffff804f462b in kern_close (td=0xffffff0001b50ae0, fd=Variable "fd" is not available. ) at /usr/src/sys/kern/kern_descrip.c:1125 #18 0xffffffff807ece67 in syscall (frame=0xffffff803e69bc80) at /usr/src/sys/amd64/amd64/trap.c:920 #19 0xffffffff807d635b in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:339 #20 0x00000008009c5b1c in ?? () Previous frame inner to this frame (corrupt stack?) -- mark saad | nonesuch@longcount.org
State Changed From-To: open->feedback I have sent the person that reported this a patch to test and am waiting for feedback. I've taken responsibility for this.
Responsible Changed From-To: freebsd-fs->rmacklem I have sent the person that reported this a patch for testing and will update the status when I hear back from them.
Author: rmacklem Date: Thu Nov 3 14:38:03 2011 New Revision: 227059 URL: http://svn.freebsd.org/changeset/base/227059 Log: Both a crash reported on freebsd-current on Oct. 18 under the subject heading "mtx_lock() of destroyed mutex on NFS" and PR# 156168 appear to be caused by clnt_dg_destroy() closing down the socket prematurely. When to close down the socket is controlled by a reference count (cs_refs), but clnt_dg_create() checks for sb_upcall being non-NULL to decide if a new socket is needed. I believe the crashes were caused by the following race: clnt_dg_destroy() finds cs_refs == 0 and decides to delete socket clnt_dg_destroy() then loses race with clnt_dg_create() for acquisition of the SOCKBUF_LOCK() clnt_dg_create() finds sb_upcall != NULL and increments cs_refs to 1 clnt_dg_destroy() then acquires SOCKBUF_LOCK(), sets sb_upcall to NULL and destroys socket This patch fixes the above race by changing clnt_dg_destroy() so that it acquires SOCKBUF_LOCK() before testing cs_refs. Tested by: bz PR: 156168 Reviewed by: dfr MFC after: 2 weeks Modified: head/sys/rpc/clnt_dg.c Modified: head/sys/rpc/clnt_dg.c ============================================================================== --- head/sys/rpc/clnt_dg.c Thu Nov 3 14:36:56 2011 (r227058) +++ head/sys/rpc/clnt_dg.c Thu Nov 3 14:38:03 2011 (r227059) @@ -1001,12 +1001,12 @@ clnt_dg_destroy(CLIENT *cl) cs = cu->cu_socket->so_rcv.sb_upcallarg; clnt_dg_close(cl); + SOCKBUF_LOCK(&cu->cu_socket->so_rcv); mtx_lock(&cs->cs_lock); cs->cs_refs--; if (cs->cs_refs == 0) { mtx_unlock(&cs->cs_lock); - SOCKBUF_LOCK(&cu->cu_socket->so_rcv); soupcall_clear(cu->cu_socket, SO_RCV); clnt_dg_upcallsdone(cu->cu_socket, cs); SOCKBUF_UNLOCK(&cu->cu_socket->so_rcv); @@ -1015,6 +1015,7 @@ clnt_dg_destroy(CLIENT *cl) lastsocketref = TRUE; } else { mtx_unlock(&cs->cs_lock); + SOCKBUF_UNLOCK(&cu->cu_socket->so_rcv); lastsocketref = FALSE; } _______________________________________________ svn-src-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
State Changed From-To: feedback->closed I believe this bug is fixed by r227059 which has been MFC'd to stable/8 r227601.
Author: rmacklem Date: Tue Nov 22 01:32:57 2011 New Revision: 227810 URL: http://svn.freebsd.org/changeset/base/227810 Log: MFC: r227059 Both a crash reported on freebsd-current on Oct. 18 under the subject heading "mtx_lock() of destroyed mutex on NFS" and PR# 156168 appear to be caused by clnt_dg_destroy() closing down the socket prematurely. When to close down the socket is controlled by a reference count (cs_refs), but clnt_dg_create() checks for sb_upcall being non-NULL to decide if a new socket is needed. I believe the crashes were caused by the following race: clnt_dg_destroy() finds cs_refs == 0 and decides to delete socket clnt_dg_destroy() then loses race with clnt_dg_create() for acquisition of the SOCKBUF_LOCK() clnt_dg_create() finds sb_upcall != NULL and increments cs_refs to 1 clnt_dg_destroy() then acquires SOCKBUF_LOCK(), sets sb_upcall to NULL and destroys socket This patch fixes the above race by changing clnt_dg_destroy() so that it acquires SOCKBUF_LOCK() before testing cs_refs. This is a slightly modified patch for stable/7. It fixes the above race, although others still exist, since some patches such as r193272 cannot be MFC'd. Tested by: nonesuch at longcount.org (Mark Saad) PR: kern/156168 Modified: stable/7/sys/rpc/clnt_dg.c Directory Properties: stable/7/sys/ (props changed) stable/7/sys/cddl/contrib/opensolaris/ (props changed) stable/7/sys/contrib/dev/acpica/ (props changed) stable/7/sys/contrib/pf/ (props changed) Modified: stable/7/sys/rpc/clnt_dg.c ============================================================================== --- stable/7/sys/rpc/clnt_dg.c Tue Nov 22 00:35:30 2011 (r227809) +++ stable/7/sys/rpc/clnt_dg.c Tue Nov 22 01:32:57 2011 (r227810) @@ -811,18 +811,22 @@ clnt_dg_destroy(CLIENT *cl) while (cu->cu_threads) msleep(cu, &cs->cs_lock, 0, "rpcclose", 0); + mtx_unlock(&cs->cs_lock); + SOCKBUF_LOCK(&cu->cu_socket->so_rcv); + mtx_lock(&cs->cs_lock); cs->cs_refs--; if (cs->cs_refs == 0) { - mtx_destroy(&cs->cs_lock); - SOCKBUF_LOCK(&cu->cu_socket->so_rcv); + mtx_unlock(&cs->cs_lock); cu->cu_socket->so_upcallarg = NULL; cu->cu_socket->so_upcall = NULL; cu->cu_socket->so_rcv.sb_flags &= ~SB_UPCALL; SOCKBUF_UNLOCK(&cu->cu_socket->so_rcv); + mtx_destroy(&cs->cs_lock); mem_free(cs, sizeof(*cs)); lastsocketref = TRUE; } else { mtx_unlock(&cs->cs_lock); + SOCKBUF_UNLOCK(&cu->cu_socket->so_rcv); lastsocketref = FALSE; } _______________________________________________ svn-src-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"