Bug 153847 - [nfs] [panic] Kernel panic from incorrect m_free in nfs_getattr
Summary: [nfs] [panic] Kernel panic from incorrect m_free in nfs_getattr
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 7.3-RELEASE
Hardware: Any Any
: Normal Affects Only Me
Assignee: freebsd-fs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-01-10 14:00 UTC by martin
Modified: 2012-01-04 01:00 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description martin 2011-01-10 14:00:19 UTC
kgdb gives the output below from the dump:


Unread portion of the kernel message buffer:
<6>nfs server pid904@greig:/sp: is alive again


Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address	= 0x7273752f
fault code		= supervisor read, page not present
instruction pointer	= 0x20:0xc084c330
stack pointer	        = 0x28:0xe7d0984c
frame pointer	        = 0x28:0xe7d0985c
code segment		= base rx0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, def32 1, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 27611 (sh)
trap number		= 12
panic: page fault
cpuid = 0
Uptime: 23h58m43s
Physical memory: 2035 MB
Dumping 181 MB: 166 150 134 118 102
<6>nfs server pid904@greig:/nfs: not responding
 86 70 54 38

Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
fault virtual address	= 0x207366ce
fault code		= supervisor read, page not present
instruction pointer	= 0x20:0xc06616ed
stack pointer	        = 0x28:0xc51b9be0
frame pointer	        = 0x28:0xc51b9bf0
code segment		= base rx0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, def32 1, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 14 (swi4: clock sio)
trap number		= 12
 22 6



#0  doadump () at pcpu.h:196
#1  0xc07f8c57 in boot (howto=0x104) at /usr/src/sys/kern/kern_shutdown.c:418
#2  0xc07f8f29 in panic (fmt=Variable "fmt" is not available.
) at /usr/src/sys/kern/kern_shutdown.c:574
#3  0xc0b010dc in trap_fatal (frame=0xe7d0980c, eva=0x7273752f) at /usr/src/sys/i386/i386/trap.c:950
#4  0xc0b01360 in trap_pfault (frame=0xe7d0980c, usermode=0x0, eva=0x7273752f) at /usr/src/sys/i386/i386/trap.c:863
#5  0xc0b01d55 in trap (frame=0xe7d0980c) at /usr/src/sys/i386/i386/trap.c:541
#6  0xc0ae503b in calltrap () at /usr/src/sys/i386/i386/exception.s:166
#7  0xc084c330 in m_freem (mb=0x7273752f) at /usr/src/sys/kern/uipc_mbuf.c:162
#8  0xc09a4245 in nfs_getattr (ap=0xe7d09980) at /usr/src/sys/nfsclient/nfs_vnops.c:666
#9  0xc0b16172 in VOP_GETATTR_APV (vop=0xc0c8ab40, a=0xe7d09980) at vnode_if.c:530
#10 0xc09a83ac in nfs_lookup (ap=0xe7d09a98) at vnode_if.h:286
#11 0xc0b17b76 in VOP_LOOKUP_APV (vop=0xc0c8ab40, a=0xe7d09a98) at vnode_if.c:99
#12 0xc0872afb in lookup (ndp=0xe7d09ba8) at vnode_if.h:57
#13 0xc0873972 in namei (ndp=0xe7d09ba8) at /usr/src/sys/kern/vfs_lookup.c:234
#14 0xc0881c04 in kern_stat (td=0xc68a7000, path=0x28203288 <Address 0x28203288 out of bounds>, pathseg=UIO_USERSPACE, sbp=0xe7d09c18) at /usr/src/sys/kern/vfs_syscalls.c:2131
#15 0xc0881def in stat (td=0xc68a7000, uap=0xe7d09cfc) at /usr/src/sys/kern/vfs_syscalls.c:2115
#16 0xc0b016b5 in syscall (frame=0xe7d09d38) at /usr/src/sys/i386/i386/trap.c:1101
#17 0xc0ae50a0 in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:262
#18 0x00000033 in ?? ()

Fix: 

Last time I saw this (in 7.2) I got some way to tracking it down.  I think the problem is that nfs_request doesn't validate the value of error that it reads from the nfs packet:

	if (*tl == 0) {
		tl = nfsm_dissect(u_int32_t *, NFSX_UNSIGNED);
		if (*tl != 0) {
			error = fxdr_unsigned(int, *tl);

Things go wrong if this value has the NFSERR_RETERR bit set already (e.g. due to a bug in the nfs server).  In particular, when this value is returned to nfsm_request, the bit will be cleared and nfs_getattr will subsequently call m_freem(mrep) even though mrep hasn't been initialized by nfs_request.
How-To-Repeat: Not sure, but I think it has something to do with filesystems being run by am-utils.
Comment 1 Mark Linimon freebsd_committer freebsd_triage 2011-01-11 01:45:24 UTC
Responsible Changed
From-To: freebsd-bugs->freebsd-fs

Over to maintainer(s).
Comment 2 martin 2011-08-24 15:37:31 UTC
FTR, I just got this again with the latest 7.4 kernel:

FreeBSD 7.4-RELEASE #0: Thu Feb 17 03:51:56 UTC 2011

Unread portion of the kernel message buffer:
<6>nfs server pid947@greig:/sp: is alive again


Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
fault virtual address	= 0x819
fault code		= supervisor read, page not present
instruction pointer	= 0x20:0xc086fea0
stack pointer	        = 0x28:0xc53c9824
frame pointer	        = 0x28:0xc53c9834
code segment		= base rx0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, def32 1, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 19529 (ls)
trap number		= 12
panic: page fault
cpuid = 1
Uptime: 6d22h30m27s
Physical memory: 2035 MB
Dumping 257 MB: 242 226 210 194 178 162 146 130 114 98 82 66 50 34 18 2

Reading symbols from /boot/kernel/acpi.ko...Reading symbols from /boot/kernel/acpi.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/acpi.ko
#0  doadump () at pcpu.h:197
197	pcpu.h: No such file or directory.
	in pcpu.h
(kgdb) where
#0  doadump () at pcpu.h:197
#1  0xc081c693 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:421
#2  0xc081c967 in panic (fmt=Variable "fmt" is not available.
) at /usr/src/sys/kern/kern_shutdown.c:576
#3  0xc0b29a0c in trap_fatal (frame=0xc53c97e4, eva=2073)
    at /usr/src/sys/i386/i386/trap.c:950
#4  0xc0b29c90 in trap_pfault (frame=0xc53c97e4, usermode=0, eva=2073)
    at /usr/src/sys/i386/i386/trap.c:863
#5  0xc0b2a66c in trap (frame=0xc53c97e4) at /usr/src/sys/i386/i386/trap.c:541
#6  0xc0b0cf8b in calltrap () at /usr/src/sys/i386/i386/exception.s:166
#7  0xc086fea0 in m_freem (mb=0x819) at /usr/src/sys/kern/uipc_mbuf.c:162
#8  0xc09c8c95 in nfs_getattr (ap=0xc53c9968)
    at /usr/src/sys/nfsclient/nfs_vnops.c:664
#9  0xc0b3f102 in VOP_GETATTR_APV (vop=0xc0cb6f80, a=0xc53c9968)
    at vnode_if.c:530
#10 0xc09cb445 in nfs_lookup (ap=0xc53c9a90) at vnode_if.h:286
#11 0xc0b40b06 in VOP_LOOKUP_APV (vop=0xc0cb6f80, a=0xc53c9a90)
    at vnode_if.c:99
#12 0xc089685b in lookup (ndp=0xc53c9ba8) at vnode_if.h:57
#13 0xc08976de in namei (ndp=0xc53c9ba8) at /usr/src/sys/kern/vfs_lookup.c:234
#14 0xc08a58e4 in kern_stat (td=0xc5c396c0, 
    path=0x28212088 <Address 0x28212088 out of bounds>, 
    pathseg=UIO_USERSPACE, sbp=0xc53c9c18)
    at /usr/src/sys/kern/vfs_syscalls.c:2141
#15 0xc08a5acf in stat (td=0xc5c396c0, uap=0xc53c9cfc)
    at /usr/src/sys/kern/vfs_syscalls.c:2125
#16 0xc0b29fe5 in syscall (frame=0xc53c9d38)
    at /usr/src/sys/i386/i386/trap.c:1101
#17 0xc0b0cff0 in Xint0x80_syscall ()
    at /usr/src/sys/i386/i386/exception.s:262
#18 0x00000033 in ?? ()
Comment 3 dfilter service freebsd_committer freebsd_triage 2011-11-19 03:20:28 UTC
Author: rmacklem
Date: Sat Nov 19 03:20:15 2011
New Revision: 227690
URL: http://svn.freebsd.org/changeset/base/227690

Log:
  The old NFS client will crash due to the reply being m_freem()'d
  twice if the server bogusly returns an error with the NFSERR_RETERR
  bit (bit 31) set. No actual NFS error has this bit set, but it seems
  that amd will sometimes do this. This patch makes sure the NFSERR_RETERR
  bit is cleared to avoid a crash.
  
  PR:		kern/153847
  MFC after:	2 weeks

Modified:
  head/sys/nfsclient/nfs_krpc.c

Modified: head/sys/nfsclient/nfs_krpc.c
==============================================================================
--- head/sys/nfsclient/nfs_krpc.c	Sat Nov 19 00:20:28 2011	(r227689)
+++ head/sys/nfsclient/nfs_krpc.c	Sat Nov 19 03:20:15 2011	(r227690)
@@ -540,6 +540,11 @@ tryagain:
 				    hz);
 			goto tryagain;
 		}
+		/*
+		 * Make sure NFSERR_RETERR isn't bogusly set by a server
+		 * such as amd. (No actual NFS error has bit 31 set.)
+		 */
+		error &= ~NFSERR_RETERR;
 
 		/*
 		 * If the File Handle was stale, invalidate the lookup
_______________________________________________
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
Comment 4 dfilter service freebsd_committer freebsd_triage 2012-01-04 00:51:18 UTC
Author: rmacklem
Date: Wed Jan  4 00:51:05 2012
New Revision: 229451
URL: http://svn.freebsd.org/changeset/base/229451

Log:
  MFC: r227690
  The old NFS client will crash due to the reply being m_freem()'d
  twice if the server bogusly returns an error with the NFSERR_RETERR
  bit (bit 31) set. No actual NFS error has this bit set, but it seems
  that amd will sometimes do this. This patch makes sure the NFSERR_RETERR
  bit is cleared to avoid a crash.
  This is not exactly a merge, since the code is in sys/nfsclient/nfs_socket.c,
  which does not exist in head.
  
  Tested by:	martin at lispworks.com
  PR:		kern/153847

Modified:
  stable/7/sys/nfsclient/nfs_socket.c
Directory Properties:
  stable/7/sys/   (props changed)
  stable/7/sys/cddl/contrib/opensolaris/   (props changed)
  stable/7/sys/contrib/dev/acpica/   (props changed)
  stable/7/sys/contrib/pf/   (props changed)

Modified: stable/7/sys/nfsclient/nfs_socket.c
==============================================================================
--- stable/7/sys/nfsclient/nfs_socket.c	Wed Jan  4 00:24:09 2012	(r229450)
+++ stable/7/sys/nfsclient/nfs_socket.c	Wed Jan  4 00:51:05 2012	(r229451)
@@ -1351,6 +1351,12 @@ wait_for_pinned_req:
 				rep->r_xid = *xidp = txdr_unsigned(nfs_xid_gen());
 				goto tryagain;
 			}
+			/*
+			 * Make sure NFSERR_RETERR isn't bogusly set by a
+			 * server such as amd. (No actual NFS error has bit 31
+			 * set.)
+			 */
+			error &= ~NFSERR_RETERR;
 
 			/*
 			 * If the File Handle was stale, invalidate the
_______________________________________________
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
Comment 5 Rick Macklem freebsd_committer freebsd_triage 2012-01-04 00:53:59 UTC
State Changed
From-To: open->closed


r227690, which is MFC'd to stable/7 as r229451 should fix it so 
this crash will not occur.