Bug 15222

Summary: mbuf leak in nfs_srvcache.c
Product: Base System Reporter: iedowse <iedowse>
Component: kernAssignee: Matt Dillon <dillon>
Status: Closed FIXED    
Severity: Affects Only Me    
Priority: Normal    
Version: 3.3-STABLE   
Hardware: Any   
OS: Any   
Attachments:
Description Flags
file.diff none

Description iedowse 1999-12-02 18:00:01 UTC
	FreeBSD maintains a cache of some recent NFS server requests and
	their replies. This allows duplicate requests to be discarded while
	the requested operation is still in progress, and avoids the need
	to redo an operation if the reply gets lost. The default maximum
	cache size is 64 entries.

	When a request is received, nfsrv_getcache() either finds an existing
	cache entry, or creates a new one. After the operation completes,
	nfsrv_updatecache() updates the cache entry, sometimes including
	a copy of the reply mbuf chain.

	The problem is that nfsrv_updatecache() assumes it will called only
	once for a given cache entry, so it does not check if the cache entry
	already has an associated mbuf chain. Normally this assumption is
	correct, but under certain high-load situations, the following can
	occur:
		1)	A request R is received by the server, say a write
			with the 'filesync' option.
		2)	64 other requests are received before R completes.
			The cache entry for R gets flushed out.
		3)	A retransmit of R is received (R1); since the cache
			entry is gone, a new cache entry is created, and
			execution of R1 begins.
		4)	R completes, and a copy of the reply mbuf chain gets
			saved by nfsrv_updatecache().
		5)	R1 completes. nfsrv_updatecache() replaces the cache
			entry with a copy of R1's reply, but doesn't free
			the original mbuf chain.

	This problem was leaking 200-500 mbufs each day on our busiest
	NFS server.

	There is another leak in nfsrv_cleancache(), but I think this will
	only be triggered if all the nfsd's get killed. The patch below
	fixes both of these problems.

	As commented in the patch, this problem only occurs when the cache
	is too small, so it might be desirable to increase the cache size
	(double it maybe) when nfsrv_updatecache() finds a RC_DONE entry.
	Recording the event in the nfs stats structure and having a sysctl
	to increase the cache size would be another option. I can submit
	patches for either.

How-To-Repeat: 	
	1)	Keep the disk on the NFS server busy - for example run
			find /whatever -type f |xargs cat > /dev/null
	2)	Send an NFS filesync write request for a file 'file1'
	3)	Send 64 access requests for another file 'file2' each
		with a different xid, to flush the cache.
	4)	Retransmit the NFS filesync write (with the same xid).
	5)	Wait for the two replies to the writes.

	This should leak just a few mbufs, so it needs to be repeated a
	number of times before the leak becomes apparent.
Comment 1 Sheldon Hearn freebsd_committer freebsd_triage 1999-12-03 10:45:09 UTC
Responsible Changed
From-To: freebsd-bugs->dillon

Over to our famed NFS maintainer. :-) 
Comment 2 Matt Dillon freebsd_committer freebsd_triage 1999-12-13 17:10:59 UTC
State Changed
From-To: open->closed

The patch has been committed to 4.x and 3.x.  Nice find Ian!