Bug 15055

Summary: Soft NFS mounts can deadlock
Product: Base System Reporter: iedowse <iedowse>
Component: kernAssignee: freebsd-bugs (Nobody) <bugs>
Status: Closed FIXED    
Severity: Affects Only Me    
Priority: Normal    
Version: 3.3-STABLE   
Hardware: Any   
OS: Any   
Attachments:
Description Flags
file.diff none

Description iedowse 1999-11-23 03:00:01 UTC
	Under certain circumstances it is possible for multiple processes
	to reach a deadlock situation when accessing a soft-mount NFS
	filesystem. This problem is triggered when the NFS server becomes
	unavailable for a time, but the processes remain deadlocked even
	after the server comes back. If the mount is also interruptable
	(NFSMNT_INT or -i), then recovery is possible by killing some of
	the affected processes; otherwise a reboot is necessary.

	This problem results from an interaction between the NFS congestion
	window mechanism, and the way that soreceive()'s on the NFS socket
	are serialised.

	When the NFS server becomes unavailable and there are outstanding
	requests (new or old), the NFS congestion window quickly shrinks
	back to 1 RPC. Requests then fall into two catagories: (a) those
	that managed to get in and send a request before the window closed
	up (R_SENT flag set); and (b) those that missed the window, so are
	waiting for nfs_timer() to transmit their requests later.

	The deadlock occurs when a process with a category (b) request gets
	the receive lock, and subsequently all type (a) requests time out.
	No type (a) requests are transmitted since they have all timed
	out, and the congestion window disallows transmitting type (b)
	requests. The process holding the receive lock will not release it
	until it receives a NFS reply (for any request), but since there
	are no requests being transmitted, this never happens. The timed-
	out requests don't complete either since their processes are all
	waiting for the receive lock!

	If the mount is interruptable, then killing the type (b) process
	that currently holds the receive lock will release it. Then all
	the type (a) processes notice that their requests have timed out,
	and return.

Fix: Apply the following patch to sys/nfs/nfs_socket.c. This causes
	the count of outstanding requests to be decremented as soon as
	a request is marked as timed-out. When all type (a) requests
	have timed out, the congestion window will allow another request
	to be transmitted, so the deadlock is avoided.

	Note that while this patch solves the deadlock problem, the code
	still does not guarantee that a process will be made aware quickly
	that its request has timed out. That would require nfs_timer() to
	set some flag in the nfsmount struct, instructing the current holder
	of the receive lock to release it as soon as possible. I'm not sure
	that such a mechanism would be worth the effort. With this patch the
	process will find out eventually (it doesn't need to wait for the
	server to come back) about a timeout, and all waiting processes will
	respond quickly when the server does return.
How-To-Repeat: 
	mount -o -s,-i someserver:/fs /mnt
	
	# Lots of accesses to push down the NFS RTT estimates
	find /mnt -print > /dev/null

	#  *** Disconnect the server from the client ***

	# Make some type (a) processes
	ls -l /mnt &; ls -l /mnt &; ls -l /mnt &; ls -l /mnt &
	sleep 5
	# Now that the congestion window has closed these will be type (b)
	df /mnt &; df /mnt &; df /mnt &; df /mnt &

	Then wait for a few 'nfs server not responding' errors, and wait
	for the NFS traffic to stop completely with one of the df processes
	waiting on 'sbwait'. When this happens, reconnecting the server
	will not unwedge the processes, but killing the df in 'sbwait' will.
Comment 1 Matt Dillon freebsd_committer freebsd_triage 1999-12-13 04:25:41 UTC
State Changed
From-To: open->closed

Patch committed to -current and am expecting permission to MFC to -stable.