Bug 160198

Summary: [rpc] amd + NFS reconnect = ICMP storm + unkillable process + hung amd mount.
Product: Base System Reporter: Artem Belevich <art>
Component: kernAssignee: Artem Belevich <art>
Status: Closed FIXED    
Severity: Affects Only Me    
Priority: Normal    
Version: 8.2-STABLE   
Hardware: Any   
OS: Any   

Description Artem Belevich freebsd_committer freebsd_triage 2011-08-26 07:10:08 UTC
When a process is interrupted during NFS reconnect which uses
UDP, the process gets stuck in an unkillable state.

In my particular case NFS connection is to the amd process on
the localhost. Continuous reconnects result in a
self-inflicted DoS attack on the amd which renders it
unresponsive which hangs all other processes that access
amd-mounted filesystems. As a side effect we also generate
rather high rate of ICMP port unreachable replies. All in all
the system ends up being virtually unavailable and in many
cases it requires reboot to get it out of this state.

The stuck process always has clnt_reconnect_call() in its backtrace:

	18779 100511 collect2         -                
	mi_switch+0x176
	turnstile_wait+0x1cb 
	_mtx_lock_sleep+0xe1 
	sleepq_catch_signals+0x386
	sleepq_timedwait_sig+0x19 
	_sleep+0x1b1 
	clnt_dg_call+0x7e6
	clnt_reconnect_call+0x12e 
	nfs_request+0x212 
	nfs_getattr+0x2e4
	VOP_GETATTR_APV+0x44 
	nfs_bioread+0x42a 
	VOP_READLINK_APV+0x4a
	namei+0x4f9 
	kern_statat_vnhook+0x92 
	kern_statat+0x15
	freebsd32_stat+0x2e 
	syscallenter+0x23d

Fix: 

clnt_dg_call() uses msleep() which may return ERESTART when
current process is interrupted. In that happens we return to
clnt_reconnect_call with RPC_CANTRECV. clnt_reconnect_call()
handles RPC_CANTRECV by trying to reconnect again and the
story repeats. Because current code never returns to the
userland, it never quits and gets stuck, in most cases,
forever.

The fix is to convert ERESTART to RPC_INTR which is what's
done in other places where it's handled in RPC code.
How-To-Repeat: In my case the problem most frequently occurs when a parallel
build that touches amd-mounted filesystem is interrupted.
Comment 1 Artem Belevich freebsd_committer freebsd_triage 2011-08-26 07:13:18 UTC
Responsible Changed
From-To: freebsd-bugs->art

Mine.
Comment 2 dfilter service freebsd_committer freebsd_triage 2011-08-28 19:09:31 UTC
Author: art
Date: Sun Aug 28 18:09:17 2011
New Revision: 225234
URL: http://svn.freebsd.org/changeset/base/225234

Log:
  Make sure RPC calls over UDP return RPC_INTR status is the process has
  been interrupted in a restartable syscall. Otherwise we could end up
  in an (almost) endless loop in clnt_reconnect_call().
  
  PR: kern/160198
  Reviewed by: rmacklem
  Approved by: re (kib), avg (mentor)
  MFC after: 1 week

Modified:
  head/sys/rpc/clnt_dg.c

Modified: head/sys/rpc/clnt_dg.c
==============================================================================
--- head/sys/rpc/clnt_dg.c	Sun Aug 28 16:11:24 2011	(r225233)
+++ head/sys/rpc/clnt_dg.c	Sun Aug 28 18:09:17 2011	(r225234)
@@ -467,7 +467,10 @@ send_again:
 		    cu->cu_waitflag, "rpccwnd", 0);
 		if (error) {
 			errp->re_errno = error;
-			errp->re_status = stat = RPC_CANTSEND;
+			if (error == EINTR || error == ERESTART)
+				errp->re_status = stat = RPC_INTR;
+			else
+				errp->re_status = stat = RPC_CANTSEND;
 			goto out;
 		}
 	}
@@ -636,7 +639,7 @@ get_reply:
 		 */
 		if (error != EWOULDBLOCK) {
 			errp->re_errno = error;
-			if (error == EINTR)
+			if (error == EINTR || error == ERESTART)
 				errp->re_status = stat = RPC_INTR;
 			else
 				errp->re_status = stat = RPC_CANTRECV;
_______________________________________________
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
Comment 3 dfilter service freebsd_committer freebsd_triage 2011-09-05 07:54:22 UTC
Author: art
Date: Mon Sep  5 06:54:13 2011
New Revision: 225384
URL: http://svn.freebsd.org/changeset/base/225384

Log:
  MFC r225234:
  
  Make sure RPC calls over UDP return RPC_INTR status if the process has
  been interrupted in a restartable syscall. Otherwise we could end up
  in an (almost) endless loop in clnt_reconnect_call().
  
  PR: kern/160198
  Reviewed by: rmacklem
  Approved by: avg (mentor)

Modified:
  stable/8/sys/rpc/clnt_dg.c
Directory Properties:
  stable/8/sys/   (props changed)
  stable/8/sys/amd64/include/xen/   (props changed)
  stable/8/sys/cddl/contrib/opensolaris/   (props changed)
  stable/8/sys/contrib/dev/acpica/   (props changed)
  stable/8/sys/contrib/pf/   (props changed)

Modified: stable/8/sys/rpc/clnt_dg.c
==============================================================================
--- stable/8/sys/rpc/clnt_dg.c	Mon Sep  5 06:11:17 2011	(r225383)
+++ stable/8/sys/rpc/clnt_dg.c	Mon Sep  5 06:54:13 2011	(r225384)
@@ -467,7 +467,10 @@ send_again:
 		    cu->cu_waitflag, "rpccwnd", 0);
 		if (error) {
 			errp->re_errno = error;
-			errp->re_status = stat = RPC_CANTSEND;
+			if (error == EINTR || error == ERESTART)
+				errp->re_status = stat = RPC_INTR;
+			else
+				errp->re_status = stat = RPC_CANTSEND;
 			goto out;
 		}
 	}
@@ -636,7 +639,7 @@ get_reply:
 		 */
 		if (error != EWOULDBLOCK) {
 			errp->re_errno = error;
-			if (error == EINTR)
+			if (error == EINTR || error == ERESTART)
 				errp->re_status = stat = RPC_INTR;
 			else
 				errp->re_status = stat = RPC_CANTRECV;
_______________________________________________
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
Comment 4 Artem Belevich freebsd_committer freebsd_triage 2011-09-05 08:10:40 UTC
State Changed
From-To: open->closed

Fix committed to head and MFC'ed to -8.