Bug 33897

Summary: rpc.lockd problems on server
Product: Base System Reporter: Thomas Quinot <thomas>
Component: binAssignee: Alfred Perlstein <alfred>
Status: Closed FIXED    
Severity: Affects Only Me CC: thomas
Priority: Normal    
Version: 5.0-CURRENT   
Hardware: Any   
OS: Any   

Description Thomas Quinot 2002-01-14 21:20:00 UTC
	Since my last -CURRENT update, rpc.lockd dumps core every now
	and then, and procmaiol running on a Solaris NFS client
	hangs when trying to deliver mail to a mailbox on myFreeBSD server.

Fix: 

None known so far.
How-To-Repeat: 	procmail delivery from Solaris client to FreeBSD 5-CURRENT server.
Comment 1 Thomas Quinot 2002-01-15 09:09:04 UTC
I was able to get a core dump from a binary with debugging symbols.
Here we go:

Script started on Tue Jan 15 10:05:27 2002
# gdb rpc.lockd -c /rpc.lockd.core
GNU gdb 4.18
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-unknown-freebsd"...
Core was generated by `rpc.lockd'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /usr/lib/librpcsvc.so.2...done.
Reading symbols from /usr/lib/libutil.so.3...done.
Reading symbols from /usr/lib/libc.so.5...done.
Reading symbols from /usr/libexec/ld-elf.so.1...done.
#0  0x804db24 in retry_blockingfilelocklist ()
    at /usr/src/usr.sbin/rpc.lockd/lockd_lock.c:1263
1263				LIST_INSERT_BEFORE(nfl, ifl, nfslocklist);
(gdb) print nfl
$1 = (struct file_lock *) 0x0
(gdb) print ifl
$2 = (struct file_lock *) 0x8065800
(gdb) print nfslocklist
No symbol "nfslocklist" in current context.
(gdb) bt
#0  0x804db24 in retry_blockingfilelocklist ()
    at /usr/src/usr.sbin/rpc.lockd/lockd_lock.c:1263
#1  0x804de5d in unlock_partialfilelock (fl=0xbfbff1ec)
    at /usr/src/usr.sbin/rpc.lockd/lockd_lock.c:1511
#2  0x804e2ad in do_unlock (fl=0xbfbff1ec)
    at /usr/src/usr.sbin/rpc.lockd/lockd_lock.c:1767
#3  0x804e5e1 in unlock (lock=0xbfbff6cc, flags=2)
    at /usr/src/usr.sbin/rpc.lockd/lockd_lock.c:1946
#4  0x804c310 in nlm4_unlock_4_svc (arg=0xbfbff6c4, rqstp=0xbfbffc24)
    at /usr/src/usr.sbin/rpc.lockd/lock_proc.c:1114
#5  0x804ac36 in nlm_prog_4 (rqstp=0xbfbffc24, transp=0x805c000)
    at nlm_prot_svc.c:434
#6  0x280d5c25 in svc_getreq_common () from /usr/lib/libc.so.5
#7  0x280d5a28 in svc_getreqset () from /usr/lib/libc.so.5
#8  0x2809fff4 in svc_run () from /usr/lib/libc.so.5
#9  0x804afdc in main (argc=1, argv=0xbfbffdd8)
    at /usr/src/usr.sbin/rpc.lockd/lockd.c:207
#10 0x80498eb in _start ()
(gdb) quit

Script done on Tue Jan 15 10:05:58 2002

Hope this helps,
Thomas.
Comment 2 Sheldon Hearn freebsd_committer freebsd_triage 2002-01-15 09:54:24 UTC
Responsible Changed
From-To: freebsd-bugs->alfred

Over to our rpc.lockd maintainer.
Comment 3 mikem 2002-01-16 00:44:00 UTC
Alfred, I took a look at retry_blockingfilelocklist() and the solution seemed simple enough. Please correct me if I am wrong. It seems said routine doesn't take into account boundary conditions when putting back file_lock entries into the blocked lock-list. Specifically, it fails when the file_lock being put back is the last element in the list, and when it is the only element in the list. I've included a patch below. 

Basically, it introduces another variable: pfl, which keeps track of the list item before ifl. That way if nfl is NULL, ifl gets inserted after pfl. If pfl is also NULL, then it gets inserted at the head of the list (since it was the only element in the list).

Thomas, could you give it a try and see if it solves your problems?


cheers,
mike makonnen

Index: rpc.lockd/lockd_lock.c
===================================================================
RCS file: /FreeBSD/ncvs/src/usr.sbin/rpc.lockd/lockd_lock.c,v
retrieving revision 1.6
diff -u -r1.6 lockd_lock.c
--- rpc.lockd/lockd_lock.c	2 Dec 2001 11:10:46 -0000	1.6
+++ rpc.lockd/lockd_lock.c	15 Jan 2002 21:37:16 -0000
@@ -1226,11 +1226,12 @@
 retry_blockingfilelocklist(void)
 {
 	/* Retry all locks in the blocked list */
-	struct file_lock *ifl, *nfl; /* Iterator */
+	struct file_lock *ifl, *nfl, *pfl; /* Iterator */
 	enum partialfilelock_status pflstatus;
 
 	debuglog("Entering retry_blockingfilelocklist\n");
 
+	pfl = NULL;
 	ifl = LIST_FIRST(&blockedlocklist_head);
 	debuglog("Iterator choice %p\n",ifl);
 
@@ -1241,6 +1242,7 @@
 		 */
 		nfl = LIST_NEXT(ifl, nfslocklist);
 		debuglog("Iterator choice %p\n",ifl);
+		debuglog("Prev iterator choice %p\n",pfl);
 		debuglog("Next iterator choice %p\n",nfl);
 
 		/*
@@ -1260,11 +1262,24 @@
 		} else {
 			/* Reinsert lock back into same place in blocked list */
 			debuglog("Replacing blocked lock\n");
-			LIST_INSERT_BEFORE(nfl, ifl, nfslocklist);
+			if (nfl == NULL)
+				/* ifl is the last elem. in the list */
+				if (pfl == NULL)
+					/* ifl is the only elem. in the list */
+					LIST_INSERT_HEAD(&blockedlocklist_head, ifl, nfslocklist);
+				else
+					LIST_INSERT_AFTER(pfl, ifl, nfslocklist);
+			else
+				LIST_INSERT_BEFORE(nfl, ifl, nfslocklist);
 		}
 
 		/* Valid increment behavior regardless of state of ifl */
 		ifl = nfl;
+		/* if a lock was granted incrementing pfl would make it nfl */
+		if (pfl != NULL && (LIST_NEXT(pfl, nfslocklist) != nfl))
+			pfl = LIST_NEXT(pfl, nfslocklist);
+		else
+			pfl = LIST_FIRST(&blockedlocklist_head);
 	}
 
 	debuglog("Exiting retry_blockingfilelocklist\n");

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-bugs" in the body of the message
Comment 4 mikem 2002-01-16 04:44:05 UTC
Thomas, please use this patch instead.
The previous patch was correct, but this one is cleaner :)

cheers,
mike makonnen

Index: rpc.lockd/lockd_lock.c
===================================================================
RCS file: /FreeBSD/ncvs/src/usr.sbin/rpc.lockd/lockd_lock.c,v
retrieving revision 1.6
diff -u -r1.6 lockd_lock.c
--- rpc.lockd/lockd_lock.c	2 Dec 2001 11:10:46 -0000	1.6
+++ rpc.lockd/lockd_lock.c	16 Jan 2002 04:16:51 -0000
@@ -1226,11 +1226,12 @@
 retry_blockingfilelocklist(void)
 {
 	/* Retry all locks in the blocked list */
-	struct file_lock *ifl, *nfl; /* Iterator */
+	struct file_lock *ifl, *nfl, *pfl; /* Iterator */
 	enum partialfilelock_status pflstatus;
 
 	debuglog("Entering retry_blockingfilelocklist\n");
 
+	pfl = NULL;
 	ifl = LIST_FIRST(&blockedlocklist_head);
 	debuglog("Iterator choice %p\n",ifl);
 
@@ -1241,6 +1242,7 @@
 		 */
 		nfl = LIST_NEXT(ifl, nfslocklist);
 		debuglog("Iterator choice %p\n",ifl);
+		debuglog("Prev iterator choice %p\n",pfl);
 		debuglog("Next iterator choice %p\n",nfl);
 
 		/*
@@ -1260,11 +1262,20 @@
 		} else {
 			/* Reinsert lock back into same place in blocked list */
 			debuglog("Replacing blocked lock\n");
-			LIST_INSERT_BEFORE(nfl, ifl, nfslocklist);
+			if (pfl != NULL)
+				LIST_INSERT_AFTER(pfl, ifl, nfslocklist);
+			else
+				/* ifl is the only elem. in the list */
+				LIST_INSERT_HEAD(&blockedlocklist_head, ifl, nfslocklist);
 		}
 
 		/* Valid increment behavior regardless of state of ifl */
 		ifl = nfl;
+		/* if a lock was granted incrementing pfl would make it nfl */
+		if (pfl != NULL && (LIST_NEXT(pfl, nfslocklist) != nfl))
+			pfl = LIST_NEXT(pfl, nfslocklist);
+		else
+			pfl = LIST_FIRST(&blockedlocklist_head);
 	}
 
 	debuglog("Exiting retry_blockingfilelocklist\n");
Comment 5 Thomas Quinot 2002-01-16 13:28:05 UTC
Le 2002-01-15, Mike Makonnen écrivait :

> Thomas, could you give it a try and see if it solves your problems?

Looks way better. I have just patched and rebooted with the new
rpc.lockd and with 15 minutes uptime, I have not had a rpc.lockd core
dump yet. I was able to open mailboxes and deliver email from a
Solaris NFS client. I'll follow up after a few further hours of testing.

Thanks for your prompt help!
Thomas.

-- 
    Thomas.Quinot@Cuivre.FR.EU.ORG
Comment 6 Alfred Perlstein freebsd_committer freebsd_triage 2002-01-17 00:13:10 UTC
State Changed
From-To: open->closed

Mike Makonnen's patch seems good. 
Fixed in revision 1.7 of src/usr.sbin/rpc.lockd/lockd_lock.c