Bug 203134 - [nfs] [nlm] [lor] newnfs/allproc lock order reversal on NFS mount recovery
Summary: [nfs] [nlm] [lor] newnfs/allproc lock order reversal on NFS mount recovery
Status: Closed DUPLICATE of bug 203133
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 10.2-BETA2
Hardware: Any Any
: --- Affects Only Me
Assignee: freebsd-fs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-09-15 18:09 UTC by Gavin Atkinson
Modified: 2015-09-20 21:41 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Gavin Atkinson freebsd_committer freebsd_triage 2015-09-15 18:09:26 UTC
Host is 10.2-BETA2 running r285810.  Host has four NFS mounts from two separate NetApp filers.  One of those filers was rebooted, which caused the mount to stop responding (and produced a LOR which I believe is unrelated, see bug 203133).  When the filer became available again, a client got this:

Sep 15 07:14:48 client kernel: lock order reversal:
Sep 15 07:14:48 client kernel: 1st 0xfffff8000d704d50 newnfs (newnfs) @ /space/freebsd/stable/10/sys/nlm/nlm_advlock.c:500
Sep 15 07:14:48 client kernel: 2nd 0xffffffff81c694d8 allproc (allproc) @ /space/freebsd/stable/10/sys/kern/kern_proc.c:309
Sep 15 07:14:48 client kernel: KDB: stack backtrace:
Sep 15 07:14:48 client kernel: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0469a137a0
Sep 15 07:14:48 client kernel: kdb_backtrace() at kdb_backtrace+0x39/frame 0xfffffe0469a13850
Sep 15 07:14:48 client kernel: witness_checkorder() at witness_checkorder+0xe24/frame 0xfffffe0469a138e0
Sep 15 07:14:48 client kernel: _sx_slock() at _sx_slock+0x76/frame 0xfffffe0469a13920
Sep 15 07:14:48 client kernel: pfind() at pfind+0x22/frame 0xfffffe0469a13940
Sep 15 07:14:48 client kernel: nlm_set_creds_for_lock() at nlm_set_creds_for_lock+0xb4/frame 0xfffffe0469a13970
Sep 15 07:14:48 client kernel: nlm_client_recover_lock() at nlm_client_recover_lock+0x61/frame 0xfffffe0469a139b0
Sep 15 07:14:48 client kernel: lf_iteratelocks_sysid() at lf_iteratelocks_sysid+0x194/frame 0xfffffe0469a13a10
Sep 15 07:14:48 client kernel: nlm_client_recovery() at nlm_client_recovery+0x51/frame 0xfffffe0469a13a50
Sep 15 07:14:48 client kernel: nlm_client_recovery_start() at nlm_client_recovery_start+0x35/frame 0xfffffe0469a13a70
Sep 15 07:14:48 client kernel: fork_exit() at fork_exit+0x84/frame 0xfffffe0469a13ab0
Sep 15 07:14:48 client kernel: fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0469a13ab0
Sep 15 07:14:48 client kernel: --- trap 0xc, rip = 0x800cb747a, rsp = 0x7fffffffeb78, rbp = 0x7fffffffed00 ---
Sep 15 07:14:49 client rpc.statd: Unsolicited notification from host netapp-14
Sep 15 07:14:51 client rpc.statd: Unsolicited notification from host netapp-126
Sep 15 07:14:57 client kernel: newnfs server netapp:/vol/filestore: is alive aganin

Unfortunately, I'm unlikely to be able to recreate this one at will.

Gavin
Comment 1 Rick Macklem freebsd_committer freebsd_triage 2015-09-20 21:41:08 UTC
The info below doesn't seem to show where the locks are acquired
in the other order. The NFS client (for NFSv4) and the NLM first lock
the NFS vnode and then the "proc" related lock(s).

Since I do not believe the NFS subsystem and NLM (not really a part of
NFS, but a separate protocol) ever first locks the proc structure and
then a vnode, I don't think and deadlock can occur.

If someone knows of a way that the generic kernel code could lock an
NFS client vnode after acquiring a proc lock, please let me know.

I do know that it isn't practical to "fix" these LORs, but I do not
believe that they can cause deadlocks.

If I find out where harmless LORs are listed, I'll add these.

rick
ps: Although 203133 isn't the same LOR, the same story applies to both.

*** This bug has been marked as a duplicate of bug 203133 ***