The following LOR appeared between r272024 and r272450: Oct 3 10:11:38 magellan kernel: lock order reversal: Oct 3 10:11:38 magellan kernel: 1st 0xfffff80006c92008 if_lagg rmlock (if_lagg rmlock) @ /mnt/home/dumbbell/Projects/freebsd/src/SVN/head/sys/modules/if_lagg/../../net/if_lagg.c:1395 Oct 3 10:11:38 magellan kernel: 2nd 0xfffff80029f77990 if_addr_lock (if_addr_lock) @ /mnt/home/dumbbell/Projects/freebsd/src/SVN/head/sys/modules/if_lagg/../../net/if_lagg.c:1499 Oct 3 10:11:38 magellan kernel: KDB: stack backtrace: Oct 3 10:11:38 magellan kernel: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe012185e650 Oct 3 10:11:38 magellan kernel: kdb_backtrace() at kdb_backtrace+0x39/frame 0xfffffe012185e700 Oct 3 10:11:38 magellan kernel: witness_checkorder() at witness_checkorder+0xdc2/frame 0xfffffe012185e790 Oct 3 10:11:38 magellan kernel: _rw_wlock_cookie() at _rw_wlock_cookie+0x71/frame 0xfffffe012185e7d0 Oct 3 10:11:38 magellan kernel: lagg_ether_cmdmulti() at lagg_ether_cmdmulti+0x5c/frame 0xfffffe012185e810 Oct 3 10:11:38 magellan kernel: lagg_ioctl() at lagg_ioctl+0x114c/frame 0xfffffe012185e8f0 Oct 3 10:11:38 magellan kernel: in_control() at in_control+0x30b/frame 0xfffffe012185e970 Oct 3 10:11:38 magellan kernel: ifioctl() at ifioctl+0xba8/frame 0xfffffe012185ea30 Oct 3 10:11:38 magellan kernel: kern_ioctl() at kern_ioctl+0x22b/frame 0xfffffe012185ea90 Oct 3 10:11:38 magellan kernel: sys_ioctl() at sys_ioctl+0x13c/frame 0xfffffe012185eae0 Oct 3 10:11:38 magellan kernel: amd64_syscall() at amd64_syscall+0x25a/frame 0xfffffe012185ebf0 Oct 3 10:11:38 magellan kernel: Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe012185ebf0 Oct 3 10:11:38 magellan kernel: --- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x8011df81a, rsp = 0x7fffffffe3e8, rbp = 0x7fffffffe460 ---
The lagg0 interface is configured like this in /etc/rc.conf: ifconfig_lagg0="laggproto failover laggport re0 laggport wlan0 SYNCDHCP"
Deadlocks due to this LOR are easy to reproduce. https://reviews.freebsd.org/D6845 has a proposed fix (not reviewed yet). r272211 is mentioned as the possible culprit in that review. This still occurs on 11.0-BETA1 so I've updated the version and added re@ to the CC list.
I do not see a corresponding commit for this. Is it still an issue?
(In reply to Glen Barber from comment #3) No commit has been implemented for this issue. My review/phab thing is a proof of concept that demonstrates that the LOR dissapears if you remove the LOCK, which is kind of dumb. At this time there is no fix for this issue.
I've got a better patch here, but it needs review https://reviews.freebsd.org/D8306
The bug is not in the lagg driver but in the upper layer that held the if_addr_lock while calling into the driver. This review should fix the lagg driver and any other ifnet implementation that needs to get a lock in its if_counter handler: https://reviews.freebsd.org/D8498
A commit references this bug: Author: rstone Date: Sat Nov 12 19:03:24 UTC 2016 New revision: 308580 URL: https://svnweb.freebsd.org/changeset/base/308580 Log: Don't read if_counters with if_addr_lock held Calling into an ifnet implementation with the if_addr_lock already held can cause a LOR and potentially a deadlock, as ifnet implementations typically can take the if_addr_lock after their own locks during configuration. Refactor a sysctl handler that was violating this to read if_counter data in a temporary buffer before the if_addr_lock is taken, and then copying the data in its final location later, when the if_addr_lock is held. PR: 194109 Reported by: Jean-Sebastien Pedron MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D8498 Reviewed by: sbruno Changes: head/sys/net/rtsock.c
A commit references this bug: Author: rstone Date: Sat Nov 26 01:16:33 UTC 2016 New revision: 309177 URL: https://svnweb.freebsd.org/changeset/base/309177 Log: MFC r308580: Don't read if_counters with if_addr_lock held Calling into an ifnet implementation with the if_addr_lock already held can cause a LOR and potentially a deadlock, as ifnet implementations typically can take the if_addr_lock after their own locks during configuration. Refactor a sysctl handler that was violating this to read if_counter data in a temporary buffer before the if_addr_lock is taken, and then copying the data in its final location later, when the if_addr_lock is held. PR: 194109 Reported by: Jean-Sebastien Pedron MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D8498 Reviewed by: sbruno Changes: _U stable/11/ stable/11/sys/net/rtsock.c
*** Bug 216731 has been marked as a duplicate of this bug. ***