Bug 194109

Summary: [lor] if_lagg rmlock <-> if_addr_lock
Product: Base System Reporter: Jean-Sébastien Pédron <dumbbell>
Component: kernAssignee: Ryan Stone <rstone>
Status: Closed FIXED    
Severity: Affects Many People CC: asomers, fbsd, melifaro, np, re, rpokala, rstone, sbruno
Priority: --- Keywords: needs-qa, patch, regression
Version: 11.0-STABLEFlags: koobs: mfc-stable11?
Hardware: Any   
OS: Any   
URL: https://reviews.freebsd.org/D6845

Description Jean-Sébastien Pédron freebsd_committer freebsd_triage 2014-10-03 08:21:31 UTC
The following LOR appeared between r272024 and r272450:

Oct  3 10:11:38 magellan kernel: lock order reversal:
Oct  3 10:11:38 magellan kernel: 1st 0xfffff80006c92008 if_lagg rmlock (if_lagg rmlock) @ /mnt/home/dumbbell/Projects/freebsd/src/SVN/head/sys/modules/if_lagg/../../net/if_lagg.c:1395
Oct  3 10:11:38 magellan kernel: 2nd 0xfffff80029f77990 if_addr_lock (if_addr_lock) @ /mnt/home/dumbbell/Projects/freebsd/src/SVN/head/sys/modules/if_lagg/../../net/if_lagg.c:1499
Oct  3 10:11:38 magellan kernel: KDB: stack backtrace:
Oct  3 10:11:38 magellan kernel: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe012185e650
Oct  3 10:11:38 magellan kernel: kdb_backtrace() at kdb_backtrace+0x39/frame 0xfffffe012185e700
Oct  3 10:11:38 magellan kernel: witness_checkorder() at witness_checkorder+0xdc2/frame 0xfffffe012185e790
Oct  3 10:11:38 magellan kernel: _rw_wlock_cookie() at _rw_wlock_cookie+0x71/frame 0xfffffe012185e7d0
Oct  3 10:11:38 magellan kernel: lagg_ether_cmdmulti() at lagg_ether_cmdmulti+0x5c/frame 0xfffffe012185e810
Oct  3 10:11:38 magellan kernel: lagg_ioctl() at lagg_ioctl+0x114c/frame 0xfffffe012185e8f0
Oct  3 10:11:38 magellan kernel: in_control() at in_control+0x30b/frame 0xfffffe012185e970
Oct  3 10:11:38 magellan kernel: ifioctl() at ifioctl+0xba8/frame 0xfffffe012185ea30
Oct  3 10:11:38 magellan kernel: kern_ioctl() at kern_ioctl+0x22b/frame 0xfffffe012185ea90
Oct  3 10:11:38 magellan kernel: sys_ioctl() at sys_ioctl+0x13c/frame 0xfffffe012185eae0
Oct  3 10:11:38 magellan kernel: amd64_syscall() at amd64_syscall+0x25a/frame 0xfffffe012185ebf0
Oct  3 10:11:38 magellan kernel: Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe012185ebf0
Oct  3 10:11:38 magellan kernel: --- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x8011df81a, rsp = 0x7fffffffe3e8, rbp = 0x7fffffffe460 ---
Comment 1 Jean-Sébastien Pédron freebsd_committer freebsd_triage 2014-10-03 08:24:23 UTC
The lagg0 interface is configured like this in /etc/rc.conf:
ifconfig_lagg0="laggproto failover laggport re0 laggport wlan0 SYNCDHCP"
Comment 2 Navdeep Parhar freebsd_committer freebsd_triage 2016-07-14 21:19:22 UTC
Deadlocks due to this LOR are easy to reproduce.  https://reviews.freebsd.org/D6845 has a proposed fix (not reviewed yet).  r272211 is mentioned as the possible culprit in that review.

This still occurs on 11.0-BETA1 so I've updated the version and added re@ to the CC list.
Comment 3 Glen Barber freebsd_committer freebsd_triage 2016-08-08 16:21:35 UTC
I do not see a corresponding commit for this.  Is it still an issue?
Comment 4 Sean Bruno freebsd_committer freebsd_triage 2016-08-08 16:27:32 UTC
(In reply to Glen Barber from comment #3)
No commit has been implemented for this issue.  My review/phab thing is a proof of concept that demonstrates that the LOR dissapears if you remove the LOCK, which is kind of dumb.

At this time there is no fix for this issue.
Comment 5 Alan Somers freebsd_committer freebsd_triage 2016-11-11 15:01:15 UTC
I've got a better patch here, but it needs review
https://reviews.freebsd.org/D8306
Comment 6 Ryan Stone freebsd_committer freebsd_triage 2016-11-11 17:46:22 UTC
The bug is not in the lagg driver but in the upper layer that held the if_addr_lock while calling into the driver.  This review should fix the lagg driver and any other ifnet implementation that needs to get a lock in its if_counter handler:

https://reviews.freebsd.org/D8498
Comment 7 commit-hook freebsd_committer freebsd_triage 2016-11-12 19:03:42 UTC
A commit references this bug:

Author: rstone
Date: Sat Nov 12 19:03:24 UTC 2016
New revision: 308580
URL: https://svnweb.freebsd.org/changeset/base/308580

Log:
  Don't read if_counters with if_addr_lock held

  Calling into an ifnet implementation with the if_addr_lock already
  held can cause a LOR and potentially a deadlock, as ifnet
  implementations typically can take the if_addr_lock after their
  own locks during configuration.  Refactor a sysctl handler that
  was violating this to read if_counter data in a temporary buffer
  before the if_addr_lock is taken, and then copying the data
  in its final location later, when the if_addr_lock is held.

  PR: 194109
  Reported by: Jean-Sebastien Pedron
  MFC after: 2 weeks
  Differential Revision:	https://reviews.freebsd.org/D8498
  Reviewed by: sbruno

Changes:
  head/sys/net/rtsock.c
Comment 8 commit-hook freebsd_committer freebsd_triage 2016-11-26 01:16:47 UTC
A commit references this bug:

Author: rstone
Date: Sat Nov 26 01:16:33 UTC 2016
New revision: 309177
URL: https://svnweb.freebsd.org/changeset/base/309177

Log:
  MFC r308580:

    Don't read if_counters with if_addr_lock held

    Calling into an ifnet implementation with the if_addr_lock already
    held can cause a LOR and potentially a deadlock, as ifnet
    implementations typically can take the if_addr_lock after their
    own locks during configuration.  Refactor a sysctl handler that
    was violating this to read if_counter data in a temporary buffer
    before the if_addr_lock is taken, and then copying the data
    in its final location later, when the if_addr_lock is held.

    PR: 194109
    Reported by: Jean-Sebastien Pedron
    MFC after: 2 weeks
    Differential Revision:        https://reviews.freebsd.org/D8498
    Reviewed by: sbruno

Changes:
_U  stable/11/
  stable/11/sys/net/rtsock.c
Comment 9 Alan Somers freebsd_committer freebsd_triage 2017-02-15 17:38:24 UTC
*** Bug 216731 has been marked as a duplicate of this bug. ***