Bug 238707 - Lock order reversal: rtentry vs "nd6 list"
Summary: Lock order reversal: rtentry vs "nd6 list"
Status: Closed Overcome By Events
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 12.2-STABLE
Hardware: Any Any
: --- Affects Some People
Assignee: freebsd-net (Nobody)
URL:
Keywords: needs-qa
Depends on:
Blocks:
 
Reported: 2019-06-19 15:05 UTC by Arnaud Ysmal
Modified: 2023-06-05 13:02 UTC (History)
3 users (show)

See Also:
koobs: mfc-stable13-
koobs: mfc-stable12?
koobs: mfc-stable11?


Attachments
Patch to move the unlock of the rtentry just before the call of rt_notifydelete (663 bytes, patch)
2019-06-19 15:05 UTC, Arnaud Ysmal
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Arnaud Ysmal 2019-06-19 15:05:49 UTC
Created attachment 205221 [details]
Patch to move the unlock of the rtentry just before the call of rt_notifydelete

Hi,

I encounter the following lor in the function rtrequest1_fib (in the RTM_DELETE case of the switch).

When deleting a rtentry, it is returned locked by rt_unlinkrte then during the call to rt_notifydelete the "nd6 list" lock is locked.

In the RTM_ADD switch case, the old_rt in unlink by rt_unlinkrte then unlocked and passed to the rt_notifydelete, with this I assume there is no issue to call rt_notifydelete with a rtentry unlocked.

The patch enclosed move the unlock of the rtentry just before the call of rt_notifydelete.

The witness trace is the following:
[2019-05-01 23:21:12] lock order reversal:
[2019-05-01 23:21:12] 1st 0xfffff80036683998 rtentry (rtentry) @ net/route.c:1230
[2019-05-01 23:21:12] 2nd 0xffffffff8147fe68 nd6 list (nd6 list) @ netinet6/nd6_rtr.c:545
[2019-05-01 23:21:12] stack backtrace:
[2019-05-01 23:21:12] #0 0xffffffff80544e10 at witness_debugger+0x70
[2019-05-01 23:21:12] #1 0xffffffff80544d10 at witness_checkorder+0xc90
[2019-05-01 23:21:12] #2 0xffffffff804ecfda at __rw_rlock+0x3a
[2019-05-01 23:21:12] #3 0xffffffff8068f798 at defrouter_lookup+0x28
[2019-05-01 23:21:12] #4 0xffffffff80689b40 at nd6_rtrequest+0x50
[2019-05-01 23:21:12] #5 0xffffffff805f4a66 at rtrequest1_fib+0x286
[2019-05-01 23:21:12] #6 0xffffffff805f89ff at route_output+0x77f
[2019-05-01 23:21:12] #7 0xffffffff8056ef7f at sosend_generic+0x43f
[2019-05-01 23:21:12] #8 0xffffffff8054f90d at soo_write+0x2d
[2019-05-01 23:21:12] #9 0xffffffff80548b0a at dofilewrite+0x8a
[2019-05-01 23:21:12] #10 0xffffffff805487e8 at kern_writev+0x68
[2019-05-01 23:21:12] #11 0xffffffff80548776 at sys_write+0x86
[2019-05-01 23:21:12] #12 0xffffffff80753bf9 at amd64_syscall+0x299
[2019-05-01 23:21:12] #13 0xffffffff80734bed at fast_syscall_common+0x101
Comment 1 Alexander V. Chernikov freebsd_committer freebsd_triage 2019-06-19 23:31:13 UTC
Thank you for reporting the LOR and providing the patch!

The fix might be slightly more complicated, as nothing stops another thread changing the gateway of the same rte to change rt->rt_ifa so we would call the handler on the new if a instead of the old one. I'm testing a slightly different patch and will hopefully commit it tomorrow.
Comment 2 Arnaud Ysmal 2019-07-31 08:44:24 UTC
Is it me or ifa_request is only used by nd6 and only for the delete case?

By "slightly different patch" do you mean something like, completely removing ifa_rtrequest from the struct ifaddr?
Comment 3 Mark Johnston freebsd_committer freebsd_triage 2020-06-02 14:24:18 UTC
(In reply to Alexander V. Chernikov from comment #1)
Alexander, has this been fixed in your recent work on the routing subsystem?
Comment 4 Alexander V. Chernikov freebsd_committer freebsd_triage 2020-06-02 19:58:36 UTC
Maybe. Thanks for reminding me about this change, it slipped through the cracks.
I plan to remove ifa_rtequest callback by the end of this week and will ensure that this LOR won't happen anymore.
Comment 5 Alexander V. Chernikov freebsd_committer freebsd_triage 2021-03-31 21:18:37 UTC
For the record, there is no rtentry lock anymore in 13+.

However, it may still be the case with 11/12.
I don't currently have cycles to look into 12/ code, so returning to the pool
Comment 6 Kubilay Kocak freebsd_committer freebsd_triage 2021-04-02 02:39:15 UTC
^Triage: 

- Update Version to reflect latest stable branch this remains an issue in
- Mark stable/13 not applicable (OBE)

@Alexander You're a (the?) primary contributor to this area of the code in recent times, and a good candidate to determine whether we should resolve this issue OBE, as it's unlikely to be resolved for 12/11 unless you're able to help identify/coordinate others who might be able contribute to its actual resolution (by way of code commit)
Comment 7 Alexander V. Chernikov freebsd_committer freebsd_triage 2023-06-05 13:02:21 UTC
Sorry for the belated reply.
I certainly won't have any cycles to look into that. There are 6 months left before 12.4 EOL. Given that no-one else expressed any willingness to address it in ~3 yeards, I'm going to close it as OBE.
Please feel free to reopen if someone gets cycles to land & test the fix.