Created attachment 205221 [details]
Patch to move the unlock of the rtentry just before the call of rt_notifydelete
I encounter the following lor in the function rtrequest1_fib (in the RTM_DELETE case of the switch).
When deleting a rtentry, it is returned locked by rt_unlinkrte then during the call to rt_notifydelete the "nd6 list" lock is locked.
In the RTM_ADD switch case, the old_rt in unlink by rt_unlinkrte then unlocked and passed to the rt_notifydelete, with this I assume there is no issue to call rt_notifydelete with a rtentry unlocked.
The patch enclosed move the unlock of the rtentry just before the call of rt_notifydelete.
The witness trace is the following:
[2019-05-01 23:21:12] lock order reversal:
[2019-05-01 23:21:12] 1st 0xfffff80036683998 rtentry (rtentry) @ net/route.c:1230
[2019-05-01 23:21:12] 2nd 0xffffffff8147fe68 nd6 list (nd6 list) @ netinet6/nd6_rtr.c:545
[2019-05-01 23:21:12] stack backtrace:
[2019-05-01 23:21:12] #0 0xffffffff80544e10 at witness_debugger+0x70
[2019-05-01 23:21:12] #1 0xffffffff80544d10 at witness_checkorder+0xc90
[2019-05-01 23:21:12] #2 0xffffffff804ecfda at __rw_rlock+0x3a
[2019-05-01 23:21:12] #3 0xffffffff8068f798 at defrouter_lookup+0x28
[2019-05-01 23:21:12] #4 0xffffffff80689b40 at nd6_rtrequest+0x50
[2019-05-01 23:21:12] #5 0xffffffff805f4a66 at rtrequest1_fib+0x286
[2019-05-01 23:21:12] #6 0xffffffff805f89ff at route_output+0x77f
[2019-05-01 23:21:12] #7 0xffffffff8056ef7f at sosend_generic+0x43f
[2019-05-01 23:21:12] #8 0xffffffff8054f90d at soo_write+0x2d
[2019-05-01 23:21:12] #9 0xffffffff80548b0a at dofilewrite+0x8a
[2019-05-01 23:21:12] #10 0xffffffff805487e8 at kern_writev+0x68
[2019-05-01 23:21:12] #11 0xffffffff80548776 at sys_write+0x86
[2019-05-01 23:21:12] #12 0xffffffff80753bf9 at amd64_syscall+0x299
[2019-05-01 23:21:12] #13 0xffffffff80734bed at fast_syscall_common+0x101
Thank you for reporting the LOR and providing the patch!
The fix might be slightly more complicated, as nothing stops another thread changing the gateway of the same rte to change rt->rt_ifa so we would call the handler on the new if a instead of the old one. I'm testing a slightly different patch and will hopefully commit it tomorrow.
Is it me or ifa_request is only used by nd6 and only for the delete case?
By "slightly different patch" do you mean something like, completely removing ifa_rtrequest from the struct ifaddr?
(In reply to Alexander V. Chernikov from comment #1)
Alexander, has this been fixed in your recent work on the routing subsystem?
Maybe. Thanks for reminding me about this change, it slipped through the cracks.
I plan to remove ifa_rtequest callback by the end of this week and will ensure that this LOR won't happen anymore.
For the record, there is no rtentry lock anymore in 13+.
However, it may still be the case with 11/12.
I don't currently have cycles to look into 12/ code, so returning to the pool
- Update Version to reflect latest stable branch this remains an issue in
- Mark stable/13 not applicable (OBE)
@Alexander You're a (the?) primary contributor to this area of the code in recent times, and a good candidate to determine whether we should resolve this issue OBE, as it's unlikely to be resolved for 12/11 unless you're able to help identify/coordinate others who might be able contribute to its actual resolution (by way of code commit)