On a very recent -CURRENT build the following LOR is generated on a kya test run. kernel: lock order reversal: 1st 0xffff000000ef4cd8 in6_multi_sx (in6_multi_sx, sx) @ /tank/nfs_public/tiny/src/sys/netinet6/i n6_mcast.c:1184 lagg0: link state changed to DOWN tap1: link state changed to DOWN tap0: link state changed to DOWN 2nd 0xffffa000c35e7a28 if_lagg sx (if_lagg sx, sx) @ /tank/nfs_public/tiny/src/sys/net/if_lagg.c: 1678 lock order in6_multi_sx -> if_lagg sx attempted at: #0 0xffff000000570d90 at witness_checkorder+0xc54 #1 0xffff00000050dc30 at _sx_xlock+0x7c #2 0xffff0000a1d0ec80 at lagg_ioctl+0xbc #3 0xffff000000612d80 at if_addmulti+0x3b4 #4 0xffff0000006af094 at in6_joingroup_locked+0x188 #5 0xffff0000006aeedc at in6_joingroup+0x5c #6 0xffff0000006a6718 at in6_update_ifa+0xe88 #7 0xffff0000006ac608 at in6_ifattach+0x4f4 #8 0xffff0000006a8218 at in6_if_up+0x9c #9 0xffff0000006c6b84 at nd6_ioctl+0x684 #10 0xffff000000613668 at ifioctl+0x528 #11 0xffff0000005763b8 at kern_ioctl+0x2ec #12 0xffff000000576080 at sys_ioctl+0x144 #13 0xffff0000008241f4 at do_el0_sync+0x7dc #14 0xffff000000803a24 at handle_el0_sync+0x90
The same LOR also happing at in_multi_sx: kernel: lock order reversal: D1st 0xffff000000ee0d30 in_multi_sx (in_multi_sx, sx) @ /tank/nfs_public/tiny/src/sys/netinet/in_m cast.c:1212 2nd 0xffffa00069351e28 if_lagg sx (if_lagg sx, sx) @ /tank/nfs_public/tiny/src/sys/net/if_lagg.c: 1678 lock order in_multi_sx -> if_lagg sx attempted at: #0 0xffff000 000570d90 at witness_checkorder+0xc54 #1 0xffff00000050dc30 at _sx_xlock+0x7c #2 0xffff0000a1d0ec80 at lagg_ioctl+0xbc #3 0xffff000000612d80 at if_addmulti+0x3b4 #4 0xffff000000656d2c at in_joingroup_locked+0x23c #5 0xffff000000656ac0 at in_joingroup+0x58 D#6 0xffff000000651b00 at in_control+0xcdc #7 0xffff00000061 3668 at ifioctl+0x528 #8 0xffff0000005763b8 at kern_ioctl+0x2ec #9 0xffff000000576080 at sys_ioctl+0x144 #10 0xffff0000008241f4 at do_el0_sync+0x7dc #11 0xffff000000803a24 at handle_el0_sync+0x90
This is still happening in latest HEAD. According to Jenkins this first happened after https://reviews.freebsd.org/D26254, but that is unlikely to be the cause of the LOR. As this has been failing the tests since October I wonder if we should add a atf_skip in CI?
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=ee231b27fff9d6950bf36a9800c02f6474b53139 commit ee231b27fff9d6950bf36a9800c02f6474b53139 Author: Alex Richardson <arichardson@FreeBSD.org> AuthorDate: 2021-03-19 18:35:04 +0000 Commit: Alex Richardson <arichardson@FreeBSD.org> CommitDate: 2021-03-19 18:35:06 +0000 Also skip sys/net/if_lagg_test:witness on non-i386 The LOR also happens on amd64 and other architectures. Ideally we would fix this. However, in order to get Jenkins green again to catch real regressions, we should skip this test for now. PR: 251726 Reviewed By: lwhsu Differential Revision: https://reviews.freebsd.org/D29341 tests/sys/net/if_lagg_test.sh | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
Just checked it and it still occurs in a recent -CURRENT from today.