Bug 251726 - LOR between in6_multi_sx and if_lagg sx
Summary: LOR between in6_multi_sx and if_lagg sx
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Some People
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-12-10 13:01 UTC by Gordon Bergling
Modified: 2021-03-19 18:40 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Gordon Bergling freebsd_committer 2020-12-10 13:01:30 UTC
On a very recent -CURRENT build the following LOR is generated on a kya test run.

kernel: lock order reversal:
1st 0xffff000000ef4cd8 in6_multi_sx (in6_multi_sx, sx) @ /tank/nfs_public/tiny/src/sys/netinet6/i
n6_mcast.c:1184
lagg0: link state changed to DOWN
tap1: link state changed to DOWN
tap0: link state changed to DOWN
2nd 0xffffa000c35e7a28 if_lagg sx (if_lagg sx, sx) @ /tank/nfs_public/tiny/src/sys/net/if_lagg.c:
1678
lock order in6_multi_sx -> if_lagg sx attempted at:
#0 0xffff000000570d90 at witness_checkorder+0xc54
#1 0xffff00000050dc30 at _sx_xlock+0x7c
#2 0xffff0000a1d0ec80 at lagg_ioctl+0xbc
#3 0xffff000000612d80 at if_addmulti+0x3b4
#4 0xffff0000006af094 at in6_joingroup_locked+0x188
#5 0xffff0000006aeedc at in6_joingroup+0x5c
#6 0xffff0000006a6718 at in6_update_ifa+0xe88
#7 0xffff0000006ac608 at in6_ifattach+0x4f4
#8 0xffff0000006a8218 at in6_if_up+0x9c
#9 0xffff0000006c6b84 at nd6_ioctl+0x684
#10 0xffff000000613668 at ifioctl+0x528
#11 0xffff0000005763b8 at kern_ioctl+0x2ec
#12 0xffff000000576080 at sys_ioctl+0x144
#13 0xffff0000008241f4 at do_el0_sync+0x7dc
#14 0xffff000000803a24 at handle_el0_sync+0x90
Comment 1 Gordon Bergling freebsd_committer 2020-12-10 13:05:06 UTC
The same LOR also happing at in_multi_sx:

kernel: lock order reversal:
D1st 0xffff000000ee0d30 in_multi_sx (in_multi_sx, sx) @ /tank/nfs_public/tiny/src/sys/netinet/in_m
cast.c:1212
2nd 0xffffa00069351e28 if_lagg sx (if_lagg sx, sx) @ /tank/nfs_public/tiny/src/sys/net/if_lagg.c:
1678
lock order in_multi_sx -> if_lagg sx attempted at:
#0 0xffff000
000570d90 at witness_checkorder+0xc54
#1 0xffff00000050dc30 at _sx_xlock+0x7c
#2 0xffff0000a1d0ec80 at lagg_ioctl+0xbc
#3 0xffff000000612d80 at if_addmulti+0x3b4
#4 0xffff000000656d2c at in_joingroup_locked+0x23c
#5 0xffff000000656ac0 at in_joingroup+0x58
D#6 0xffff000000651b00 at in_control+0xcdc
#7 0xffff00000061
3668 at ifioctl+0x528
#8 0xffff0000005763b8 at kern_ioctl+0x2ec
#9 0xffff000000576080 at sys_ioctl+0x144
#10 0xffff0000008241f4 at do_el0_sync+0x7dc
#11 0xffff000000803a24 at handle_el0_sync+0x90
Comment 2 Alex Richardson freebsd_committer 2021-03-12 11:25:56 UTC
This is still happening in latest HEAD. According to Jenkins this first happened after https://reviews.freebsd.org/D26254, but that is unlikely to be the cause of the LOR.

As this has been failing the tests since October I wonder if we should add a atf_skip in CI?
Comment 3 commit-hook freebsd_committer 2021-03-19 18:40:07 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=ee231b27fff9d6950bf36a9800c02f6474b53139

commit ee231b27fff9d6950bf36a9800c02f6474b53139
Author:     Alex Richardson <arichardson@FreeBSD.org>
AuthorDate: 2021-03-19 18:35:04 +0000
Commit:     Alex Richardson <arichardson@FreeBSD.org>
CommitDate: 2021-03-19 18:35:06 +0000

    Also skip sys/net/if_lagg_test:witness on non-i386

    The LOR also happens on amd64 and other architectures. Ideally we would
    fix this. However, in order to get Jenkins green again to catch real
    regressions, we should skip this test for now.

    PR:             251726
    Reviewed By:    lwhsu
    Differential Revision: https://reviews.freebsd.org/D29341

 tests/sys/net/if_lagg_test.sh | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)