Bug 254596 - if_bridge wants LRO turned off, if_vlan insists it remain on
Summary: if_bridge wants LRO turned off, if_vlan insists it remain on
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 12.2-RELEASE
Hardware: Any Any
: --- Affects Some People
Assignee: Ed Maste
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-03-27 03:57 UTC by paul vixie
Modified: 2024-06-14 04:17 UTC (History)
15 users (show)

See Also:


Attachments
untested patch (1.26 KB, patch)
2023-05-08 21:05 UTC, paul vixie
no flags Details | Diff
second patch, same content, but in "git format-patch" format (1.81 KB, text/plain)
2023-05-12 16:50 UTC, paul vixie
no flags Details
partial FreeBSD-EN-ERRATA_TEMPLATE as requested (4.40 KB, text/plain)
2023-05-12 20:46 UTC, paul vixie
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description paul vixie 2021-03-27 03:57:31 UTC
today i added a vlan subinterface to a bridge, and dmesg reported:

bridge1: can't disable some capabilities on igb0.203: 0x400

this is IFCAP_LRO. bridge(4) says:

"on all interfaces added to the bridge ... The LRO capability is always disabled."

wikipedia(Large receive offload) says:

"LRO should not operate on machines acting as routers, as it breaks the end-to-end principle and can significantly impact performance."

sys/net/if_vlan.c says:

        /*
         * If the parent interface can do LRO and checksum offloading on
         * VLANs, then guess it may do LRO on VLANs.  False positive here
         * cost nothing, while false negative may lead to some confusions.
         */
        if (p->if_capabilities & IFCAP_VLAN_HWCSUM)
                cap |= p->if_capabilities & IFCAP_LRO;
        if (p->if_capenable & IFCAP_VLAN_HWCSUM)
                ena |= p->if_capenable & IFCAP_LRO;

to get LRO to not be sticky-on for a vlan subinterface, i would have to remove LRO from the parent interface. i don't think this is functioning as intended, because i have other non-bridged subinterfaces for which LRO is appropriate.

i am not posting a candidate fix because too many layers are being crossed here and there's not an obvious way to relax this deep constraint. someone with recent kernel architecture chops should take a look.
Comment 1 Kubilay Kocak freebsd_committer freebsd_triage 2021-03-27 04:08:39 UTC
@Alexander Anyone else you suggest might be suitable/able to advise on this issue?
Comment 2 Kyle Evans freebsd_committer freebsd_triage 2023-05-07 19:58:57 UTC
Maybe CC trouble as well, I think someone mentioned that he's been doing some bridge work lately.
Comment 3 dfr 2023-05-08 07:59:06 UTC
I'm pretty sure I came across this on a truenas system where the physical interface is on the trunk and various vlans are derived from that for service jails. I had absolutely terrible throughput for one service and ended up just disabling LRO altogether.
Comment 4 paul vixie 2023-05-08 21:05:26 UTC
Created attachment 242067 [details]
untested patch
Comment 5 Ed Maste freebsd_committer freebsd_triage 2023-05-11 14:52:28 UTC
(In reply to paul vixie from comment #4)
Would you kindly upload a git format-patch style patch or otherwise provide the email and name you would like used for Git's --author
Comment 6 paul vixie 2023-05-11 15:48:28 UTC
(In reply to Ed Maste from comment #5)

since it doesn't seem unreasonable to you, i'll do what you ask, but first i will test it.

i think the patch is unreasonable and that it would be better to strike these two "if" statements than to containerize them to the non-bridge case. however, what i'm actually proposing is less controversial in that it's unable to break a working configuration. removing the LRO logic (those two if statements) could indeed break working configurations.
Comment 7 Naman Sood 2023-05-11 23:32:52 UTC
For what it's worth, I was able to test the patch and observe that LRO is disabled when adding a VLAN subinterface to a bridge, and re-enabled when removing it. Unfortunately, as the PRs Ed linked above show, the e1000 driver (what I have in my machine) doesn't actually support LRO even though it claims to, so I can't test that it meaningfully changes performance.
Comment 8 paul vixie 2023-05-12 16:50:23 UTC
Created attachment 242133 [details]
second patch, same content, but in "git format-patch" format
Comment 9 paul vixie 2023-05-12 16:51:46 UTC
(In reply to Naman Sood from comment #7)
i've now tested this patch and it does what i need. the LRO option remains on for the physical interface and for the vlan subinterfaces which are not added to a bridge. i did not performance-test the physical interface or other subinterfaces to see what impact LRO was having, but input coalescence is a theoretical boon and i'm glad to not have to turn off LRO on the physical interface in order to protect the bridge members from having LRO turned on inappropriately.

the main motivation is to avoid input coalescence on the bridge member, since the other bridge members are tap(4) interfaces each belonging to a bhyve guest (where the interface shows up as a vtnet(4) interface.) input coalescence in this case is not just a performance problem it damages the input flow and makes the bhyve guest unusable. i was working around this by turning off LRO on the physical interface but this is a price too high: bhyve guests whose bridge is connected not by a physical interface but by a vlan subinterface must "just work".

i am attaching a second patch, this time in "git format-patch" format, since this was requested. it looks the same to me. can this patch be reviewed for inclusion in 13.3 and also pulled into the 13.2 release in the next maintenance patch?
Comment 10 Ed Maste freebsd_committer freebsd_triage 2023-05-12 17:31:08 UTC
> can this patch be reviewed for inclusion in 13.3 and also pulled into the 13.2
> release in the next maintenance patch?

In order to make it into a 13.2 errata update the first two steps will be committing to main and MFCing to stable/13. I'll handle that (likely after a couple of weeks).

In order to help facilitate an errata update, would you be able to provide content for the errata notice? See https://www.freebsd.org/security/errata-template.txt, and in particular sections I through IV.
Comment 11 paul vixie 2023-05-12 20:46:11 UTC
Created attachment 242136 [details]
partial FreeBSD-EN-ERRATA_TEMPLATE as requested

as requested
Comment 12 paul vixie 2023-05-12 20:46:46 UTC
(In reply to Ed Maste from comment #10)
done, see ticket attachments.
Comment 13 dfr 2023-08-11 15:39:44 UTC
Is there anything blocking merging this patch to main? It would be nice to get this fix into the tree before 14.0 branches.
Comment 14 commit-hook freebsd_committer freebsd_triage 2023-08-11 22:52:00 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=5f11a33ceeb385477cb22d9ad5941061c5a26be9

commit 5f11a33ceeb385477cb22d9ad5941061c5a26be9
Author:     Paul Vixie <paul@redbarn.org>
AuthorDate: 2023-08-11 18:17:16 +0000
Commit:     Kristof Provost <kp@FreeBSD.org>
CommitDate: 2023-08-11 22:50:37 +0000

    if_vlan: do not enable LRO for bridge interaces

    If the parent interface is not a bridge and can do LRO and
    checksum offloading on VLANs, then guess it may do LRO on VLANs.
    False positive here cost nothing, while false negative may lead
    to some confusions. According to Wikipedia:

    "LRO should not operate on machines acting as routers, as it breaks
    the end-to-end principle and can significantly impact performance."

    The same reasoning applies to machines acting as bridges.

    PR:             254596
    MFC after:      3 weeks

 sys/net/if_vlan.c | 22 +++++++++++++++-------
 1 file changed, 15 insertions(+), 7 deletions(-)
Comment 15 Kevin Bowling freebsd_committer freebsd_triage 2023-08-12 16:33:10 UTC
The above patch was reverted, a different fix will land momentarily.
Comment 16 commit-hook freebsd_committer freebsd_triage 2023-08-12 16:40:40 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=b1a39c31a3569bd045a0f40057c3773fc8166f6d

commit b1a39c31a3569bd045a0f40057c3773fc8166f6d
Author:     Kevin Bowling <kbowling@FreeBSD.org>
AuthorDate: 2023-08-12 16:31:22 +0000
Commit:     Kevin Bowling <kbowling@FreeBSD.org>
CommitDate: 2023-08-12 16:39:23 +0000

    vlan: Respect IFCAP_LRO mask

    vlan_capabilities(), used by the IFCAP ioctl, was not respecting the
    IFCAP_LRO bit if it was masked by the requestor.

    This prevented if_bridge(4) from automasking LRO with a message like:
    bridge0: can't disable some capabilities on em3.11: 0x400

    This also prevented manually disabling LRO from any vlan interface.

    PR:             254596
    Reported by:    Paul Vixie <paul@redbarn.org>
    MFC after:      1 week

 sys/net/if_vlan.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
Comment 17 commit-hook freebsd_committer freebsd_triage 2023-08-18 01:00:11 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=f73eb35106cc65d19af0f2e82a111308c5090c3a

commit f73eb35106cc65d19af0f2e82a111308c5090c3a
Author:     Kevin Bowling <kbowling@FreeBSD.org>
AuthorDate: 2023-08-12 16:31:22 +0000
Commit:     Kevin Bowling <kbowling@FreeBSD.org>
CommitDate: 2023-08-18 00:59:30 +0000

    vlan: Respect IFCAP_LRO mask

    vlan_capabilities(), used by the IFCAP ioctl, was not respecting the
    IFCAP_LRO bit if it was masked by the requestor.

    This prevented if_bridge(4) from automasking LRO with a message like:
    bridge0: can't disable some capabilities on em3.11: 0x400

    This also prevented manually disabling LRO from any vlan interface.

    PR:             254596
    Reported by:    Paul Vixie <paul@redbarn.org>

    (cherry picked from commit b1a39c31a3569bd045a0f40057c3773fc8166f6d)

 sys/net/if_vlan.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
Comment 18 commit-hook freebsd_committer freebsd_triage 2023-08-18 01:01:14 UTC
A commit in branch stable/12 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=50b52fe14b8949539e7fe583d4578b1f2c96dc59

commit 50b52fe14b8949539e7fe583d4578b1f2c96dc59
Author:     Kevin Bowling <kbowling@FreeBSD.org>
AuthorDate: 2023-08-12 16:31:22 +0000
Commit:     Kevin Bowling <kbowling@FreeBSD.org>
CommitDate: 2023-08-18 01:00:07 +0000

    vlan: Respect IFCAP_LRO mask

    vlan_capabilities(), used by the IFCAP ioctl, was not respecting the
    IFCAP_LRO bit if it was masked by the requestor.

    This prevented if_bridge(4) from automasking LRO with a message like:
    bridge0: can't disable some capabilities on em3.11: 0x400

    This also prevented manually disabling LRO from any vlan interface.

    PR:             254596
    Reported by:    Paul Vixie <paul@redbarn.org>

    (cherry picked from commit b1a39c31a3569bd045a0f40057c3773fc8166f6d)

 sys/net/if_vlan.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
Comment 19 Kevin Bowling freebsd_committer freebsd_triage 2023-08-18 01:02:14 UTC
Passing to Ed per above comment about EN.
Comment 20 Mark Linimon freebsd_committer freebsd_triage 2024-01-10 03:30:30 UTC
^Triage: to committer: was the EN for 12.3 ever completed?
Comment 21 Mark Linimon freebsd_committer freebsd_triage 2024-06-14 04:17:56 UTC
^Triage: 12.3 has since gone EOL.  The idea of an EN is now OBE.