Bug 282095 - enic breaks when changing MTU on interfaces with fib other than 0
Summary: enic breaks when changing MTU on interfaces with fib other than 0
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 14.1-RELEASE
Hardware: amd64 Any
: --- Affects Only Me
Assignee: Doug Ambrisko
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-10-15 00:47 UTC by Scott Aitken
Modified: 2025-02-14 16:31 UTC (History)
2 users (show)

See Also:
linimon: mfc-stable14?
linimon: mfc-stable13?


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Scott Aitken 2024-10-15 00:47:21 UTC
I'm running a Cisco UCS C220M4 with an MLOM card (2 x 10Gb).  I wanted to test the enic driver for throughput, so I plugged a DAC cable between the ports, installed a fresh version of 14.1-RELEASE and ran the following:

sysctl net.fibs=3
ifconfig enic0 inet 172.12.1.1/24 fib 1
ifconfig enic1 inet 172.12.2.1/24 fib 2
route add 172.12.2.0/24 -iface enic0 -fib 1
route add 172.12.1.0/24 -iface enic1 -fib 2

I could then ping/iper3 between the interfaces no problem (with setfib 1 ...)

As soon as I alter the MTU of either interface to 9000 bytes, I can no longer ping, all IP breaks.

PS. The reason I wanted to bump the MTU was I could only get ~3.7Gb/s of UDP across the interfaces:
[  5]   0.00-10.00  sec  4.27 GBytes  3.67 Gbits/sec  0.001 ms  1142296/4280693 (27%)  receiver
You can see 27% from the sender's side was dropped.  Packets were 1460 bytes.
Comment 1 Zhenlei Huang freebsd_committer freebsd_triage 2024-10-15 06:59:24 UTC
I have a two ports Chelsio T520-CR card. With the default MTU 1500, either TCP or UDP mode iperf3 can easily reach ~9.6Gbps. For UDP the loss rate is also negligible.

I think only ~3.7Gb/s for a 10G nic with default MTU 1500 is not reasonable. Well the enic(4) driver was introduced on Feb 6 2023. That is quite new.

I moved cxl1 to a separated vnet while doing the benchmark. For multiple fibs setup, the result is similar.

```
# jail -ic vnet persist
1
# ifconfig cxl1 vnet 1
# jexec 1 ifconfig cxl1 inet 192.168.99.1/31
# jexec 1 iperf3 -sD
# ifconfig cxl0 inet 192.168.99.0/31

# iperf3 -ub0 -c 192.168.99.1
Connecting to host 192.168.99.1, port 5201
[  5] local 192.168.99.0 port 19631 connected to 192.168.99.1 port 5201
[ ID] Interval           Transfer     Bitrate         Total Datagrams
[  5]   0.00-1.00   sec  1.11 GBytes  9.56 Gbits/sec  819173  
[  5]   1.00-2.00   sec  1.11 GBytes  9.57 Gbits/sec  819938  
[  5]   2.00-3.00   sec  1.11 GBytes  9.57 Gbits/sec  819165  
[  5]   3.00-4.00   sec  1.11 GBytes  9.57 Gbits/sec  819298  
[  5]   4.00-5.00   sec  1.11 GBytes  9.57 Gbits/sec  819118  
[  5]   5.00-6.00   sec  1.11 GBytes  9.57 Gbits/sec  818958  
[  5]   6.00-7.00   sec  1.11 GBytes  9.56 Gbits/sec  818553  
[  5]   7.00-8.00   sec  1.11 GBytes  9.57 Gbits/sec  819750  
[  5]   8.00-9.00   sec  1.11 GBytes  9.56 Gbits/sec  818532  
[  5]   9.00-10.00  sec  1.11 GBytes  9.58 Gbits/sec  819819  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Jitter    Lost/Total Datagrams
[  5]   0.00-10.00  sec  11.1 GBytes  9.57 Gbits/sec  0.000 ms  0/8192304 (0%)  sender
[  5]   0.00-10.00  sec  11.1 GBytes  9.56 Gbits/sec  0.000 ms  743/8189808 (0.0091%)  receiver
```
Comment 2 Zhenlei Huang freebsd_committer freebsd_triage 2024-10-15 07:00:49 UTC
CC the driver author Doug Ambrisko.
Comment 3 Scott Aitken 2024-10-15 07:05:57 UTC
This driver may have multiple issues, and if required I'll raise another for throughput.

To further the troubleshooting of the MTU change issue, I discovered that traffic doesn't completely stop after changing the MTU, however continuous pings with default values results in:

# setfib 1 ping 172.12.2.1
PING 172.12.2.1 (172.12.2.1): 56 data bytes
64 bytes from 172.12.2.1: icmp_seq=0 ttl=64 time=24391.545 ms
64 bytes from 172.12.2.1: icmp_seq=1 ttl=64 time=23332.953 ms
^C
--- 172.12.2.1 ping statistics ---
31 packets transmitted, 2 packets received, 93.5% packet loss
round-trip min/avg/max/stddev = 23332.953/23862.249/24391.545/529.296 ms

This was with a PCI NIC rather than the MLOM:
enic2: <Cisco VIC Ethernet NIC> port 0x4000-0x407f mem 0xc7010000-0xc7017fff,0xc7018000-0xc7019fff irq 44 at device 0.0 numa-domain 0 on pci20
Comment 4 Scott Aitken 2024-10-15 13:55:56 UTC
(In reply to Zhenlei Huang from comment #1)

Same result using jails as you show (instead of multiple fibs in my first scenario).  Pings work find until MTU is changed, then nothing until a reboot.
Comment 5 Doug Ambrisko freebsd_committer freebsd_triage 2024-10-15 15:12:46 UTC
I haven't tried MTU changes.  I might have to block that since I thought people on the Linux side said changing the MTU was not supported without a VIC configuration change (via BMC UI) and OS reboot.  I didn't connect the various offloads that the card supports.  I'll need to poke around at that.  I should look at adding devcmd2 support.  Currently the driver supports basic connectivity.  The VIC card is not a typical NIC.  The checksum and VLAN offload should be easy to add support since the API into the card
is there but I never attached to the FreeBSD API.  devcmd2 is a bit more involved since it requires another write queue for that.  It would be great if you could try it with Linux to compare features.  Then I can dig into it.  It seems with iflib it requires pairs of TX/RX queues.  The driver looks at the RX and TX queue configuration and uses the smallest value.  You might check the configuration and see if bumping up the number of queues help.  I think by default the number of TX queues is only 1.  I haven't looked at SR-IOV.

Thanks for playing with it, hopefully you find it useful.  My main goal was to be able to PXE boot and run machines diskless especially blades that only have this NIC.

If someone wants to look at the offload and send me patches that would be great.
Comment 6 Scott Aitken 2024-10-16 11:06:19 UTC
So the MTU was already set in CIMC to 9K.  I installed Fedora and the Linux driver must read the value since it picked up 9K:
enp19s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9000
        ether 70:70:8b:77:7e:76  txqueuelen 1000  (Ethernet)
        RX packets 20  bytes 6520 (6.3 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 71  bytes 11070 (10.8 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

Going back to CIMC and setting it to 4K, and after a reboot:
enp19s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 4000
        inet6 fe80::7270:8bff:fe77:7e76  prefixlen 64  scopeid 0x20<link>
        ether 70:70:8b:77:7e:76  txqueuelen 1000  (Ethernet)
        RX packets 14  bytes 4564 (4.4 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 48  bytes 7564 (7.3 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

I was only using FIBs/jails as didn't have a 10Gb switch and/or second server at hand - so I needed two route tables in order to test back-to-back on the same server.

So with the interface breaking when changing MTU and the low throughput (<4Gb/s) it might be fair to say the driver is in alpha?

I'm happy to test if needed.  Testing on Linux for me isn't easy since I know squat about Linux.
Comment 7 commit-hook freebsd_committer freebsd_triage 2025-01-09 17:06:46 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=0acab8b3d1336d4db73a9946ef76b4bcd0b0aabe

commit 0acab8b3d1336d4db73a9946ef76b4bcd0b0aabe
Author:     Doug Ambrisko <ambrisko@FreeBSD.org>
AuthorDate: 2025-01-09 16:28:37 +0000
Commit:     Doug Ambrisko <ambrisko@FreeBSD.org>
CommitDate: 2025-01-09 16:52:54 +0000

    enic(4): fix down/up, MTU changes and more

    ifconfig down/up cycles was not working.  Fix that which is required
    to support MTU changes.  Now doing ifconfig enic0 mtu 3000 for example
    works.  If the MTU is changes in the VIC HW configuration, that is not
    reflected in and the OS reports the default 1500.  I need to look at
    that but changing it via ifconfig works.  So this is different then
    what Linux does.

    Change TX interrupt allocation to be in this driver.  Change the admin
    interrupt count to 2.  This make multiple queues work but need to be
    done as pairs so if the VIC has more TX or RX queues setup in the
    VIC configuration it will use the lesser value.

    While updating the TX interrupt also add support for devcmd2.

    Enable checksum offloading.

    PR:     282095

 sys/dev/enic/cq_desc.h       |  15 ---
 sys/dev/enic/enic.h          |  76 ++++++--------
 sys/dev/enic/enic_res.c      |   4 +-
 sys/dev/enic/enic_res.h      |   2 -
 sys/dev/enic/enic_txrx.c     |  39 +++++--
 sys/dev/enic/if_enic.c       | 173 +++++++++++++++++++++++++++----
 sys/dev/enic/vnic_cq.h       |   5 +-
 sys/dev/enic/vnic_dev.c      | 235 +++++++++++++++++++++++++++++++++++++------
 sys/dev/enic/vnic_dev.h      |   8 +-
 sys/dev/enic/vnic_intr.c     |   2 +-
 sys/dev/enic/vnic_intr.h     |   2 +-
 sys/dev/enic/vnic_resource.h |   1 +
 sys/dev/enic/vnic_rq.c       |   5 +-
 sys/dev/enic/vnic_rq.h       |   1 -
 sys/dev/enic/vnic_rss.h      |   5 -
 sys/dev/enic/vnic_wq.c       | 104 ++++++++++++++++++-
 sys/dev/enic/vnic_wq.h       |  18 +++-
 17 files changed, 559 insertions(+), 136 deletions(-)
Comment 8 Mark Linimon freebsd_committer freebsd_triage 2025-02-14 16:31:50 UTC
^Triage: assign to committer and set flags for MFCs if interested.