Bug 258923 - [mlx4en] Wrong mbuf cluster size in mlx4_en_debugnet_init leads to panic
Summary: [mlx4en] Wrong mbuf cluster size in mlx4_en_debugnet_init leads to panic
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Some People
Assignee: freebsd-net (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-10-04 14:12 UTC by Andrey V. Elsukov
Modified: 2022-05-24 00:23 UTC (History)
5 users (show)

See Also:


Attachments
proposed patch (515 bytes, patch)
2021-10-04 14:12 UTC, Andrey V. Elsukov
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Andrey V. Elsukov freebsd_committer freebsd_triage 2021-10-04 14:12:38 UTC
Created attachment 228433 [details]
proposed patch

Setting MTU `ifconfig mlxen0 mtu 9000` leads to panic:

Unread portion of the kernel message buffer:
panic: m_getzone: invalid cluster size 1522
cpuid = 14
time = 1630422868
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe01500e5710
vpanic() at vpanic+0x182/frame 0xfffffe01500e5760
panic() at panic+0x43/frame 0xfffffe01500e57c0
debugnet_mbuf_reinit() at debugnet_mbuf_reinit+0x22b/frame 0xfffffe01500e5800
debugnet_any_ifnet_update() at debugnet_any_ifnet_update+0x14b/frame 0xfffffe01500e5850
ifhwioctl() at ifhwioctl+0x48e/frame 0xfffffe01500e58d0
ifioctl() at ifioctl+0x3ac/frame 0xfffffe01500e59a0

mlx4_en_debugnet_init() uses priv->rx_mb_size to set mbuf cluster size in DEBUGNET. It is calculated as:

  priv->rx_mb_size = dev->if_mtu + ETH_HLEN + VLAN_HLEN + ETH_FCS_LEN;

But DEBUGNET uses this value to call debugnet_mbuf_reinit(), where m_getzone() is called for specified cluster size. And since it has wrong value, this triggers panic.
Comment 1 Hans Petter Selasky freebsd_committer freebsd_triage 2021-10-04 14:18:10 UTC
I'm good with your patch. Just wait for kib@ to approve aswell.

Sorry for not getting back to you on this one. Has been very hectic.

--HPS
Comment 2 Hans Petter Selasky freebsd_committer freebsd_triage 2021-10-04 14:18:40 UTC
You can just submit and MFC it once Konstantin approves it.
Comment 3 Hans Petter Selasky freebsd_committer freebsd_triage 2021-10-05 07:52:50 UTC
OK, got a green light from Konstantin.

Please go ahead and commit and MFC.

--HPS
Comment 4 commit-hook freebsd_committer freebsd_triage 2021-10-05 08:51:29 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=5a7de2b42caf6241e87b417a0521e9ab303989d7

commit 5a7de2b42caf6241e87b417a0521e9ab303989d7
Author:     Hans Petter Selasky <hselasky@FreeBSD.org>
AuthorDate: 2021-10-05 08:46:56 +0000
Commit:     Hans Petter Selasky <hselasky@FreeBSD.org>
CommitDate: 2021-10-05 08:48:30 +0000

    mlx4en(4): Fix wrong mbuf cluster size in mlx4_en_debugnet_init()

    This fixes an "invalid cluster size" panic when debugnet is activated.

    panic()
    m_getzone()
    debugnet_mbuf_reinit()
    debugnet_any_ifnet_update()
    ifhwioctl()
    ifioctl()

    Submitted by:   ae@
    PR:             258923
    MFC after:      1 week
    Sponsored by:   NVIDIA Networking

 sys/dev/mlx4/mlx4_en/mlx4_en_netdev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
Comment 5 commit-hook freebsd_committer freebsd_triage 2021-10-12 12:14:40 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=fe5ee07a11ced27eab9c6a6bd0f3e27d99423ba8

commit fe5ee07a11ced27eab9c6a6bd0f3e27d99423ba8
Author:     Hans Petter Selasky <hselasky@FreeBSD.org>
AuthorDate: 2021-10-05 08:46:56 +0000
Commit:     Hans Petter Selasky <hselasky@FreeBSD.org>
CommitDate: 2021-10-12 12:12:00 +0000

    mlx4en(4): Fix wrong mbuf cluster size in mlx4_en_debugnet_init()

    This fixes an "invalid cluster size" panic when debugnet is activated.

    panic()
    m_getzone()
    debugnet_mbuf_reinit()
    debugnet_any_ifnet_update()
    ifhwioctl()
    ifioctl()

    Submitted by:   ae@
    PR:             258923
    Sponsored by:   NVIDIA Networking

    (cherry picked from commit 5a7de2b42caf6241e87b417a0521e9ab303989d7)

 sys/dev/mlx4/mlx4_en/mlx4_en_netdev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
Comment 6 commit-hook freebsd_committer freebsd_triage 2021-10-12 12:18:43 UTC
A commit in branch stable/12 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=8d804809630fd145683de92ba502a9ed34b94115

commit 8d804809630fd145683de92ba502a9ed34b94115
Author:     Hans Petter Selasky <hselasky@FreeBSD.org>
AuthorDate: 2021-10-05 08:46:56 +0000
Commit:     Hans Petter Selasky <hselasky@FreeBSD.org>
CommitDate: 2021-10-12 12:17:02 +0000

    mlx4en(4): Fix wrong mbuf cluster size in mlx4_en_debugnet_init()

    This fixes an "invalid cluster size" panic when debugnet is activated.

    panic()
    m_getzone()
    debugnet_mbuf_reinit()
    debugnet_any_ifnet_update()
    ifhwioctl()
    ifioctl()

    Submitted by:   ae@
    PR:             258923
    Sponsored by:   NVIDIA Networking

    (cherry picked from commit 5a7de2b42caf6241e87b417a0521e9ab303989d7)

 sys/dev/mlx4/mlx4_en/mlx4_en_netdev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
Comment 7 Michael Meiszl 2022-02-20 07:11:34 UTC
is this considered "fixed" by now?

This morning I swapped (ok, I TRYED TO SWAP) an Intel 10Gbe card to a Mellanox Connect-X3.

At the next boot this error showed here up too, no way to get into multiuser.
  
Architecture: amd64
  Architecture Version: 2
  Dump Length: 1162113024
  Blocksize: 512
  Compression: none
  Dumptime: 2022-02-20 07:47:05 +0100
  Hostname: l3router.meiszl.de
  Magic: FreeBSD Kernel Dump
  Version String: FreeBSD 13.0-RELEASE-p7 #0: Mon Jan 31 18:24:03 UTC 2022
    root@amd64-builder.daemonology.net:/usr/obj/usr/src/amd64.amd64/sys/GENERIC
  Panic String: m_getzone: invalid cluster size 1522
  Dump Parity: 3248958317
  Bounds: 4
  Dump Status: good

I've tried a few times, disabled ipfw (had some troubles with it last year but this has been fixed now), but finally had the cards flipped back again to get the box up and running.

So, if the fix here has already been incorporated, it did not work, at least for me :-(
Comment 8 Hans Petter Selasky freebsd_committer freebsd_triage 2022-02-20 21:02:41 UTC
It has not been merged to 13.0, you need to use 13-stable!
Comment 9 Michael Meiszl 2022-02-20 21:07:27 UTC
(In reply to Hans Petter Selasky from comment #8)
>It has not been merged to 13.0, you need to use 13-stable!
Sorry, I dont get it.
Why should I use 13-stable if it is not in 13.0 ???

I've added it manually today and created a custom kernel, now it works. But I dont understand why it is not incorporated into the normal updates that come in regulary ?

Usually I avoid using "special" kernels.
Comment 10 Marek Zarychta 2022-02-20 21:20:24 UTC
It will be fixed in the upcoming 13.1-RELEASE. 

Please compare: https://docs.freebsd.org/en/books/dev-model/#development-model