Bug 277849 - [bge] panic when attaching if_bge on Supermicro H13SSL-N
Summary: [bge] panic when attaching if_bge on Supermicro H13SSL-N
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 14.0-STABLE
Hardware: amd64 Any
: --- Affects Only Me
Assignee: freebsd-net (Nobody)
URL:
Keywords: crash
Depends on:
Blocks:
 
Reported: 2024-03-20 22:31 UTC by Martin Matuska
Modified: 2024-11-29 21:51 UTC (History)
2 users (show)

See Also:


Attachments
Coredump text (33.75 KB, application/gzip)
2024-03-20 22:31 UTC, Martin Matuska
no flags Details
Backtrace (5.05 KB, text/plain)
2024-03-20 22:31 UTC, Martin Matuska
no flags Details
Workarounbd patch (4.79 KB, patch)
2024-04-03 23:19 UTC, Martin Matuska
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Matuska freebsd_committer freebsd_triage 2024-03-20 22:31:07 UTC
Created attachment 249353 [details]
Coredump text

I have a panic when attaching if_bge on a Supermicro H13SSL-N
The board has BCM7520 onboard LAN.
Comment 1 Martin Matuska freebsd_committer freebsd_triage 2024-03-20 22:31:43 UTC
Created attachment 249354 [details]
Backtrace
Comment 2 Martin Matuska freebsd_committer freebsd_triage 2024-03-20 22:42:31 UTC
The panic actually happens when calling SIOCGIFMEDIA in ifmedia_ioctl() in sys/net/if_media.c from get_operstate_ether() in /usr/src/sys/netlink/route/iface.c

(*ifm->ifm_status)(ifp, ifmr);

with unpopulated ifm:
ifm->ifm_mask = 0
ifm->ifm_media = 0
ifm->ifm_cur = 0x0
ifm->ifm_change = 0x0
ifm->ifm_status = 0x0
Comment 3 Martin Matuska freebsd_committer freebsd_triage 2024-04-03 23:17:49 UTC
I can confirm that this workaround avoids the panic makes the network card operate:

--- a/sys/net/if_media.c
+++ b/sys/net/if_media.c
@@ -272,19 +272,19 @@ ifmedia_ioctl(struct ifnet *ifp, struct ifreq *ifr, struct ifmedia *ifm,
        /*
         * Get list of available media and current media on interface.
         */
        case  SIOCGIFMEDIA: 
        case  SIOCGIFXMEDIA: 
        {
                struct ifmedia_entry *ep;
                int i;
 
-               if (ifmr->ifm_count < 0)
+               if (ifmr->ifm_count < 0 || ifm->ifm_status == NULL)
                        return (EINVAL);
 
                if (cmd == SIOCGIFMEDIA) {
                        ifmr->ifm_active = ifmr->ifm_current = ifm->ifm_cur ?
                            compat_media(ifm->ifm_cur->ifm_media) : IFM_NONE;
                } else {
                        ifmr->ifm_active = ifmr->ifm_current = ifm->ifm_cur ?
                            ifm->ifm_cur->ifm_media : IFM_NONE;
                }
Comment 4 Martin Matuska freebsd_committer freebsd_triage 2024-04-03 23:19:54 UTC
Created attachment 249689 [details]
Workarounbd patch
Comment 5 Zhenlei Huang freebsd_committer freebsd_triage 2024-10-31 10:04:39 UTC
(In reply to Martin Matuska from comment #2)
> with unpopulated ifm:
> ifm->ifm_mask = 0
> ifm->ifm_media = 0
> ifm->ifm_cur = 0x0
> ifm->ifm_change = 0x0
> ifm->ifm_status = 0x0

That looks weird.

From line 3821 to 3842, either TBI mode or not the ifmedia data is initialized correctly.

For if_bge, the ifp->if_ioctl should be bge_ioctl(). Can you get that frame ( if possible ), and examine `sc->bge_flags`, `sc->bge_ifmedia` and `sc->bge_miibus->softc->mii_media` ? sc == ifp->if_softc .
Comment 6 Franco Fichtner 2024-10-31 10:41:26 UTC
Sounds like https://cgit.freebsd.org/src/commit/?id=f2daf89954a we recently came across.
Comment 7 Franco Fichtner 2024-11-29 21:42:03 UTC
Since we found another one in sfxge(4) just now I've looked at this again. TBI mode seems to have the right ordering but I don't have the hardware. Is this bug for non-TBI perhaps?  This never calls ifmedia_init() which could cause the issue.

Judging from bug appearing in at least 3 drivers shouldn't we rather safeguard the internal function pointers in the ifmedia internals and/or provide a no-op function there by default?  It seems brittle as it is now.


Cheers,
Franco
Comment 8 Franco Fichtner 2024-11-29 21:51:42 UTC
Well, if the assumption is correct a non-TBI ifmedia_init() call with existing no-op functions passed should work around the issue from the driver side, but it would be nicer to straighten out the actual desired behaviour in the subsystem instead of letting it run into NULL function pointers.