Bug 279352 - dpaa2 (dpni0 device): interface goes catatonic periodically
Summary: dpaa2 (dpni0 device): interface goes catatonic periodically
Status: In Progress
Alias: None
Product: Base System
Classification: Unclassified
Component: arm (show other bugs)
Version: 14.0-RELEASE
Hardware: arm64 Any
: --- Affects Only Me
Assignee: Dmitry Salychev
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-05-27 16:40 UTC by Justin Hibbits
Modified: 2024-06-06 13:11 UTC (History)
2 users (show)

See Also:


Attachments
Output of sysctl and top from working dpaa2 period (8.20 KB, text/plain)
2024-06-03 14:00 UTC, Justin Hibbits
no flags Details
Output of sysctl and top from failing dpaa2 period (5.67 KB, text/plain)
2024-06-05 19:04 UTC, Justin Hibbits
no flags Details
Output of sysctl and top from failing dpaa2 period (take 2) (6.61 KB, text/plain)
2024-06-06 13:11 UTC, Justin Hibbits
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Justin Hibbits freebsd_committer freebsd_triage 2024-05-27 16:40:04 UTC
On my Honeycomb LX2 system the dpni0 device suddenly stops working after a while, and the only way to get it working, short of rebooting, is to physically flap the connection, by either pulling the ethernet cable or disabling and re-enabling the port through the switch it's connected to.  I don't see any logs leading up to the problem state.

Some details that may be interesting: multiple VNET jails, plus one VLAN.

I'm willing to run most any dtrace script to narrow down the problem, or test a patch against 14.0-RELEASE to fix the problem.
Comment 1 Bjoern A. Zeeb freebsd_committer freebsd_triage 2024-05-27 22:12:47 UTC
Is this related to 964b3408fa872178aacf58f2d84dc43564ec0aa7 ?

Hmm are you willing to patch the kernel to add some more sysctl/stats?  I have a local patch (somewhere in the multitude of updates in a local branch) that adds the internal view on the links to sysctls which would likely also be as helpful as ifconfig -mv dpni0 in that case?


Side comment, does the VLAN work for you on 14.0 w/o lowering the MTU?
Comment 2 Justin Hibbits freebsd_committer freebsd_triage 2024-05-27 22:30:46 UTC
(In reply to Bjoern A. Zeeb from comment #1)

Not sure, it could be related to that change.  Though in my case it starts working at boot, and just stops after a while, but "a while" seems to change (could be a day, could be a month between having to do the unplug dance).

I haven't changed the MTU at all on the VLAN, so just doing whatever is default.

Here's the ifconfig -mv output:

dpni0: flags=1008943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu 1500
        options=80028<VLAN_MTU,JUMBO_MTU,LINKSTATE>
        capabilities=8002b<RXCSUM,TXCSUM,VLAN_MTU,JUMBO_MTU,LINKSTATE>
        ether YY:YY:YY:YY:YY:YY
        inet XXX.XXX.XXX.XXX netmask 0xffffff00 broadcast XXX.XXX.XXX.255
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
        supported media:
                media autoselect
                media 1000baseT mediaopt full-duplex,master
                media 1000baseT mediaopt full-duplex
                media 1000baseSX mediaopt full-duplex
                media 100baseTX mediaopt full-duplex
                media 100baseTX
                media 10baseT/UTP mediaopt full-duplex
                media 10baseT/UTP
                media none
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>

I'll gladly test a patch you throw at me, as long as it's against 14.0 (not adventurous enough to run -CURRENT on my NAS :) )
Comment 3 Dmitry Salychev freebsd_committer freebsd_triage 2024-06-03 11:36:24 UTC
(In reply to Justin Hibbits from comment #0)

Could you show an output of:

1. sysctl dev.dpaa2_ni
2. top -SjwHPz -mcpu
3. vmstat -i | grep dpaa2

on a normally working system and after a moment when dpni0 stops working?
Comment 4 Dmitry Salychev freebsd_committer freebsd_triage 2024-06-03 13:16:41 UTC
Btw, were you stressing your NAS anyhow, i.e. was dpni0 pushed to max throughput at 1500 MTU (~970 Mbps, 121 MB/s)?
Comment 5 Justin Hibbits freebsd_committer freebsd_triage 2024-06-03 13:55:59 UTC
(In reply to Dmitry Salychev from comment #3)

My NAS is rarely stressed, though sometimes (not every time) when I do pkg upgrades the issue does hit, but often it's pretty idle when the issue hits.  I just set up a script to check the link every minute and dump the sysctl and top output to files when it fails to ping my router.  I'll attach those when it happens, and I'll attach the "good" ones now.
Comment 6 Justin Hibbits freebsd_committer freebsd_triage 2024-06-03 14:00:38 UTC
Created attachment 251195 [details]
Output of sysctl and top from working dpaa2 period
Comment 7 Dmitry Salychev freebsd_committer freebsd_triage 2024-06-03 17:25:42 UTC
(In reply to Justin Hibbits from comment #6)

Could you include vmstat output as well?

Regarding the "bad" state of the dpni0, it isn't possible to get it back to work without rebooting but re-plugging a cable back, is it?
Comment 8 Dmitry Salychev freebsd_committer freebsd_triage 2024-06-03 17:34:01 UTC
And "uname -apKU", please :)
Comment 9 Justin Hibbits freebsd_committer freebsd_triage 2024-06-03 17:37:26 UTC
(In reply to Dmitry Salychev from comment #8)

FreeBSD alexandria.knownspace 14.0-RELEASE-p3 FreeBSD 14.0-RELEASE-p3 #0: Mon Dec 11 05:07:37 UTC 2023     root@arm64-builder.daemonology.net:/usr/obj/usr/src/arm64.aarch64/sys/GENERIC arm64 aarch64 1400097 1400097
Comment 10 Justin Hibbits freebsd_committer freebsd_triage 2024-06-05 19:04:39 UTC
Created attachment 251240 [details]
Output of sysctl and top from failing dpaa2 period

Took a few days, but here's the output shortly after failure.
Comment 11 Bjoern A. Zeeb freebsd_committer freebsd_triage 2024-06-05 20:35:21 UTC
Comment on attachment 251240 [details]
Output of sysctl and top from failing dpaa2 period

How did you fix it?
Sorry, no ifconfig -vm output; can you add that to the script?
I am still working on the patches, sorry.

does an ifconfig dpni0 media auto  or something like this fix it without having to walk to the machine?
Comment 12 Justin Hibbits freebsd_committer freebsd_triage 2024-06-05 20:46:38 UTC
(In reply to Bjoern A. Zeeb from comment #11)

I can try the `ifconfig dpni0 media auto` when it stops again.  If at my computer I restore it by flapping the port on my switch.

I added ifconfig -vm to my script and restarted it.

@dsl what vmstat output would you like to see?  Had missed that earlier.
Comment 13 Bjoern A. Zeeb freebsd_committer freebsd_triage 2024-06-05 21:18:25 UTC
(In reply to Justin Hibbits from comment #12)

Thanks; of media none; followed by media auto;  simply toggling it should do the trick.
Comment 14 Justin Hibbits freebsd_committer freebsd_triage 2024-06-06 13:11:51 UTC
Created attachment 251252 [details]
Output of sysctl and top from failing dpaa2 period (take 2)

This one includes vmstat and ifconfig outputs, too.

@bz `ifconfig dpni0 media none; ifconfig dpni0 media auto` worked to restore, so this can be a workaround for now.