On my Honeycomb LX2 system the dpni0 device suddenly stops working after a while, and the only way to get it working, short of rebooting, is to physically flap the connection, by either pulling the ethernet cable or disabling and re-enabling the port through the switch it's connected to. I don't see any logs leading up to the problem state. Some details that may be interesting: multiple VNET jails, plus one VLAN. I'm willing to run most any dtrace script to narrow down the problem, or test a patch against 14.0-RELEASE to fix the problem.
Is this related to 964b3408fa872178aacf58f2d84dc43564ec0aa7 ? Hmm are you willing to patch the kernel to add some more sysctl/stats? I have a local patch (somewhere in the multitude of updates in a local branch) that adds the internal view on the links to sysctls which would likely also be as helpful as ifconfig -mv dpni0 in that case? Side comment, does the VLAN work for you on 14.0 w/o lowering the MTU?
(In reply to Bjoern A. Zeeb from comment #1) Not sure, it could be related to that change. Though in my case it starts working at boot, and just stops after a while, but "a while" seems to change (could be a day, could be a month between having to do the unplug dance). I haven't changed the MTU at all on the VLAN, so just doing whatever is default. Here's the ifconfig -mv output: dpni0: flags=1008943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu 1500 options=80028<VLAN_MTU,JUMBO_MTU,LINKSTATE> capabilities=8002b<RXCSUM,TXCSUM,VLAN_MTU,JUMBO_MTU,LINKSTATE> ether YY:YY:YY:YY:YY:YY inet XXX.XXX.XXX.XXX netmask 0xffffff00 broadcast XXX.XXX.XXX.255 media: Ethernet autoselect (1000baseT <full-duplex>) status: active supported media: media autoselect media 1000baseT mediaopt full-duplex,master media 1000baseT mediaopt full-duplex media 1000baseSX mediaopt full-duplex media 100baseTX mediaopt full-duplex media 100baseTX media 10baseT/UTP mediaopt full-duplex media 10baseT/UTP media none nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> I'll gladly test a patch you throw at me, as long as it's against 14.0 (not adventurous enough to run -CURRENT on my NAS :) )
(In reply to Justin Hibbits from comment #0) Could you show an output of: 1. sysctl dev.dpaa2_ni 2. top -SjwHPz -mcpu 3. vmstat -i | grep dpaa2 on a normally working system and after a moment when dpni0 stops working?
Btw, were you stressing your NAS anyhow, i.e. was dpni0 pushed to max throughput at 1500 MTU (~970 Mbps, 121 MB/s)?
(In reply to Dmitry Salychev from comment #3) My NAS is rarely stressed, though sometimes (not every time) when I do pkg upgrades the issue does hit, but often it's pretty idle when the issue hits. I just set up a script to check the link every minute and dump the sysctl and top output to files when it fails to ping my router. I'll attach those when it happens, and I'll attach the "good" ones now.
Created attachment 251195 [details] Output of sysctl and top from working dpaa2 period
(In reply to Justin Hibbits from comment #6) Could you include vmstat output as well? Regarding the "bad" state of the dpni0, it isn't possible to get it back to work without rebooting but re-plugging a cable back, is it?
And "uname -apKU", please :)
(In reply to Dmitry Salychev from comment #8) FreeBSD alexandria.knownspace 14.0-RELEASE-p3 FreeBSD 14.0-RELEASE-p3 #0: Mon Dec 11 05:07:37 UTC 2023 root@arm64-builder.daemonology.net:/usr/obj/usr/src/arm64.aarch64/sys/GENERIC arm64 aarch64 1400097 1400097
Created attachment 251240 [details] Output of sysctl and top from failing dpaa2 period Took a few days, but here's the output shortly after failure.
Comment on attachment 251240 [details] Output of sysctl and top from failing dpaa2 period How did you fix it? Sorry, no ifconfig -vm output; can you add that to the script? I am still working on the patches, sorry. does an ifconfig dpni0 media auto or something like this fix it without having to walk to the machine?
(In reply to Bjoern A. Zeeb from comment #11) I can try the `ifconfig dpni0 media auto` when it stops again. If at my computer I restore it by flapping the port on my switch. I added ifconfig -vm to my script and restarted it. @dsl what vmstat output would you like to see? Had missed that earlier.
(In reply to Justin Hibbits from comment #12) Thanks; of media none; followed by media auto; simply toggling it should do the trick.
Created attachment 251252 [details] Output of sysctl and top from failing dpaa2 period (take 2) This one includes vmstat and ifconfig outputs, too. @bz `ifconfig dpni0 media none; ifconfig dpni0 media auto` worked to restore, so this can be a workaround for now.