| Summary: | "netstat -i" for ixl/lagg shows idrop as 18446744073709551612 (-4) - incorrectly intialized counters? | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Base System | Reporter: | Peter Eriksson <pen> | ||||||
| Component: | kern | Assignee: | freebsd-bugs (Nobody) <bugs> | ||||||
| Status: | Closed FIXED | ||||||||
| Severity: | Affects Some People | CC: | brian90013 | ||||||
| Priority: | --- | ||||||||
| Version: | 12.3-RELEASE | ||||||||
| Hardware: | amd64 | ||||||||
| OS: | Any | ||||||||
| Attachments: |
|
||||||||
|
Description
Peter Eriksson
2019-05-15 20:15:39 UTC
The incorrect counters still seems to be there for FreeBSD 12.2 with ixl & lagg interfaces: This is the output from: (netstat -i|netstat -ih )| egrep 'Name|ixl|lagg' Caught during the boot phase (modified /etc/rc.d/netif) Name Mtu Network Address Ipkts Ierrs Idrop Opkts Oerrs Coll ixl0 1500 <Link#5> 3c:fd:fe:2f:36:a0 281474976710650 0 4294966954 281474976710650 0 0 ixl1* 1500 <Link#6> 3c:fd:fe:2f:36:a2 0 0 0 0 0 0 ixl2 1500 <Link#7> 3c:fd:fe:2f:36:a0 281474976710617 0 4294966953 281474976710643 0 0 ixl3* 1500 <Link#8> 3c:fd:fe:2f:36:a6 0 0 0 0 0 0 lagg0 1500 <Link#10> 3c:fd:fe:2f:36:a0 562949953421267 0 8589933907 562949953421293 0 0 lagg0 - 130.236.8.32/ runur00.it.liu.se 0 - - 0 - - lagg0 - fe80::%lagg0/ fe80::3efd:feff:f 0 - - 1 - - lagg0 - 2001:6b0:17:2 runur00.it.liu.se 0 - - 1 - - Name Mtu Network Address Ipkts Ierrs Idrop Opkts Oerrs Coll ixl0 1.5K <Link#5> 3c:fd:fe:2f:36:a0 281T 0 4.3G 281T 0 0 ixl1* 1.5K <Link#6> 3c:fd:fe:2f:36:a2 0 0 0 0 0 0 ixl2 1.5K <Link#7> 3c:fd:fe:2f:36:a0 281T 0 4.3G 281T 0 0 ixl3* 1.5K <Link#8> 3c:fd:fe:2f:36:a6 0 0 0 0 0 0 lagg0 1.5K <Link#10> 3c:fd:fe:2f:36:a0 563T 0 8.6G 563T 0 0 lagg0 - 130.236.8.32/ runur00.it.liu.se 0 - - 0 - - lagg0 - fe80::%lagg0/ fe80::3efd:feff:f 0 - - 1 - - lagg0 - 2001:6b0:17:2 runur00.it.liu.se 0 - - 1 - - Some time after it has booted this is reported: Name Mtu Network Address Ipkts Ierrs Idrop Opkts Oerrs Coll ixl0 1500 <Link#5> 3c:fd:fe:2f:36:a0 537485 0 4294966954 149747 0 0 ixl1* 1500 <Link#6> 3c:fd:fe:2f:36:a2 0 0 0 0 0 0 ixl2 1500 <Link#7> 3c:fd:fe:2f:36:a0 1278 0 4294966953 210 0 0 ixl3* 1500 <Link#8> 3c:fd:fe:2f:36:a6 0 0 0 0 0 0 lagg0 1500 <Link#10> 3c:fd:fe:2f:36:a0 538801 0 8589933907 149957 0 0 lagg0 - 130.236.8.32/ runur00.it.liu.se 120394 - - 117460 - - lagg0 - fe80::%lagg0/ fe80::3efd:feff:f 194 - - 308 - - lagg0 - 2001:6b0:17:2 runur00.it.liu.se 8183 - - 9551 - - Name Mtu Network Address Ipkts Ierrs Idrop Opkts Oerrs Coll ixl0 1.5K <Link#5> 3c:fd:fe:2f:36:a0 537k 0 4.3G 150k 0 0 ixl1* 1.5K <Link#6> 3c:fd:fe:2f:36:a2 0 0 0 0 0 0 ixl2 1.5K <Link#7> 3c:fd:fe:2f:36:a0 1.3k 0 4.3G 210 0 0 ixl3* 1.5K <Link#8> 3c:fd:fe:2f:36:a6 0 0 0 0 0 0 lagg0 1.5K <Link#10> 3c:fd:fe:2f:36:a0 539k 0 8.6G 150k 0 0 lagg0 - 130.236.8.32/ runur00.it.liu.se 120k - - 117k - - lagg0 - fe80::%lagg0/ fe80::3efd:feff:f 194 - - 308 - - lagg0 - 2001:6b0:17:2 runur00.it.liu.se 8.2k - - 9.6k - - This is one a basically idle test machine (with some, but not much traffic). 8.6G idrops on 539k packets? :-) Created attachment 233858 [details]
Patch to call ixl_vsi_reset_stats()
Hello,
I am seeing the same issue with the ixl driver on FreeBSD-12.3 and believe I have a fix.
The hardware counters can start with any value so the ixl_stat_update48()/ixl_stat_update32() functions check if this is the first read. If so, they save the current value as an offset which is subtracted from all later reads.
On two different servers (one using ixl for 10G the other for 40G) I used debugging output to determine some counters were small but non-zero at boot time. These values are saved as the offsets. Then the counters appear to be reset to 0 so on the next call newval < oldval and return_value = newval + (1<<bitsize) - offset = 0 + (1<<bitsize) - offset producing the huge values we see. Note the multicast counters are 48 bits (281T) and the discard counter is 32 bit (4.3G). If you run netstat with the -a option you can see the contribution from the multicast registers. On my machines multicast represents the total count.
There is an ixl_vsi_reset_stats() function that resets the stats, the offsets, and the flag indicating offsets have been set. This function is not called anywhere in the driver. Looking at the ice driver, it has a similarly named function ice_reset_vsi_stats() that is called at the end of ice_initialize_vsi(). Back to ixl, there is a ixl_initialize_vsi() function which seems like a likely place to add the call.
I made the addition and tested on both machines. Across multiple reboots the elapsed stats have always started at zero. I have observed no change in stats when carrier is lost or the transceiver is pulled. The stats do reset to 0 when the interface is administratively marked 'DOWN' and then 'UP'.
Created attachment 233877 [details]
Revised v2 patch to fix counters
My apologies but the first patch should not be used. It placed the vsi_reset_stats() call in initialize_vsi() which gets called every time the interface comes up. Someone watching the interface with 'netstat -I xxx -w 1' will see a jump near 2^64 as the counters are reset.
Therefore I moved the vsi_reset_stats() call to if_attach_post() where it is right next to the pf_reset_stats() call. This function is not called when an interface flaps down and then back up.
I also removed the call to update_stats_counters() in if_attach_post(). Now the only call to update_stats_counters() is in if_timer(). This matches the behavior of the ice driver and produces the expected results in my testing. Elapsed metrics are all 0 on boot and periodic metrics don't jump when an interface comes up.
I'm not seeing this behaviour in FreeBSD 13.2 so I think we can close this bug now... |