Bug 237915 - "netstat -i" for ixl/lagg shows idrop as 18446744073709551612 (-4) - incorrectly intialized counters?
Summary: "netstat -i" for ixl/lagg shows idrop as 18446744073709551612 (-4) - incorrec...
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 12.3-RELEASE
Hardware: amd64 Any
: --- Affects Some People
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-05-15 20:15 UTC by Peter Eriksson
Modified: 2023-08-03 21:31 UTC (History)
1 user (show)

See Also:


Attachments
Patch to call ixl_vsi_reset_stats() (361 bytes, patch)
2022-05-11 15:55 UTC, Brian Poole
no flags Details | Diff
Revised v2 patch to fix counters (295 bytes, patch)
2022-05-12 17:41 UTC, Brian Poole
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Peter Eriksson 2019-05-15 20:15:39 UTC
lagg0 interface stat "Idrop" starts at -4 errors? (lagg0 = ixl0 + ixl2, which both have zero idrops...)


# uname -a
FreeBSD runur00 12.0-RELEASE-p4 FreeBSD 12.0-RELEASE-p4 GENERIC  amd64

# netstat -i
Name    Mtu Network       Address              Ipkts Ierrs Idrop    Opkts Oerrs  Coll
bge0*  1500 <Link#1>      80:18:44:ec:5a:56        0     0     0        0     0     0
bge1*  1500 <Link#2>      80:18:44:ec:5a:57        0     0     0        0     0     0
bge2   1500 <Link#3>      80:18:44:ec:5a:54        0     0     0        0     0     0
bge3*  1500 <Link#4>      80:18:44:ec:5a:55        0     0     0        0     0     0
ixl0   1500 <Link#5>      3c:fd:fe:2f:36:a0  1079899     0     0  1081493     0     0
ixl1*  1500 <Link#6>      3c:fd:fe:2f:36:a2       12     0    18        6     0     0
ixl2   1500 <Link#7>      3c:fd:fe:2f:36:a0   831962     0     0   984116     0     0
ixl3*  1500 <Link#8>      3c:fd:fe:2f:36:a6       12     0    21        6     0     0
lo0   16384 <Link#9>      lo0                   3120     0     0     2970     0     0
lo0       - localhost     localhost             1485     -     -     1485     -     -
lo0       - fe80::%lo0/64 fe80::1%lo0              0     -     -        0     -     -
lo0       - your-net      localhost             1485     -     -     1485     -     -
lagg0  1500 <Link#10>     3c:fd:fe:2f:36:a0  1912011     0 18446744073709551612  2065609     7     0
lagg0     - 130.236.8.32/ runur00.it.liu.se   682130     -     -   602828     -     -
lagg0     - fe80::%lagg0/ fe80::3efd:feff:f        0     -     -        4     -     -
lagg0     - 2001:6b0:17:2 runur00.it.liu.se    99324     -     -    97186     -     -
pflog 33160 <Link#11>     pflog0                   0     0     0        0     0     0

# netstat -i -h
Name    Mtu Network       Address              Ipkts Ierrs Idrop    Opkts Oerrs  Coll
bge0*  1.5K <Link#1>      80:18:44:ec:5a:56        0     0     0        0     0     0
bge1*  1.5K <Link#2>      80:18:44:ec:5a:57        0     0     0        0     0     0
bge2   1.5K <Link#3>      80:18:44:ec:5a:54        0     0     0        0     0     0
bge3*  1.5K <Link#4>      80:18:44:ec:5a:55        0     0     0        0     0     0
ixl0   1.5K <Link#5>      3c:fd:fe:2f:36:a0     1.1M     0     0     1.1M     0     0
ixl1*  1.5K <Link#6>      3c:fd:fe:2f:36:a2       12     0    18        6     0     0
ixl2   1.5K <Link#7>      3c:fd:fe:2f:36:a0     832k     0     0     984k     0     0
ixl3*  1.5K <Link#8>      3c:fd:fe:2f:36:a6       12     0    21        6     0     0
lo0     16K <Link#9>      lo0                   3.1k     0     0     3.0k     0     0
lo0       - localhost     localhost             1.5k     -     -     1.5k     -     -
lo0       - fe80::%lo0/64 fe80::1%lo0              0     -     -        0     -     -
lo0       - your-net      localhost             1.5k     -     -     1.5k     -     -
lagg0  1.5K <Link#10>     3c:fd:fe:2f:36:a0     1.9M     0    -4     2.1M     7     0
lagg0     - 130.236.8.32/ runur00.it.liu.se     682k     -     -     603k     -     -
lagg0     - fe80::%lagg0/ fe80::3efd:feff:f        0     -     -        4     -     -
lagg0     - 2001:6b0:17:2 runur00.it.liu.se      99k     -     -      97k     -     -
pflog   32K <Link#11>     pflog0                   0     0     0        0     0     0

# ifconfig lagg0
lagg0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
	options=e507bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWFILTER,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
	ether 3c:fd:fe:2f:36:a0
	inet 130.236.8.38 netmask 0xffffffe0 broadcast 130.236.8.63
	inet6 fe80::3efd:feff:fe2f:36a0%lagg0 prefixlen 64 scopeid 0xa
	inet6 2001:6b0:17:2400::8:38 prefixlen 64
	laggproto lacp lagghash l2,l3,l4
	laggport: ixl0 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
	laggport: ixl2 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
	groups: lagg
	media: Ethernet autoselect
	status: active
	nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
Comment 1 Peter Eriksson 2021-02-20 15:01:29 UTC
The incorrect counters still seems to be there for FreeBSD 12.2 with ixl & lagg interfaces:

This is the output from:
  (netstat -i|netstat -ih )| egrep 'Name|ixl|lagg'

Caught during the boot phase (modified /etc/rc.d/netif)

Name    Mtu Network       Address              Ipkts Ierrs Idrop    Opkts Oerrs  Coll
ixl0   1500 <Link#5>      3c:fd:fe:2f:36:a0 281474976710650     0 4294966954 281474976710650     0     0
ixl1*  1500 <Link#6>      3c:fd:fe:2f:36:a2        0     0     0        0     0     0
ixl2   1500 <Link#7>      3c:fd:fe:2f:36:a0 281474976710617     0 4294966953 281474976710643     0     0
ixl3*  1500 <Link#8>      3c:fd:fe:2f:36:a6        0     0     0        0     0     0
lagg0  1500 <Link#10>     3c:fd:fe:2f:36:a0 562949953421267     0 8589933907 562949953421293     0     0
lagg0     - 130.236.8.32/ runur00.it.liu.se        0     -     -        0     -     -
lagg0     - fe80::%lagg0/ fe80::3efd:feff:f        0     -     -        1     -     -
lagg0     - 2001:6b0:17:2 runur00.it.liu.se        0     -     -        1     -     -

Name    Mtu Network       Address              Ipkts Ierrs Idrop    Opkts Oerrs  Coll
ixl0   1.5K <Link#5>      3c:fd:fe:2f:36:a0     281T     0  4.3G     281T     0     0
ixl1*  1.5K <Link#6>      3c:fd:fe:2f:36:a2        0     0     0        0     0     0
ixl2   1.5K <Link#7>      3c:fd:fe:2f:36:a0     281T     0  4.3G     281T     0     0
ixl3*  1.5K <Link#8>      3c:fd:fe:2f:36:a6        0     0     0        0     0     0
lagg0  1.5K <Link#10>     3c:fd:fe:2f:36:a0     563T     0  8.6G     563T     0     0
lagg0     - 130.236.8.32/ runur00.it.liu.se        0     -     -        0     -     -
lagg0     - fe80::%lagg0/ fe80::3efd:feff:f        0     -     -        1     -     -
lagg0     - 2001:6b0:17:2 runur00.it.liu.se        0     -     -        1     -     -


Some time after it has booted this is reported:

Name    Mtu Network       Address              Ipkts Ierrs Idrop    Opkts Oerrs  Coll
ixl0   1500 <Link#5>      3c:fd:fe:2f:36:a0   537485     0 4294966954   149747     0     0
ixl1*  1500 <Link#6>      3c:fd:fe:2f:36:a2        0     0     0        0     0     0
ixl2   1500 <Link#7>      3c:fd:fe:2f:36:a0     1278     0 4294966953      210     0     0
ixl3*  1500 <Link#8>      3c:fd:fe:2f:36:a6        0     0     0        0     0     0
lagg0  1500 <Link#10>     3c:fd:fe:2f:36:a0   538801     0 8589933907   149957     0     0
lagg0     - 130.236.8.32/ runur00.it.liu.se   120394     -     -   117460     -     -
lagg0     - fe80::%lagg0/ fe80::3efd:feff:f      194     -     -      308     -     -
lagg0     - 2001:6b0:17:2 runur00.it.liu.se     8183     -     -     9551     -     -

Name    Mtu Network       Address              Ipkts Ierrs Idrop    Opkts Oerrs  Coll
ixl0   1.5K <Link#5>      3c:fd:fe:2f:36:a0     537k     0  4.3G     150k     0     0
ixl1*  1.5K <Link#6>      3c:fd:fe:2f:36:a2        0     0     0        0     0     0
ixl2   1.5K <Link#7>      3c:fd:fe:2f:36:a0     1.3k     0  4.3G      210     0     0
ixl3*  1.5K <Link#8>      3c:fd:fe:2f:36:a6        0     0     0        0     0     0
lagg0  1.5K <Link#10>     3c:fd:fe:2f:36:a0     539k     0  8.6G     150k     0     0
lagg0     - 130.236.8.32/ runur00.it.liu.se     120k     -     -     117k     -     -
lagg0     - fe80::%lagg0/ fe80::3efd:feff:f      194     -     -      308     -     -
lagg0     - 2001:6b0:17:2 runur00.it.liu.se     8.2k     -     -     9.6k     -     -


This is one a basically idle test machine (with some, but not much traffic). 8.6G idrops on 539k packets? :-)
Comment 2 Brian Poole 2022-05-11 15:55:54 UTC
Created attachment 233858 [details]
Patch to call ixl_vsi_reset_stats()

Hello,

I am seeing the same issue with the ixl driver on FreeBSD-12.3 and believe I have a fix.

The hardware counters can start with any value so the ixl_stat_update48()/ixl_stat_update32() functions check if this is the first read. If so, they save the current value as an offset which is subtracted from all later reads.

On two different servers (one using ixl for 10G the other for 40G) I used debugging output to determine some counters were small but non-zero at boot time. These values are saved as the offsets. Then the counters appear to be reset to 0 so on the next call newval < oldval and return_value = newval + (1<<bitsize) - offset = 0 + (1<<bitsize) - offset producing the huge values we see. Note the multicast counters are 48 bits (281T) and the discard counter is 32 bit (4.3G). If you run netstat with the -a option you can see the contribution from the multicast registers. On my machines multicast represents the total count.


There is an ixl_vsi_reset_stats() function that resets the stats, the offsets, and the flag indicating offsets have been set. This function is not called anywhere in the driver. Looking at the ice driver, it has a similarly named function ice_reset_vsi_stats() that is called at the end of ice_initialize_vsi(). Back to ixl, there is a ixl_initialize_vsi() function which seems like a likely place to add the call.

I made the addition and tested on both machines. Across multiple reboots the elapsed stats have always started at zero. I have observed no change in stats when carrier is lost or the transceiver is pulled. The stats do reset to 0 when the interface is administratively marked 'DOWN' and then 'UP'.
Comment 3 Brian Poole 2022-05-12 17:41:51 UTC
Created attachment 233877 [details]
Revised v2 patch to fix counters

My apologies but the first patch should not be used. It placed the vsi_reset_stats() call in initialize_vsi() which gets called every time the interface comes up. Someone watching the interface with 'netstat -I xxx -w 1' will see a jump near 2^64 as the counters are reset.

Therefore I moved the vsi_reset_stats() call to if_attach_post() where it is right next to the pf_reset_stats() call. This function is not called when an interface flaps down and then back up.

I also removed the call to update_stats_counters() in if_attach_post(). Now the only call to update_stats_counters() is in if_timer(). This matches the behavior of the ice driver and produces the expected results in my testing. Elapsed metrics are all 0 on boot and periodic metrics don't jump when an interface comes up.
Comment 4 Peter Eriksson 2023-08-03 21:31:49 UTC
I'm not seeing this behaviour in FreeBSD 13.2 so I think we can close this bug now...