Bug 235031

Summary: [em] em0: poor NFS performance, strange behavior
Product: Base System Reporter: Martin Birgmeier <d8zNeCFG>
Component: kernAssignee: freebsd-net (Nobody) <net>
Status: Open ---    
Severity: Affects Only Me CC: cy, decke, eugen, mach, marko.cupac, naito.yuichiro, pi, rgrimes, rmacklem, sbruno, shurd
Priority: --- Keywords: IntelNetworking, regression
Version: 12.0-RELEASE   
Hardware: Any   
OS: Any   
Attachments:
Description Flags
loader.conf
none
sysctl.conf none

Description Martin Birgmeier 2019-01-17 20:59:41 UTC
Scenario
========
I have a laptop with the following for em0 in /var/run/dmesg.boot:

em0: <Intel(R) PRO/1000 Network Connection> port 0x6040-0x605f mem 0xd5300000-0xd531ffff,0xd532b000-0xd532bfff irq 20 at device 25.0 on pci0
em0: attach_pre capping queues at 1
em0: using 1024 tx descriptors and 1024 rx descriptors
em0: msix_init qsets capped at 1
em0: Unable to map MSIX table 
em0: Using an MSI interrupt
em0: allocated for 1 tx_queues
em0: allocated for 1 rx_queues

[0]# ifconfig em0
em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=81249b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,LRO,WOL_MAGIC,VLAN_HWFILTER>
        ether f0:de:f1:98:86:a9
        inet 192.168.1.19 netmask 0xffffff00 broadcast 192.168.1.255 
        inet6 fe80::f2de:f1ff:fe98:86a9%em0 prefixlen 64 scopeid 0x1 
        inet6 fec0:0:0:4d42::13 prefixlen 64 
        inet6 fec0::4d42:f2de:f1ff:fe98:86a9 prefixlen 64 autoconf 
        inet6 2002:5875:4c34:4d42:f2de:f1ff:fe98:86a9 prefixlen 64 autoconf 
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
        nd6 options=23<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL>
[0]# 

This laptop exports a filesystem via NFS to a second machine running a 100 Mbit interface.

The second machine performs read requests via NFS.

Result
======
The result is that the NFS performance is very poor, on the order of 500 kBps to 1 MBps (5..10% of the 100 Mbps of the second machine).

Scenario (continued)
====================
There is a third machine with a gigabit interface.

Result
======
If I run iperf between the laptop and the third machine, the read performance for the second machine increases (!) to several MB/s (20..40% of the 100 Mbps of the second machine). The iperf performance itself is close to wire speed (914 Mbps).

Note
====
Maybe this problem is the one which underlies the issue I reported in bug #234520; machine B in that scenario is the laptop, machine A is the third machine.
Comment 1 Martin Birgmeier 2019-01-17 21:03:47 UTC
Another note: performance was fine under 11.2.
Comment 2 Martin Birgmeier 2019-01-17 21:13:08 UTC
Another scenario
================

Just for experimenting, I issued the following commands:

ifconfig em0 -rxcsum .............. result: connectivity to other machines lost for existing connections (TCP and UDP), new connections can be established
ifconfig em0 rxcsum ............... result: same
ifconfig em0 down ................. result: same
ifocnfig em0 up ................... result: same

In addition, these actions resulted in the following message being repeated on the console every few seconds:

TX(0) desc avail = 1024, pidx = 0

I had to reboot the machine to get it working normally again.

[0]# grep em0 /var/log/messages
...
Jan 17 21:29:18 mizar kernel: em0: link state changed to DOWN
Jan 17 21:29:20 mizar kernel: em0: TX(0) desc avail = 1024, pidx = 0
Jan 17 21:30:02 mizar kernel: em0: TX(0) desc avail = 1024, pidx = 0
Jan 17 21:36:14 mizar kernel: em0: TX(0) desc avail = 1024, pidx = 0
Jan 17 21:36:22 mizar kernel: em0: TX(0) desc avail = 1024, pidx = 0
Jan 17 21:36:25 mizar kernel: em0: TX(0) desc avail = 1024, pidx = 0
Jan 17 21:36:28 mizar kernel: em0: TX(0) desc avail = 1024, pidx = 0
Jan 17 21:36:31 mizar kernel: em0: TX(0) desc avail = 1024, pidx = 0
Jan 17 21:37:59 mizar kernel: Syncing disks, vnodes remaining... 11 12 em0: TX(0) desc avail = 1024, pidx = 0
Jan 17 21:37:59 mizar kernel: 8 8 em0: TX(0) desc avail = 1024, pidx = 0
Jan 17 21:37:59 mizar kernel: 8 8 8 em0: TX(0) desc avail = 1024, pidx = 0
Jan 17 21:37:59 mizar kernel: 8 2 2 em0: TX(0) desc avail = 1024, pidx = 0
Jan 17 21:37:59 mizar kernel: 1 0 0 em0: TX(0) desc avail = 1024, pidx = 0
Jan 17 21:37:59 mizar kernel: 0 0 0 em0: TX(0) desc avail = 1024, pidx = 0
Jan 17 21:37:59 mizar kernel: Waiting (max 60 seconds) for system thread `bufspacedaemon-0' to stop... em0: TX(0) desc avail = 1024, pidx = 0
Jan 17 21:37:59 mizar kernel: Waiting (max 60 seconds) for system thread `bufspacedaemon-6' to stop... em0: TX(0) desc avail = 1024, pidx = 0
Jan 17 21:37:59 mizar kernel: em0: <Intel(R) PRO/1000 Network Connection> port 0x6040-0x605f mem 0xd5300000-0xd531ffff,0xd532b000-0xd532bfff irq 20 at device 25.0 on pci0
Jan 17 21:37:59 mizar kernel: em0: attach_pre capping queues at 1
Jan 17 21:37:59 mizar kernel: em0: using 1024 tx descriptors and 1024 rx descriptors
Jan 17 21:37:59 mizar kernel: em0: msix_init qsets capped at 1
Jan 17 21:37:59 mizar kernel: em0: Unable to map MSIX table 
Jan 17 21:37:59 mizar kernel: em0: Using an MSI interrupt
Jan 17 21:37:59 mizar kernel: em0: allocated for 1 tx_queues
Jan 17 21:37:59 mizar kernel: em0: allocated for 1 rx_queues
Jan 17 21:37:59 mizar kernel: em0: Ethernet address: f0:de:f1:98:86:a9
Jan 17 21:37:59 mizar kernel: em0: link state changed to UP
[0]#
Comment 3 Martin Birgmeier 2019-01-17 21:21:30 UTC
Next scenario:
==============

If I run

iperf -c hal -b 500m,0

i.e., using only 50% of the available bandwidth between the laptop and the third machine, NFS read performance between the laptop and the second machine increases to 5..7 MBps, i.e., 50..70% of the available bandwidth of 100 Mbps.

It seems that in this case there is sufficient but not too much activity on em0 in order to optimally talk to the second machine.

Maybe this is a case of lost or stuck interrupts? Or some problems with handling UDP?
Comment 4 Martin Birgmeier 2019-01-17 21:40:37 UTC
A tcpdump running on the laptop filtering packets to the second machine shows the following (mizar = laptop, gandalf = second machine):

22:37:34.527491 IP gandalf.xyzzy.uuidgen > mizar.xyzzy.nfsd: Flags [.], ack 83752633, win 3583, options [nop,nop,TS val 830264279 ecr 2409434096], length 0
22:37:34.527739 IP gandalf.xyzzy.uuidgen > mizar.xyzzy.nfsd: Flags [.], ack 83755529, win 3537, options [nop,nop,TS val 830264279 ecr 2409434096], length 0
22:37:34.527984 IP gandalf.xyzzy.uuidgen > mizar.xyzzy.nfsd: Flags [.], ack 83758425, win 3492, options [nop,nop,TS val 830264280 ecr 2409434096], length 0
22:37:34.528230 IP gandalf.xyzzy.uuidgen > mizar.xyzzy.nfsd: Flags [.], ack 83761321, win 3447, options [nop,nop,TS val 830264280 ecr 2409434096], length 0
22:37:34.528476 IP gandalf.xyzzy.uuidgen > mizar.xyzzy.nfsd: Flags [.], ack 83764217, win 3402, options [nop,nop,TS val 830264280 ecr 2409434096], length 0
22:37:34.528722 IP gandalf.xyzzy.uuidgen > mizar.xyzzy.nfsd: Flags [.], ack 83767113, win 3356, options [nop,nop,TS val 830264280 ecr 2409434096], length 0
22:37:34.620273 IP mizar.xyzzy.x11 > gandalf.xyzzy.60338: Flags [.], ack 8228, win 1026, options [nop,nop,TS val 4065857024 ecr 3569015095], length 0
22:37:34.620470 IP gandalf.xyzzy.uuidgen > mizar.xyzzy.nfsd: Flags [.], ack 83769069, win 4352, options [nop,nop,TS val 830264372 ecr 2409434097], length 0
22:37:34.625029 IP gandalf.xyzzy.uuidgen > mizar.xyzzy.nfsd: Flags [P.], seq 171240:171368, ack 83769069, win 4352, options [nop,nop,TS val 830264377 ecr 2409434097], length 128: NFS request xid 1082723581 124 read fh Unknown/F6D49DE9DE48A22F0A00F50500000000194002000000000000000000 65536 bytes @ 2883584
22:37:34.625262 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83769069:83770517, ack 171368, win 29127, options [nop,nop,TS val 2409434199 ecr 830264377], length 1448: NFS reply xid 1082723581 reply ok 1444 read
22:37:34.625268 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83770517:83771965, ack 171368, win 29127, options [nop,nop,TS val 2409434199 ecr 830264377], length 1448
22:37:34.625270 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83771965:83773413, ack 171368, win 29127, options [nop,nop,TS val 2409434199 ecr 830264377], length 1448
22:37:34.625273 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83773413:83774861, ack 171368, win 29127, options [nop,nop,TS val 2409434199 ecr 830264377], length 1448
22:37:34.625276 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83774861:83776309, ack 171368, win 29127, options [nop,nop,TS val 2409434199 ecr 830264377], length 1448
22:37:34.625278 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83776309:83777757, ack 171368, win 29127, options [nop,nop,TS val 2409434199 ecr 830264377], length 1448
22:37:34.625281 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83777757:83779205, ack 171368, win 29127, options [nop,nop,TS val 2409434199 ecr 830264377], length 1448
22:37:34.625284 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83779205:83780653, ack 171368, win 29127, options [nop,nop,TS val 2409434199 ecr 830264377], length 1448
22:37:34.625286 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83780653:83782101, ack 171368, win 29127, options [nop,nop,TS val 2409434199 ecr 830264377], length 1448
22:37:34.625288 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83782101:83783549, ack 171368, win 29127, options [nop,nop,TS val 2409434199 ecr 830264377], length 1448
22:37:34.625291 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83783549:83784997, ack 171368, win 29127, options [nop,nop,TS val 2409434199 ecr 830264377], length 1448
22:37:34.625294 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83784997:83786445, ack 171368, win 29127, options [nop,nop,TS val 2409434199 ecr 830264377], length 1448
22:37:34.625296 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83786445:83787893, ack 171368, win 29127, options [nop,nop,TS val 2409434199 ecr 830264377], length 1448
22:37:34.625299 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83787893:83789341, ack 171368, win 29127, options [nop,nop,TS val 2409434199 ecr 830264377], length 1448
22:37:34.625302 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83789341:83790789, ack 171368, win 29127, options [nop,nop,TS val 2409434199 ecr 830264377], length 1448
22:37:34.625304 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83790789:83792237, ack 171368, win 29127, options [nop,nop,TS val 2409434199 ecr 830264377], length 1448
22:37:34.625306 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83792237:83793685, ack 171368, win 29127, options [nop,nop,TS val 2409434199 ecr 830264377], length 1448
22:37:34.625309 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83793685:83795133, ack 171368, win 29127, options [nop,nop,TS val 2409434199 ecr 830264377], length 1448
22:37:34.625311 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83795133:83796581, ack 171368, win 29127, options [nop,nop,TS val 2409434199 ecr 830264377], length 1448
22:37:34.625314 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83796581:83798029, ack 171368, win 29127, options [nop,nop,TS val 2409434199 ecr 830264377], length 1448
22:37:34.625316 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83798029:83799477, ack 171368, win 29127, options [nop,nop,TS val 2409434199 ecr 830264377], length 1448
22:37:34.625318 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83799477:83800925, ack 171368, win 29127, options [nop,nop,TS val 2409434199 ecr 830264377], length 1448
22:37:34.625320 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83800925:83802373, ack 171368, win 29127, options [nop,nop,TS val 2409434199 ecr 830264377], length 1448
22:37:34.625323 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83802373:83803821, ack 171368, win 29127, options [nop,nop,TS val 2409434199 ecr 830264377], length 1448
22:37:34.625325 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83803821:83805269, ack 171368, win 29127, options [nop,nop,TS val 2409434199 ecr 830264377], length 1448
22:37:34.625328 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83805269:83806717, ack 171368, win 29127, options [nop,nop,TS val 2409434199 ecr 830264377], length 1448
22:37:34.625330 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83806717:83808165, ack 171368, win 29127, options [nop,nop,TS val 2409434199 ecr 830264377], length 1448
22:37:34.625332 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83808165:83809613, ack 171368, win 29127, options [nop,nop,TS val 2409434199 ecr 830264377], length 1448
22:37:34.625335 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83809613:83811061, ack 171368, win 29127, options [nop,nop,TS val 2409434199 ecr 830264377], length 1448
22:37:34.625337 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83811061:83812509, ack 171368, win 29127, options [nop,nop,TS val 2409434199 ecr 830264377], length 1448
22:37:34.625340 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83812509:83813957, ack 171368, win 29127, options [nop,nop,TS val 2409434199 ecr 830264377], length 1448
22:37:34.625342 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83813957:83815405, ack 171368, win 29127, options [nop,nop,TS val 2409434199 ecr 830264377], length 1448
22:37:34.625344 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83815405:83816853, ack 171368, win 29127, options [nop,nop,TS val 2409434199 ecr 830264377], length 1448
22:37:34.625347 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83816853:83818301, ack 171368, win 29127, options [nop,nop,TS val 2409434199 ecr 830264377], length 1448
22:37:34.625349 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83818301:83819749, ack 171368, win 29127, options [nop,nop,TS val 2409434199 ecr 830264377], length 1448
22:37:34.625352 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83819749:83821197, ack 171368, win 29127, options [nop,nop,TS val 2409434199 ecr 830264377], length 1448
22:37:34.625354 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83821197:83822645, ack 171368, win 29127, options [nop,nop,TS val 2409434199 ecr 830264377], length 1448
22:37:34.625356 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83822645:83824093, ack 171368, win 29127, options [nop,nop,TS val 2409434199 ecr 830264377], length 1448
22:37:34.625359 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83824093:83825541, ack 171368, win 29127, options [nop,nop,TS val 2409434199 ecr 830264377], length 1448
22:37:34.625362 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83825541:83826989, ack 171368, win 29127, options [nop,nop,TS val 2409434199 ecr 830264377], length 1448
22:37:34.625364 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83826989:83828437, ack 171368, win 29127, options [nop,nop,TS val 2409434199 ecr 830264377], length 1448
22:37:34.625366 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83828437:83829885, ack 171368, win 29127, options [nop,nop,TS val 2409434199 ecr 830264377], length 1448
22:37:34.625369 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83829885:83831333, ack 171368, win 29127, options [nop,nop,TS val 2409434199 ecr 830264377], length 1448
22:37:34.625372 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83831333:83832781, ack 171368, win 29127, options [nop,nop,TS val 2409434199 ecr 830264377], length 1448
22:37:34.625375 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83832781:83834229, ack 171368, win 29127, options [nop,nop,TS val 2409434199 ecr 830264377], length 1448
22:37:34.625377 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [P.], seq 83834229:83834737, ack 171368, win 29127, options [nop,nop,TS val 2409434199 ecr 830264377], length 508
22:37:34.625627 IP gandalf.xyzzy.uuidgen > mizar.xyzzy.nfsd: Flags [.], ack 83771965, win 4307, options [nop,nop,TS val 830264377 ecr 2409434199], length 0
22:37:34.625876 IP gandalf.xyzzy.uuidgen > mizar.xyzzy.nfsd: Flags [.], ack 83774861, win 4261, options [nop,nop,TS val 830264378 ecr 2409434199], length 0
22:37:34.626204 IP gandalf.xyzzy.uuidgen > mizar.xyzzy.nfsd: Flags [.], ack 83777757, win 4216, options [nop,nop,TS val 830264378 ecr 2409434199], length 0
22:37:34.626448 IP gandalf.xyzzy.uuidgen > mizar.xyzzy.nfsd: Flags [.], ack 83780653, win 4171, options [nop,nop,TS val 830264378 ecr 2409434199], length 0
22:37:34.626691 IP gandalf.xyzzy.uuidgen > mizar.xyzzy.nfsd: Flags [.], ack 83783549, win 4126, options [nop,nop,TS val 830264378 ecr 2409434199], length 0
22:37:34.626938 IP gandalf.xyzzy.uuidgen > mizar.xyzzy.nfsd: Flags [.], ack 83786445, win 4080, options [nop,nop,TS val 830264379 ecr 2409434199], length 0
22:37:34.627185 IP gandalf.xyzzy.uuidgen > mizar.xyzzy.nfsd: Flags [.], ack 83789341, win 4035, options [nop,nop,TS val 830264379 ecr 2409434199], length 0
22:37:34.627438 IP gandalf.xyzzy.uuidgen > mizar.xyzzy.nfsd: Flags [.], ack 83792237, win 3990, options [nop,nop,TS val 830264379 ecr 2409434199], length 0
22:37:34.627677 IP gandalf.xyzzy.uuidgen > mizar.xyzzy.nfsd: Flags [.], ack 83795133, win 3945, options [nop,nop,TS val 830264379 ecr 2409434199], length 0
22:37:34.627921 IP gandalf.xyzzy.uuidgen > mizar.xyzzy.nfsd: Flags [.], ack 83798029, win 3899, options [nop,nop,TS val 830264380 ecr 2409434199], length 0
22:37:34.628166 IP gandalf.xyzzy.uuidgen > mizar.xyzzy.nfsd: Flags [.], ack 83800925, win 3854, options [nop,nop,TS val 830264380 ecr 2409434199], length 0
22:37:34.628411 IP gandalf.xyzzy.uuidgen > mizar.xyzzy.nfsd: Flags [.], ack 83803821, win 3809, options [nop,nop,TS val 830264380 ecr 2409434199], length 0
22:37:34.628658 IP gandalf.xyzzy.uuidgen > mizar.xyzzy.nfsd: Flags [.], ack 83806717, win 3764, options [nop,nop,TS val 830264380 ecr 2409434199], length 0
22:37:34.628920 IP gandalf.xyzzy.uuidgen > mizar.xyzzy.nfsd: Flags [.], ack 83809613, win 3718, options [nop,nop,TS val 830264381 ecr 2409434199], length 0
22:37:34.629152 IP gandalf.xyzzy.uuidgen > mizar.xyzzy.nfsd: Flags [.], ack 83812509, win 3673, options [nop,nop,TS val 830264381 ecr 2409434199], length 0
22:37:34.629396 IP gandalf.xyzzy.uuidgen > mizar.xyzzy.nfsd: Flags [.], ack 83815405, win 3628, options [nop,nop,TS val 830264381 ecr 2409434199], length 0
22:37:34.629643 IP gandalf.xyzzy.uuidgen > mizar.xyzzy.nfsd: Flags [.], ack 83818301, win 3583, options [nop,nop,TS val 830264381 ecr 2409434199], length 0
22:37:34.629896 IP gandalf.xyzzy.uuidgen > mizar.xyzzy.nfsd: Flags [.], ack 83821197, win 3537, options [nop,nop,TS val 830264382 ecr 2409434199], length 0
22:37:34.630145 IP gandalf.xyzzy.uuidgen > mizar.xyzzy.nfsd: Flags [.], ack 83824093, win 3492, options [nop,nop,TS val 830264382 ecr 2409434199], length 0
22:37:34.630388 IP gandalf.xyzzy.uuidgen > mizar.xyzzy.nfsd: Flags [.], ack 83826989, win 3447, options [nop,nop,TS val 830264382 ecr 2409434199], length 0
22:37:34.630634 IP gandalf.xyzzy.uuidgen > mizar.xyzzy.nfsd: Flags [.], ack 83829885, win 3402, options [nop,nop,TS val 830264382 ecr 2409434199], length 0
22:37:34.630881 IP gandalf.xyzzy.uuidgen > mizar.xyzzy.nfsd: Flags [.], ack 83832781, win 3356, options [nop,nop,TS val 830264382 ecr 2409434199], length 0
22:37:34.667289 IP mizar.xyzzy.46612 > gandalf.xyzzy.hosts2-ns: Flags [.], ack 1, win 1026, options [nop,nop,TS val 3514464195 ecr 3121876329], length 0
22:37:34.667481 IP gandalf.xyzzy.hosts2-ns > mizar.xyzzy.46612: Flags [.], ack 1, win 1026, options [nop,nop,TS val 3121886328 ecr 3511103843], length 0
22:37:34.667488 IP gandalf.xyzzy.uuidgen > mizar.xyzzy.nfsd: Flags [.], ack 83834737, win 4352, options [nop,nop,TS val 830264419 ecr 2409434199], length 0
22:37:34.668080 IP gandalf.xyzzy.uuidgen > mizar.xyzzy.nfsd: Flags [P.], seq 171368:171496, ack 83834737, win 4352, options [nop,nop,TS val 830264420 ecr 2409434199], length 128: NFS request xid 1082723582 124 read fh Unknown/F6D49DE9DE48A22F0A00F50500000000194002000000000000000000 65536 bytes @ 2949120
22:37:34.668287 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83834737:83836185, ack 171496, win 29127, options [nop,nop,TS val 2409434242 ecr 830264420], length 1448: NFS reply xid 1082723582 reply ok 1444 read
22:37:34.668292 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83836185:83837633, ack 171496, win 29127, options [nop,nop,TS val 2409434242 ecr 830264420], length 1448
22:37:34.668294 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83837633:83839081, ack 171496, win 29127, options [nop,nop,TS val 2409434242 ecr 830264420], length 1448
22:37:34.668297 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83839081:83840529, ack 171496, win 29127, options [nop,nop,TS val 2409434242 ecr 830264420], length 1448
22:37:34.668300 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83840529:83841977, ack 171496, win 29127, options [nop,nop,TS val 2409434242 ecr 830264420], length 1448
22:37:34.668302 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83841977:83843425, ack 171496, win 29127, options [nop,nop,TS val 2409434242 ecr 830264420], length 1448
22:37:34.668305 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83843425:83844873, ack 171496, win 29127, options [nop,nop,TS val 2409434242 ecr 830264420], length 1448
22:37:34.668307 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83844873:83846321, ack 171496, win 29127, options [nop,nop,TS val 2409434242 ecr 830264420], length 1448
22:37:34.668309 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83846321:83847769, ack 171496, win 29127, options [nop,nop,TS val 2409434242 ecr 830264420], length 1448
22:37:34.668312 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83847769:83849217, ack 171496, win 29127, options [nop,nop,TS val 2409434242 ecr 830264420], length 1448
22:37:34.668315 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83849217:83850665, ack 171496, win 29127, options [nop,nop,TS val 2409434242 ecr 830264420], length 1448
22:37:34.668317 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83850665:83852113, ack 171496, win 29127, options [nop,nop,TS val 2409434242 ecr 830264420], length 1448
22:37:34.668320 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83852113:83853561, ack 171496, win 29127, options [nop,nop,TS val 2409434242 ecr 830264420], length 1448
22:37:34.668322 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83853561:83855009, ack 171496, win 29127, options [nop,nop,TS val 2409434242 ecr 830264420], length 1448
22:37:34.668325 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83855009:83856457, ack 171496, win 29127, options [nop,nop,TS val 2409434242 ecr 830264420], length 1448
22:37:34.668327 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83856457:83857905, ack 171496, win 29127, options [nop,nop,TS val 2409434242 ecr 830264420], length 1448
22:37:34.668329 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83857905:83859353, ack 171496, win 29127, options [nop,nop,TS val 2409434242 ecr 830264420], length 1448
22:37:34.668331 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83859353:83860801, ack 171496, win 29127, options [nop,nop,TS val 2409434242 ecr 830264420], length 1448
22:37:34.668334 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83860801:83862249, ack 171496, win 29127, options [nop,nop,TS val 2409434242 ecr 830264420], length 1448
22:37:34.668337 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83862249:83863697, ack 171496, win 29127, options [nop,nop,TS val 2409434242 ecr 830264420], length 1448
22:37:34.668339 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83863697:83865145, ack 171496, win 29127, options [nop,nop,TS val 2409434242 ecr 830264420], length 1448
22:37:34.668342 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83865145:83866593, ack 171496, win 29127, options [nop,nop,TS val 2409434242 ecr 830264420], length 1448
22:37:34.668344 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83866593:83868041, ack 171496, win 29127, options [nop,nop,TS val 2409434242 ecr 830264420], length 1448
22:37:34.668346 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83868041:83869489, ack 171496, win 29127, options [nop,nop,TS val 2409434242 ecr 830264420], length 1448
22:37:34.668349 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83869489:83870937, ack 171496, win 29127, options [nop,nop,TS val 2409434242 ecr 830264420], length 1448
22:37:34.668351 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83870937:83872385, ack 171496, win 29127, options [nop,nop,TS val 2409434242 ecr 830264420], length 1448
22:37:34.668354 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83872385:83873833, ack 171496, win 29127, options [nop,nop,TS val 2409434242 ecr 830264420], length 1448
22:37:34.668357 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83873833:83875281, ack 171496, win 29127, options [nop,nop,TS val 2409434242 ecr 830264420], length 1448
22:37:34.668360 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83875281:83876729, ack 171496, win 29127, options [nop,nop,TS val 2409434242 ecr 830264420], length 1448
22:37:34.668363 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83876729:83878177, ack 171496, win 29127, options [nop,nop,TS val 2409434242 ecr 830264420], length 1448
22:37:34.668365 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83878177:83879625, ack 171496, win 29127, options [nop,nop,TS val 2409434242 ecr 830264420], length 1448
22:37:34.668367 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83879625:83881073, ack 171496, win 29127, options [nop,nop,TS val 2409434242 ecr 830264420], length 1448
22:37:34.668370 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83881073:83882521, ack 171496, win 29127, options [nop,nop,TS val 2409434242 ecr 830264420], length 1448
22:37:34.668372 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83882521:83883969, ack 171496, win 29127, options [nop,nop,TS val 2409434242 ecr 830264420], length 1448
22:37:34.668375 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83883969:83885417, ack 171496, win 29127, options [nop,nop,TS val 2409434242 ecr 830264420], length 1448
22:37:34.668378 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83885417:83886865, ack 171496, win 29127, options [nop,nop,TS val 2409434242 ecr 830264420], length 1448
22:37:34.668380 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83886865:83888313, ack 171496, win 29127, options [nop,nop,TS val 2409434242 ecr 830264420], length 1448
22:37:34.668383 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83888313:83889761, ack 171496, win 29127, options [nop,nop,TS val 2409434242 ecr 830264420], length 1448
22:37:34.668385 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83889761:83891209, ack 171496, win 29127, options [nop,nop,TS val 2409434242 ecr 830264420], length 1448
22:37:34.668388 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83891209:83892657, ack 171496, win 29127, options [nop,nop,TS val 2409434242 ecr 830264420], length 1448
22:37:34.668390 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83892657:83894105, ack 171496, win 29127, options [nop,nop,TS val 2409434242 ecr 830264420], length 1448
22:37:34.668392 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83894105:83895553, ack 171496, win 29127, options [nop,nop,TS val 2409434242 ecr 830264420], length 1448
22:37:34.668395 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83895553:83897001, ack 171496, win 29127, options [nop,nop,TS val 2409434242 ecr 830264420], length 1448
22:37:34.668397 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83897001:83898449, ack 171496, win 29127, options [nop,nop,TS val 2409434242 ecr 830264420], length 1448
22:37:34.668400 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [.], seq 83898449:83899897, ack 171496, win 29127, options [nop,nop,TS val 2409434242 ecr 830264420], length 1448
22:37:34.668403 IP mizar.xyzzy.nfsd > gandalf.xyzzy.uuidgen: Flags [P.], seq 83899897:83900405, ack 171496, win 29127, options [nop,nop,TS val 2409434242 ecr 830264420], length 508
22:37:34.668649 IP gandalf.xyzzy.uuidgen > mizar.xyzzy.nfsd: Flags [.], ack 83837633, win 4307, options [nop,nop,TS val 830264420 ecr 2409434242], length 0
22:37:34.668899 IP gandalf.xyzzy.uuidgen > mizar.xyzzy.nfsd: Flags [.], ack 83840529, win 4261, options [nop,nop,TS val 830264421 ecr 2409434242], length 0
22:37:34.669227 IP gandalf.xyzzy.uuidgen > mizar.xyzzy.nfsd: Flags [.], ack 83843425, win 4216, options [nop,nop,TS val 830264421 ecr 2409434242], length 0
22:37:34.669466 IP gandalf.xyzzy.uuidgen > mizar.xyzzy.nfsd: Flags [.], ack 83846321, win 4171, options [nop,nop,TS val 830264421 ecr 2409434242], length 0
22:37:34.669715 IP gandalf.xyzzy.uuidgen > mizar.xyzzy.nfsd: Flags [.], ack 83849217, win 4126, options [nop,nop,TS val 830264421 ecr 2409434242], length 0
22:37:34.669961 IP gandalf.xyzzy.uuidgen > mizar.xyzzy.nfsd: Flags [.], ack 83852113, win 4080, options [nop,nop,TS val 830264422 ecr 2409434242], length 0
22:37:34.670208 IP gandalf.xyzzy.uuidgen > mizar.xyzzy.nfsd: Flags [.], ack 83855009, win 4035, options [nop,nop,TS val 830264422 ecr 2409434242], length 0
22:37:34.670453 IP gandalf.xyzzy.uuidgen > mizar.xyzzy.nfsd: Flags [.], ack 83857905, win 3990, options [nop,nop,TS val 830264422 ecr 2409434242], length 0
22:37:34.670703 IP gandalf.xyzzy.uuidgen > mizar.xyzzy.nfsd: Flags [.], ack 83860801, win 3945, options [nop,nop,TS val 830264422 ecr 2409434242], length 0
22:37:34.670945 IP gandalf.xyzzy.uuidgen > mizar.xyzzy.nfsd: Flags [.], ack 83863697, win 3899, options [nop,nop,TS val 830264423 ecr 2409434242], length 0
22:37:34.671191 IP gandalf.xyzzy.uuidgen > mizar.xyzzy.nfsd: Flags [.], ack 83866593, win 3854, options [nop,nop,TS val 830264423 ecr 2409434242], length 0
22:37:34.671434 IP gandalf.xyzzy.uuidgen > mizar.xyzzy.nfsd: Flags [.], ack 83869489, win 3809, options [nop,nop,TS val 830264423 ecr 2409434242], length 0
22:37:34.671683 IP gandalf.xyzzy.uuidgen > mizar.xyzzy.nfsd: Flags [.], ack 83872385, win 3764, options [nop,nop,TS val 830264423 ecr 2409434242], length 0
22:37:34.671933 IP gandalf.xyzzy.uuidgen > mizar.xyzzy.nfsd: Flags [.], ack 83875281, win 3718, options [nop,nop,TS val 830264424 ecr 2409434242], length 0
22:37:34.672174 IP gandalf.xyzzy.uuidgen > mizar.xyzzy.nfsd: Flags [.], ack 83878177, win 3673, options [nop,nop,TS val 830264424 ecr 2409434242], length 0
22:37:34.672419 IP gandalf.xyzzy.uuidgen > mizar.xyzzy.nfsd: Flags [.], ack 83881073, win 3628, options [nop,nop,TS val 830264424 ecr 2409434242], length 0
22:37:34.672667 IP gandalf.xyzzy.uuidgen > mizar.xyzzy.nfsd: Flags [.], ack 83883969, win 3583, options [nop,nop,TS val 830264424 ecr 2409434242], length 0
22:37:34.672914 IP gandalf.xyzzy.uuidgen > mizar.xyzzy.nfsd: Flags [.], ack 83886865, win 3537, options [nop,nop,TS val 830264425 ecr 2409434242], length 0
22:37:34.673158 IP gandalf.xyzzy.uuidgen > mizar.xyzzy.nfsd: Flags [.], ack 83889761, win 3492, options [nop,nop,TS val 830264425 ecr 2409434242], length 0
22:37:34.673404 IP gandalf.xyzzy.uuidgen > mizar.xyzzy.nfsd: Flags [.], ack 83892657, win 3447, options [nop,nop,TS val 830264425 ecr 2409434242], length 0
22:37:34.673650 IP gandalf.xyzzy.uuidgen > mizar.xyzzy.nfsd: Flags [.], ack 83895553, win 3402, options [nop,nop,TS val 830264425 ecr 2409434242], length 0
22:37:34.673899 IP gandalf.xyzzy.uuidgen > mizar.xyzzy.nfsd: Flags [.], ack 83898449, win 3356, options [nop,nop,TS val 830264426 ecr 2409434242], length 0
Comment 5 Eugene Grosbein freebsd_committer 2019-01-18 08:50:32 UTC
The problem may be related to interrupt handling and/or timekeeping.

1) Do you run ntpd there? If yes, show contents of /var/db/ntpd.drift. If not, start it, make sure it synced time successfully (ntpq -np), wait an hour and show /var/db/ntpd.drift.

2) Show output of sysctl kern.timecounter and sysctl kern.eventtimer commands. If kern.eventtimer.periodic is 0, set it to 1 and repeat your tests and post results.

3) Show output of "grep irq /var/run/dmesg.boot" for 12.0 and use 11.2 live boot media (flash stick or CD/DVD) and repeat "grep irq /var/run/dmesg.boot", save output and post it too.
Comment 6 Martin Birgmeier 2019-01-18 10:18:05 UTC
Hi Eugene,

Thank you for looking into this.

I have always been using ntp on all (physical, not VMs) machines. But I am not using a drift file because in my experience, the machines do not run long enough to arrive at stable values in the drift file, leading to bad ntp startup behavior (drifting) on the next boot (I have my own config files; maybe I should restart using drift files because my experience is from very long ago ~ 15 years :-)). Here is the output of ntpq -p:

[0]# ntpq -p
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
*gandalf.xyzzy   192.53.103.103   2 u   11   64  377    0.187  -18.657   0.447
+hal.xyzzy       193.204.114.233  2 u   23   64  377    0.180   -2.254   0.096
[0]# 

I am syncing two machines to stratum-1 servers; they in turn are references for all the other machines (including the laptop with the em0).

[0]# sysctl kern.timecounter 
kern.timecounter.tsc_shift: 1
kern.timecounter.smp_tsc_adjust: 0
kern.timecounter.smp_tsc: 1
kern.timecounter.invariant_tsc: 1
kern.timecounter.fast_gettime: 1
kern.timecounter.tick: 1
kern.timecounter.choice: TSC(1000) ACPI-fast(900) HPET(950) i8254(0) dummy(-1000000)
kern.timecounter.hardware: ACPI-fast
kern.timecounter.alloweddeviation: 5
kern.timecounter.stepwarnings: 0
kern.timecounter.tc.TSC.quality: 1000
kern.timecounter.tc.TSC.frequency: 1993570808
kern.timecounter.tc.TSC.counter: 3362030552
kern.timecounter.tc.TSC.mask: 4294967295
kern.timecounter.tc.ACPI-fast.quality: 900
kern.timecounter.tc.ACPI-fast.frequency: 3579545
kern.timecounter.tc.ACPI-fast.counter: 11926669
kern.timecounter.tc.ACPI-fast.mask: 16777215
kern.timecounter.tc.HPET.quality: 950
kern.timecounter.tc.HPET.frequency: 14318180
kern.timecounter.tc.HPET.counter: 114481493
kern.timecounter.tc.HPET.mask: 4294967295
kern.timecounter.tc.i8254.quality: 0
kern.timecounter.tc.i8254.frequency: 1193182
kern.timecounter.tc.i8254.counter: 14744
kern.timecounter.tc.i8254.mask: 65535
[0]# sysctl kern.eventtimer 
kern.eventtimer.periodic: 0
kern.eventtimer.timer: LAPIC
kern.eventtimer.idletick: 0
kern.eventtimer.singlemul: 2
kern.eventtimer.choice: LAPIC(600) HPET(550) i8254(100) RTC(0)
kern.eventtimer.et.RTC.quality: 0
kern.eventtimer.et.RTC.frequency: 32768
kern.eventtimer.et.RTC.flags: 17
kern.eventtimer.et.HPET.quality: 550
kern.eventtimer.et.HPET.frequency: 14318180
kern.eventtimer.et.HPET.flags: 7
kern.eventtimer.et.i8254.quality: 100
kern.eventtimer.et.i8254.frequency: 1193182
kern.eventtimer.et.i8254.flags: 1
kern.eventtimer.et.LAPIC.quality: 600
kern.eventtimer.et.LAPIC.frequency: 1993570808
kern.eventtimer.et.LAPIC.flags: 7
[0]# 

[0]# sysctl kern.eventtimer.periodic=1 
[0]# 
This does not improve the situation. Again, additionally running iperf between the laptop and the 3rd machine still improves it.

[0]# grep irq /var/run/dmesg.boot
ioapic0 <Version 2.0> irqs 0-23 on motherboard
attimer0: <AT timer> port 0x40-0x43 irq 0 on acpi0
atrtc0: <AT realtime clock> port 0x70-0x71 irq 8 on acpi0
pcib1: <ACPI PCI-PCI bridge> irq 16 at device 1.0 on pci0
vgapci0: <VGA-compatible display> port 0x5000-0x507f mem 0xd2000000-0xd2ffffff,0xc0000000-0xcfffffff,0xd0000000-0xd1ffffff irq 16 at device 0.0 on pci1
hdac0: <NVIDIA GT108 HDA Controller> mem 0xd3000000-0xd3003fff irq 17 at device 0.1 on pci1
em0: <Intel(R) PRO/1000 Network Connection> port 0x6040-0x605f mem 0xd5300000-0xd531ffff,0xd532b000-0xd532bfff irq 20 at device 25.0 on pci0
ehci0: <Intel Cougar Point USB 2.0 controller> mem 0xd532a000-0xd532a3ff irq 16 at device 26.0 on pci0
hdac1: <Intel Cougar Point HDA Controller> mem 0xd5320000-0xd5323fff irq 22 at device 27.0 on pci0
pcib2: <ACPI PCI-PCI bridge> irq 16 at device 28.0 on pci0
pcib3: <ACPI PCI-PCI bridge> irq 17 at device 28.1 on pci0
iwn0: <Intel Centrino Ultimate-N 6300> mem 0xd5200000-0xd5201fff irq 17 at device 0.0 on pci3
pcib4: <ACPI PCI-PCI bridge> irq 19 at device 28.3 on pci0
pcib5: <ACPI PCI-PCI bridge> irq 16 at device 28.4 on pci0
sdhci_pci0: <RICOH R5CE823 SD> mem 0xd4201000-0xd42010ff irq 16 at device 0.0 on pci4
fwohci0: <1394 Open Host Controller Interface> mem 0xd4200000-0xd42007ff irq 19 at device 0.3 on pci4
pcib6: <ACPI PCI-PCI bridge> irq 18 at device 28.6 on pci0
xhci0: <NEC uPD720200 USB 3.0 controller> mem 0xd4100000-0xd4101fff irq 18 at device 0.0 on pci5
ehci1: <Intel Cougar Point USB 2.0 controller> mem 0xd5329000-0xd53293ff irq 23 at device 29.0 on pci0
ahci0: <Intel Cougar Point AHCI SATA controller> port 0x6068-0x606f,0x607c-0x607f,0x6060-0x6067,0x6078-0x607b,0x6020-0x603f mem 0xd5328000-0xd53287ff irq 19 at device 31.2 on pci0
ichsmb0: <Intel Cougar Point SMBus controller> port 0xefa0-0xefbf mem 0xd5324000-0xd53240ff irq 18 at device 31.3 on pci0
atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
psm0: <PS/2 Mouse> irq 12 on atkbdc0
[0]# 

[0]# cd /z/backup/rsync/mizar/ROOT/.zfs/snapshot/backup.2018-12-24.21:03:49
[0]# egrep -i 'FreeBSD|irq' var/run/dmesg.boot                             
Copyright (c) 1992-2018 The FreeBSD Project.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 11.2-RELEASE-p5 #4 r341248M: Thu Nov 29 19:30:14 CET 2018
    root@v904.xyzzy:/.../hal/z/OBJ/FreeBSD/amd64/releng/11.2/sys/XYZZY_SMP amd64
FreeBSD clang version 6.0.0 (tags/RELEASE_600/final 326565) (based on LLVM 6.0.0)
FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs
FreeBSD/SMP: 1 package(s) x 4 core(s) x 2 hardware threads
ioapic0 <Version 2.0> irqs 0-23 on motherboard
attimer0: <AT timer> port 0x40-0x43 irq 0 on acpi0
atrtc0: <AT realtime clock> port 0x70-0x71 irq 8 on acpi0
pcib1: <ACPI PCI-PCI bridge> irq 16 at device 1.0 on pci0
vgapci0: <VGA-compatible display> port 0x5000-0x507f mem 0xd2000000-0xd2ffffff,0xc0000000-0xcfffffff,0xd0000000-0xd1ffffff irq 16 at device 0.0 on pci1
hdac0: <NVIDIA GT108 HDA Controller> mem 0xd3000000-0xd3003fff irq 17 at device 0.1 on pci1
em0: <Intel(R) PRO/1000 Network Connection 7.6.1-k> port 0x6040-0x605f mem 0xd5300000-0xd531ffff,0xd532b000-0xd532bfff irq 20 at device 25.0 on pci0
ehci0: <Intel Cougar Point USB 2.0 controller> mem 0xd532a000-0xd532a3ff irq 16 at device 26.0 on pci0
hdac1: <Intel Cougar Point HDA Controller> mem 0xd5320000-0xd5323fff irq 22 at device 27.0 on pci0
pcib2: <ACPI PCI-PCI bridge> irq 16 at device 28.0 on pci0
pcib3: <ACPI PCI-PCI bridge> irq 17 at device 28.1 on pci0
iwn0: <Intel Centrino Ultimate-N 6300> mem 0xd5200000-0xd5201fff irq 17 at device 0.0 on pci3
pcib4: <ACPI PCI-PCI bridge> irq 19 at device 28.3 on pci0
pcib5: <ACPI PCI-PCI bridge> irq 16 at device 28.4 on pci0
sdhci_pci0: <RICOH R5CE823 SD> mem 0xd4201000-0xd42010ff irq 16 at device 0.0 on pci4
fwohci0: <1394 Open Host Controller Interface> mem 0xd4200000-0xd42007ff irq 19 at device 0.3 on pci4
pcib6: <ACPI PCI-PCI bridge> irq 18 at device 28.6 on pci0
xhci0: <NEC uPD720200 USB 3.0 controller> mem 0xd4100000-0xd4101fff irq 18 at device 0.0 on pci5
ehci1: <Intel Cougar Point USB 2.0 controller> mem 0xd5329000-0xd53293ff irq 23 at device 29.0 on pci0
ahci0: <Intel Cougar Point AHCI SATA controller> port 0x6068-0x606f,0x607c-0x607f,0x6060-0x6067,0x6078-0x607b,0x6020-0x603f mem 0xd5328000-0xd53287ff irq 19 at device 31.2 on pci0
ichsmb0: <Intel Cougar Point SMBus controller> port 0xefa0-0xefbf mem 0xd5324000-0xd53240ff irq 18 at device 31.3 on pci0
atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
psm0: <PS/2 Mouse> irq 12 on atkbdc0
fuse-freebsd: version 0.4.4, FUSE ABI 7.8
[0]#
Comment 7 Eugene Grosbein freebsd_committer 2019-01-18 10:39:56 UTC
Routing of interrupts not changed between 11.2 and 12.0.

Post output of grep em0 /z/backup/rsync/mizar/ROOT/.zfs/snapshot/backup.2018-12-24.21:03:49/var/run/dmesg.boot

Please restart ntpd with drift file, wait an hour then post its contents.

Can you reboot the system and drop to loader prompt at boot time? If yes, do at loader prompt:

set hw.pci.enable_msi=0
boot

Then repeat your tests to see if it gets better with MSI disabled.
Comment 8 Martin Birgmeier 2019-01-18 11:16:05 UTC
[0]# egrep 'FreeBSD|em0' var/run/dmesg.boot
Copyright (c) 1992-2018 The FreeBSD Project.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 11.2-RELEASE-p5 #4 r341248M: Thu Nov 29 19:30:14 CET 2018
    root@v904.xyzzy:/.../hal/z/OBJ/FreeBSD/amd64/releng/11.2/sys/XYZZY_SMP amd64
FreeBSD clang version 6.0.0 (tags/RELEASE_600/final 326565) (based on LLVM 6.0.0)
FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs
FreeBSD/SMP: 1 package(s) x 4 core(s) x 2 hardware threads
em0: <Intel(R) PRO/1000 Network Connection 7.6.1-k> port 0x6040-0x605f mem 0xd5300000-0xd531ffff,0xd532b000-0xd532bfff irq 20 at device 25.0 on pci0
em0: Using an MSI interrupt
em0: Ethernet address: f0:de:f1:98:86:a9
ahciem0: <AHCI enclosure management bridge> on ahci0
pass2 at ahciem0 bus 0 scbus4 target 0 lun 0
em0: link state changed to UP
[0]# 

Rebooting the system takes a little longer... ntp in progress
Comment 9 Martin Birgmeier 2019-01-18 16:14:24 UTC
I added

    hw.pci.enable_msi=0

to /boot/loader.conf, which was a bad idea, because with this, after loading the kernel, the system does not find ada0 anymore. (I recovered using a pxe netboot.)

Regarding ntpd.drift, after adding "driftfile /var/db/ntpd.drift" to /etc/ntp.conf ntpd only complains about "format error frequency file /var/db/ntpd.drift" in the syslog (I put a zero-sized file with owner/group ntpd there). I probably have to wait longer for it to write anything to it.

-- Martin
Comment 10 Eugene Grosbein freebsd_committer 2019-01-18 17:34:54 UTC
(In reply to Martin Birgmeier from comment #9)

>I added hw.pci.enable_msi=0 to /boot/loader.conf, which was a bad idea, because with this, after loading the kernel, the system does not find ada0 anymore.

That's why I suggested to use loader prompt. Next time, if you need to test something like this and cannot use loader prompt directly, use nextboot(8) feature that guarantees safe fallback to known working configuration even if boot is broken during experiment: you add "hw.pci.enable_msi=0" (or another setting) to /boot/nextboot.conf that is created with nextboot(8) command and loader uses nextboot.conf for one time only ignoring it for successive loading attempts and nextboot.conf is removed completely during first successfull boot.

You can even change root file system name, kernel name or set of loaded kernel modules using nextboot safely.
Comment 11 Eugene Grosbein freebsd_committer 2019-01-18 17:38:07 UTC
Do you have any non-default settings in your /boot/loader.conf or /etc/sysctl.conf ?
Comment 12 Martin Birgmeier 2019-01-18 17:42:24 UTC
Created attachment 201243 [details]
loader.conf
Comment 13 Martin Birgmeier 2019-01-18 17:42:40 UTC
Yes; I just thought it was going to help and wanted to make it permanent right away. Bad idea.

In the meantime:

[0]# cat /var/db/ntpd.drift
-6.596
[0]# 

What can you get from the ntp drift?

Is there a way to selectively disable MSI for em0 only?

-- Martin
Comment 14 Martin Birgmeier 2019-01-18 17:47:23 UTC
Created attachment 201244 [details]
sysctl.conf

This might be more interesting (kern.timecounter.hardware setting).
Comment 15 Eugene Grosbein freebsd_committer 2019-01-19 01:13:31 UTC
(In reply to Martin Birgmeier from comment #13)

> What can you get from the ntp drift?

Now we see this machine has good enough hardware timer to rule out problems linked with bad hardware timer.

> Is there a way to selectively disable MSI for em0 only?

No, driver does not support that.

However, you have cpufreq enabled so the box may run with lower CPU frequency unless there is something like iperf running making extra load. Can you temporary configure system to run at max all the time or just unload cpufreq and repeat your tests?
Comment 16 Martin Birgmeier 2019-01-20 08:22:31 UTC
In the tcpdump log shown in comment #4 it seems that there is always a lot of traffic in one direction followed by a lot of ACKs in the other (if I interpret it correctly).

Could it be that something is buffering too many packets, thereby destroying NFS "streaming"?

What about this strange "em0: TX(0) desc avail = 1024, pidx = 0" error in comment #2 when I tried to set -rxcsum? Does it mean that there are 1024 buffers available?

-- Martin
Comment 17 Martin Birgmeier 2019-01-20 09:04:18 UTC
Because bug #234550 might be related to this one, I tried to issue "ifconfig -lro em0". This resulted in the interface not working anymore, with the already known syslog messages:

Jan 20 09:29:18 mizar kernel: em0: link state changed to DOWN
Jan 20 09:29:20 mizar kernel: em0: TX(0) desc avail = 1024, pidx = 0
Jan 20 09:34:36 mizar kernel: em0: TX(0) desc avail = 1024, pidx = 0
Jan 20 09:34:42 mizar kernel: em0: TX(0) desc avail = 1024, pidx = 0
Jan 20 09:34:51 mizar kernel: em0: TX(0) desc avail = 1024, pidx = 0
Jan 20 09:34:54 mizar kernel: em0: TX(0) desc avail = 1024, pidx = 0

None of the following commands (from the shell history) helped:

 4981  2019-01-20 09:29  0:01  ifconfig em0 -lro
 4982  2019-01-20 09:29  0:00  ifconfig em0 up
 4985  2019-01-20 09:30  0:00  ifconfig em0
 4987  2019-01-20 09:30  0:03  ifconfig em0 lro up
 4989  2019-01-20 09:31  0:00  ifconfig em0
 4990  2019-01-20 09:32  0:00  ifconfig em0 down
 4991  2019-01-20 09:32  0:02  ifconfig em0 up
 4999  2019-01-20 09:40  0:00  ifconfig em0

Then I tried adding "-lro" in rc.conf:

ifconfig_em0="-lro inet $hostname"

This resulted in a rather long wait for the interface to be available to some rc processes. When the machine was finally up, NFS to it did not work at all, it just hung.

I removed the -lro flag again.

What is the issue that using ifconfig on this interface breaks it?

-- Martin
Comment 18 Martin Birgmeier 2019-01-20 09:08:27 UTC
Another candidate for being related is bug #234570.
Comment 19 Martin Birgmeier 2019-01-21 18:14:57 UTC
I have installed net/intel-em-kmod and switched the kernel module from if_em to if_em_updated which is installed by that package. This results in the interface running at full speed again.

However, looking at the sources shipped with this port, they are dated 27 Sep 2017.

So it would be nice if the if_em shipped with the FreeBSD sources could be corrected.

-- Martin
Comment 20 Martin Birgmeier 2019-01-21 19:43:54 UTC
With the switch to the "if_em_updated" kernel module from net/intel-em-kmod, the issue described in bug #234520 is gone.
Comment 21 Martin Birgmeier 2019-01-27 17:00:56 UTC
Hi,

Is there anybody out there who might want to take on this issue? It is a regression from FreeBSD 11, so most likely should be fixable.

-- Martin
Comment 22 Rodney W. Grimes freebsd_committer 2019-01-27 19:12:54 UTC
(In reply to Martin Birgmeier from comment #21)
I wont take the bug as an assignee, but shall speak to what is in the works with respect to em(4) and other iflib'ed devices in the 12.0 release.

Release engineering has become aware of several regressions in drivers that are first released in 12.0 using the new iflib design.  There are several commits to ^head, some of which have been merged to stable/12 that address some of these issues.  There are several more issues being worked on as well.

There has been at least 2 formal requests for 2 different commits related to em(4) regressions to have errata issued.

Release engineering is waiting on the first group of fixes to be MFC'ed after  testing and plans to work with the security team (so@) to issue an EN for these fixes.

The additional fixes that are not in the first EN well be done in a future EN.

Regards,
Rod >RE@

PS  I do not know if this set of changes fixes your specific bug, but it does have a high probability of that.
Comment 23 Martin Birgmeier 2019-01-28 17:36:21 UTC
Hi Rodney,

Thank you for your feedback, I very much appreciate it.

Best regards,

Martin
Comment 24 Martin Birgmeier 2019-02-01 18:06:34 UTC
I have observed that using le0 in a VirtualBox virtual machine also has very poor NFS performance. Is le0 affected by the same issues?
Comment 25 Martin Birgmeier 2019-02-01 18:11:08 UTC
Just for the record, the le0 (le(4)) issue is with the client running head as of Dec. 8:

Copyright (c) 1992-2018 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
        The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 13.0-CURRENT #0 r341726M: Sat Dec  8 19:24:08 CET 2018
    root@v903.xyzzy:/.../hal/z/OBJ/FreeBSD/amd64/head/amd64.amd64/sys/XYZZY_SMP_NOSPIN amd64
FreeBSD clang version 6.0.1 (tags/RELEASE_601/final 335540) (based on LLVM 6.0.1)


The exactly same client runs fine under Hyper-V using de0 (de(4)).
Comment 26 Eugene Grosbein freebsd_committer 2019-02-02 14:56:48 UTC
I've found that NFS client of FreeBSD 11.2-STABLE has bad NFS performance with FreeBSD NFS servers and distinct real hardware, so the problem may be more complex.

As workaround, use set of mount options at client side: rsize=10240,wsize=10240,readahead=4

This improves performance in many orders.
Comment 27 Rick Macklem freebsd_committer 2019-02-02 23:42:25 UTC
A little off topic, but rsize,wsize will be set to a power of 2 at or
below the argument. As such, rsize=10240,wsize=10240 will result in
a setting of 8K (or less if the server forces it) for both of them.
You can use "nfsstat -m" on the client to see what is actually being used.

I can't help much, but I would suggest trying increasing powers of 2 for
rsize,wsize until performance tanks.

A couple of reasons smaller rsize,wsize might perform better.
- TSO is broken. This can be tested by using the sysctl to disable TSO.
  (I suggest the generic sysctl and not a driver specific ifconfig, in
   case the driver config flag is broken.)
- Small send or receive rings that can't handle the burst of packets/mbufs
  that the larger I/O size causes.
  A write request/receive reply will typically be a chain of size/2K + 2 mbufs
  for the sender. (Sometimes +3 depending on how the driver handles the MAC
  header.)
  The receiver will typically see this as a burst of roughly size/1500 + 1 packets
  making up one TCP segment.
If jumbo packets are in use, all bets are off for the receive side, since
at least the >4K ones can fragment the mbuf cluster pool.
- Interrupt throttling. Since most NFS RPCs are small messages requiring
  a small reply message in the opposite direction, anything that increases
  latency (such as delaying interrupts assuming more packets of a stream
  will arrive soon) can hit NFS performance.
Comment 28 Eugene Grosbein freebsd_committer 2019-02-03 04:54:00 UTC
(In reply to Rick Macklem from comment #27)

Hardware-assisted TSO itself may be not useful in case of 1Gbps or less linerate for multiple reasons: overwhelming horsepower of modern (and even pretty old modern) amd64 CPUs, NFS mounts over UDP, extensive use of ipfw nat/libalias for routed jails/guest virtual machines as libalias is not compartible with TSO etc. So when not dealing with 40Gbps or faster hardware, I choose to disable TSO per-interface and not fight against corresponding bugs.

As for NIC rings, even pretty old Intel Gigabit have tunnables to setup big enough rings to not bother of this issue.

Again, even old Intel Gigabit NICs have multiple interrupt tunnables to setup right timings for local traffic profile.

As for testing different client mount option sets, I'll try it later and respond.
Comment 29 Martin Birgmeier 2019-04-28 15:42:52 UTC
I have just been reading https://lists.freebsd.org/pipermail/freebsd-stable/2019-April/090829.html which seems to be the same issue I am describing here.

To summarize:
- em(4) has very poor performance in 12.0
- I had to switch to net/intel-em-kmod to get back a decent performance
- net/intel-em-kmod is old, therefore I would like to see the real issue being fixed

Kris: I added you to this bug's CC list - hope you don't mind.

To summarize even further :-)
- Could you please fix this regression

-- Martin
Comment 30 Martin Birgmeier 2019-04-28 15:45:06 UTC
Kris, one more thing: To get the driver from ports loading, you must remove if_em_load="YES" from /boot/loader.conf and add if_em_updated_load="YES"

I am quite sure this is the reason you do not get good performance after installing the driver from ports.

-- Martin
Comment 31 Martin Birgmeier 2019-04-28 16:14:04 UTC
I guess bug #219428 is related?

Also https://lists.freebsd.org/pipermail/freebsd-stable/2019-April/090927.html ?

-- Martin
Comment 32 commit-hook freebsd_committer 2019-05-07 08:29:29 UTC
A commit references this bug:

Author: marius
Date: Tue May  7 08:28:35 UTC 2019
New revision: 347221
URL: https://svnweb.freebsd.org/changeset/base/347221

Log:
  o Use iflib_fast_intr_rxtx() also for "legacy" interrupts, i. e. INTx and
    MSI. Unlike as with iflib_fast_intr_ctx(), the former will also enqueue
    _task_fn_tx() in addition to _task_fn_rx() if appropriate, bringing TCP
    TX throughput of EM-class devices on par with the MSI-X case and, thus,
    close to wirespeed/pre-iflib(4) times again. [1]
    Note that independently of the interrupt type, the UDP performance with
    these MACs still is abysmal and nowhere near to where it was before the
    conversion of em(4) to iflib(4).
  o In iflib_init_locked(), announce which free list failed to set up.
  o In _task_fn_tx() when running netmap(4), issue ifdi_intr_enable instead
    of the ifdi_tx_queue_intr_enable method in case of a "legacy" interrupt
    as the latter is valid with MSI-X only.
  o Instead of adding the missing - and apparently convoluted enough that a
    DBG_COUNTER_INC was put into a wrong spot in _task_fn_rx() - checks for
    ifdi_{r,t}x_queue_intr_enable being available in the MSI-X case also to
    iflib_fast_intr_rxtx(), factor these out to iflib_device_register() and
    make the checks fail gracefully rather than panic. This avoids invoking
    the checks at runtime over and over again in iflib_fast_intr_rxtx() and
    _task_fn_{r,t}x() - even if it's just in case of INVARIANTS - and makes
    these functions more readable.
  o In iflib_rx_structures_setup(), only initialize LRO resources if device
    and driver have LRO capability in order to not waste memory. Also, free
    the LRO resources again if setting them up fails for one of the queues.
    However, don't bother invoking iflib_rx_sds_free() in that case because
    iflib_rx_structures_setup() doesn't call iflib_rxsd_alloc() either (and
    iflib_{device,pseudo}_register() will issue iflib_rx_sds_free() in case
    of failure via iflib_rx_structures_free(), but there definitely is some
    asymmetry left to be fixed, though).
  o Similarly, free LRO resources again in iflib_rx_structures_free().
  o In iflib_irq_set_affinity(), handle get_core_offset() errors gracefully
    instead of panicing (but only in case of INVARIANTS). This is a follow-
    up to r344132, as such driver bugs shouldn't be fatal.
  o Likewise, handle unknown iflib_intr_type_t in iflib_irq_alloc_generic()
    gracefully, too.
  o Bring yet more sanity to iflib_msix_init():
    - If the device doesn't provide enough MSI-X vectors or not all vectors
      can be allocate so the expected number of queues in addition to admin
      interrupts can't be supported, try MSI next (and then INTx) as proper
      MSI-X vector distribution can't be assured in such cases. In essence,
      this change brings r254008 forward to iflib(4). Also, this is the fix
      alluded to in the commit message of r343934.
    - If the MSI-X allocation has failed, don't prematurely announce MSI is
      going to be used as the latter in fact may not be available either.
    - When falling back to MSI, only release the MSI-X table resource again
      if it was allocated in iflib_msix_init(), i. e. isn't supplied by the
      driver, in the first place.
  o In mp_ndesc_handler(), handle unknown type arguments gracefully, too.

  PR:		235031 (likely) [1]
  Reviewed by:	shurd
  Differential Revision:	https://reviews.freebsd.org/D20175

Changes:
  head/sys/net/iflib.c
Comment 33 Martin Birgmeier 2019-05-07 17:09:45 UTC
Thank you for this interesting commit note.

Seeing that "the UDP performance with these MACs still is abysmal", should I already backport these changes to 12.0 or will they not help in my case of using a lot of NFS?

Should I expect further changes which will also address the UDP performance?

-- Martin
Comment 34 commit-hook freebsd_committer 2019-06-16 15:26:36 UTC
A commit references this bug:

Author: marius
Date: Sun Jun 16 15:25:46 UTC 2019
New revision: 349112
URL: https://svnweb.freebsd.org/changeset/base/349112

Log:
  MFC: r347221, r347245

  o Use iflib_fast_intr_rxtx() also for "legacy" interrupts, i. e. INTx and
    MSI. Unlike as with iflib_fast_intr_ctx(), the former will also enqueue
    _task_fn_tx() in addition to _task_fn_rx() if appropriate, bringing TCP
    TX throughput of EM-class devices on par with the MSI-X case and, thus,
    close to wirespeed/pre-iflib(4) times again. [1]
    Note that independently of the interrupt type, the UDP performance with
    these MACs still is abysmal and nowhere near to where it was before the
    conversion of em(4) to iflib(4).
  o In iflib_init_locked(), announce which free list failed to set up.
  o In _task_fn_tx() when running netmap(4), issue ifdi_intr_enable instead
    of the ifdi_tx_queue_intr_enable method in case of a "legacy" interrupt
    as the latter is valid with MSI-X only.
  o Instead of adding the missing - and apparently convoluted enough that a
    DBG_COUNTER_INC was put into a wrong spot in _task_fn_rx() - checks for
    ifdi_{r,t}x_queue_intr_enable being available in the MSI-X case also to
    iflib_fast_intr_rxtx(), factor these out to iflib_device_register() and
    make the checks fail gracefully rather than panic. This avoids invoking
    the checks at runtime over and over again in iflib_fast_intr_rxtx() and
    _task_fn_{r,t}x() - even if it's just in case of INVARIANTS - and makes
    these functions more readable.
  o In iflib_rx_structures_setup(), only initialize LRO resources if device
    and driver have LRO capability in order to not waste memory. Also, free
    the LRO resources again if setting them up fails for one of the queues.
    However, don't bother invoking iflib_rx_sds_free() in that case because
    iflib_rx_structures_setup() doesn't call iflib_rxsd_alloc() either (and
    iflib_{device,pseudo}_register() will issue iflib_rx_sds_free() in case
    of failure via iflib_rx_structures_free(), but there definitely is some
    asymmetry left to be fixed, though).
  o Similarly, free LRO resources again in iflib_rx_structures_free().
  o In iflib_irq_set_affinity(), handle get_core_offset() errors gracefully
    instead of panicing (but only in case of INVARIANTS). This is a follow-
    up to r344132 (MFCed to stable/12 in r344163) as such bugs shouldn't be
    fatal.
  o Likewise, handle unknown iflib_intr_type_t in iflib_irq_alloc_generic()
    gracefully, too.
  o Bring yet more sanity to iflib_msix_init():
    - If the device doesn't provide enough MSI-X vectors or not all vectors
      can be allocate so the expected number of queues in addition to admin
      interrupts can't be supported, try MSI next (and then INTx) as proper
      MSI-X vector distribution can't be assured in such cases. In essence,
      this change brings r254008 forward to iflib(4). Also, this is the fix
      alluded to in the commit message of r343934.
    - If the MSI-X allocation has failed, don't prematurely announce MSI is
      going to be used as the latter in fact may not be available either.
    - When falling back to MSI, only release the MSI-X table resource again
      if it was allocated in iflib_msix_init(), i. e. isn't supplied by the
      driver, in the first place.
  o In mp_ndesc_handler(), handle unknown type arguments gracefully, too.

  PR:		235031 (likely) [1]
  Reviewed by:	shurd
  Differential Revision:	https://reviews.freebsd.org/D20175

Changes:
_U  stable/12/
  stable/12/sys/net/iflib.c
Comment 35 Martin Birgmeier 2019-08-15 12:32:02 UTC
Update: Today I tried to revert to if_em from releng/12.0 updated to latest. Result: The performance is still abysmal.

I'll switch to net/intel-em-kmod again.

This is with FreeBSD mizar.xyzzy 12.0-RELEASE-p9 FreeBSD 12.0-RELEASE-p9 #9 r350677M: Wed Aug  7 18:35:55 CEST 2019     root@v908.xyzzy:/.../hal/z/OBJ/FreeBSD/amd64/releng/12.0/amd64.amd64/sys/XYZZY_SMP  amd64.
Comment 36 Martin Birgmeier 2019-08-15 12:34:21 UTC
I just notice that the console and syslog have about 20 messages of

em: frame error: ignored
em: frame error: ignored
em: frame error: ignored
em: frame error: ignored
em: frame error: ignored

Uptime is 2 1/2 hours.
Comment 37 Martin Birgmeier 2019-08-31 13:30:34 UTC
(Adding Bruce's mail here:)

On 31.08.19 13:44, Bruce Evans wrote:
> On Thu, 15 Aug 2019 a bug that doesn't want replies at freebsd.org wrote:
>
>> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=235031
>>
>> --- Comment #36 from Martin Birgmeier <d8zNeCFG@aon.at> ---
>> I just notice that the console and syslog have about 20 messages of
>>
>> em: frame error: ignored
>> em: frame error: ignored
>> em: frame error: ignored
>> em: frame error: ignored
>> em: frame error: ignored
>>
>> Uptime is 2 1/2 hours. 
>
> You seem to be using my old patch which is not in -current:
>
> Index: em_txrx.c
> XX ===================================================================
> XX --- em_txrx.c    (revision 348771)
> XX +++ em_txrx.c    (working copy)
> XX @@ -629,9 +629,20 @@
> XX XX          /* Make sure bad packets are discarded */
> XX          if (errors & E1000_RXD_ERR_FRAME_ERR_MASK) {
> XX +#if 0
> XX              adapter->dropped_pkts++;
> XX -            /* XXX fixup if common */
> XX              return (EBADMSG);
> XX +#else
> XX +            /*
> XX +             * XXX the above error handling is worse than none.
> XX +             * First it it drops 'i' packets before the current
> XX +             * one and doesn't count them.  Then it returns an
> XX +             * error.  iflib can't really handle this error.
> XX +             * It just resets, and this usually drops many more
> XX +             * packets (without counting them) and much time.
> XX +             */
> XX +            printf("lem: frame error: ignored\n");
> XX +#endif
> XX          }
> XX XX          ri->iri_frags[i].irf_flid = 0;
> XX @@ -692,8 +703,12 @@
> XX XX          /* Make sure bad packets are discarded */
> XX          if (staterr & E1000_RXDEXT_ERR_FRAME_ERR_MASK) {
> XX +#if 0
> XX              adapter->dropped_pkts++;
> XX              return EBADMSG;
> XX +#else
> XX +            printf("em: frame error: ignored\n");
> XX +#endif
> XX          }
> XX XX          ri->iri_frags[i].irf_flid = 0;
>
> Without this patch, no message is printed and the device takes a long
> time to recover (when I wrote the patch, recovery was from something
> like a watchdog timeout after many seconds).  With the patch, the recovery
> is good enough for nfs over udp to not lose any data or time out, but I
> don't trust this so I print the message.
>
> Pre-iflib versions of [l]em handled this correctly by dropping a single
> packet, which was easy to do.  Unpatched iflib makes a mess by returning
> with subsequent packets unprocessed.  It apparently just stops receiving
> until kicked by a watchdog.
>
> I don't know what causes this error.  Maybe just a bad cable or switch.
> I don't see it for I218V with the same cable and switch.
>
> Bruce
Comment 38 Martin Birgmeier 2019-08-31 13:35:21 UTC
Your are right, I forgot to remove this patch when updating to the latest 12.0 patchlevel. But I guess that should not adversely affect the driver.

Anyway, with the patch performance was just good enough to let some NFS through.

However, this certainly is no hardware problem because when using the driver from ports (net/intel-em-kmod) performance is perfect.

Something is still very broken with the conversion of this driver to iflib.

I just noticed that comment #34 just shows a merge to stable/12, but I am running releng/12.0. So I should probably merge this stuff and see how that works.

-- Martin
Comment 39 Marko Cupać 2021-02-17 11:04:24 UTC
Any news?

I am having slightly different problem with em driver since 12.0. VirtualBox VMs with bridged networking have terrible network performance. I am now at 12.2-RELEASE-p3 amd64.

Installing net/intel-em-kmod and loading it instead of solves the problem.


Any chance to have this fixed at least in upcoming 13.0-RELEASE?

My dmesg is as follows:

---<<BOOT>>---
Copyright (c) 1992-2020 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
	The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 12.2-RELEASE-p3 GENERIC amd64
FreeBSD clang version 10.0.1 (git@github.com:llvm/llvm-project.git llvmorg-10.0.1-0-gef32c611aa2)
VT(vga): resolution 640x480
module_register: cannot register pci/em from kernel; already loaded from if_em_updated.ko
Module pci/em failed to register: 17
CPU: Intel(R) Xeon(R) CPU E3-1245 V2 @ 3.40GHz (3392.37-MHz K8-class CPU)
  Origin="GenuineIntel"  Id=0x306a9  Family=0x6  Model=0x3a  Stepping=9
  Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
  Features2=0x7fbae3ff<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND>
  AMD Features=0x28100800<SYSCALL,NX,RDTSCP,LM>
  AMD Features2=0x1<LAHF>
  Structured Extended Features=0x281<FSGSBASE,SMEP,ERMS>
  Structured Extended Features3=0x9c000400<MD_CLEAR,IBPB,STIBP,L1DFL,SSBD>
  XSAVE Features=0x1<XSAVEOPT>
  VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID
  TSC: P-state invariant, performance statistics
real memory  = 25769803776 (24576 MB)
avail memory = 24877342720 (23724 MB)
Event timer "LAPIC" quality 600
ACPI APIC Table: <HPQOEM SLIC-WKS>
FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs
FreeBSD/SMP: 1 package(s) x 4 core(s) x 2 hardware threads
random: unblocking device.
ioapic0 <Version 2.0> irqs 0-23 on motherboard
Launching APs: 1 7 4 6 2 3 5
Timecounter "TSC-low" frequency 1696185100 Hz quality 1000
random: entropy device external interface
kbd1 at kbdmux0
000.000023 [4336] netmap_init               netmap: loaded module
[ath_hal] loaded
module_register_init: MOD_LOAD (vesa, 0xffffffff81114e40, 0) error 19
random: registering fast source Intel Secure Key RNG
random: fast provider: "Intel Secure Key RNG"
nexus0
vtvga0: <VT VGA driver> on motherboard
cryptosoft0: <software crypto> on motherboard
aesni0: <AES-CBC,AES-CCM,AES-GCM,AES-ICM,AES-XTS> on motherboard
acpi0: <HPQOEM SLIC-WKS> on motherboard
acpi0: Power Button (fixed)
cpu0: <ACPI CPU> on acpi0
hpet0: <High Precision Event Timer> iomem 0xfed00000-0xfed003ff on acpi0
Timecounter "HPET" frequency 14318180 Hz quality 950
Event timer "HPET" frequency 14318180 Hz quality 550
atrtc0: <AT realtime clock> port 0x70-0x77 irq 8 on acpi0
atrtc0: Warning: Couldn't map I/O.
atrtc0: registered as a time-of-day clock, resolution 1.000000s
Event timer "RTC" frequency 32768 Hz quality 0
attimer0: <AT timer> port 0x40-0x43,0x50-0x53 irq 0 on acpi0
Timecounter "i8254" frequency 1193182 Hz quality 0
Event timer "i8254" frequency 1193182 Hz quality 100
Timecounter "ACPI-fast" frequency 3579545 Hz quality 900
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x408-0x40b on acpi0
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pci0: <ACPI PCI bus> on pcib0
vgapci0: <VGA-compatible display> port 0xf000-0xf03f mem 0xf7800000-0xf7bfffff,0xe0000000-0xefffffff irq 16 at device 2.0 on pci0
vgapci0: Boot video device
xhci0: <Intel Panther Point USB 3.0 controller> mem 0xf7c20000-0xf7c2ffff irq 16 at device 20.0 on pci0
xhci0: 32 bytes context size, 64-bit DMA
xhci0: Port routing mask set to 0xffffffff
usbus0 on xhci0
usbus0: 5.0Gbps Super Speed USB v3.0
pci0: <simple comms> at device 22.0 (no driver attached)
uart2: <Intel Panther Point KT Controller> port 0xf0e0-0xf0e7 mem 0xf7c3a000-0xf7c3afff irq 19 at device 22.3 on pci0
uart2: Using 1 MSI message
em0: <Intel(R) PRO/1000 Network Connection 7.7.8> port 0xf080-0xf09f mem 0xf7c00000-0xf7c1ffff,0xf7c39000-0xf7c39fff irq 20 at device 25.0 on pci0
em0: Using an MSI interrupt
em0: Ethernet address: 10:60:4b:74:64:d7
ehci0: <Intel Panther Point USB 2.0 controller> mem 0xf7c38000-0xf7c383ff irq 16 at device 26.0 on pci0
usbus1: EHCI version 1.0
usbus1 on ehci0
usbus1: 480Mbps High Speed USB v2.0
hdac0: <Intel Panther Point HDA Controller> mem 0xf7c30000-0xf7c33fff irq 22 at device 27.0 on pci0
ehci1: <Intel Panther Point USB 2.0 controller> mem 0xf7c37000-0xf7c373ff irq 23 at device 29.0 on pci0
usbus2: EHCI version 1.0
usbus2 on ehci1
usbus2: 480Mbps High Speed USB v2.0
pcib1: <ACPI PCI-PCI bridge> at device 30.0 on pci0
pci1: <ACPI PCI bus> on pcib1
isab0: <PCI-ISA bridge> at device 31.0 on pci0
isa0: <ISA bus> on isab0
ahci0: <Intel Panther Point AHCI SATA controller> port 0xf0d0-0xf0d7,0xf0c0-0xf0c3,0xf0b0-0xf0b7,0xf0a0-0xf0a3,0xf060-0xf07f mem 0xf7c36000-0xf7c367ff irq 19 at device 31.2 on pci0
ahci0: AHCI v1.30 with 6 6Gbps ports, Port Multiplier not supported
ahcich0: <AHCI channel> at channel 0 on ahci0
ahcich2: <AHCI channel> at channel 2 on ahci0
ahcich3: <AHCI channel> at channel 3 on ahci0
ahcich5: <AHCI channel> at channel 5 on ahci0
ahciem0: <AHCI enclosure management bridge> on ahci0
acpi_button0: <Power Button> on acpi0
acpi_tz0: <Thermal Zone> on acpi0
acpi_tz1: <Thermal Zone> on acpi0
orm0: <ISA Option ROMs> at iomem 0xc0000-0xce7ff,0xce800-0xcf7ff pnpid ORM0000 on isa0
atkbdc0: <Keyboard controller (i8042)> at port 0x60,0x64 on isa0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
atkbdc0: non-PNP ISA device will be removed from GENERIC in FreeBSD 12.
est0: <Enhanced SpeedStep Frequency Control> on cpu0
Timecounters tick every 1.000 msec
hdacc0: <Realtek ALC221 HDA CODEC> at cad 0 on hdac0
hdaa0: <Realtek ALC221 Audio Function Group> at nid 1 on hdacc0
pcm0: <Realtek ALC221 (Analog)> at nid 23 and 24,26,27 on hdaa0
pcm1: <Realtek ALC221 (Analog 2.0+HP)> at nid 20,33 on hdaa0
hdacc1: <Intel Panther Point HDA CODEC> at cad 3 on hdac0
hdaa1: <Intel Panther Point Audio Function Group> at nid 1 on hdacc1
pcm2: <Intel Panther Point (HDMI/DP 8ch)> at nid 6 on hdaa1
ugen0.1: <0x8086 XHCI root HUB> at usbus0
Trying to mount root from ufs:/dev/gpt/root [rw]...
Root mount waiting for: usbus0 usbus1 usbus2 CAM
uhub0: <0x8086 XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus0
ugen2.1: <Intel EHCI root HUB> at usbus2
ugen1.1: <Intel EHCI root HUB> at usbus1
uhub1: <Intel EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus2
uhub2: <Intel EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus1
ses0 at ahciem0 bus 0 scbus4 target 0 lun 0
ses0: <AHCI SGPIO Enclosure 2.00 0001> SEMB S-E-S 2.00 device
ses0: SEMB SES Device
ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
ada0: <INTEL SSDSA2UP024G3H 9CV10379> ATA8-ACS SATA 2.x device
ada0: Serial Number CVHA3010005D024BGN
ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada0: Command Queueing enabled
ada0: 22902MB (46905264 512 byte sectors)
ada1 at ahcich2 bus 0 scbus1 target 0 lun 0
ada1: <WDC WD4004FZWX-00GBGB0 81.H0A81> ACS-2 ATA SATA 3.x device
ada1: Serial Number N8GDYBKY
ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada1: Command Queueing enabled
ada1: 3815447MB (7814037168 512 byte sectors)
ses0: pass0,ada0 in 'Slot 00', SATA Slot: scbus0 target 0
ses0: pass1,ada1 in 'Slot 02', SATA Slot: scbus1 target 0
ada2 at ahcich3 bus 0 scbus2 target 0 lun 0
ada2: <WDC WD4004FZWX-00GBGB0 81.H0A81> ACS-2 ATA SATA 3.x device
ada2: Serial Number N8GDZT5Y
ada2: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada2: Command Queueing enabled
ada2: 3815447MB (7814037168 512 byte sectors)
ses0: pass2,ada2 in 'Slot 03', SATA Slot: scbus2 target 0
ses0: pass3,cd0 in 'Slot 05', SATA Slot: scbus3 target 0
cd0 at ahcich5 bus 0 scbus3 target 0 lun 0
cd0: <hp CDDVDW SH-216BB HE40> Removable CD-ROM SCSI device
cd0: Serial Number R8TM6GCD103P4J
cd0: 150.000MB/s transfers (SATA 1.x, UDMA5, ATAPI 12bytes, PIO 8192bytes)
cd0: Attempt to query device size failed: NOT READY, Medium not present - tray closed
uhub0: 8 ports with 8 removable, self powered
uhub1: 3 ports with 3 removable, self powered
uhub2: 3 ports with 3 removable, self powered
Root mount waiting for: usbus1 usbus2
ugen1.2: <vendor 0x8087 product 0x0024> at usbus1
uhub3 on uhub2
uhub3: <vendor 0x8087 product 0x0024, class 9/0, rev 2.00/0.00, addr 2> on usbus1
ugen2.2: <vendor 0x8087 product 0x0024> at usbus2
uhub4 on uhub1
uhub4: <vendor 0x8087 product 0x0024, class 9/0, rev 2.00/0.00, addr 2> on usbus2
Root mount waiting for: usbus1 usbus2
uhub3: 6 ports with 6 removable, self powered
uhub4: 8 ports with 8 removable, self powered
ugen1.3: <vendor 0x0409 product 0x005a> at usbus1
uhub5 on uhub3
uhub5: <vendor 0x0409 product 0x005a, class 9/0, rev 2.00/1.00, addr 3> on usbus1
uhub5: 2 ports with 2 removable, self powered
Root mount waiting for: usbus1
ugen1.4: <Chicony HP Elite USB Keyboard> at usbus1
ukbd0 on uhub5
ukbd0: <Chicony HP Elite USB Keyboard, class 0/0, rev 1.10/1.21, addr 4> on usbus1
kbd2 at ukbd0
ugen1.5: <PixArt HP USB Optical Mouse> at usbus1
Root mount waiting for: usbus1
ugen1.6: <Generic USB2.0-CRW> at usbus1
umass0 on uhub3
umass0: <Bulk-In, Bulk-Out, Interface> on usbus1
umass0:  SCSI over Bulk-Only; quirks = 0x4000
umass0:5:0: Attached to scbus5
da0 at umass-sim0 bus 0 scbus5 target 0 lun 0
da0: <Generic- Compact Flash 1.00> Removable Direct Access SCSI device
da0: Serial Number 20100818841300000
da0: 40.000MB/s transfers
da0: Attempt to query device size failed: NOT READY, Medium not present
da0: quirks=0x2<NO_6_BYTE>
da1 at umass-sim0 bus 0 scbus5 target 0 lun 1
da1: <Generic- SM/xD-Picture 1.00> Removable Direct Access SCSI device
da1: Serial Number 20100818841300000
da1: 40.000MB/s transfers
da1: Attempt to query device size failed: NOT READY, Medium not present
da1: quirks=0x2<NO_6_BYTE>
da2 at umass-sim0 bus 0 scbus5 target 0 lun 2
da2: <Generic- SD/MMC 1.00> Removable Direct Access SCSI device
da2: Serial Number 20100818841300000
da2: 40.000MB/s transfers
da2: Attempt to query device size failed: NOT READY, Medium not present
da2: quirks=0x2<NO_6_BYTE>
da3 at umass-sim0 bus 0 scbus5 target 0 lun 3
da3: <Generic- M.S./M.S.Pro/HG 1.00> Removable Direct Access SCSI device
da3: Serial Number 20100818841300000
da3: 40.000MB/s transfers
da3: Attempt to query device size failed: NOT READY, Medium not present
da3: quirks=0x2<NO_6_BYTE>
da4 at umass-sim0 bus 0 scbus5 target 0 lun 4
da4: <Generic- SD/MMC/M.S.PRO 1.00> Removable Direct Access SCSI device
da4: Serial Number 20100818841300000
da4: 40.000MB/s transfers
da4: Attempt to query device size failed: NOT READY, Medium not present
da4: quirks=0x2<NO_6_BYTE>
mountroot: waiting for device /dev/gpt/root...
ZFS filesystem version: 5
ZFS storage pool version: features support (5000)
drmn0: <drmn> on vgapci0
vgapci0: child drmn0 requested pci_enable_io
vgapci0: child drmn0 requested pci_enable_io
[drm] Unable to create a private tmpfs mount, hugepage support will be disabled(-19).
Failed to add WC MTRR for [0xe0000000-0xefffffff]: -22; performance may suffer
[drm] Got stolen memory base rxdb200000, size 0x4000000
[drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[drm] Driver supports precise vblank timestamp query.
[drm] Connector VGA-1: get mode from tunables:
[drm]   - kern.vt.fb.modes.VGA-1
[drm]   - kern.vt.fb.default_mode
[drm] Connector HDMI-A-1: get mode from tunables:
[drm]   - kern.vt.fb.modes.HDMI-A-1
[drm]   - kern.vt.fb.default_mode
[drm] Connector DP-1: get mode from tunables:
[drm]   - kern.vt.fb.modes.DP-1
[drm]   - kern.vt.fb.default_mode
[drm] Connector HDMI-A-2: get mode from tunables:
[drm]   - kern.vt.fb.modes.HDMI-A-2
[drm]   - kern.vt.fb.default_mode
[drm] Connector DP-2: get mode from tunables:
[drm]   - kern.vt.fb.modes.DP-2
[drm]   - kern.vt.fb.default_mode
[drm] Initialized i915 1.6.0 20171222 for drmn0 on minor 0
VT: Replacing driver "vga" with new "fb".
start FB_INFO:
type=11 height=1080 width=1920 depth=32
cmsize=16 size=8294400
pbase=0xe0061000 vbase=0xfffff800e0061000
name=drmn0 flags=0x0 stride=7680 bpp=32
cmap[0]=0 cmap[1]=7f0000 cmap[2]=7f00 cmap[3]=c4a000
end FB_INFO
drmn0: fb0: inteldrmfb frame buffer device
lo0: link state changed to UP
uhid0 on uhub5
uhid0: <Chicony HP Elite USB Keyboard, class 0/0, rev 1.10/1.21, addr 4> on usbus1
ums0 on uhub5
ums0: <PixArt HP USB Optical Mouse, class 0/0, rev 2.00/1.00, addr 5> on usbus1
ums0: 3 buttons and [XYZ] coordinates ID=0
em0: link state changed to UP
Comment 40 Marko Cupać 2021-02-17 12:08:39 UTC
Hi,

I just got advice to try and disable LRO on em0 interface with base driver, and it resulted in acceptable performance in my VirtualBox VMs with bridged networking.

I noticed different options between the two drivers:

if_em: options=81249b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,LRO,WOL_MAGIC,VLAN_HWFILTER>

if_em_updated: 	options=4219b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,WOL_MAGIC,VLAN_HWTSO>

The difference is that if_em_updated has TSO4 and not LRO, also VLAN_HWTSO and not VLAN_HWFILTER

I am back to if_em_updated for now, but hopefully this information will be helpful in fixing if_em.

Regards,