Bug 255671 - ixl(4) netmap pkt-gen stops transmitting
Summary: ixl(4) netmap pkt-gen stops transmitting
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 12.2-STABLE
Hardware: amd64 Any
: --- Affects Some People
Assignee: freebsd-net (Nobody)
URL:
Keywords: IntelNetworking
Depends on:
Blocks:
 
Reported: 2021-05-06 21:29 UTC by Brian Poole
Modified: 2021-06-16 14:31 UTC (History)
4 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Brian Poole 2021-05-06 21:29:19 UTC
Hello,

I am using an Intel XL710-Q2 (2x40GbE) NIC with the ixl driver on FreeBSD-12.2. I have the two ports connected to each other via fiber. During testing with netmap's pkt-gen, I observe the transmitter sporadically stops sending packets when running at top speed. Running at a rate-limited speed or running without a receiving process allows the transmitter to run for extended periods of time.

Expanding on the above, I have ixl0 running netmap receive:
pkt-gen -i ixl0 -f rx

Interface ixl1 is transmitting:
pkt-gen -i ixl1 -f tx -z -Z

Both are running on the same 4 core / 8 thread system. Below is the output of the transmitting process showing how it will transmit at 25Mpps for a period (typically 3-30 seconds) then drops and remains at 0. When I CTRL-C the transmitting pkt-gen and start it again, the cycle repeats - transmits at full rate for a time then falls to 0.

I tried running with '-R 10000000' to limit the transmitter to 10Mpps and that appears stable. Likewise, '-R 15000000' runs without issue. Running with '-R 20000000' was the first time I saw this behavior happen. Also, if I don't have a receive pkt-gen running, the transmitter pkt-gen appears to run without failure.


928.543328 main [2889] interface is ixl1
928.543341 main [3011] using default burst size: 512
928.543345 main [3019] running on 1 cpus (have 8)
928.543429 extract_ip_range [471] range is 10.0.0.1:1234 to 10.0.0.1:1234
928.543433 extract_ip_range [471] range is 10.1.0.1:1234 to 10.1.0.1:1234
928.543448 nm_open [858] overriding ARG1 0
928.543451 nm_open [862] overriding ARG2 0
928.543452 nm_open [866] overriding ARG3 0
928.543453 nm_open [870] overriding RING_CFG
928.543453 nm_open [879] overriding ifname ixl1 ringid 0x0 flags 0x8001
929.781631 main [3117] mapped 334980KB at 0x800e00000
Sending on netmap:ixl1: 4 queues, 1 threads and 1 cpus.
10.0.0.1 -> 10.1.0.1 (00:00:00:00:00:00 -> ff:ff:ff:ff:ff:ff)
929.781666 main [3202] --- SPECIAL OPTIONS:

929.781667 main [3224] Sending 512 packets every  0.000000000 s
929.781699 start_threads [2549] Wait 2 secs for phy reset
931.816433 start_threads [2551] Ready...
931.816467 sender_body [1580] start, fd 3 main_fd 3
931.816489 sender_body [1638] frags 1 frag_size 60
931.826028 sender_body [1676] drop copy
932.818181 main_thread [2639] 25.799 Mpps (25.843 Mpkts 12.405 Gbps in 1001715 usec) 29.06 avg_batch 0 min_space
933.819180 main_thread [2639] 25.775 Mpps (25.801 Mpkts 12.384 Gbps in 1000999 usec) 28.81 avg_batch 99999 min_space
934.820181 main_thread [2639] 25.464 Mpps (25.489 Mpkts 12.235 Gbps in 1001001 usec) 28.27 avg_batch 99999 min_space
935.822180 main_thread [2639] 22.089 Mpps (22.133 Mpkts 10.624 Gbps in 1001999 usec) 29.97 avg_batch 99999 min_space
936.823273 main_thread [2639] 202.339 Kpps (202.560 Kpkts 97.229 Mbps in 1001093 usec) 30.56 avg_batch 99999 min_space
937.824192 main_thread [2639] 0.000 pps (0.000 pkts 0.000 bps in 1000919 usec) 0.00 avg_batch 99999 min_space
937.832261 sender_body [1663] poll error on queue 0: timeout
938.865434 main_thread [2639] 0.000 pps (0.000 pkts 0.000 bps in 1041242 usec) 0.00 avg_batch 99999 min_space
939.833191 sender_body [1663] poll error on queue 0: timeout
939.908431 main_thread [2639] 0.000 pps (0.000 pkts 0.000 bps in 1042997 usec) 0.00 avg_batch 99999 min_space
940.910179 main_thread [2639] 0.000 pps (0.000 pkts 0.000 bps in 1001748 usec) 0.00 avg_batch 99999 min_space
941.834194 sender_body [1663] poll error on queue 0: timeout
941.942570 main_thread [2639] 0.000 pps (0.000 pkts 0.000 bps in 1032391 usec) 0.00 avg_batch 99999 min_space
^C942.272210 sigint_h [562] received control-C on thread 0x800722000
942.272220 main_thread [2639] 0.000 pps (0.000 pkts 0.000 bps in 329649 usec) 0.00 avg_batch 99999 min_space
943.273216 main_thread [2639] 0.000 pps (0.000 pkts 0.000 bps in 1000996 usec) 0.00 avg_batch 99999 min_space
943.835259 sender_body [1718] flush tail 128 head 128 on thread 0x800722500
944.300285 main_thread [2639] 0.000 pps (0.000 pkts 0.000 bps in 1027069 usec) 0.00 avg_batch 99999 min_space
Sent 99468256 packets 5968095360 bytes 3431947 events 60 bytes each in 12.02 seconds.
Speed: 8.276 Mpps Bandwidth: 3.973 Gbps (raw 3.973 Gbps). Average batch: 28.98 pkts


I enabled verbose netmap logging with 'sysctl dev.netmap.verbose=1' and captured the events during the test above. I don't see any error messages.


May  5 16:35:19 m1 kernel: 919.656330 [1927] netmap_interp_ringid      ixl0: tx [0,4) rx [0,4) id 0
May  5 16:35:19 m1 kernel: 919.816690 [ 621] netmap_set_all_rings      ixl0: disable all rings
May  5 16:35:20 m1 kernel: 919.887388 [4127] netmap_reset              ixl0 TX0, hc 0->0, ht 1023->1023, ho 0->0
May  5 16:35:20 m1 kernel: 919.977866 [4127] netmap_reset              ixl0 TX1, hc 0->0, ht 1023->1023, ho 0->0
May  5 16:35:20 m1 kernel: 920.068349 [4127] netmap_reset              ixl0 TX2, hc 0->0, ht 1023->1023, ho 0->0
May  5 16:35:20 m1 kernel: 920.158830 [4127] netmap_reset              ixl0 TX3, hc 0->0, ht 1023->1023, ho 0->0
May  5 16:35:20 m1 kernel: 920.250844 [4127] netmap_reset              ixl0 RX0, hc 0->0, ht 0->0, ho 0->0
May  5 16:35:20 m1 kernel: 920.334074 [4127] netmap_reset              ixl0 RX1, hc 0->0, ht 0->0, ho 0->0
May  5 16:35:20 m1 kernel: 920.418313 [4127] netmap_reset              ixl0 RX2, hc 0->0, ht 0->0, ho 0->0
May  5 16:35:20 m1 kernel: 920.502557 [4127] netmap_reset              ixl0 RX3, hc 0->0, ht 0->0, ho 0->0
May  5 16:35:20 m1 kernel: 920.586800 [ 621] netmap_set_all_rings      ixl0: enable all rings
May  5 16:35:20 m1 kernel: 920.657517 [1083] netmap_mmap_single        cdev 0xfffff80104c31a00 foff 0 size 343019520 objp 0xfffffe00facf59a8 prot 3
May  5 16:35:20 m1 kernel: 920.784392 [ 991] netmap_dev_pager_ctor     handle 0xfffff80100088f10 size 343019520 prot 3 foff 0
May  5 16:35:28 m1 kernel: 928.543459 [1927] netmap_interp_ringid      ixl1: tx [0,4) rx [0,4) id 0
May  5 16:35:28 m1 kernel: 928.703938 [ 621] netmap_set_all_rings      ixl1: disable all rings
May  5 16:35:28 m1 kernel: 928.774688 [4127] netmap_reset              ixl1 TX0, hc 0->0, ht 1023->1023, ho 0->0
May  5 16:35:28 m1 kernel: 928.865166 [4127] netmap_reset              ixl1 TX1, hc 0->0, ht 1023->1023, ho 0->0
May  5 16:35:28 m1 kernel: 928.955647 [4127] netmap_reset              ixl1 TX2, hc 0->0, ht 1023->1023, ho 0->0
May  5 16:35:29 m1 kernel: 929.046130 [4127] netmap_reset              ixl1 TX3, hc 0->0, ht 1023->1023, ho 0->0
May  5 16:35:29 m1 kernel: 929.144101 [4127] netmap_reset              ixl1 RX0, hc 0->0, ht 0->0, ho 0->0
May  5 16:35:29 m1 kernel: 929.227306 [4127] netmap_reset              ixl1 RX1, hc 0->0, ht 0->0, ho 0->0
May  5 16:35:29 m1 kernel: 929.311546 [4127] netmap_reset              ixl1 RX2, hc 0->0, ht 0->0, ho 0->0
May  5 16:35:29 m1 kernel: 929.395785 [4127] netmap_reset              ixl1 RX3, hc 0->0, ht 0->0, ho 0->0
May  5 16:35:29 m1 kernel: 929.480030 [ 621] netmap_set_all_rings      ixl1: enable all rings
May  5 16:35:29 m1 kernel: 929.550747 [1083] netmap_mmap_single        cdev 0xfffff80104c31a00 foff 0 size 343019520 objp 0xfffffe00facf09a8 prot 3
May  5 16:35:29 m1 kernel: 929.677625 [ 991] netmap_dev_pager_ctor     handle 0xfffff8010624e8d0 size 343019520 prot 3 foff 0
May  5 16:35:44 m1 kernel: 944.072696 [1007] netmap_dev_pager_dtor     handle 0xfffff80100088f10
May  5 16:35:44 m1 kernel: 944.229535 [ 621] netmap_set_all_rings      ixl0: disable all rings
May  5 16:35:44 m1 kernel: 944.300402 [1007] netmap_dev_pager_dtor     handle 0xfffff8010624e8d0
May  5 16:35:44 m1 kernel: 944.374240 [1142] netmap_close              dev 0xfffff80104c31a00 fflag 0x20003 devtype 8192 td 0xfffff801123f0000
May  5 16:35:44 m1 kernel: 944.579807 [ 621] netmap_set_all_rings      ixl1: disable all rings
May  5 16:35:44 m1 kernel: 944.652540 [ 429] netmap_init_obj_allocator_bitmap netmap_if free 100
May  5 16:35:44 m1 kernel: 944.725321 [ 429] netmap_init_obj_allocator_bitmap netmap_ring free 200
May  5 16:35:44 m1 kernel: 944.801577 [ 429] netmap_init_obj_allocator_bitmap netmap_buf free 163840


Lastly, here is the driver initialization:
ixl0: <Intel(R) Ethernet Controller XL710 for 40GbE QSFP+ - 2.3.0-k> mem 0xfa800000-0xfaffffff,0xfb008000-0xfb00ffff irq 40 at device 0.0 on pci8
ixl0: fw 6.0.48442 api 1.7 nvm 6.01 etid 80003564 oem 0.0.0
ixl0: PF-ID[0]: VFs 64, MSI-X 129, VF MSI-X 5, QPs 768, I2C
ixl0: Using 1024 TX descriptors and 1024 RX descriptors
ixl0: queue equality override not set, capping rx_queues at 4 and tx_queues at 4
ixl0: Using 4 RX queues 4 TX queues
ixl0: Using MSI-X interrupts with 5 vectors
ixl0: Ethernet address:
ixl0: Allocating 4 queues for PF LAN VSI; 4 queues active
ixl0: PCI Express Bus: Speed 8.0GT/s Width x8
ixl0: netmap queues/slots: TX 4/1024, RX 4/1024
ixl1: <Intel(R) Ethernet Controller XL710 for 40GbE QSFP+ - 2.3.0-k> mem 0xfa000000-0xfa7fffff,0xfb000000-0xfb007fff irq 40 at device 0.1 on pci8
ixl1: fw 6.0.48442 api 1.7 nvm 6.01 etid 80003564 oem 0.0.0
ixl1: PF-ID[1]: VFs 64, MSI-X 129, VF MSI-X 5, QPs 768, I2C
ixl1: Using 1024 TX descriptors and 1024 RX descriptors
ixl1: queue equality override not set, capping rx_queues at 4 and tx_queues at 4
ixl1: Using 4 RX queues 4 TX queues
ixl1: Using MSI-X interrupts with 5 vectors
ixl1: Ethernet address:
ixl1: Allocating 4 queues for PF LAN VSI; 4 queues active
ixl1: PCI Express Bus: Speed 8.0GT/s Width x8
ixl1: netmap queues/slots: TX 4/1024, RX 4/1024


Thank you for any help debugging and fixing this issue.
Comment 1 Ozkan KIRIK 2021-05-08 21:42:52 UTC
Hello,

I hit same problem before. We have discussed this issue with Vincenzo Maffione. 
He told me that ixl driver has issues but setting hw.ixl.enable_head_writeback=0
on loader.conf helps.
By setting this loader tunable I solved my problem.

I hope it helps to you also.

It will be good if this issue could be solved within ixl driver.

Regads
Ozkan
Comment 2 Brian Poole 2021-05-10 14:36:50 UTC
Hello Ozkan,

Thank you very much for your comment! I have not seen any failures in tx/rx or tx/tx configurations since setting hw.ixl.enable_head_writeback=0. Now that I know what to search for, I found multiple posts suggesting setting that sysctl to zero for better stability. I agree it would be preferable to not require manual adjustments to ixl.