I've filed a bug report with netmap, but it seems the FreeBSD project is using a different tree, so I'm reporting it here as well.
I've reproduced the problem with
* 10.2 with the netmap + re code from 11-CURRENT
* 10.2 with netmap from the official repository (master)
The problem is always the same
Using pkt-gen and after 20 or so "batches", the card is overloaded and stops responding. I've tried various driver settings (polling, fast queue, no MSI, irq filtering, etc.), but nothing helped.
There is a driver from Realtek, but it doesn't support netmap, so I've tried to patch it, but I've got exactly the same results as described in other netmap issues. Only one batch makes it. If I limit the rate, it fails after the total of each batch matches the one of a default batch.
One thing I've noticed in my tests is that the generic software implementation (which works flawlessly, but eats a lot of CPU) has 1024 queues and when looking at the number of mbufs used with netstat, I can see that 1024 are in use.
In dmesg, I can see that the realtek driver support 256 queues, but in netstat, it uses 512 and sometimes even more (erratic changes up to 600+ at which point things fail).
Could this be the reason? Is this fixable in netmap or is this a driver issue which should be reported in the FreeBSD project?
Details about the card
re0: <RealTek 8168/8111 B/C/CP/D/DP/E/F/G PCIe Gigabit Ethernet> port 0xe000-0xe0ff mem 0x81300000-0x81300fff,0xa0100000-0xa0103fff irq 17 at device 0.0 on pci2
re0: Using 1 MSI-X message
re0: turning off MSI enable bit.
re0: Chip rev. 0x4c000000
re0: MAC rev. 0x00000000
miibus0: <MII bus> on re0
rgephy0: <RTL8251 1000BASE-T media interface> PHY 1 on miibus0
rgephy0: none, 10baseT, 10baseT-FDX, 10baseT-FDX-flow, 100baseTX, 100baseTX-FDX, 100baseTX-FDX-flow, 1000baseT-FDX, 1000baseT-FDX-master, 1000baseT-FDX-flow, 1000baseT-FDX-flow-master, auto, auto-flow
re0: Using defaults for TSO: 65518/35/2048
re0: netmap queues/slots: TX 1/256, RX 1/256
re0@pci0:2:0:0: class=0x020000 card=0x012310ec chip=0x816810ec rev=0x0c hdr=0x00
vendor = 'Realtek Semiconductor Co., Ltd.'
device = 'RTL8111/8168B PCI Express Gigabit Ethernet controller'
class = network
subclass = ethernet
bar  = type I/O Port, range 32, base rxe000, size 256, enabled
bar  = type Memory, range 64, base rx81300000, size 4096, enabled
bar  = type Prefetchable Memory, range 64, base rxa0100000, size 16384, enabled
I've just tested on 11-CURRENT and got the same results.
Setting re0 to use a MTU of 9000 and the connection stays alive. Instead of timing out, the packet rate drops drastically once and things go back to normal.
The main difference in netstat is that the mbuf clusters are split between standard and jumbo frames
768/2787/3555 mbufs in use (current/cache/total)
256/1524/1780/500200 mbuf clusters in use (current/cache/total/max)
256/1515 mbuf+clusters out of packet secondary zone in use (current/cache)
0/46/46/250099 4k (page size) jumbo clusters in use (current/cache/total/max)
256/65/321/74103 9k jumbo clusters in use (current/cache/total/max)
0/0/0/41683 16k jumbo clusters in use (current/cache/total/max)
3008K/4513K/7521K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0 requests for sfbufs denied
0 requests for sfbufs delayed
0 requests for I/O initiated by sendfile
The rate in vmstat keeps rising, but that doesn't seem to be a problem
interrupt total rate
irq16: sdhci_pci0 1 0
cpu0:timer 3008083 1113
irq256: ahci0 10125 3
irq257: xhci0 11363 4
irq258: hdac0 3 0
irq259: re0 13105929 4850
irq260: re1 101440 37
cpu2:timer 1095578 405
cpu1:timer 1083354 400
cpu3:timer 1123144 415
Total 19539020 7231
I think this is logged when things start failing
231.020147  netmap_transmit re0 full hwcur 0 hwtail 0 qlen 255 len 42 m 0xfffff80052005400
235.997171  netmap_transmit re0 full hwcur 0 hwtail 0 qlen 255 len 42 m 0xfffff80023ec9c00
240.989245  netmap_transmit re0 full hwcur 0 hwtail 0 qlen 255 len 42 m 0xfffff800521ad500
247.887586  netmap_transmit re0 full hwcur 0 hwtail 0 qlen 255 len 42 m 0xfffff80023da9b00
253.069781  netmap_transmit re0 full hwcur 0 hwtail 0 qlen 255 len 42 m 0xfffff80023ec7700
258.110746  netmap_transmit re0 full hwcur 0 hwtail 0 qlen 255 len 42 m 0xfffff800521ade00
263.188076  netmap_transmit re0 full hwcur 0 hwtail 0 qlen 255 len 42 m 0xfffff800237d6900
Using fresh netmap from FreeBSD 11 and a newer pkt-gen, this is what I see.
986.519903  netmap_ioctl nr_cmd must be 0 not 12
047.486179  nm_txsync_prologue fail head < kring->rhead || head > kring->rtail
047.510386  nm_txsync_prologue re0 TX0 kring error: head 107 cur 107 tail 106 rhead 52 rcur 52 rtail 106 hwcur 52 hwtail 106
047.534818  netmap_ring_reinit called for re0 TX0
051.945718  nm_txsync_prologue fail head < kring->rhead || head > kring->rtail
051.990215  nm_txsync_prologue re0 TX0 kring error: head 225 cur 225 tail 224 rhead 223 rcur 223 rtail 224 hwcur 223 hwtail 224
052.009143  netmap_ring_reinit called for re0 TX0
At this point pkt-gen exits with error.
I've also tried using the netmap software emulation and it crashes even earlier.
Current netmap code in HEAD, stable/11 and stable/12 is aligned to the github
(and code has changed quite a lot since 2016).
I just tried to run pkt-gen (tx or rx) in a VM with a r8169 emulated NIC, and everything seems to work fine to me.
Can you check if the issue is still there?