Bug 226289 - [igb] [netmap] Kernel NIC Driver conflict
Summary: [igb] [netmap] Kernel NIC Driver conflict
Status: Closed Not A Bug
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 11.2-RELEASE
Hardware: amd64 Any
: --- Affects Some People
Assignee: freebsd-net (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-03-02 02:45 UTC by Nolli
Modified: 2021-01-08 21:09 UTC (History)
5 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Nolli 2018-03-02 02:45:41 UTC
The kernel seems to generate FreeBSD kernel code (specifically the netmap module) issues with NIC...in my case Intel dual gigabit PCle 82575 where I would get a series of: 196.115874 [1071] netmap_grab_packets   bad pkt at 445 1en 2164
or igb1 watchdog timeout. It seems that the kernel and the NIC driver conflict.
Comment 1 wings1446 2018-10-25 00:08:57 UTC
This is still occurring in FreeBSD 11.2-RELEASE-p3.  Please see the following URL on the FreeBSD support forum:  https://forums.freebsd.org/threads/intel-ethernet-adapter-with-intel-82580-controller.67988/.  Can someone please take a look.  Thank you.
Comment 2 Eric Joyner freebsd_committer freebsd_triage 2018-10-25 00:53:58 UTC
(In reply to wings1446 from comment #1)

Are you using jumbo frames?
Comment 3 wings1446 2018-10-25 01:22:43 UTC
(In reply to Eric Joyner from comment #2)
I'm not using jumbo frames at the moment.
Comment 4 Rodney W. Grimes freebsd_committer freebsd_triage 2018-10-25 03:16:27 UTC
Adding vmaffione@FreeBSD.org, to cc: list as the netmap expert.
Comment 5 wings1446 2018-10-29 20:41:53 UTC
Just to give you an idea of what I'm seeing just in case it helps:

Oct 29 16:23:00  kernel  580.594684 [1071] netmap_grab_packets bad pkt at 259 len 2378
Oct 29 16:23:00  kernel  580.079591 [1071] netmap_grab_packets bad pkt at 194 len 2378
Oct 29 16:22:59  kernel  579.560714 [1071] netmap_grab_packets bad pkt at 178 len 2378
Oct 29 16:22:59  kernel  579.192987 [1071] netmap_grab_packets bad pkt at 166 len 2378
Oct 29 16:22:59  kernel  579.009581 [1071] netmap_grab_packets bad pkt at 158 len 2358
Oct 29 16:22:58  kernel  578.492871 [1071] netmap_grab_packets bad pkt at 145 len 2358
Oct 29 16:22:58  kernel  578.078858 [1071] netmap_grab_packets bad pkt at 136 len 2358
Oct 29 16:22:58  kernel  577.876158 [1071] netmap_grab_packets bad pkt at 126 len 2358
Oct 29 16:22:57  kernel  577.776789 [1071] netmap_grab_packets bad pkt at 123 len 2271
Oct 29 16:22:57  kernel  577.326387 [1071] netmap_grab_packets bad pkt at 95 len 2271
Comment 6 Sean Bruno freebsd_committer freebsd_triage 2018-10-29 22:40:25 UTC
(In reply to wings1446 from comment #5)
Can you detail precisely your setup and test case?  I assume you are attaching with pkt-gen but I'd be curious about any other information.
Comment 7 wings1446 2018-10-30 23:08:01 UTC
A partial copy from my FreeBSD support forum post:  
https://forums.freebsd.org/threads/intel-ethernet-adapter-with-intel-82580-controller.67988/

I'm currently using pfSense 2.4.4 with FreeBSD 11.2-RELEASE-p3. I'm using an Intel E1G44HT I340-T4 4 port PCIe Ethernet Server Adapter that's supposed to have an Intel 82580 controller (from Intel's documentation: https://ark.intel.com/products/49186/Intel-Ethernet-Server-Adapter-I340-T4). The following URL shows that that the igb(4) driver is supposed to be compatible with the ethernet adapter I listed with the 82580 controller: https://www.freebsd.org/cgi/man.cgi?igb(4) . I'm using Suricata with pfSense for an IDPS in Inline IPS mode. I'm receiving netmap_grab_packets messages which I'm told means that only certain ethernet drivers work with netmap and has nothing to do with Suricata.

Here is a post from pfSense outlining the same that's pretty detaled:
https://forum.netgate.com/topic/110562/suricata-causing-kernel-error-netmap_grab_packets-bad-pkt-at/2

I hope this information helps.
Comment 8 wings1446 2018-11-01 20:05:28 UTC
Today I had to disable Suricata due to the netmap issue I submitted.  My internet was going offline way too much.
Comment 9 Vincenzo Maffione freebsd_committer freebsd_triage 2019-01-10 22:14:42 UTC
Hi, I'm sorry, I saw this only now.

Long story short: this looks like your netmap application behaving incorrectly, or is misconfigured, and netmap duly complaining about that.

Netmap uses packet buffers of fixed size, which is 2048 by default (sysctl dev.netmap.buf_size, see netmap(4)).

That piece of code (netmap_grab_packets) forwards packets from the netmap "host TX ring" (associated to the interface igb1) and injects those into the FreeBSD kernel stack, so that kernel stack "thinks" that those packets are coming from the igb1 link.
Who puts packets in the igb1 "host TX ring"? Well, that's an userspace netmap application (maybe Suricata in your deployment? It's not clear from your description. You should check in your code who is opening netmap ports with nm_open(...) or ioctl(.., NIOCREGIF, ..).

Unfortunately, some of the packets written to the host TX ring look larger than the netmap buffer size, which means that Suricata (or whoever) is setting the len field (in struct netmap_slot) to something larger than 2048. This is clearly wrong, and those messages mean that netmap is complaining about that.

So the real question is: why is your application is doing that?
If you don't find an answer for that of course you can increase dev.netmap.buf_size, e.g. double it to 4096 ...

The real solution to your problem may be to simply disable TSO (TCP Segmentation Offloading) and LRO (Large receive offloading) in the network interfaces of your system (through ifconfig), because TSO and LRO are likely the reason why you see  packets larger than the MTU (which I assume is 1500, since you said you are not using jumbo frames).
Note that TSO and LRO should always be disabled when playing with netmap (read the very end of netmap(4)).