Bug 208409 - [PATCH] igb and ALTQ
Summary: [PATCH] igb and ALTQ
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 10.3-BETA2
Hardware: Any Any
: --- Affects Many People
Assignee: freebsd-net (Nobody)
URL:
Keywords: IntelNetworking, patch
Depends on:
Blocks:
 
Reported: 2016-03-30 23:14 UTC by freebsd
Modified: 2017-04-05 22:29 UTC (History)
9 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description freebsd 2016-03-30 23:14:21 UTC
In 10.3-RELEASE the man page for altq(4) indicates igb(4) is on the list of supported devices, however by default it doesn't work.  Attempting to apply altq to an igb interface generates the following error:

 pfctl: igb0: driver does not support altq

This unexpected behavior can be fixed with the following three-line patch:

--- sys/dev/e1000/if_igb.h_DIST 2016-03-24 19:09:30.000000000 -0600
+++ sys/dev/e1000/if_igb.h      2016-03-29 22:27:53.271352000 -0600
@@ -35,6 +35,10 @@
 #ifndef _IF_IGB_H_
 #define _IF_IGB_H_
 
+#ifdef ALTQ
+#define IGB_LEGACY_TX
+#endif
+
 #include <sys/param.h>
 #include <sys/systm.h>
 #ifndef IGB_LEGACY_TX

Please consider adopting this simple workaround, as this problem has persisted since FreeBSD 9.1
Comment 1 Sean Bruno freebsd_committer freebsd_triage 2016-04-06 01:10:13 UTC
Eric:

Does this look fine to you guys?
Comment 2 Olivier Cochard freebsd_committer freebsd_triage 2016-04-06 07:25:52 UTC
But enabling IGB_LEGACY_TX disable multiqueue, right ?
And disabling NIC multiqueue feature have a big performance impact.
Comment 3 freebsd 2016-04-06 18:20:54 UTC
(In reply to Olivier Cochard from comment #2)
Multiqueue on igb is incompatible with ALTQ.  When building a kernel with ALTQ support, it must be disabled or queues may not be assigned to an igb interface.  The patch simply automates this dependency.
Comment 4 Sean Bruno freebsd_committer freebsd_triage 2016-04-29 14:47:25 UTC
Eric:

I'm going to commit this today.
Comment 5 Eric Joyner freebsd_committer freebsd_triage 2016-04-29 17:06:15 UTC
(In reply to Sean Bruno from comment #4)

Okay; this looks fine.
Comment 6 commit-hook freebsd_committer freebsd_triage 2016-05-06 15:41:53 UTC
A commit references this bug:

Author: sbruno
Date: Fri May  6 15:41:38 UTC 2016
New revision: 299182
URL: https://svnweb.freebsd.org/changeset/base/299182

Log:
  If ALTQ is defined in the kern conf, switch to Legacy Mode.

  PR:		208409
  Submitted by:	freebsd@mcwest.org
  MFC after:	2 weeks

Changes:
  head/sys/dev/e1000/if_igb.h
Comment 7 commit-hook freebsd_committer freebsd_triage 2016-07-22 03:17:02 UTC
A commit references this bug:

Author: sbruno
Date: Fri Jul 22 03:16:38 UTC 2016
New revision: 303174
URL: https://svnweb.freebsd.org/changeset/base/303174

Log:
  MFC r299182
  If ALTQ is defined in the kern conf, switch to Legacy Mode.

  PR:		208409

Changes:
  stable/10/sys/dev/e1000/if_igb.h
Comment 8 Nicolas 2016-11-06 22:23:54 UTC
The patch create a lot of trouble.

See https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=212413 and https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=213257.

I suggest to rollback.

Why this patch have been commited as it have a performance impact (disabling multiqueue) ?

Creating a kernel options IGB_LEGACY_TX (disabled by default) to enable IGB_LEGACY_TX is a better approch if there is no solution to support ALTQ with multiqueue enabled.
Comment 9 Sean Bruno freebsd_committer freebsd_triage 2016-11-06 23:34:41 UTC
(In reply to nicolas from comment #8)
So ... if you remove this diff, ALTQ works with the multiqueue driver?
Comment 10 ncrogers 2016-11-07 15:46:20 UTC
ALTQ has never worked with the multiqueue drivers. em(4) interfaces work "out of the box" when ALTQ is included in the kernel, because the default em(4) behavior has multiqueue disabled. With igb(4) and ixgbe(4), you MUST enable the legacy path in the driver. Prior to this patch, the only way to sensibly do this was to manually edit the driver and add an IGB_LEGACY_TX define.

It sounds like there is an issue with the legacy single-queue path, perhaps only when ALTQ is not actually used in the PF configuration? FWIW I support many PF routers with igb(4) interfaces having ALTQ enabled (not just compiled in the kernel), including the IGB_LEGACY_TX path, and have not had any problems, albeit I am still running 10.1-RELEASE.

This was a much-needed patch from the perspective of someone that supports many PF+ALTQ installations and has had to add IGB_LEGACY_TX manually for years now (since the 8.x days I believe). However it appears that there are many people that have haphazardly added ALTQ support to their kernels without actually using it, and that is definitely not a case where you would want to downgrade to the single-queue behavior.

Somehow this patch in question made it through even though the issue has been brought up many times before over the past few years. At one point there was discussion of adding some kind of kernel tunable to enable the IGB/IXGBE legacy transmit paths, but it did not gain any committer support. Perhaps this is the way to go as it would have prevented the problem where people are now unexpectedly running the single-queue transmit path because ALTQ is in their kernel.

The following PR had discussed the kernel tunable option.
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=194197
Comment 11 freebsd 2016-11-07 16:58:31 UTC
(In reply to ncrogers from comment #10)
Well said.  I can confirm that there is a bug in the igb(4) driver when using the legacy single-queue mode, eventually causing the kernel to crash.  Adding a queue to limit throughput is an effective workaround.  For example:

     ## Limit bandwidth on internal interface to avoid igb driver bug
     altq on $int_if cbq bandwidth 404Mb queue { internal }
     queue internal bandwidth 99% priority 1 cbq(default red borrow)

Note that this appears to be a bug in the driver, and not in ALTQ itself (and hence should be a separate bug report).  However, with this workaround in place I have not seen any crashes in the last six months.
Comment 12 ncrogers 2016-11-07 18:10:57 UTC
(In reply to freebsd from comment #11)
That is good to know. How did you arrive at 404Mb as the limit, or is it perhaps higher? I wonder when the bug was introduced because I know I have routers pushing upwards of 1G on an igb interface with an altq+hfsc limit of 1Gb. But this is on 10.1 and I believe the Intel drivers have been updated since then.
Comment 13 freebsd 2016-11-07 19:34:41 UTC
(In reply to ncrogers from comment #12)
I determined the queue BW by trial-and-error, i.e. decrease the limit until the system is stable.  I suspect the max may depend on the system config (e.g. CPU speed).  Unfortunately I haven't found the bug in the driver, so I can't offer a patch.
Comment 14 Luiz Otavio O Souza,+55 (14) 99772-1255 freebsd_committer freebsd_triage 2017-03-16 03:31:45 UTC
The bug with ALTQ and IGB_LEGACY_TX is now fixed.  I believe this PR can be closed.