Summary: | FreeBSD 11-RC2 crashing after some time | ||
---|---|---|---|
Product: | Base System | Reporter: | Cassiano Peixoto <peixoto.cassiano> |
Component: | kern | Assignee: | Luiz Otavio O Souza,+55 (14) 99772-1255 <loos> |
Status: | Closed FIXED | ||
Severity: | Affects Many People | CC: | chris, freebsd, ncrogers, nicolas, ports |
Priority: | --- | Keywords: | regression |
Version: | 11.0-RC1 | ||
Hardware: | amd64 | ||
OS: | Any |
Description
Cassiano Peixoto
2016-09-06 14:52:11 UTC
Guys, Just an update about this issue. I had to remove ALTQ from my kernel. After that it stopped crashing. So looks some conflict with ALTQ. I confirm this bug on FreeBSD 11.0 RELEASE. Only igb interfaces are affected as the issue don't occur with host using em instead of igb interfaces. The crash seem to be related to a certain type and amount of packets. Removing ALTQ support from kernel fix the issue. (In reply to nicolas from comment #2) FYI, including ALTQ in the kernel config switches igb(4) interfaces to legacy single-queue mode. I have seen crashes like these with high-bandwidth traffic when the driver is in this mode. If you need ALTQ, add a queue to limit throughput through igb interfaces. For example, the following workaround has prevented crashes on my systems for the last six months: ## Limit bandwidth on internal interface to avoid igb driver bug altq on $int_if cbq bandwidth 404Mb queue { internal } queue internal bandwidth 99% priority 1 cbq(default red borrow) This issue is also discussed here: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=208409 and here: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=213257 A commit references this bug: Author: loos Date: Sat Feb 25 20:21:39 UTC 2017 New revision: 314281 URL: https://svnweb.freebsd.org/changeset/base/314281 Log: Disable the driver managed queue for igb(4) when the legacy transmit interface is used. The legacy API (IGB_LEGACY_TX) is enabled when ALTQ is built into kernel. As noted in altq(9), it is responsibility of the caller to protect this queue against concurrent access and, in the igb case, the interface send queue is protected by tx queue mutex. This obviously cannot protect the driver managed queue against concurrent access from different tx queues and leads to numerous and quite strange panic traces (usually shown as packets disappearing into thin air). Improving the locking to cope with this means serialize all access to this (single) queue and produces no gain, it actually affects the performance quite noticeabily. The driver managed queue is already disabled when an ALTQ queue discipline is set on interface (in altq_enable()), because the driver managed queue can interfere with ALTQ timing (whence the reports that setting an ALTQ queue discipline on interface also fixes the issue). Disabling this additional queue keeps the ability to use if_start() to send packets to individual NIC queues while it simply eliminate the race. This is a direct commit to stable/11 as -head driver does not support ALTQ anymore. PR: 213257 PR: 212413 Discussed with: sbruno Tested by: Konstantin Kormashev <konstantin@netgate.com> Obtained from: pfSense Sponsored by: Rubicon Communications, LLC (Netgate) Changes: stable/11/sys/dev/e1000/if_igb.c A commit references this bug: Author: loos Date: Sat Mar 11 07:54:05 UTC 2017 New revision: 315060 URL: https://svnweb.freebsd.org/changeset/base/315060 Log: MFC of r314281: Disable the driver managed queue for igb(4) when the legacy transmit interface is used. The legacy API (IGB_LEGACY_TX) is enabled when ALTQ is built into kernel. As noted in altq(9), it is responsibility of the caller to protect this queue against concurrent access and, in the igb case, the interface send queue is protected by tx queue mutex. This obviously cannot protect the driver managed queue against concurrent access from different tx queues and leads to numerous and quite strange panic traces (usually shown as packets disappearing into thin air). Improving the locking to cope with this means serialize all access to this (single) queue and produces no gain, it actually affects the performance quite noticeabily. The driver managed queue is already disabled when an ALTQ queue discipline is set on interface (in altq_enable()), because the driver managed queue can interfere with ALTQ timing (whence the reports that setting an ALTQ queue discipline on interface also fixes the issue). Disabling this additional queue keeps the ability to use if_start() to send packets to individual NIC queues while it simply eliminate the race. This is a direct commit to stable/11 as -head driver does not support ALTQ anymore. PR: 213257 PR: 212413 Discussed with: sbruno Tested by: Konstantin Kormashev <konstantin@netgate.com> Obtained from: pfSense Sponsored by: Rubicon Communications, LLC (Netgate) Changes: _U stable/10/ stable/10/sys/dev/e1000/if_igb.c Committed back in 2017. ^Triage: assign to committer that resolved. |