Created attachment 227954 [details] Successful boot sequence https://cgit.freebsd.org/src/commit/sys/dev/e1000?h=stable/13&id=1fb96c59b4ce265ea94eddef5a97c7c075ceaec5 On my Shuttle DS77U Bhyve host, the above mentioned 1fb96c59b4ce265ea94eddef5a97c7c075ceaec5 commit causes the system to freeze during boot. Just backing out the commit solves. The detail is as follows. * Yesterday's 13-STABLE (8895170347fcfd9c9acf413ed408f11b15760b4b) * uname -a FreeBSD fwina.tesla.local 13.0-STABLE FreeBSD 13.0-STABLE #0: Thu Sep 16 18:42:17 JST 2021 root@fwina.tesla.local:/usr/obj/usr/src/amd64.amd64/sys/FWINA amd64 * Part of pciconf -lv: em0@pci0:0:31:6: class=0x020000 rev=0x21 hdr=0x00 vendor=0x8086 device=0x156f subvendor=0x8086 subdevice=0x0000 vendor = 'Intel Corporation' device = 'Ethernet Connection I219-LM' class = network subclass = ethernet igb0@pci0:1:0:0: class=0x020000 rev=0x03 hdr=0x00 vendor=0x8086 device=0x1539 subvendor=0x1297 subdevice=0x4052 vendor = 'Intel Corporation' device = 'I211 Gigabit Network Connection' class = network subclass = ethernet * Live LAN connection hooked up only to igb0. em0 has no connection during test. * bridge is fabricated because this host also serves as a traffic filter. autobridge_interfaces="bridge0" autobridge_bridge0="igb0 em0" cloned_interfaces="bridge0" ifconfig_bridge0="up" ifconfig_em0="up" ifconfig_igb0="inet 10.141.30.22 netmask 255.255.255.0 up" * Various modules specified in $devmatch_blacklist so no kld is loaded during test. When the mentioned commit backed out, the system boots without any glitch. When the commit being applied, the system freezes at adding net default: gateway, maybe trying to link up the bridge0 and its member interfaces. The attached ok.jpg shows the successful activation of bridge0 while the ng.jpg is the capture of the hard lock-up. Power-cycle, load the previous kernel and fsck is the only available option to recover.
Created attachment 227955 [details] Hard lock-up
(In reply to t_uemura from comment #1) Thanks, can you try backing out fc7682b17f3738573099b8b03f5628dcc8148adb instead? I think this is your I219 not initializing.
(In reply to Kevin Bowling from comment #2) Same hard lock-up happened when only fc7682b17f3738573099b8b03f5628dcc8148adb backed out. The culprit must be 1fb96c59b4ce265ea94eddef5a97c7c075ceaec5 . Thanks.
(In reply to Kevin Bowling from comment #2) JFYI, by moving the LAN connection from igb0 to em0, em0: Hardware Initialization Failed warnings were disappeared. Also note that Could not read PHY page 769 warning was shown at the very late of the shutdown sequence when em0 has no connection.
Created attachment 227963 [details] Partial revert msi-x, unconditional re-arms Can you try this patch? It re-enables the link interrupts unconditionally in the fast handler instead of waiting to do it in the interrupt filter. I am not sure how this would cause your lockup but there is little downside to reverting if this stabilizes it for you. One thing I am worried about in fc7682b17f3738573099b8b03f5628dcc8148adb is it can brick I219s and their interaction with the Management Engine. Can you try reverting that and doing a hard power cycle too? I have the changes split out into individual patches here: https://github.com/freebsd/freebsd-src/pull/538/commits it would be helpful if you can test.
(In reply to Kevin Bowling from comment #5) Sorry for my delay. Only the attached partial revert msi-x patch applied, the system sometime booted successfully and sometime not. I'd attempted to make kernel several times and rebooted/did powercycle nearly 20 times but I couldn't make sure the cause of this strangeness. When backed out 1fb96c59b4ce265ea94eddef5a97c7c075ceaec5, the system always worked as expected regardless of a) live LAN connection on em0 or igb0, and b) reboot or full powercycle. Furthermore, backed out both 1fb96c59b4ce265ea94eddef5a97c7c075ceaec5 and fc7682b17f3738573099b8b03f5628dcc8148adb, the system still worked as expected, same as the above. The cause of my hard lock-up might be in 1fb96c59b4ce265ea94eddef5a97c7c075ceaec5 but not in the partial revert msi-x patch. At least, fc7682b17f3738573099b8b03f5628dcc8148adb seemed to be harmless.
(In reply to t_uemura from comment #6) Thanks, can you try this instead https://reviews.freebsd.org/D32087
(In reply to Kevin Bowling from comment #7) Tried two installkernel, two reboots and four full powercycle, and two successful boot from full powercycle. Other four tries were to lock-ups.
(In reply to t_uemura from comment #8) I've updated https://reviews.freebsd.org/D32087, it should be equivalent to a revert for these cards.
Are you able to test the diff https://reviews.freebsd.org/D32087? It seems straight forward but I would like confirmation it fixes the issue before committing since this bug is intermittent.
(In reply to Kevin Bowling from comment #10) The patch applied kernel worked flawlessly regardless of LAN connection and boor/powercycle. This completely fixes the issue. Very appreciated. Thanks.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=450c3f8b3d259c7eb82488319aff45f1f6554aaf commit 450c3f8b3d259c7eb82488319aff45f1f6554aaf Author: Kevin Bowling <kbowling@FreeBSD.org> AuthorDate: 2021-09-27 16:17:48 +0000 Commit: Kevin Bowling <kbowling@FreeBSD.org> CommitDate: 2021-09-27 16:25:58 +0000 e1000: Re-arm link changes A change to MSI-X link handler was somehow causing issues on MSI-based em(4) NICs. Revert the change based on user reports and testing. PR: 258551 Reported by: Franco Fichtner <franco@opnsense.org>, t_uemura@macome.co.jp Reviewed by: markj, Franco Fichtner <franco@opnsense.org> Tested by: t_uemura@macome.co.jp MFC after: 1 day sys/dev/e1000/if_em.c | 22 ++++++---------------- 1 file changed, 6 insertions(+), 16 deletions(-)
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=594a25fa43049c336d6016002538cad7a5383284 commit 594a25fa43049c336d6016002538cad7a5383284 Author: Kevin Bowling <kbowling@FreeBSD.org> AuthorDate: 2021-09-27 16:17:48 +0000 Commit: Kevin Bowling <kbowling@FreeBSD.org> CommitDate: 2021-09-28 16:55:59 +0000 e1000: Re-arm link changes A change to MSI-X link handler was somehow causing issues on MSI-based em(4) NICs. Revert the change based on user reports and testing. PR: 258551 Reported by: Franco Fichtner <franco@opnsense.org>, t_uemura@macome.co.jp Reviewed by: markj, Franco Fichtner <franco@opnsense.org> Tested by: t_uemura@macome.co.jp MFC after: 1 day (cherry picked from commit 450c3f8b3d259c7eb82488319aff45f1f6554aaf) sys/dev/e1000/if_em.c | 22 ++++++---------------- 1 file changed, 6 insertions(+), 16 deletions(-)
A commit in branch stable/12 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=a6640bca4827036ad9374696381513da7d9df0f9 commit a6640bca4827036ad9374696381513da7d9df0f9 Author: Kevin Bowling <kbowling@FreeBSD.org> AuthorDate: 2021-09-27 16:17:48 +0000 Commit: Kevin Bowling <kbowling@FreeBSD.org> CommitDate: 2021-09-28 17:28:54 +0000 e1000: Re-arm link changes A change to MSI-X link handler was somehow causing issues on MSI-based em(4) NICs. Revert the change based on user reports and testing. PR: 258551 Reported by: Franco Fichtner <franco@opnsense.org>, t_uemura@macome.co.jp Reviewed by: markj, Franco Fichtner <franco@opnsense.org> Tested by: t_uemura@macome.co.jp MFC after: 1 day (cherry picked from commit 450c3f8b3d259c7eb82488319aff45f1f6554aaf) sys/dev/e1000/if_em.c | 22 ++++++---------------- 1 file changed, 6 insertions(+), 16 deletions(-)
(In reply to t_uemura from comment #11) Thank you very much for your report and testing!