Bug 264179 - em(4): 13.1-RELEASE hangs on boot at 82574L (em0, 0x10d3) with I219-V (em1, 0x1a1d ) enabled (Intel Alderlake GbE NIC)
Summary: em(4): 13.1-RELEASE hangs on boot at 82574L (em0, 0x10d3) with I219-V (em1, 0...
Status: Closed Not A Bug
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 13.1-RELEASE
Hardware: amd64 Any
: --- Affects Some People
Assignee: Yasuhiro Kimura
URL:
Keywords: IntelNetworking, needs-qa, regression
Depends on:
Blocks: 264030
  Show dependency treegraph
 
Reported: 2022-05-23 16:59 UTC by Yasuhiro Kimura
Modified: 2023-10-22 04:24 UTC (History)
9 users (show)

See Also:
koobs: maintainer-feedback? (freebsd)
koobs: maintainer-feedback? (kbowling)


Attachments
Output of dmesg(8) command (93.26 KB, text/plain)
2023-02-10 00:42 UTC, Yasuhiro Kimura
no flags Details
Output of dmesg(8) command with installer of 13.2-RC1 (12.21 KB, text/plain)
2023-03-04 07:31 UTC, Yasuhiro Kimura
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Yasuhiro Kimura freebsd_committer freebsd_triage 2022-05-23 16:59:52 UTC
2 months ago I updated my home server to Intel Alderlake Core i3 12100 and GIGABYTE H610I DDR4 (rev. 1.0) motherboard.

The latter has onboard Intel GbE NIC. But unfortunately 13.0-RELEASE doesn't detect it. So I inserted Intel PCI-E GbE adaptor to the PCI-E slot of the motherbord and used it as network interface of the server.

And now 13.1-RELEASE is released. I tried updating with `freebsd-update update -r 13.1-RELEASE`, `freebsd install` and `shutdown -r now`. But after that system hangs up in the middle of boot.

At first boot stops after onboard Intel GbE NIC is detected.

https://people.freebsd.org/~yasu/Alderlake-GbE-boot-hangup.01.jpg

It keeps about a minute and then boot process resumes. But soon it stops again.

https://people.freebsd.org/~yasu/Alderlake-GbE-boot-hangup.02.jpg

I waited about 20 minites in this state but boot never goes ahead.

Removing PCE-E GbE adopter doesn't change the situation.

I also tried boot image of 14.0-CURRENT 20220519 snapshot and boot hangs up just same as 13.1-RELEASE.
Comment 1 Kubilay Kocak freebsd_committer freebsd_triage 2022-05-24 00:53:37 UTC
- Can you confirm boot behaviour is OK on 12.1-RELEASE ?
- Is the system able to boot single user? (to obtain and attach pciconf -lv and usbconfig list output)
- Does boot verbose show anything additional?
- How does behaviour change without any USB devices connected?
- How does behaviour change after disabling USB controllers in BIOS/UEFI?
- How does behaviour change after disabling:
   - just onboard network controller
   - just pci network controller
   - both network controllers
Comment 2 Yasuhiro Kimura freebsd_committer freebsd_triage 2022-05-25 09:22:55 UTC
(In reply to Kubilay Kocak from comment #1)

> - Can you confirm boot behaviour is OK on 12.1-RELEASE ?

Do you mean 12.3-RELEASE? If so, same hangup happens with it.

> - Is the system able to boot single user? (to obtain and attach pciconf -lv and usbconfig list output)

No. Hangup happens before reaching single user shell.
But on 13.0-RELEASE `pciconf -lv` gets following result.

yasu@maybe[1002]% pciconf -lv                                                                                        ~
hostb0@pci0:0:0:0:      class=0x060000 rev=0x05 hdr=0x00 vendor=0x8086 device=0x4630 subvendor=0x1458 subdevice=0x5000
    vendor     = 'Intel Corporation'
    class      = bridge
    subclass   = HOST-PCI
pcib1@pci0:0:1:0:       class=0x060400 rev=0x05 hdr=0x01 vendor=0x8086 device=0x460d subvendor=0x1458 subdevice=0x5000
    vendor     = 'Intel Corporation'
    class      = bridge
    subclass   = PCI-PCI
vgapci0@pci0:0:2:0:     class=0x030000 rev=0x0c hdr=0x00 vendor=0x8086 device=0x4692 subvendor=0x1458 subdevice=0xd000
    vendor     = 'Intel Corporation'
    class      = display
    subclass   = VGA
none0@pci0:0:10:0:      class=0x118000 rev=0x01 hdr=0x00 vendor=0x8086 device=0x467d subvendor=0x0000 subdevice=0x0000
    vendor     = 'Intel Corporation'
    class      = dasp
xhci0@pci0:0:20:0:      class=0x0c0330 rev=0x11 hdr=0x00 vendor=0x8086 device=0x7ae0 subvendor=0x1458 subdevice=0x5007
    vendor     = 'Intel Corporation'
    class      = serial bus
    subclass   = USB
none1@pci0:0:20:2:      class=0x050000 rev=0x11 hdr=0x00 vendor=0x8086 device=0x7aa7 subvendor=0x0000 subdevice=0x0000
    vendor     = 'Intel Corporation'
    class      = memory
    subclass   = RAM
none2@pci0:0:21:0:      class=0x0c8000 rev=0x11 hdr=0x00 vendor=0x8086 device=0x7acc subvendor=0x0000 subdevice=0x0000
    vendor     = 'Intel Corporation'
    class      = serial bus
none3@pci0:0:21:1:      class=0x0c8000 rev=0x11 hdr=0x00 vendor=0x8086 device=0x7acd subvendor=0x0000 subdevice=0x0000
    vendor     = 'Intel Corporation'
    class      = serial bus
none4@pci0:0:21:2:      class=0x0c8000 rev=0x11 hdr=0x00 vendor=0x8086 device=0x7ace subvendor=0x0000 subdevice=0x0000
    vendor     = 'Intel Corporation'
    class      = serial bus
none5@pci0:0:21:3:      class=0x0c8000 rev=0x11 hdr=0x00 vendor=0x8086 device=0x7acf subvendor=0x0000 subdevice=0x0000
    vendor     = 'Intel Corporation'
    class      = serial bus
none6@pci0:0:22:0:      class=0x078000 rev=0x11 hdr=0x00 vendor=0x8086 device=0x7ae8 subvendor=0x1458 subdevice=0x1c3a
    vendor     = 'Intel Corporation'
    class      = simple comms
ahci0@pci0:0:23:0:      class=0x010601 rev=0x11 hdr=0x00 vendor=0x8086 device=0x7ae2 subvendor=0x1458 subdevice=0xb005
    vendor     = 'Intel Corporation'
    class      = mass storage
    subclass   = SATA
none7@pci0:0:25:0:      class=0x0c8000 rev=0x11 hdr=0x00 vendor=0x8086 device=0x7afc subvendor=0x0000 subdevice=0x0000
    vendor     = 'Intel Corporation'
    class      = serial bus
none8@pci0:0:25:1:      class=0x0c8000 rev=0x11 hdr=0x00 vendor=0x8086 device=0x7afd subvendor=0x0000 subdevice=0x0000
    vendor     = 'Intel Corporation'
    class      = serial bus
pcib2@pci0:0:28:0:      class=0x060400 rev=0x11 hdr=0x01 vendor=0x8086 device=0x7ab8 subvendor=0x0000 subdevice=0x0000
    vendor     = 'Intel Corporation'
    class      = bridge
    subclass   = PCI-PCI
pcib3@pci0:0:28:4:      class=0x060400 rev=0x11 hdr=0x01 vendor=0x8086 device=0x7abc subvendor=0x1458 subdevice=0x5001
    vendor     = 'Intel Corporation'
    class      = bridge
    subclass   = PCI-PCI
isab0@pci0:0:31:0:      class=0x060100 rev=0x11 hdr=0x00 vendor=0x8086 device=0x7a87 subvendor=0x1458 subdevice=0x5001
    vendor     = 'Intel Corporation'
    class      = bridge
    subclass   = PCI-ISA
hdac0@pci0:0:31:3:      class=0x040300 rev=0x11 hdr=0x00 vendor=0x8086 device=0x7ad0 subvendor=0x1458 subdevice=0xa194
    vendor     = 'Intel Corporation'
    class      = multimedia
    subclass   = HDA
none9@pci0:0:31:4:      class=0x0c0500 rev=0x11 hdr=0x00 vendor=0x8086 device=0x7aa3 subvendor=0x1458 subdevice=0x5001
    vendor     = 'Intel Corporation'
    class      = serial bus
    subclass   = SMBus
none10@pci0:0:31:5:     class=0x0c8000 rev=0x11 hdr=0x00 vendor=0x8086 device=0x7aa4 subvendor=0x0000 subdevice=0x0000
    vendor     = 'Intel Corporation'
    class      = serial bus
none11@pci0:0:31:6:     class=0x020000 rev=0x11 hdr=0x00 vendor=0x8086 device=0x1a1d subvendor=0x1458 subdevice=0xe000
    vendor     = 'Intel Corporation'
    device     = 'Ethernet Connection (17) I219-V'
    class      = network
    subclass   = ethernet
em0@pci0:1:0:0: class=0x020000 rev=0x00 hdr=0x00 vendor=0x8086 device=0x10d3 subvendor=0x8086 subdevice=0xa01f
    vendor     = 'Intel Corporation'
    device     = '82574L Gigabit Network Connection'
    class      = network
    subclass   = ethernet
nvme0@pci0:3:0:0:       class=0x010802 rev=0x00 hdr=0x00 vendor=0x15b7 device=0x5006 subvendor=0x15b7 subdevice=0x5006
    vendor     = 'Sandisk Corp'
    device     = 'WD Black SN750 / PC SN730 NVMe SSD'
    class      = mass storage
    subclass   = NVM

> - Does boot verbose show anything additional?

I tried it but I don't think there is anything that seems meaningfull.

> - How does behaviour change without any USB devices connected?

There is no change.

> - How does behaviour change after disabling USB controllers in BIOS/UEFI?

Unfortunately UEFI BIOS menu of my motherboard doesn't provide a way to disable USB controllers.

> - How does behaviour change after disabling:
>   - just onboard network controller
>   - just pci network controller
>   - both network controllers

Unfortunately UEFI BIOS menu of my motherboard doesn't provide a way to disable onboard NIC.
Removing PCI-E network adopter from PCI-E slot of motherboard doesn't change the result.
Comment 3 Yasuhiro Kimura freebsd_committer freebsd_triage 2022-05-25 09:25:03 UTC
According to the suggestion in freebsd-stable ML, I added 'hint.em.1.disabled=1' to /boot/loader.conf and then 13.1-RELEASE boots successfully.
Comment 4 Yasuhiro Kimura freebsd_committer freebsd_triage 2022-05-27 06:04:39 UTC
Now boot succeeds with 13.1-RELEASE and `pciconf -lv` gets following result.

yasu@maybe[1028]% pciconf -lv
hostb0@pci0:0:0:0:      class=0x060000 rev=0x05 hdr=0x00 vendor=0x8086 device=0x4630 subvendor=0x1458 subdevice=0x5000
    vendor     = 'Intel Corporation'
    class      = bridge
    subclass   = HOST-PCI
pcib1@pci0:0:1:0:       class=0x060400 rev=0x05 hdr=0x01 vendor=0x8086 device=0x460d subvendor=0x1458 subdevice=0x5000
    vendor     = 'Intel Corporation'
    class      = bridge
    subclass   = PCI-PCI
vgapci0@pci0:0:2:0:     class=0x030000 rev=0x0c hdr=0x00 vendor=0x8086 device=0x4692 subvendor=0x1458 subdevice=0xd000
    vendor     = 'Intel Corporation'
    class      = display
    subclass   = VGA
none0@pci0:0:10:0:      class=0x118000 rev=0x01 hdr=0x00 vendor=0x8086 device=0x467d subvendor=0x0000 subdevice=0x0000
    vendor     = 'Intel Corporation'
    class      = dasp
xhci0@pci0:0:20:0:      class=0x0c0330 rev=0x11 hdr=0x00 vendor=0x8086 device=0x7ae0 subvendor=0x1458 subdevice=0x5007
    vendor     = 'Intel Corporation'
    class      = serial bus
    subclass   = USB
none1@pci0:0:20:2:      class=0x050000 rev=0x11 hdr=0x00 vendor=0x8086 device=0x7aa7 subvendor=0x0000 subdevice=0x0000
    vendor     = 'Intel Corporation'
    class      = memory
    subclass   = RAM
ig4iic0@pci0:0:21:0:    class=0x0c8000 rev=0x11 hdr=0x00 vendor=0x8086 device=0x7acc subvendor=0x0000 subdevice=0x0000
    vendor     = 'Intel Corporation'
    class      = serial bus
ig4iic1@pci0:0:21:1:    class=0x0c8000 rev=0x11 hdr=0x00 vendor=0x8086 device=0x7acd subvendor=0x0000 subdevice=0x0000
    vendor     = 'Intel Corporation'
    class      = serial bus
ig4iic2@pci0:0:21:2:    class=0x0c8000 rev=0x11 hdr=0x00 vendor=0x8086 device=0x7ace subvendor=0x0000 subdevice=0x0000
    vendor     = 'Intel Corporation'
    class      = serial bus
ig4iic3@pci0:0:21:3:    class=0x0c8000 rev=0x11 hdr=0x00 vendor=0x8086 device=0x7acf subvendor=0x0000 subdevice=0x0000
    vendor     = 'Intel Corporation'
    class      = serial bus
none2@pci0:0:22:0:      class=0x078000 rev=0x11 hdr=0x00 vendor=0x8086 device=0x7ae8 subvendor=0x1458 subdevice=0x1c3a
    vendor     = 'Intel Corporation'
    class      = simple comms
ahci0@pci0:0:23:0:      class=0x010601 rev=0x11 hdr=0x00 vendor=0x8086 device=0x7ae2 subvendor=0x1458 subdevice=0xb005
    vendor     = 'Intel Corporation'
    class      = mass storage
    subclass   = SATA
ig4iic4@pci0:0:25:0:    class=0x0c8000 rev=0x11 hdr=0x00 vendor=0x8086 device=0x7afc subvendor=0x0000 subdevice=0x0000
    vendor     = 'Intel Corporation'
    class      = serial bus
ig4iic5@pci0:0:25:1:    class=0x0c8000 rev=0x11 hdr=0x00 vendor=0x8086 device=0x7afd subvendor=0x0000 subdevice=0x0000
    vendor     = 'Intel Corporation'
    class      = serial bus
pcib2@pci0:0:28:0:      class=0x060400 rev=0x11 hdr=0x01 vendor=0x8086 device=0x7ab8 subvendor=0x0000 subdevice=0x0000
    vendor     = 'Intel Corporation'
    class      = bridge
    subclass   = PCI-PCI
pcib3@pci0:0:28:4:      class=0x060400 rev=0x11 hdr=0x01 vendor=0x8086 device=0x7abc subvendor=0x1458 subdevice=0x5001
    vendor     = 'Intel Corporation'
    class      = bridge
    subclass   = PCI-PCI
isab0@pci0:0:31:0:      class=0x060100 rev=0x11 hdr=0x00 vendor=0x8086 device=0x7a87 subvendor=0x1458 subdevice=0x5001
    vendor     = 'Intel Corporation'
    class      = bridge
    subclass   = PCI-ISA
hdac0@pci0:0:31:3:      class=0x040300 rev=0x11 hdr=0x00 vendor=0x8086 device=0x7ad0 subvendor=0x1458 subdevice=0xa194
    vendor     = 'Intel Corporation'
    class      = multimedia
    subclass   = HDA
ichsmb0@pci0:0:31:4:    class=0x0c0500 rev=0x11 hdr=0x00 vendor=0x8086 device=0x7aa3 subvendor=0x1458 subdevice=0x5001
    vendor     = 'Intel Corporation'
    class      = serial bus
    subclass   = SMBus
none3@pci0:0:31:5:      class=0x0c8000 rev=0x11 hdr=0x00 vendor=0x8086 device=0x7aa4 subvendor=0x0000 subdevice=0x0000
    vendor     = 'Intel Corporation'
    class      = serial bus
em1@pci0:0:31:6:        class=0x020000 rev=0x11 hdr=0x00 vendor=0x8086 device=0x1a1d subvendor=0x1458 subdevice=0xe000
    vendor     = 'Intel Corporation'
    device     = 'Ethernet Connection (17) I219-V'
    class      = network
    subclass   = ethernet
em0@pci0:1:0:0: class=0x020000 rev=0x00 hdr=0x00 vendor=0x8086 device=0x10d3 subvendor=0x8086 subdevice=0xa01f
    vendor     = 'Intel Corporation'
    device     = '82574L Gigabit Network Connection'
    class      = network
    subclass   = ethernet
nvme0@pci0:3:0:0:       class=0x010802 rev=0x00 hdr=0x00 vendor=0x15b7 device=0x5006 subvendor=0x15b7 subdevice=0x5006
    vendor     = 'Sandisk Corp'
    device     = 'WD Black SN750 / PC SN730 NVMe SSD'
    class      = mass storage
    subclass   = NVM
Comment 5 Tomoaki AOKI 2022-06-01 09:23:33 UTC
Can Bug 253172 [1] be related with this?

 Bug 253172 - Intel e1000 - Interface Stalls After Media Type is Changed

With this bug, second if_em (em1) causes hang.
And with Bug 253172, second attempt with ifconfig up (different media type used) causes interface stall.

Just a thought, but if this affects even if the port is different (whichever port or media type is different on if_em driver causes same issue), and if the em1 is configured to use DHCP, possibly em1 wait for DHCP reply, which never comes into, and thus locks up.


Not reading the if_em (e1000) codes nor iflib codes, I could be mis-understanding, of course.


[1] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=253172
Comment 6 Yasuhiro Kimura freebsd_committer freebsd_triage 2023-02-09 17:56:49 UTC
Recently some commits that change files under sys/dev/e1000 are made to main branch. So I built install image from base c8f47b28827c and tried booting my home server with it. Then system boots without hang up but onboard Intel GbE NIC failed to be detected with following message.

----------------------------------------------------------------------
em1: <Intel(R) I219-V ADL(17)> mem 0x42300000-0x4231ffff at device 31.6 on pci0
em1: Setup of Shared code failed. error -1
em1: IFDI_ATTACH_PRE failed 6
device_attach: em1 attach returned 6
----------------------------------------------------------------------
Comment 7 Eric Joyner freebsd_committer freebsd_triage 2023-02-09 19:20:17 UTC
By itself, the error code seems to indicate an issue with the NVM. Can you recompile the driver with "#define DBG" in e1000_osdep.h changed from 0 to 1 and try loading it again so that we can see additional error messages?
Comment 8 Yasuhiro Kimura freebsd_committer freebsd_triage 2023-02-10 00:42:15 UTC
Created attachment 240039 [details]
Output of dmesg(8) command

(In reply to Eric Joyner from comment #7)

I modified e1000_osdep.h, built install image from modified src, booted with it and got attached file as output of dmesg(8) command.
Comment 9 Eric Joyner freebsd_committer freebsd_triage 2023-02-10 01:03:09 UTC
Comment on attachment 240039 [details]
Output of dmesg(8) command

I see this now:

PHY Initialization Error
em1: Setup of Shared code failed, error -2

Which is different from the -1 code you posted before. Did the error code change between your last comment and the dmesg output you posted here?
Comment 10 Eric Joyner freebsd_committer freebsd_triage 2023-02-10 01:04:22 UTC
Was your home server still using 13.1-RELEASE before you tried installing an image based on CURRENT?
Comment 11 Yasuhiro Kimura freebsd_committer freebsd_triage 2023-02-10 01:23:45 UTC
(In reply to Eric Joyner from comment #9)

Sorry, it was typo. Following is screenshot of first time.

https://people.freebsd.org/~yasu/Alderlake-GbE-failure-to-detect.jpg

As you can see, error code is not -1 but -2.

(In reply to Eric Joyner from comment #10)

I don't install -CURRNET on my home server. I just boot with installer image of -CURRENT, enter shell mode and execute dmesg in it.
Comment 12 Yasuhiro Kimura freebsd_committer freebsd_triage 2023-03-04 07:31:11 UTC
Created attachment 240579 [details]
Output of dmesg(8) command with installer of 13.2-RC1

I tried boot with installer of 13.2-RC1 and get same error. Just FYI.
Comment 13 Piotr Kubaj freebsd_committer freebsd_triage 2023-05-05 15:30:52 UTC
Can you verify whether that port (em1) works with some other OS, e.g. Linux or OpenBSD?
Comment 14 Yasuhiro Kimura freebsd_committer freebsd_triage 2023-05-11 13:52:23 UTC
(In reply to Piotr Kubaj from comment #13)

I tried latest Debian amd64 testing live install image that uses Linux 6.1.25 as kernel, and unfortunately it doesn't detect onboad NIC(em1).
Comment 15 Piotr Kubaj freebsd_committer freebsd_triage 2023-05-11 13:57:27 UTC
(In reply to Yasuhiro Kimura from comment #14)
OK, Linux 6.1 should definitely support Alder Lake. Is it possible there's a hardware damage? Currently it looks like there's no operating system that works with your NIC.
Comment 16 Xin LI freebsd_committer freebsd_triage 2023-09-12 16:33:13 UTC
Requesting for feedback -- was the issue solved, or is this something we still want to chase?
Comment 17 Yasuhiro Kimura freebsd_committer freebsd_triage 2023-09-18 00:41:53 UTC
(In reply to Xin LI from comment #16)

Yesterday I found newer version of BIOS for the motherboard in question is released. I updated the BIOS of my home server to the latest one and after that onboard NIC is successfully detected as following.

yasu@eastasia[1008]% ifconfig -a
em0: flags=8822<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=481249b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,LRO,WOL_MAGIC,VLAN_HWFILTER,NOMAP>
        ether xx:xx:xx:xx:xx:xx
        media: Ethernet autoselect
        status: no carrier
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
em1: flags=8863<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=481049b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,LRO,VLAN_HWFILTER,NOMAP>
        ether xx:xx:xx:xx:xx:xx
        inet 192.168.0.1 netmask 0xffffff00 broadcast 192.168.0.255
        inet6 fe80::xxxx:xxxx:xxxx:xxxx%em1 prefixlen 64 scopeid 0x2
        inet6 xxxx:xx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx prefixlen 64
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
        options=680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
        inet6 ::1 prefixlen 128
        inet6 fe80::1%lo0 prefixlen 64 scopeid 0x3
        inet 127.0.0.1 netmask 0xff000000
        groups: lo
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
yasu@eastasia[1008]% 

Onboard NIC is detected as em1 and the one inserted to PCIe slot is em0. So the source of the problem seems to be hardware (or firmware).

I changed /etc/rc.conf so em1 is used and is watching if em1 works without any problem. I'll keep watching for a while and close this bug report if no problem happens.
Comment 18 Kevin Bowling freebsd_committer freebsd_triage 2023-09-18 07:05:44 UTC
Seems like a corrupt or broken eeprom image was in the older BIOS.  If you don't have an example of it working with another OS there is nothing actionable to do in FreeBSD.
Comment 19 Yasuhiro Kimura freebsd_committer freebsd_triage 2023-10-22 04:24:43 UTC
(In reply to Yasuhiro Kimura from comment #17)

After the BIOS update onboard NIS have worked fine for about a month. So close this bug report.