Bug 230807 - if_alc(4): Driver not working for Killer Networking E2200
Summary: if_alc(4): Driver not working for Killer Networking E2200
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: amd64 Any
: --- Affects Many People
Assignee: freebsd-net (Nobody)
URL: https://forums.freebsd.org/threads/se...
Keywords: needs-qa
Depends on:
Blocks:
 
Reported: 2018-08-21 18:42 UTC by Daniel McKee
Modified: 2023-12-28 11:13 UTC (History)
14 users (show)

See Also:
koobs: mfc-stable13?
koobs: mfc-stable12?


Attachments
WIP patch (do not use) (2.89 KB, patch)
2023-12-28 00:03 UTC, Lexi Winter
no flags Details | Diff
working patch (1.99 KB, patch)
2023-12-28 02:35 UTC, Lexi Winter
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Daniel McKee 2018-08-21 18:42:22 UTC
Driver loads, but will not pick up DHCP nor will a static IP address yield any IP traffic -- Cannot even ping the gateway or any other device on the local network.

This driver worked previously.
Comment 1 Eugene Grosbein freebsd_committer freebsd_triage 2018-08-22 15:32:40 UTC
(In reply to Daniel McKee from comment #0)

The driver worked previously - with same hardware? In what OS version/revision did it work? Please show lines concerning alc hardware from dmesg output and pciconf -lvv

Do you have any errors in dmesg output? Does "tcpdump -npi alc0" shows any incoming packets when you do your tests?
Comment 2 Mark Linimon freebsd_committer freebsd_triage 2018-08-29 19:19:45 UTC
Please show us the exact revision you are running.  Thanks.
Comment 3 Daniel McKee 2018-09-14 19:26:53 UTC
I am mistaken.  It was not working previously.  However I did find this post as to a possible (?) fix;

https://forums.freebsd.org/threads/setting-up-ethernet-device-atheros-ar8161.60635/

Specifically;

 /*
        * Force maximum payload size to 128 bytes for
        * E2200/E2400/E2500.
        * Otherwise it triggers DMA write error.
        */
       if ((sc->alc_flags & ALC_FLAG_E2X00) != 0)
           sc->alc_dma_wr_burst = 0;
Comment 4 Daniel McKee 2018-09-14 19:28:01 UTC
EDIT:

By adding the same last two lines (with modified if condition) and rebuilding the whole system, it works, but I don't know if it exactly works fine or not.
Comment 5 Mark Millard 2019-11-25 02:26:04 UTC
A ThreadRipper 1950X X399 AORUS gaming 7 I use has an E2500:

alc0: <Killer E2500 Gigabit Ethernet> port 0x1000-0x107f mem 0xba000000-0xba03ffff irq 27 at device 0.0 numa-domain 0 on pci5
alc0: 11776 Tx FIFO, 12032 Rx FIFO
alc0: Using 1 MSIX message(s).
alc0: 4GB boundary crossed, switching to 32bit DMA addressing mode.
miibus0: <MII bus> numa-domain 0 on alc0
atphy0: <Atheros F1 10/100/1000 PHY> PHY 0 on miibus0
atphy0:  none, 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow
alc0: Using defaults for TSO: 65518/35/2048
alc0: Ethernet address: . . .

(Note the numa domain use, in case numa matters for some reason.)

While Fedora (such as 31 now but for a long time) and Windows 10 Pro x64
(1903 now but for a long time) have had no problems using this EtherNet
interface when native-booted from their drives, I've historically either
booted the FreeBSD drive under HyperV so Windows would deal with the
EtherNet or used the poorly working WiFi for native FreeBSD.

This was not limited to any specific versions of FreeBSD but has been
true over some time. I recently jumped from head -r352341 to -r355027 and
the behavior did not change.

My experiments with things like "tcpdump -npi alc0" did not show packets
coming in.

I have not tried making any code changes.
Comment 6 Mark Millard 2020-01-08 03:56:13 UTC
(In reply to Mark Millard from comment #5)

While I was not looking for such at the time,
I noticed somewhat after switching to non-NUMA
on the ThreadRipper that the E2500 had started
working.

I'm not aware of any other configuration change
that would be likely to have contributed.

It has been working ever since I switched to
non-NUMA. I waited to see if it would stay
operational for a time (weeks) before making
this comment.
Comment 7 Kubilay Kocak freebsd_committer freebsd_triage 2020-01-25 03:27:30 UTC
(In reply to Mark Millard from comment #6)

Can you try the patch mentioned in comment 3?

--
^Triage: Not a regression (comment 3)
Comment 8 Mark Millard 2020-01-25 08:36:11 UTC
(In reply to Kubilay Kocak from comment #7)

It is working without the patch when the threadripper 1950X ACPI is
configured as non-NUMA (via what the old Ryzen Master version calls
"distributed" mode).

I am no longer actively experimenting with NUMA mode for other
purposes, at least for now. So if I just applied the patch and
rebuilt and reinstalled, I'd just be testing if it stopped the
working context from working. That presumes that I understood
what the patch is to be, but I do not.

The way I read the forum message, the shown lines of code were
already in /usr/src/sys/dev/alc/if_alc.c and a somewhat different
pair of lines was to be put somewhere ("with modified if condition").
I've no clue what the modification was nor where the two new lines
were to be put. (Knowing the later might help identify the former
by giving a context to examine.)

Is the patch expected to be NUMA-specific as far as fixing things?
Are you asking that I try the patch when the ACPI information is
configured for NUMA (called "local" mode in that Ryzen Master)?

If yes-and-yes, then I'd first need to somehow identify the
specific patch to be made, unless someone that knows reports
the specifics that I should use. I have yet to figure this
out on my own.
Comment 9 Josh K 2022-01-31 19:18:45 UTC
Tested alc driver with the Killer E2200 ethernet with:
- FreeBSD 14-CURRENT: Does not work.
- DragonflyBSD 6.2.1: Works
- NetBSD 9.2: Works

Process:
# ifconfig alc0 up

# dhclient alc0
OR
# dhcpcd alc0

I have not tried OpenBSD.

If it works in other BSDs then there is something that is different in the FreeBSD driver or kernel that is making the device not work with the driver.
Comment 10 Bjoern A. Zeeb freebsd_committer freebsd_triage 2022-02-18 14:43:17 UTC
(In reply to Daniel McKee from comment #3)

--- sys/dev/alc/if_alc.c
+++ sys/dev/alc/if_alc.c
@@ -1422,6 +1422,8 @@ alc_attach(device_t dev)
                        sc->alc_flags |= ALC_FLAG_LINK_WAR;
                /* FALLTHROUGH */
        case DEVICEID_ATHEROS_AR8171:
+               if (sc->alc_ident->deviceid == DEVICEID_ATHEROS_AR8171)
+                       sc->alc_flags |= ALC_FLAG_E2X00;
                sc->alc_flags |= ALC_FLAG_AR816X_FAMILY;
                break;
        case DEVICEID_ATHEROS_AR8162:


Will probably do that trick the Forum post indicates.  I only have an AR8161.

If anyone is still interested in this for the 8171, please post the relevant boot -v (boot_verbose) output of your NIC.
Comment 11 Bjoern A. Zeeb freebsd_committer freebsd_triage 2022-02-18 14:48:16 UTC
(In reply to Bjoern A. Zeeb from comment #10)

Sorry I picked the wrong PR to follow-up but it's kind-of in the right place anyway ;-)
Comment 12 Ross McKelvie 2022-04-15 09:36:00 UTC
I am experiencing an intermittent bug with the alc(4) driver with the Killer E2200 Gigabit Ethernet NIC on FreeBSD-12.3-RELEASE-p5 for amd64 that may be related.  Network connectivity drops, the CPU load increases rapidly, the same set of messages are logged repeatedly and the system cannot be restarted or powered down using shutdown(8). The behaviour may occur shortly after a cold start of the system but I need to observe more instances to confirm this.

uname -mv:
FreeBSD 12.3-RELEASE-p5 GENERIC  amd64

pciconf -lv alc0:
alc0@pci0:8:0:0:        class=0x020000 card=0x05ad1028 chip=0xe0911969 rev=0x10 hdr=0x00
    vendor     = 'Qualcomm Atheros'
    device     = 'Killer E220x Gigabit Ethernet Controller'
    class      = network
    subclass   = ethernet

dmesg | grep alc (when working correctly):
alc0: <Killer E2200 Gigabit Ethernet> port 0x3000-0x307f mem 0xd2400000-0xd243ffff at device 0.0 on pci2
alc0: 11776 Tx FIFO, 12032 Rx FIFO
alc0: Using 1 MSIX message(s).
miibus0: <MII bus> on alc0
alc0: Using defaults for TSO: 65518/35/2048
alc0: Ethernet address: f0:1f:af:30:ae:eb
alc0: link state changed to DOWN
alc0: link state changed to UP

Messages in /var/log/messages from today's occurrence:
Apr 15 09:51:11 redacted-hostname kernel: alc0: <Killer E2200 Gigabit Ethernet> port 0x3000-0x307f mem 0xd2400000-0xd243ffff at device 0.0 on pci2                   [6/1850]
Apr 15 09:51:11 redacted-hostname kernel: alc0: 11776 Tx FIFO, 12032 Rx FIFO
Apr 15 09:51:11 redacted-hostname kernel: alc0: Using 1 MSIX message(s).
Apr 15 09:51:11 redacted-hostname kernel: miibus0: <MII bus> on alc0
Apr 15 09:51:11 redacted-hostname kernel: alc0: Using defaults for TSO: 65518/35/2048
Apr 15 09:51:11 redacted-hostname kernel: alc0: Ethernet address: f0:1f:af:30:ae:eb
Apr 15 09:51:11 redacted-hostname kernel: alc0: link state changed to DOWN
Apr 15 09:51:11 redacted-hostname kernel: alc0: link state changed to UP
Apr 15 09:51:30 redacted-hostname kernel: alc0: phy read timeout : 1
Apr 15 09:51:30 redacted-hostname kernel: alc0: phy read timeout : 0
Apr 15 09:51:30 redacted-hostname kernel: alc0: phy read timeout : 17
Apr 15 09:51:30 redacted-hostname kernel: alc0: could not disable RxQ/TxQ (0xffffffff)!
Apr 15 09:51:30 redacted-hostname kernel: alc0: could not disable Rx/Tx MAC(0xffffffff)!
Apr 15 09:51:30 redacted-hostname kernel: alc0: link state changed to DOWN
Apr 15 09:51:30 redacted-hostname kernel: alc0: phy read timeout : 1
Apr 15 09:51:30 redacted-hostname kernel: alc0: phy read timeout : 0
Apr 15 09:51:30 redacted-hostname kernel: alc0: phy read timeout : 17
Apr 15 09:51:30 redacted-hostname kernel: alc0: DMA read error! -- resetting
Apr 15 09:51:30 redacted-hostname kernel: alc0: DMA write error! -- resetting
Apr 15 09:51:30 redacted-hostname kernel: alc0: TxQ reset! -- resetting
Apr 15 09:51:31 redacted-hostname kernel: alc0: could not disable RxQ/TxQ (0xffffffff)!
Apr 15 09:51:31 redacted-hostname kernel: alc0: could not disable Rx/Tx MAC(0xffffffff)!
Apr 15 09:51:31 redacted-hostname kernel: alc0: MAC reset timeout!
Apr 15 09:51:31 redacted-hostname kernel: alc0: master reset timeout!
Apr 15 09:51:31 redacted-hostname kernel: alc0: reset timeout(0xffffffff)!
Apr 15 09:51:31 redacted-hostname kernel: alc0: phy write timeout : 29
Apr 15 09:51:31 redacted-hostname kernel: alc0: phy write timeout : 30
Apr 15 09:51:31 redacted-hostname kernel: alc0: phy read timeout : 16
Apr 15 09:51:31 redacted-hostname kernel: alc0: phy write timeout : 16
Apr 15 09:51:31 redacted-hostname kernel: alc0: phy write timeout : 4
Apr 15 09:51:31 redacted-hostname kernel: alc0: phy write timeout : 9
Apr 15 09:51:31 redacted-hostname kernel: alc0: phy write timeout : 0
Apr 15 09:51:31 redacted-hostname kernel: alc0: phy read timeout : 0
Apr 15 09:51:31 redacted-hostname kernel: alc0: phy write timeout : 4
Apr 15 09:51:31 redacted-hostname kernel: alc0: phy write timeout : 9
Apr 15 09:51:31 redacted-hostname kernel: alc0: phy write timeout : 0
Apr 15 09:51:31 redacted-hostname kernel: alc0: phy read timeout : 1
Apr 15 09:51:31 redacted-hostname kernel: alc0: phy read timeout : 0
Apr 15 09:51:32 redacted-hostname kernel: alc0: phy read timeout : 17
Apr 15 09:51:32 redacted-hostname kernel: alc0: could not disable RxQ/TxQ (0xffffffff)!
Apr 15 09:51:32 redacted-hostname kernel: alc0: could not disable Rx/Tx MAC(0xffffffff)!
Apr 15 09:51:32 redacted-hostname kernel: alc0: DMA read error! -- resetting
Apr 15 09:51:32 redacted-hostname kernel: alc0: DMA write error! -- resetting
Apr 15 09:51:32 redacted-hostname kernel: alc0: TxQ reset! -- resetting
Apr 15 09:51:32 redacted-hostname kernel: alc0: could not disable RxQ/TxQ (0xffffffff)!
Apr 15 09:51:32 redacted-hostname kernel: alc0: could not disable Rx/Tx MAC(0xffffffff)!
Apr 15 09:51:32 redacted-hostname kernel: alc0: MAC reset timeout!
Apr 15 09:51:32 redacted-hostname kernel: alc0: master reset timeout!
Apr 15 09:51:32 redacted-hostname kernel: alc0: reset timeout(0xffffffff)!
Apr 15 09:51:32 redacted-hostname kernel: alc0: phy write timeout : 29
[...continues until removing power...]
Comment 13 Kubilay Kocak freebsd_committer freebsd_triage 2022-08-10 00:09:39 UTC
@Bjoern Did your patch in comment 10 resolve the issue on your AR8161? If so, are you able to land the patch for that specific model until we can get confirmation it resolves the issue for other models?
Comment 14 Ed Maste freebsd_committer freebsd_triage 2022-08-10 00:26:11 UTC
OpenBSD's version of the driver has a slightly different approach for the same thing, but it all involves the code around the comment

                /*
                 * Force maximum payload size to 128 bytes for
                 * E2200/E2400/E2500/AR8162/AR8171/AR8172.
                 * Otherwise it triggers DMA write error.
                 */

OpenBSD changed the if condition to

                if ((sc->alc_flags &
                    (ALC_FLAG_E2X00 | ALC_FLAG_AR816X_FAMILY)) != 0)
                        sc->alc_dma_wr_burst = 0;

In any case, if any version of the patch is confirmed to "fix" the issue we can get something committed.
Comment 15 Eugene Grosbein freebsd_committer freebsd_triage 2022-10-22 14:22:14 UTC
Any updates on the topic? There are reports that alc(4) is not stable for Killer E2200 gigabit ethernet controller in 13.1
Comment 16 Cheburashka 2022-10-22 17:44:53 UTC
My experience. Tested on FreeBSD 13.0, 13.1.
Killer Networking E2200 is working perfect if u are using FreeBSD + Windows or only FreeBSD. No any problems with internet connection. 
BUT!If you switch from linux to FreeBSD happening strange things. Internet will not work. For fix it need completely power off your pc and then it will work.
I don't know that linux do with ethernet adapter, but this is my case.
Comment 17 Mark Millard 2022-10-22 19:11:08 UTC
(In reply to Cheburashka from comment #16)

I've seen similar OS context-switching behavior with the
ThreadRipper 1950X X399 AORUS gaming 7 I use that has an
E2500. Complete power off-then-on is the only solution
that I've found.

This is not new. It has been a long time since I've booted
Linux on this box. So this is more of a note about
historical examples. (I still boot both FreeBSD and Windows
10 Pro.)

The major point being that E2500 and E2200 both have the
type of issue of some form of incomplete reset allowing
prior Linux configuration to mess up E2?00 things for
FreeBSD.
Comment 18 Laurent Cimon 2022-12-10 18:52:10 UTC
I have an E2200 that is not working, both on 13.1-RELEASE and on 14-CURRENT. It works fine on OpenBSD, Linux, Windows. On FreeBSD the router doesn't respond to DHCPDISCOVER and, when trying to set my IP manually, it can't ping the router, saying "sendto: Host is down".
Comment 19 Laurent Cimon 2022-12-10 19:10:18 UTC
Disabling MSIX fixed the issue for me!
Comment 20 Cheburashka 2022-12-27 17:01:35 UTC
(In reply to Laurent Cimon from comment #19)
Thank you ! just added line hw.alc.msix_disable=1 in /boot/loader.conf and no problems now.
Comment 21 Josh K 2022-12-27 21:58:25 UTC
Can confirm on 13.1-RELEASE with a Killer E2200 that disabling MSIX by adding hw.alc.msix_disable=1 to /boot/loader.conf makes it work!
Comment 22 Lexi Winter 2023-12-28 00:03:44 UTC
Created attachment 247305 [details]
WIP patch (do not use)

i've been playing around with this a bit and it seems like for MSI-X to work, the driver needs a mapping table set up to tell it which msix tx/rx queues are enabled.

this patch makes the card work, except there's an odd 1-second delay on all packets:

64 bytes from 10.1.4.1: icmp_seq=0 ttl=64 time=1000.038 ms
64 bytes from 10.1.4.1: icmp_seq=1 ttl=64 time=1000.037 ms
64 bytes from 10.1.4.1: icmp_seq=2 ttl=64 time=1000.035 ms
64 bytes from 10.1.4.1: icmp_seq=3 ttl=64 time=1000.037 ms

this patch only enables RXQ0; the Linux driver also enables TXQ0, but that prevents msix from working at all for me; either i'm doing something wrong or there's another step i missed.

alc0@pci0:33:0:0:	class=0x020000 rev=0x10 hdr=0x00 vendor=0x1969 device=0xe0b1 subvendor=0x1462 subdevice=0x7b77
    vendor     = 'Qualcomm Atheros'
    device     = 'Killer E2500 Gigabit Ethernet Controller'
    class      = network
    subclass   = ethernet
Comment 23 Lexi Winter 2023-12-28 02:35:09 UTC
Created attachment 247306 [details]
working patch

this patch fixes MSI-X for me, but i've only done a small amount of testing with it, and i don't have any other AR81xx cards to test with.
Comment 24 Kevin Lo freebsd_committer freebsd_triage 2023-12-28 03:16:01 UTC
(In reply to Lexi Winter from comment #23)

The problem has been fixed:
https://cgit.freebsd.org/src/commit/sys/dev/alc/if_alc.c?id=8cdb6b2dd78793628d7c36198598c85741e44119
Comment 25 Lexi Winter 2023-12-28 11:13:02 UTC
(In reply to Kevin Lo from comment #24)

this patch is for the MSI-X issue that some people reported here (which is still present in stable/14), but i'll create a separate bug for this since it's unrelated to the original problem.