Bug 275882 - net/realtek-re-kmod: hangs after update to 199.00
Summary: net/realtek-re-kmod: hangs after update to 199.00
Status: New
Alias: None
Product: Ports & Packages
Classification: Unclassified
Component: Individual Port(s) (show other bugs)
Version: Latest
Hardware: Any Any
: --- Affects Some People
Assignee: Alex Dupre
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-12-22 10:30 UTC by Tino Engel
Modified: 2024-04-05 14:41 UTC (History)
12 users (show)

See Also:
bugzilla: maintainer-feedback? (ale)


Attachments
kdump of 'ktrace curl google.de' (167.33 KB, text/plain)
2023-12-26 08:52 UTC, Tino Engel
no flags Details
kdump of 'ktrace -i curl www.freebsd.org' (148.53 KB, text/plain)
2024-01-04 10:07 UTC, Tino Engel
no flags Details
fruss -f curl www.freebsd.org (94.79 KB, text/plain)
2024-01-05 18:56 UTC, Tino Engel
no flags Details
0001-net-realtek-re-kmod-downgrade-to-198.00.patch (3.32 KB, patch)
2024-03-13 01:29 UTC, Koichiro Iwao
no flags Details | Diff
0001-net-realrek-re-kmod198-add-port-for-198-version.patch (4.44 KB, patch)
2024-03-13 09:50 UTC, Koichiro Iwao
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Tino Engel 2023-12-22 10:30:15 UTC
On FreeBSD 14.0-RELEASE-p4 amd64 with Killer E3000 I have the following problem:
The driver hangs after update from 198.00 to 199.00. DHCP looks good, but any process that tries to access network simply hangs.
After rollback to 198.00 it works again.
Comment 1 Alexander Vereeken 2023-12-23 00:08:04 UTC
Hello,

i have some sort of that problem aswell.

When i do get to the steam login dialog in Wine then the whole connection is stuck, like in Tino`s case DHCP etc.. looks fine.

Reverting it to 198.00 helps for me aswell.
Comment 2 Martin Birgmeier 2023-12-25 12:48:05 UTC
Same issue here... ASUS Prime X670-P WiFi motherboard, after upgrade to 1.99 a few ping packets can be exchanged initially after setting the IP address, then nothing.
Comment 3 Alex Dupre freebsd_committer freebsd_triage 2023-12-25 16:34:16 UTC
Tino, are you able to debug the issue?

This new version fixed some severe issues for some cards, and introduced new severe issues for others :-(
Comment 4 Alex Dupre freebsd_committer freebsd_triage 2023-12-26 08:37:56 UTC
Is LRO enabled after the update to 1.99 on your card? Can you try disabling it and check if the problem persists?
Comment 5 Tino Engel 2023-12-26 08:52:51 UTC
Created attachment 247259 [details]
kdump of 'ktrace curl google.de'
Comment 6 Tino Engel 2023-12-26 08:54:54 UTC
Hello Alex,

I already tried to debug the issue, but had no finding yet.
I tried to ktrace/kdump a hanging process ('ktrace curl google.de'). I have attached the output to this ticket. Maybe this gives you a better idea of what is going wrong (it is not evident to me)?
Comment 7 Tino Engel 2023-12-26 09:10:00 UTC
P.S.: I also tried https://wiki.freebsd.org/Networking/10GbE/Router#Disabling_LRO_and_TSO without success.
Comment 8 Tino Engel 2024-01-04 10:07:33 UTC
Created attachment 247441 [details]
kdump of 'ktrace -i curl www.freebsd.org'

I have tried again to debug the issue, but unfortunately it seems this is over my head.
I have attached a new trace, this time also tracing the sub-processes.
I am not good at reading kdumps, but I have the impression curl calls www.freebsd.org and forever waits for an answer.

If anyone has an idea, I am willing to invest more time in this issue.
Comment 9 Tino Engel 2024-01-05 18:56:59 UTC
Created attachment 247468 [details]
fruss -f curl www.freebsd.org

I am not willing to give up on this.
I have attached also a truss trace.
I am digging through it, nevertheless any hints are appreciated.
Comment 10 Martin Birgmeier 2024-01-05 19:21:07 UTC
Hi Tino,

From the behavior I have seen this is an issue in the new driver. After booting it can exchange a few packets and then stops working. This means that userland traces like you are supplying probably won't give many clues as to what is happening.

I have compared the 1.98 and 1.99 sources from https://github.com/alexdupre/rtl_bsd_drv, and there are extensive changes, so it is not easy to find what causes the regression. The best way forward might be to ask the Realtek people who supply the original code, which can be found at https://www.realtek.com/en/component/zoo/category/network-interface-controllers-10-100-1000m-gigabit-ethernet-pci-express-software, to also support FreeBSD 14.

-- Martin
Comment 11 Tino Engel 2024-01-06 11:02:32 UTC
Hi Martin,

I also have compared the 1.98 and 1.99 sources from https://github.com/alexdupre/rtl_bsd_drv. I even tried some minor changes, but did not manage to get it working.
The Realtek site (https://www.realtek.com/en/component/zoo/category/network-interface-controllers-10-100-1000m-gigabit-ethernet-pci-express-software) is confusing. They offer a FreeBSD driver (latest version from 09/2023) for FreeBSD 7 and 8. That absolutely makes no sense to me.
I'll try to contact Realtek, maybe they are gonna help if we are lucky.

Tino
Comment 12 imbutler 2024-01-06 20:26:58 UTC
Something appears to have changed in the kernel ..

I can boot ..

FreeBSD 15.0-CURRENT #4 main-c3268c23de4: Mon Jan  1 20:17:26 EST 2024

 .. but my next snapshot of a build at ..

FreeBSD 15.0-CURRENT #8 main-10f2e94acc1: Tue Jan  2 16:46:09 EST 2024

 .. (or anything after that) panics with the message 're0 taskq'

I used the same module from ports (realtek-re-kmod-199.00_1) in each case.
Comment 13 rdunkle 2024-01-09 14:08:04 UTC
FreeBSD 14.0-STABLE #0 stable/14-53a984a36  arm64.aarch64
I compiled realtek-re-kmod today.  This module appears to load OK.  The nic is recognized OK.  But quickly the kernel panics.
dmesg | grep re0
re0: <Realtek PCIe 2.5GbE Family Controller> mem 0xf3000000-0xf300ffff,0xf3010000-0xf3013fff at device 0.0 on pci1
re0: Using Memory Mapping!
re0: Using line-based interrupt
re0: version:1.98.00
---------------------
I switched back to the realtek-re-kmod from FreeBSD repo.That one appears to work OK.
Comment 14 Alex Dupre freebsd_committer freebsd_triage 2024-01-09 15:14:09 UTC
(In reply to rdunkle from comment #13)

From your log it seems you compiled an old 1.98 version, so it's not actually related to this issue that started with 1.99, according to other users.
Comment 15 rdunkle 2024-01-09 15:45:13 UTC
that is dmesg is from the old version, correct. That version runs OK. I did a git pull today on ports and compiled.  The new version does a kernel panic so I cannot get a dmesg with new version
Comment 16 Alexander Vereeken 2024-01-09 17:08:44 UTC
(In reply to rdunkle from comment #15)

Not even in /var/log/messages ?
Comment 17 rdunkle 2024-01-09 17:36:32 UTC
when I boot with the 1.99.04 ... there is a panic and the /var/log/messages is empty
when I boot with 1.98 the nics work
root@orange:/boot/modules # strings if_re.ko | grep 1.99
1.99.04
root@orange:/boot/modules # strings if_re.ko.save | grep 1.98
1.98.00
Is there something else I can do to get useful information for you?
Comment 18 Alexander Vereeken 2024-01-10 07:20:39 UTC
(In reply to rdunkle from comment #17)

I guess that you can obtain something when you load the module while the system is running.

Remove the module from your loader.conf then load the module manually later with:

kldload /boot/modules/if_re.ko

then the panic should be documented in /var/log/messages.
Comment 19 rdunkle 2024-01-10 08:46:16 UTC
the kldload completes.  In about 2 seconds the system reboots.  The version info is not written to the log and the previous log entries vanish.

Jan 10 10:30:50 orange kernel: , 1061.
Jan 10 10:30:50 orange ntpd[1008]: ntpd exiting on signal 15 (Terminated)
Jan 10 10:30:50 orange kernel: .
Jan 10 10:30:51 orange kernel: , 736.
Jan 10 10:30:51 orange syslogd: exiting on signal 15
Jan 10 10:32:15 orange syslogd: kernel boot file is /boot/kernel/kernel
Jan 10 10:32:15 orange kernel: ---<<BOOT>>---
Jan 10 10:32:15 orange kernel: Copyright (c) 1992-2023 The FreeBSD Project.
Jan 10 10:32:15 orange kernel: Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
Jan 10 10:32:15 orange kernel:  The Regents of the University of California. All rights reserved.
Jan 10 10:32:15 orange kernel: FreeBSD is a registered trademark of The FreeBSD Foundation.
Jan 10 10:32:15 orange kernel: FreeBSD 14.0-STABLE #0 stable/14-53a984a36: Mon Jan  8 12:46:16 EET 2024
Jan 10 10:32:15 orange kernel:     root@sky22.smallcatbrain.com:/usr/obj/usr/src-stable-14/arm64.aarch64/sys/
GENERIC arm64
Comment 20 Ott Köstner 2024-01-19 17:58:09 UTC
I can confirm this bug. Everything seems to work, but no traffic goes through the interface.

I have custom built 14.0 kernel and realtek-re-kmod-199.00_1 built from port.
Tried with different ifconfig options and got it working at some point of time, but it was not stable. Also, repeating the same sequence of disabling offload options did not give the same results.

No error messages. Driver loads OK, and ifconfig shows the status active. 

Devices are:
device     = 'RTL8125 2.5GbE Controller'
and
device     = 'RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller'

None of these is working with this driver.

What is more interesting is that on another machine with no Realtek devices, loading this driver disables all traffic on another interface(bge), not related to Realtek.
Comment 21 Koichiro Iwao freebsd_committer freebsd_triage 2024-01-30 01:11:51 UTC
I also encountered this issue. 198 works fine, and 199 stops working after exchanging a few packets such as DHCP and IPv6 RA.

My devices are:

re0@pci0:2:0:0: class=0x020000 rev=0x15 hdr=0x00 vendor=0x10ec device=0x8168 subvendor=0x103c subdevice=0x806a
    vendor     = 'Realtek Semiconductor Co., Ltd.'
    device     = 'RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller'
    class      = network
    subclass   = ethernet

re0@pci0:4:0:0: class=0x020000 rev=0x15 hdr=0x00 vendor=0x10ec device=0x8161 subvendor=0x10ec subdevice=0x8168
    vendor     = 'Realtek Semiconductor Co., Ltd.'
    device     = 'RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller'
    class      = network
    subclass   = ethernet
re1@pci0:5:0:0: class=0x020000 rev=0x0e hdr=0x00 vendor=0x10ec device=0x8168 subvendor=0x17aa subdevice=0x32e1
    vendor     = 'Realtek Semiconductor Co., Ltd.'
    device     = 'RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller'
    class      = network
    subclass   = ethernet
Comment 22 László Károlyi 2024-02-14 13:59:14 UTC
I've had to recompile version 198 for myself after rebooting with the GENERIC re0 driver, after the update caused ~3hrs of downtime (it took 2hrs to get a console on my server).

Definitely can confirm this is an issue, because it made my server unreachable as well.

What I noticed is, upon first rebooting with the faulty driver, the server responded to 3 pings (IPv6) and then went completely silent.

Looking forward for a fix here because I now can't install the latest driver from realtek-re-kmod.
Comment 23 Victor Volpe 2024-03-12 06:38:45 UTC
Same problem with the version 199.00_1. Previous versions worked as intended.

FreeBSD home.local 13.2-RELEASE-p10 FreeBSD 13.2-RELEASE-p10 GENERIC amd64

re0@pci0:1:0:0: class=0x020000 rev=0x15 hdr=0x00 vendor=0x10ec device=0x8168 subvendor=0x10ec subdevice=0x0123
    vendor     = 'Realtek Semiconductor Co., Ltd.'
    device     = 'RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller'
    class      = network
    subclass   = ethernet

/boot/loader.conf
if_re_load="YES"
if_re_name="/boot/modules/if_re.ko"
hw.re.max_rx_mbuf_sz="2048"

No feedback yet?
Comment 24 Koichiro Iwao freebsd_committer freebsd_triage 2024-03-13 01:29:35 UTC
Created attachment 249124 [details]
0001-net-realtek-re-kmod-downgrade-to-198.00.patch

I suggest downgrading this port to 198 until the issue is resolved.
Comment 25 Alex Dupre freebsd_committer freebsd_triage 2024-03-13 07:31:14 UTC
(In reply to Koichiro Iwao from comment #24)

Unfortunately 1.98 was broken for another set of people/cards (see for example https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=274995 that was reported by many people), reverting is not a good solution.
Comment 26 Koichiro Iwao freebsd_committer freebsd_triage 2024-03-13 08:31:11 UTC
(In reply to Alex Dupre from comment #25)
I see. Then, probably we need to create another port for the 198. At least, 198 needs to be able to be installed via pkg install for the people 199 doesn't work.
Comment 27 Koichiro Iwao freebsd_committer freebsd_triage 2024-03-13 08:51:48 UTC
In addition, using the default driver instead of this port is not a solution, too. It has a watchdog timeout issue so using 198 is the only solution so far. They need 198, really.
Comment 28 Alex Dupre freebsd_committer freebsd_triage 2024-03-13 09:13:14 UTC
(In reply to Koichiro Iwao from comment #27)
I know, that was the main reason to create this port. I have no objections if you want to restore the previous version as a separate port.
Comment 29 Koichiro Iwao freebsd_committer freebsd_triage 2024-03-13 09:50:00 UTC
Created attachment 249128 [details]
0001-net-realrek-re-kmod198-add-port-for-198-version.patch

Here it is. Feel free to modify it if you think necessary. It also should be added to quarterly because the quarterly branch has already been updated to 199.
Comment 30 Alex Dupre freebsd_committer freebsd_triage 2024-03-13 16:39:27 UTC
(In reply to Koichiro Iwao from comment #29)
I think you can drop the `PORTREVISION=3` from the new port. I'm time limited, you are welcome to commit (and take the maintainership of) this new port.
Comment 31 Victor Volpe 2024-03-14 00:10:48 UTC
(In reply to Koichiro Iwao from comment #27)
4 days of uptime with no watchdog timeout so far. What FreeBSD version are you running?

# uname -a
FreeBSD home.local 13.2-RELEASE-p10 FreeBSD 13.2-RELEASE-p10 GENERIC amd64
# ifconfig re0
re0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=82099<RXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,WOL_MAGIC,LINKSTATE>
        ether 7c:83:34:b1:8f:8f
        inet 192.168.15.250 netmask 0xffffff00 broadcast 192.168.15.255
        inet6 fe80::7e83:34ff:feb1:8f8f%re0 prefixlen 64 scopeid 0x1
        inet6 2804:7f0:ba41:1e60:7e83:**** prefixlen 64 autoconf
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
        nd6 options=23<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL>
# netstat -db -I re0
Name    Mtu Network       Address              Ipkts Ierrs Idrop     Ibytes    Opkts Oerrs     Obytes  Coll  Drop
re0    1500 <Link#1>      7c:83:34:b1:8f:8f 58536613     0     0 38572546833 101711299     0 99083129652     0   144
Comment 32 László Károlyi 2024-03-14 00:19:16 UTC
(In reply to Victor Volpe from comment #31)
Victor,

there is an entire bug dedicated to the watchdog timeout (bug #166724), I know because I was a victim of it.

Although it recently disappeared for me — which I only managed to find out through a mis-compiled v198 of mine that didn't work and the built-in re0 loaded instead, which I only noticed weeks later by testing rebooting for the pf rule changes I made —, I don't want to risk going back to it on a bare metal, production server like mine is.

Cheers,
László
Comment 33 Victor Volpe 2024-03-14 00:24:47 UTC
(In reply to László Károlyi from comment #32)
Yes, I know that, mate. I was affected too on the 12-RELEASE and I've been using the kmod driver since version 196.04. Now with my system upgraded to 13.2, and after the version 199 bug I had no more watchdog timeouts after downgrading to default driver.
Comment 34 László Károlyi 2024-03-14 00:27:43 UTC
(In reply to Victor Volpe from comment #33)
Welp, that makes two of us then.

Maybe more testing is in order for the default driver.
Comment 35 commit-hook freebsd_committer freebsd_triage 2024-03-14 02:04:17 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/ports/commit/?id=b770c919121526ebbf61b81fd6b832619319df60

commit b770c919121526ebbf61b81fd6b832619319df60
Author:     Koichiro Iwao <meta@FreeBSD.org>
AuthorDate: 2024-03-13 08:52:50 +0000
Commit:     Koichiro Iwao <meta@FreeBSD.org>
CommitDate: 2024-03-14 02:03:06 +0000

    net/realrek-re-kmod198: add port for 198 version

    as a workaround for bug 275882. This port can be retired when the bug is
    resolved completely.

    Many people need the 198 version because of the hang-up issue. Another
    set of people need 199 because of another issue. This port is needed to
    satisfy both sets of people until complete until a complete solution for
    275882 is found.

    PR:             275882
    Sponsored by:   Cybertrust Japan

 net/Makefile                             |  1 +
 net/realtek-re-kmod198/Makefile (new)    | 23 +++++++++++++++++++++++
 net/realtek-re-kmod198/distinfo (new)    |  3 +++
 net/realtek-re-kmod198/pkg-descr (new)   | 25 +++++++++++++++++++++++++
 net/realtek-re-kmod198/pkg-message (new) | 22 ++++++++++++++++++++++
 5 files changed, 74 insertions(+)
Comment 36 commit-hook freebsd_committer freebsd_triage 2024-03-14 02:06:20 UTC
A commit in branch 2024Q1 references this bug:

URL: https://cgit.FreeBSD.org/ports/commit/?id=f967592923a21e7b44c11a45f7a241439a97f163

commit f967592923a21e7b44c11a45f7a241439a97f163
Author:     Koichiro Iwao <meta@FreeBSD.org>
AuthorDate: 2024-03-13 08:52:50 +0000
Commit:     Koichiro Iwao <meta@FreeBSD.org>
CommitDate: 2024-03-14 02:04:19 +0000

    net/realrek-re-kmod198: add port for 198 version

    as a workaround for bug 275882. This port can be retired when the bug is
    resolved completely.

    Many people need the 198 version because of the hang-up issue. Another
    set of people need 199 because of another issue. This port is needed to
    satisfy both sets of people until complete until a complete solution for
    275882 is found.

    PR:             275882
    Sponsored by:   Cybertrust Japan

    (cherry picked from commit b770c919121526ebbf61b81fd6b832619319df60)

 net/Makefile                             |  1 +
 net/realtek-re-kmod198/Makefile (new)    | 23 +++++++++++++++++++++++
 net/realtek-re-kmod198/distinfo (new)    |  3 +++
 net/realtek-re-kmod198/pkg-descr (new)   | 25 +++++++++++++++++++++++++
 net/realtek-re-kmod198/pkg-message (new) | 22 ++++++++++++++++++++++
 5 files changed, 74 insertions(+)
Comment 37 Koichiro Iwao freebsd_committer freebsd_triage 2024-03-14 02:10:30 UTC
(In reply to Alex Dupre from comment #30)
Thanks, I have added the port. 

Guys, the temporary workaround until the complete resolution is to use net/realtek-re-kmod198 instead.
Comment 38 Ott Köstner 2024-03-20 17:21:59 UTC
The temporary workaround wit net/realtek-re-kmod198 works in my case. That confirmed, the hardware is OK and this is a driver bug.

Still waiting the new driver net/realtek-re-kmod to be fixed.