Bug 253469 - realtek-re-kmod MC filter problem
Summary: realtek-re-kmod MC filter problem
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: 13.0-STABLE
Hardware: amd64 Any
: --- Affects Only Me
Assignee: freebsd-net (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-02-12 20:53 UTC by courtney.hicks1
Modified: 2021-02-22 20:03 UTC (History)
4 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description courtney.hicks1 2021-02-12 20:53:45 UTC
I'm not sure where to begin with this issue, but I'm noticing a strange behavior with IPv6 on FreeBSD 13.0-BETA1. Not sure if it's my driver or an issue with FreeBSD 13.0 itself. It seems after moments of IPv6 routing properly it suddenly stops. I found out what is missing when it's working and when it isn't.

For one, I use RTSOL to get my IPv6 address. So I have the rtsold daemon running. Once IPv6 routing stops working, I can restart the daemon and routing works again, albeit momentarily. When I run netstat -r when IPv6 routing doesn't work:

Internet6:
Destination        Gateway            Flags     Netif Expire
dead:beef:x:x::    link#1             U           re0
fe80::%re0/64      link#1             U           re0

And when it works

Internet6:
Destination        Gateway            Flags     Netif Expire
default            fe80::1:1%re0      UG          re0 
dead:beef:x:x::    link#1             U           re0 
fe80::%re0/64      link#1             U           re0 

I left out my loopback info, I can add it if necessary. So for whatever reason (and nothing shows in the logs), the default route vanishes for IPv6. During this time I still have my address and I can still ping6 hosts internally.

uname -a
FreeBSD towerDefense 13.0-BETA1 FreeBSD 13.0-BETA1 #8 c48cbd025: Wed Feb 10 10:30:48 PST 2021     root@towerDefense:/usr/obj/usr/src/amd64.amd64/sys/GENERIC  amd64


I also compiled the realtek-re-kmod port to get networking.

Intel Z490 motherboard, Intel 10700k. Here is my ethernet info from pciconf

re0@pci0:2:0:0: class=0x020000 rev=0x04 hdr=0x00 vendor=0x10ec device=0x8125 subvendor=0x1462 subdevice=0x7c75
    vendor     = 'Realtek Semiconductor Co., Ltd.'
    device     = 'RTL8125 2.5GbE Controller'
    class      = network
    subclass   = ethernet
Comment 1 Alexander V. Chernikov freebsd_committer 2021-02-13 11:13:38 UTC
Is there any chance you could run `route -n monitor > log.txt` in the background, stopping it when the default route disappears?
Comment 2 courtney.hicks1 2021-02-13 19:12:48 UTC
Here is what I got. I did service rtsold restart and then waited until I lost my route. Here is the output of ping6 as the route got deleted

16 bytes from 2607:f8b0:400a:805::200e, icmp_seq=49 hlim=116 time=11.189 ms
16 bytes from 2607:f8b0:400a:805::200e, icmp_seq=50 hlim=116 time=12.354 ms
ping6: sendmsg: No route to host
ping6: wrote ipv6.l.google.com 16 chars, ret=-1


And here is the output of route -n monitor from before restarting rtsold and when the route got deleted

got message of size 248 on Sat Feb 13 11:09:40 2021
RTM_ADD: Add Route: len 248, pid: 0, seq 0, errno 0, flags:<UP,GATEWAY,DONE>
locks:  inits: 
sockaddrs: <DST,GATEWAY,NETMASK>
 :: fe80::1:1%re0 ::

got message of size 248 on Sat Feb 13 11:10:41 2021
RTM_DELETE: Delete Route: len 248, pid: 0, seq 0, errno 0, flags:<GATEWAY,DONE>
locks:  inits: 
sockaddrs: <DST,GATEWAY,NETMASK>
 :: fe80::1:1%re0 ::
Comment 3 courtney.hicks1 2021-02-13 19:15:35 UTC
I forgot to note, I found out that if I manually add the route, it does not go away

sudo route -6 add default fe80::1:1%re0

So rtsol is removing the routes? Or something upstream is telling the routes to get removed? Everything in my environment is constant, the only variables are I had a hardware change going from intel to realtek (new mobo/processor) and I'm using FreeBSD 13.0 now instead of my prior 12.2
Comment 4 Alexander V. Chernikov freebsd_committer 2021-02-13 19:50:13 UTC
Okay, so it's actually not the userland who adds/removes the route, as the "PID" valued of the rtsock messages is 0.

what is the output of "ndp -r" closer to the route expiration? How does "ndp -p" look?
Also: does the router(s) on your network send RA messages periodically, or you have to explicitly rely on the rtsol?
Comment 5 courtney.hicks1 2021-02-13 20:05:37 UTC
I believe my router will send out RA messages periodically. I have just been using rtsol to see what changes. I have these in my rc.conf. Let me verify some of these things on my other 12.2 box so I can give better info in terms of RA. I'm using pfSense and to my understanding it's using RA.

ifconfig_re0_ipv6="inet6 accept_rtadv"
ipv6_activate_all_interfaces="YES"

Note the behavior is no different whether I have rtsold enabled and running or not

I can restart netif and get an IPv6 address without rtsol so I assume my interface is listening for RAs?

> what is the output of "ndp -r" closer to the route expiration? How does "ndp -p" look?

"ndp -p" before the expiration

dead:beef:x:x::/64 if=re0
flags=LAO vltime=86400, pltime=14400, expire=23h59m30s, ref=1
  advertised by
    fe80::1:1%re0 (reachable)
fe80::%re0/64 if=re0
flags=LAO vltime=infinity, pltime=infinity, expire=Never, ref=1
  No advertising router
fe80::%lo0/64 if=lo0
flags=LAO vltime=infinity, pltime=infinity, expire=Never, ref=1
  No advertising router

"ndp -p" when I lose my route

dead:beef:x:x::/64 if=re0
flags=LAO vltime=86400, pltime=14400, expire=23h58m53s, ref=1
  No advertising router
fe80::%re0/64 if=re0
flags=LAO vltime=infinity, pltime=infinity, expire=Never, ref=1
  No advertising router
fe80::%lo0/64 if=lo0
flags=LAO vltime=infinity, pltime=infinity, expire=Never, ref=1
  No advertising router


"ndp -r" before I lose my route

fe80::1:1%re0 if=re0, flags=, pref=high, expire=32s

and afterwards I get no output
Comment 6 courtney.hicks1 2021-02-13 20:18:39 UTC
I watched my systems more. My one that is having issues will let my...I can't think of the words, information(?) expire. Watching my 12.2 machine with ndp -r I see the expiration time refresh. It seems an advertisement is sent out every 20 seconds on my network. I can confirm that my desktop is getting those advertisements every 20 seconds too. So for some reason it just isn't applying them?

Also, I have to have rtsold running on my systems.
Comment 7 Alexander V. Chernikov freebsd_committer 2021-02-13 21:03:13 UTC
Okay, so we're receiving RAs but for some reason they are ignored.

Could you consider:
1) sharing netstat -sp icmp6 output?
2) turn on sysctl net.inet6.icmp6.nd6_debug=1 & check dmesg if there are any relevant messages?
Comment 8 courtney.hicks1 2021-02-13 22:31:55 UTC
netstat -sp icmp6

icmp6:
        2 calls to icmp6_error
        0 errors not generated in response to an icmp6 message
        0 errors not generated because of rate limitation
        Output histogram:
                unreach: 2
                echo: 99
                router solicitation: 11
                neighbor solicitation: 148
                neighbor advertisement: 205
                MLDv2 listener report: 19
        0 messages with bad code fields
        0 messages < minimum length
        0 bad checksums
        0 messages with bad length
        Input histogram:
                echo reply: 82
                router advertisement: 20
                neighbor solicitation: 205
                neighbor advertisement: 136
        Histogram of error messages to be generated:
                0 no route
                0 administratively prohibited
                0 beyond scope
                0 address unreachable
                2 port unreachable
                0 packet too big
                0 time exceed transit
                0 time exceed reassembly
                0 erroneous header field
                0 unrecognized next header
                0 unrecognized option
                0 redirect
                0 unknown
        0 message responses generated
        0 messages with too many ND options
        0 messages with bad ND options
        0 bad neighbor solicitation messages
        0 bad neighbor advertisement messages
        0 bad router solicitation messages
        0 bad router advertisement messages
        0 bad redirect messages
        0 default routers overflows
        0 prefix overflows
        0 neighbour entries overflows
        0 redirect overflows
        0 messages with invalid hop limit
        0 path MTU changes




And for dmesg I saw this:

nd6_options: unsupported option 24 - option ignored
Comment 9 Alexander V. Chernikov freebsd_committer 2021-02-14 13:32:16 UTC
Good.
So from ICMP6 input histogram, it can be seen we received these RA messages.
Do you see multiple messages "nd6_options: unsupported option 24 - option ignored
"?

As far as I understand, RA messages are received by the nd6_ra_input(), processed halfway there (as nd6_options() is called and likely does not fail).
The next part it can fail is enabled forwarding (net.inet6.ip6.forwarding).
Do you have it turned off (or have net.inet6.ip6.rfc6204w3 turned on)?

If forwarding is turned off, do you mind sharing
"ndp -i re0" and showing input RA message?

e.g. "tcpdump -i re0 -lnpvvs0 icmp6"
Comment 10 Bjoern A. Zeeb freebsd_committer 2021-02-14 17:06:36 UTC
(In reply to courtney.hicks1 from comment #0)

To me this sounds that unsolicited RAs (as send out periodically by your router) don't make it while solicited (rtsol "asking") work.

That may imply that there are filters somewhere not working properly or multicast is not properly received.

If you do see the unsolicited RAs from your router in
   tcpdump -ln -s0 -i re0 -vvv icmp6
then can you do the following:

(0) confirm no firewall active on your local system?
(a) wait for the default route to go away
(b) do not restart anything!
(c) start tcpdump as per above
(d) when you see an unsolicited RA coming back in, check (in a 2nd terminal) if your default route is back or not.

(d1) if it is not; please report
(d2) if it is back then stop tcpdump, do an ifconfig re0 promisc;  keep watching if your default route goes away again or not during the next 25 minutes and report back.

Otherwise and/or in addition sysctl net.inet6.icmp6.nd6_debug=0xff may also help Alexander to further debug this.
Comment 11 courtney.hicks1 2021-02-16 03:34:10 UTC
For the nd6_options message, I only see it twice. Looking at it, it looks to come around the completion of DAD for my re0 link local and autoconf IPv6 address. Sorry if some of my terminology is poor, my IPv6 knowledge has gotten rusty.

inet.inet6.ip6.forwarding = 0
inet.inet6.ip6.rfc.6204w3 = 0

The output of ndp -i rs0:

linkmtu=1500, maxmtu=1500, curhlim=64, basereachable=30s0ms, reachable=18s, retrans=1s0ms
Flags: nud accept_rtadv auto_linklocal

For the tcpdump, I see nothing showing up. I have tried with my firewall both on and off. With promisc mode enabled I am getting router solicitation packets. Actually, what's also interesting is when I enable promiscuous mode the RAs work and I don't lose my routes. Then if I turn promiscuous mode off I lose my route again. I see there are nd6 updates available in releng/13.0 branch. So I'm going to create a new boot environment and see if I have these issues after the update. I'll be keeping my existing boot environment with the issues. I'm getting src updates via gitup. For the future:

# Have: c48cbd0254dedd363ab569692ddf3395b6214412
# Want: 1e76911d62ed4b66bc21cfc22101ef6b20cd6630

I'll update this bug report after things compile and such.
Comment 12 courtney.hicks1 2021-02-16 05:46:01 UTC
Just updated to the latest updates in releng/13.0, I did see nd6 updates but no dice on fixing the issue:

FreeBSD towerDefense 13.0-BETA2 FreeBSD 13.0-BETA2 #9 1e76911d6: Mon Feb 15 20:52:00 PST 2021     root@towerDefense:/usr/obj/usr/src/amd64.amd64/sys/GENERIC  amd64
Comment 13 Alexander V. Chernikov freebsd_committer 2021-02-16 19:45:14 UTC
It looks like Bjoern's diagnostics was right.

So far it looks like a potential problem w.r.t programming multicast groups in the driver.

Could you consider sharing `ifmcstat` output?
Is there any chance you can potentially try it with different NIC?
Comment 14 courtney.hicks1 2021-02-16 21:10:32 UTC
Poo, that sucks. I could probably take the extra card out of my server tonight. It's an Intel 82580.

Here is the ifmcstat for now

re0:
        inet 192.168.10.201
        igmpv3 rv 2 qi 125 qri 10 uri 3
                group 224.0.0.1 mode exclude
                        mcast-macaddr 01:00:5e:00:00:01
        inet6 fe80::2ef0:5dff:fecc:4ed7%re0 scopeid 0x1
        mldv2 flags=2<USEALLOW> rv 2 qi 125 qri 10 uri 3
                group ff01::1%re0 scopeid 0x1 mode exclude
                        mcast-macaddr 33:33:00:00:00:01
                group ff02::2:2c1c:8e10%re0 scopeid 0x1 mode exclude
                        mcast-macaddr 33:33:2c:1c:8e:10
                group ff02::2:ff2c:1c8e%re0 scopeid 0x1 mode exclude
                        mcast-macaddr 33:33:ff:2c:1c:8e
                group ff02::1%re0 scopeid 0x1 mode exclude
                        mcast-macaddr 33:33:00:00:00:01
                group ff02::1:ffcc:4ed7%re0 scopeid 0x1 mode exclude
                        mcast-macaddr 33:33:ff:cc:4e:d7
lo0:
        inet 127.0.0.1
        igmpv3 rv 2 qi 125 qri 10 uri 3
                group 224.0.0.1 mode exclude
        inet6 fe80::1%lo0 scopeid 0x2
        mldv2 flags=2<USEALLOW> rv 2 qi 125 qri 10 uri 3
                group ff01::1%lo0 scopeid 0x2 mode exclude
                group ff02::2:2c1c:8e10%lo0 scopeid 0x2 mode exclude
                group ff02::2:ff2c:1c8e%lo0 scopeid 0x2 mode exclude
                group ff02::1%lo0 scopeid 0x2 mode exclude
                group ff02::1:ff00:1%lo0 scopeid 0x2 mode exclude
Comment 15 courtney.hicks1 2021-02-19 00:47:35 UTC
I can confirm this is an issue with the driver. Popped in my PCIe NIC and my problem is gone:

igb3@pci0:3:0:3:        class=0x020000 rev=0x01 hdr=0x00 vendor=0x8086 device=0x150e subvendor=0x8086 subdevice=0x12a2
    vendor     = 'Intel Corporation'
    device     = '82580 Gigabit Network Connection'
    class      = network
    subclass   = ethernet


ifmcstat:

igb3:
        inet 192.168.10.204
        igmpv3 rv 2 qi 125 qri 10 uri 3
                group 224.0.0.1 mode exclude
                        mcast-macaddr 01:00:5e:00:00:01
        inet6 fe80::21b:21ff:fed7:dbab%igb3 scopeid 0x4
        mldv2 flags=2<USEALLOW> rv 2 qi 125 qri 10 uri 3
                group ff01::1%igb3 scopeid 0x4 mode exclude
                        mcast-macaddr 33:33:00:00:00:01
                group ff02::2:2c1c:8e10%igb3 scopeid 0x4 mode exclude
                        mcast-macaddr 33:33:2c:1c:8e:10
                group ff02::2:ff2c:1c8e%igb3 scopeid 0x4 mode exclude
                        mcast-macaddr 33:33:ff:2c:1c:8e
                group ff02::1%igb3 scopeid 0x4 mode exclude
                        mcast-macaddr 33:33:00:00:00:01
                group ff02::1:ffd7:dbab%igb3 scopeid 0x4 mode exclude
                        mcast-macaddr 33:33:ff:d7:db:ab



Not sure if there is anymore info you need, or what to do about a driver issue in ports. I suppose I could open an issue with the port, but it seems more like it's something simply pulled down from upstream? I know that FreeBSD has some sort of realtek driver in base already. Would there be any plans to update it?
Comment 16 Andrey V. Elsukov freebsd_committer 2021-02-19 10:02:35 UTC
jmg@ also has reported very similar problem in the net@. John-Mark, can you share what hw do you use?
Comment 17 Konstantin Belousov freebsd_committer 2021-02-21 10:27:22 UTC
Try this.  The patch should be applied on top of all patches in the realtek
driver port.

It might be easier to apply it by hands

diff --git a/if_re.c b/if_re.c
index 47466f9..d8f0176 100644
--- a/if_re.c
+++ b/if_re.c
@@ -8663,7 +8663,7 @@ struct re_softc		*sc;
 
         /* now program new ones */
 #if OS_VER >= VERSION(13,0)
-	if_foreach_llmaddr(ifp, re_hash_maddr, hashes);
+	mcnt = if_foreach_llmaddr(ifp, re_hash_maddr, hashes);
 #else
 #if OS_VER >= VERSION(12,0)
 	if_maddr_rlock(ifp);
Comment 18 John-Mark Gurney freebsd_committer 2021-02-22 20:03:37 UTC
Looks like it was bge.  It may also affect ure as well, but the testing that I was doing for the thread was bge:
https://docs.freebsd.org/cgi/mid.cgi?20210112213707.GP31099@funkthat.com

and later I was able to reproduce w/ epair as well:
https://docs.freebsd.org/cgi/mid.cgi?20210114193429.GT31099@funkthat.com