Bug 211926

Summary: svn rev 303171 breaks Layer 2 with IPv6 on the freebsd.org cluster
Product: Base System Reporter: Peter Wemm <peter>
Component: kernAssignee: Mike Karels <karels>
Status: Closed DUPLICATE    
Severity: Affects Some People CC: delphij, des, emaste, gnn, karels, peter, re
Priority: --- Keywords: needs-qa, patch, regression
Version: CURRENTFlags: koobs: mfc-stable11?
Hardware: Any   
OS: Any   
See Also: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211872
Attachments:
Description Flags
Proposed patch none

Description Peter Wemm freebsd_committer freebsd_triage 2016-08-17 07:58:45 UTC
After rev 303171 we are seeing multiple network stack problems.

The most urgent is that Layer-2 routing is broken and packets are being sent to the wrong address.

In the paste below,
0c:c4:7a:49:48:70 = halo
00:25:90:30:d7:48 = ns1
00:00:5e:00:01:64 = default gateway

04:18:06.156769 0c:c4:7a:49:48:70 > 00:25:90:30:d7:48, ethertype IPv6 (0x86dd), length 102: halo.40045 > ns1.domain: 26984+% [1au] DS? freebsd.org. (40)
04:18:06.156942 00:25:90:30:d7:48 > 00:00:5e:00:01:64, ethertype IPv6 (0x86dd), length 313: ns1.domain > halo.40045: 26984 2/0/1 DS, RRSIG (251)

You can see the reply is being incorrectly sent to the default gateway.  We have confirmed with tcpdump that the gateway actually is receiving the packets and it isn't a display error.

From the broken machine we can see it known to ndp:
# ndp -n halo
Neighbor                             Linklayer Address  Netif Expire    S Flags
2610:1c1:1:6002::16:12               0c:c4:7a:49:48:70    em0 23h59m45s S 

And the default gateway is also present in ndp:
# ndp -an
...
fe80::1%em0                          00:00:5e:00:01:64    em0 23h59m57s S R
...

A 'route get' shows the correct answer on the affected machine.
# route -n get -inet6 halo
   route to: 2610:1c1:1:6002::16:12
destination: 2610:1c1:1:6002::
       mask: ffff:ffff:ffff:ffff::
  interface: em0
      flags: <UP,DONE>

Reverting 303171 locally restores correct behavior where both machines are able to communicate directly on the same ethernet segment again. When the packets arrive at the router it (understandably) refuses to route it back out the same interface it arrived on.

I am aware that 303171 has been mfc'ed to 11-stable.  It appears to work when we tried stable/11 temporarily but I cannot explain why.  (These are in redundant paired machines, one runs 11, the other runs 12.)

The machine does have jails.  ns1 is a dual-stack jail.  The host runs an em0 interface, but we have also seen it on igb, bce, bge and vlan.

Jail addresses:
        127.0.1.8
        96.47.72.14
        2610:1c1:1:6002::100

Interface addresses:
	inet 96.47.72.4 netmask 0xffffffe0 broadcast 96.47.72.31 
	inet 96.47.72.21 netmask 0xffffffff broadcast 96.47.72.21 
	inet 96.47.72.14 netmask 0xffffffff broadcast 96.47.72.14 
	inet6 2610:1c1:1:6002::1004 prefixlen 64 
	inet6 2610:1c1:1:6002::7b:1 prefixlen 128 
	inet6 2610:1c1:1:6002::100 prefixlen 128
Comment 1 Mark Linimon freebsd_committer freebsd_triage 2016-08-17 16:32:44 UTC
Mike, this seems to have been via one of your commits?
Comment 2 Peter Wemm freebsd_committer freebsd_triage 2016-08-17 18:15:53 UTC
Argh, I appear to have mixed up a test last night.

I can confirm that it *is* broken on stable/11 now as well.

From stable/11 r304269:
18:11:56.768302 0c:c4:7a:49:48:70 > 00:25:90:30:da:0e, ethertype IPv6 (0x86dd), length 102: halo.33215 > ns2.domain: 46162+% [1au] TXT? freebsd.org. (40)
18:11:56.768432 00:25:90:30:da:0e > 00:00:5e:00:01:64, ethertype IPv6 (0x86dd), length 833: ns2.domain > halo.33215: 46162$ 2/4/1 TXT "v=spf1 redirect=_spf.freebsd.org", RRSIG (771)

Replies are going to the default gateway rather than the machine on the local network.

The behavior is now the same in stable/11 as with head after patch 303171.  Of note it has been merged to releng/11 as well. I'm going to try a local backout of r303698 to get the freebsd.org cluster working again.
Comment 3 Peter Wemm freebsd_committer freebsd_triage 2016-08-17 22:01:20 UTC
A backout of r304086 on stable/11 and releng/11.0 fixes the problem of packets going to the wrong MAC address.
Comment 4 Peter Wemm freebsd_committer freebsd_triage 2016-08-17 22:34:18 UTC
On a hunch, I changed one of the machines from old-style IPv6 jail / alias configuration to something more modern.

With the following changes:
 ifconfig_em0="inet 96.47.72.5/27 -tso -vlanhwtso"
-ifconfig_em0_ipv6="inet6 2610:1c1:1:6002::1005/64"
+ifconfig_em0_ipv6="inet6 2610:1c1:1:6002::1005/64 prefer_source"
 
-ifconfig_em0_alias0="inet6 2610:01c1:0001:6002::7b:2/128"
+ifconfig_em0_alias0="inet6 2610:01c1:0001:6002::7b:2/64"
 ifconfig_em0_alias1="inet 96.47.72.22/32"
...
 jail_ns2_hostname="ns2.nyi.freebsd.org"
-jail_ns2_ip="lo1|127.0.1.9,96.47.72.15,2610:01c1:0001:6002::200"
+jail_ns2_ip="lo1|127.0.1.9,96.47.72.15,2610:01c1:0001:6002::200/64"

The problem no longer manifests.  The test scenario I was seeing packets going to the default gateway was for packets between:

2610:1c1:1:6002::16:12 <-> 2610:01c1:0001:6002::200, both in the same /64.

Note that it is *still* using /32 aliases for ipv4 for the jails and it works as expected there still.  The problem was using the ipv6 equivalent - /128.
Comment 5 Peter Wemm freebsd_committer freebsd_triage 2016-08-17 22:38:47 UTC
I should have pasted the actual configuration.

$ ifconfig | grep inet
	inet 96.47.72.5 netmask 0xffffffe0 broadcast 96.47.72.31 
	inet 96.47.72.22 netmask 0xffffffff broadcast 96.47.72.22 
	inet 96.47.72.15 netmask 0xffffffff broadcast 96.47.72.15 
	inet6 2610:1c1:1:6002::1005 prefixlen 64 prefer_source 
	inet6 2610:1c1:1:6002::7b:2 prefixlen 64 
	inet6 2610:1c1:1:6002::200 prefixlen 64 

The addresses tested:
96.47.72.18 <-> 96.47.72.15 (jail alias)
2610:1c1:1:6002::16:12 <-> 2610:1c1:1:6002::200 (jail alias)
Comment 6 Mike Karels freebsd_committer freebsd_triage 2016-08-18 00:50:00 UTC
(In reply to Mark Linimon from comment #1)

Yes, and I'm looking at this.
Comment 7 Mike Karels freebsd_committer freebsd_triage 2016-08-18 08:01:03 UTC
Created attachment 173815 [details]
Proposed patch
Comment 8 Mike Karels freebsd_committer freebsd_triage 2016-08-18 08:02:02 UTC
This appears to be a dup of 211872.

I'm attaching a proposed patch that Peter Wemm is testing, with good initial results.
Comment 9 Glen Barber freebsd_committer freebsd_triage 2016-08-18 18:48:28 UTC
I'll follow up with peter@ internally, but as this update is only 10 hours old, wanted to follow up with you on the status.
Comment 10 Dag-Erling Smørgrav freebsd_committer freebsd_triage 2016-08-19 07:19:53 UTC
The casts in the patch are unnecessary (and a style(9) violation).
Comment 11 Kubilay Kocak freebsd_committer freebsd_triage 2016-08-19 13:36:43 UTC
Annotate / bring up to date.

@Mike if/when you're confident this issue and bug 211872 are duplicates, please close one as a duplicate (using 'Mark as Duplicate').

Ideally close the newer as the dupe, but failing that the one with the most context/activity/content.
Comment 12 Mike Karels freebsd_committer freebsd_triage 2016-08-19 23:52:28 UTC
(In reply to Dag-Erling Smørgrav from comment #10)
Thanks, I'll remember that.  This is actually fairly old code.  I don't think this patch is going in now, but a simpler one (TBD).
Comment 13 commit-hook freebsd_committer freebsd_triage 2016-08-20 20:47:17 UTC
A commit references this bug:

Author: karels
Date: Sat Aug 20 20:46:54 UTC 2016
New revision: 304545
URL: https://svnweb.freebsd.org/changeset/base/304545

Log:
  Disable L2 caching for UDP over IPv6

  The ip6_output routine is missing L2 cache invalication as done
  in ip_output.  Even with that code, some problems with UDP over
  IPv6 have been reported.  Diabling L2 cache for that problem works
  around the problem for now.

  PR:		211872 211926
  Reviewed by:	gnn
  Approved by:	gnn (mentor)
  MFC after:	immediate

Changes:
  head/sys/netinet6/udp6_usrreq.c
Comment 14 commit-hook freebsd_committer freebsd_triage 2016-08-20 20:57:22 UTC
A commit references this bug:

Author: karels
Date: Sat Aug 20 20:56:37 UTC 2016
New revision: 304546
URL: https://svnweb.freebsd.org/changeset/base/304546

Log:
  MFC r304545: Disable L2 caching for UDP over IPv6

  The ip6_output routine is missing L2 cache invalication as done
  in ip_output.  Even with that code, some problems with UDP over
  IPv6 have been reported.  Diabling L2 cache for that problem works
  around the problem for now.

  PR:		211872 211926
  Reviewed by:	gnn
  Approved by:	gnn (mentor)
  Tested by:	peter@, Mike Andrews
  MFC after:	immediate

Changes:
_U  stable/11/
  stable/11/sys/netinet6/udp6_usrreq.c
Comment 15 Mike Karels freebsd_committer freebsd_triage 2016-08-21 00:45:11 UTC
(In reply to Kubilay Kocak from comment #11)
I am reasonably confident that both bugs are essentially the same.  There is a fair amount of history on both.  My inclination is to mark this bug as a dup of 211872 if no one objects.
Comment 16 commit-hook freebsd_committer freebsd_triage 2016-08-22 22:30:42 UTC
A commit references this bug:

Author: karels
Date: Mon Aug 22 22:29:57 UTC 2016
New revision: 304642
URL: https://svnweb.freebsd.org/changeset/base/304642

Log:
  MFC r304546: Disable L2 caching for UDP over IPv6

  The ip6_output routine is missing L2 cache invalication as done
  in ip_output.  Even with that code, some problems with UDP over
  IPv6 have been reported.  Diabling L2 cache for that problem works
  around the problem for now.

  PR:             211872 211926
  Reviewed by:    gnn
  Approved by:    gnn (mentor)
  Approved by:    re (gjb)
  Tested by:      peter@, Mike Andrews

Changes:
_U  releng/11.0/
  releng/11.0/sys/netinet6/udp6_usrreq.c
Comment 17 Mike Karels freebsd_committer freebsd_triage 2016-08-22 22:33:04 UTC
Marking as duplicate of 211872, which was filed earlier.

*** This bug has been marked as a duplicate of bug 211872 ***