Bug 264115 - net/wireguard-kmod: Panics ARM64 (RockPro64) on FreeBSD 14-CURRENT (716fd348e01): panic: vm_fault failed: 0 error 1
Summary: net/wireguard-kmod: Panics ARM64 (RockPro64) on FreeBSD 14-CURRENT (716fd348e...
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: arm64 Any
: --- Affects Only Me
Assignee: Kyle Evans
URL: https://reviews.freebsd.org/D44283
Keywords: crash, needs-qa
Depends on:
Blocks:
 
Reported: 2022-05-21 06:27 UTC by Poul-Henning Kamp
Modified: 2024-03-28 12:32 UTC (History)
12 users (show)

See Also:


Attachments
git-diff against src (1.73 KB, patch)
2024-03-08 20:05 UTC, Kyle Evans
no flags Details | Diff
Proposed EN template (4.15 KB, text/plain)
2024-03-22 19:05 UTC, Kyle Evans
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Poul-Henning Kamp freebsd_committer freebsd_triage 2022-05-21 06:27:31 UTC
On main-n255696-716fd348e01: Thu May 19 10:14:47 UTC 2022

        Fatal data abort:
          x0:                0
          x1:                6
          x2:                0
          x3:                0
          x4:                0
          x5: ffffa000df794000
          x6:               ff
          x7: ffffa00000c2e100
          x8: ffffa000b79305a8
          x9:                0
         x10:            20000
         x11:                0
         x12:                0
         x13:                0
         x14: ffffa00000ce3700
         x15:                0
         x16: ffff0000e5abf438 (_DYNAMIC + 490)
         x17: ffff000000589a44 (sosend + 0)
         x18: ffff0000e6459490 (ratelimit_v6 + 989e38)
         x19:                0
         x20: ffff0000e6459518 (ratelimit_v6 + 989ec0)
         x21:                0
         x22:                0
         x23:               14
         x24:               40
         x25: ffff0000e6459538 (ratelimit_v6 + 989ee0)
         x26:                0
         x27:                0
         x28: ffffa0009a9c713c
         x29: ffff0000e6459490 (ratelimit_v6 + 989e38)
          sp: ffff0000e6459490
          lr: ffff000000657974 (fib4_lookup + 40)
         elr:                0
        spsr:         60000045
         far:                0
         esr:         86000004
        panic: vm_fault failed: 0 error 1
        cpuid = 4
        time = 1653111862
        KDB: stack backtrace:
        db_trace_self() at db_trace_self
        db_trace_self_wrapper() at db_trace_self_wrapper+0x30
        vpanic() at vpanic+0x174
        panic() at panic+0x44
        data_abort() at data_abort+0x2c4
        handle_el1h_sync() at handle_el1h_sync+0x10
        --- exception, esr 0x86000004
        (null)() at 0
        ip_output() at ip_output+0x9a4
        udp_send() at udp_send+0xb5c
        sosend_dgram() at sosend_dgram+0x4a4
        sosend() at sosend+0x2c
        wg_send() at wg_send+0x10c
        wg_deliver_out() at wg_deliver_out+0x17c
        gtaskqueue_run_locked() at gtaskqueue_run_locked+0x17c
        gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0x130
        fork_exit() at fork_exit+0x88
        fork_trampoline() at fork_trampoline+0x14
        KDB: enter: panic
        [ thread pid 0 tid 112253 ]
        Stopped at      kdb_enter+0x40: undefined       f907827f
        db>
Comment 1 Daniel Engberg freebsd_committer freebsd_triage 2022-05-21 07:16:34 UTC
Package or ports?
Comment 2 Bernhard Froehlich freebsd_committer freebsd_triage 2022-05-21 09:20:21 UTC
FreeBSD version would be useful. Please make sure that the kmod was build for exactly that kernel.
Comment 3 Poul-Henning Kamp freebsd_committer freebsd_triage 2022-05-21 09:33:42 UTC
FreeBSD version is literally in first line of this report.

Kernel, userland and net/wireguard all freshly built.
Comment 4 Bernhard Froehlich freebsd_committer freebsd_triage 2022-05-21 10:55:46 UTC
716fd348e01 is from 19.05. but according to bug #264105 wireguard-kmod cannot be build since b46667c which is from 17.05.

https://freshbsd.org/freebsd/src/commit/b46667c63eb7f126a56e23af1401d98d77b912e8
Comment 5 Poul-Henning Kamp freebsd_committer freebsd_triage 2022-05-21 13:28:43 UTC
I fixed that locally by adding M_WAITOK.
Comment 6 Poul-Henning Kamp freebsd_committer freebsd_triage 2022-05-21 20:32:44 UTC
I forgot to add:  This is very reproducible.
Comment 7 Poul-Henning Kamp freebsd_committer freebsd_triage 2022-05-22 05:38:27 UTC
Got an identical panic this night (This machine collects backups every night):

    Fatal data abort:
      x0:                0
      x1:                6
      x2:                0
      x3:                0
      x4:                0
      x5: ffffa000bd0dc000
      x6:               ff
      x7: ffffa00000c2e100
      x8: ffffa000b79305a8
      x9:                0
     x10:            20000
     x11:                0
     x12:                0
     x13:                0
     x14: ffffa00000ce3700
     x15:                0
     x16: ffff0000e5abf438 (_DYNAMIC + 490)
     x17: ffff000000589a44 (sosend + 0)
     x18: ffff0000e5ef0490 (ratelimit_v6 + 420e38)
     x19:                0
     x20: ffff0000e5ef0518 (ratelimit_v6 + 420ec0)
     x21:                0
     x22:                0
     x23:               14
     x24:               40
     x25: ffff0000e5ef0538 (ratelimit_v6 + 420ee0)
     x26:                0
     x27:                0
     x28: ffffa000bd11533c
     x29: ffff0000e5ef0490 (ratelimit_v6 + 420e38)
      sp: ffff0000e5ef0490
      lr: ffff000000657974 (fib4_lookup + 40)
     elr:                0
    spsr:         60000045
     far:                0
     esr:         86000004
    panic: vm_fault failed: 0 error 1
    cpuid = 4
    time = 1653197725
    KDB: stack backtrace:
    db_trace_self() at db_trace_self
    db_trace_self_wrapper() at db_trace_self_wrapper+0x30
    vpanic() at vpanic+0x174
    panic() at panic+0x44
    data_abort() at data_abort+0x2c4
    handle_el1h_sync() at handle_el1h_sync+0x10
    --- exception, esr 0x86000004
    (null)() at 0
    ip_output() at ip_output+0x9a4
    udp_send() at udp_send+0xb5c
    sosend_dgram() at sosend_dgram+0x4a4
    sosend() at sosend+0x2c
    wg_send() at wg_send+0x10c
    wg_deliver_out() at wg_deliver_out+0x17c
    gtaskqueue_run_locked() at gtaskqueue_run_locked+0x17c
    gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0x130
    fork_exit() at fork_exit+0x88
    fork_trampoline() at fork_trampoline+0x14
    KDB: enter: panic
    [ thread pid 0 tid 100185 ]
    Stopped at      kdb_enter+0x40: undefined       f907827f
    db>
Comment 8 Poul-Henning Kamp freebsd_committer freebsd_triage 2022-05-22 18:51:15 UTC
Slightly different, but same backtrace:




          x0: 29d954f85865cc4b
      x1:                6
      x2:                0
      x3:                0
      x4:                0
      x5: ffffa000122186a0
      x6:               ff
      x7: ffffa00000c2e100
      x8: ffffa0000faeb568
      x9: d549cccb4d4ecc4d
     x10:            20000
     x11:             ff6c
     x12:                0
     x13:                0
     x14:                0
     x15:                0
     x16: ffff0000e5cbf438 (_DYNAMIC + 490)
     x17: ffff00000056b0c0 (sosend + 0)
     x18: ffff0000a5fea470 (_end + a509a470)
     x19:                0
     x20: ffff0000a5fea4f8 (_end + a509a4f8)
     x21:                0
     x22:                0
     x23:               14
     x24:               40
     x25: ffff0000a5fea518 (_end + a509a518)
     x26:                0
     x27: ffffa000a96a843c
     x28: ffff000000beb000 (vfs_smr + 0)
     x29: ffff0000a5fea470 (_end + a509a470)
      sp: ffff0000a5fea470
      lr: ffff000000638f24 (fib4_lookup + 40)
     elr: d549cccb4d4ecc4d
    spsr:         60000045
     far: d549cccb4d4ecc4d
    panic: Unknown kernel exception 22 esr_el1 8a000000
    cpuid = 4
    time = 1653244554
    KDB: stack backtrace:
    db_trace_self() at db_trace_self
    db_trace_self_wrapper() at db_trace_self_wrapper+0x30
    vpanic() at vpanic+0x178
    panic() at panic+0x44
    do_el1h_sync() at do_el1h_sync+0x194
    handle_el1h_sync() at handle_el1h_sync+0x10
    --- exception, esr 0x8a000000
    (null)() at 0xcccb4d4ecc4d
    ip_output() at ip_output+0x994
    udp_send() at udp_send+0xb2c
    sosend_dgram() at sosend_dgram+0x4ac
    sosend() at sosend+0x2c
    wg_send() at wg_send+0x10c
    wg_deliver_out() at wg_deliver_out+0x17c
    gtaskqueue_run_locked() at gtaskqueue_run_locked+0x17c
    gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0x130
    fork_exit() at fork_exit+0x88
    fork_trampoline() at fork_trampoline+0x14
    KDB: enter: panic
    [ thread pid 0 tid 100167 ]
    Stopped at      kdb_enter+0x44: undefined       f907827f
    db>
Comment 9 Marek Zarychta 2022-05-22 21:03:05 UTC
Maybe it's an issue introduced by  Clang 14, please see bug 264021, bug 264094,  https://lists.freebsd.org/archives/freebsd-arm/2022-May/001356.html, and probably more to follow.
Comment 10 Poul-Henning Kamp freebsd_committer freebsd_triage 2022-07-12 11:08:33 UTC
This issue is now also seen on FreeBSD-13.1-R:

root@mick:~ # uname -a
FreeBSD mick.freebsd.dk 13.1-RELEASE FreeBSD 13.1-RELEASE releng/13.1-n250148-fc952ac2212 GENERIC arm64
root@mick:~ # pkg info | grep wire
wireguard-2,1                  Meta-port for Wireguard
wireguard-kmod-0.0.20220615    WireGuard implementation for the FreeBSD kernel
wireguard-tools-1.0.20210914_1 Fast, modern and secure VPN Tunnel
Comment 11 Daniel Engberg freebsd_committer freebsd_triage 2022-07-15 22:45:37 UTC
Can't really test any other version right now but I've been running the previous version for weeks without issue on 13.1-RELEASE

wireguard-2,1                  Meta-port for Wireguard
wireguard-kmod-0.0.20220610_1  WireGuard implementation for the FreeBSD kernel
wireguard-tools-1.0.20210914_1 Fast, modern and secure VPN Tunnel

It seems a bit odd to me that it's always CPUID=4 since many applications/threads "core jumps" quite a bit. Do you have any boot/loader settings in place and/or running something more than just wireguard? Is it perhaps power and/or temperature related?

I'm not using the built-in NIC but it doesn't seem related to that looking at the panic.
Comment 12 Marek Zarychta 2022-10-30 17:15:22 UTC
FreeBSD 14.0-CURRENT main-n258920-f0a15aafcb8: Sun Oct 30 09:44:53 CET 2022      arm64 with if_wg.ko loaded (module from the base system) seems to work fine.
Comment 13 Marek Zarychta 2022-11-06 17:32:41 UTC
FreeBSD 14.0-CURRENT #3 main-n259059-d56c7ac87f9  Sun Nov  6 10:37:24 CET 2022  running on ARMv7 also can load and make right use if_wg(4), so probably this PR could be closed (overtaken by events).
Comment 14 OlivierW 2023-06-24 14:59:16 UTC
Hello,

I think I am hitting this bug, too.
I have one Bhyve VM on a x86-64 computer where Wireguard works fine and the other side of the tunnel is a VPS instance with an arm64 CPU.

The tunnel seemed to work reliably for a few days until I hit more traffic by using iperf3 or curl between both machines.

The VPS with the arm64 CPU rebooted each time.

The two machines are up to date with FreeBSD 13.2-RELEASE-p1.
Comment 15 Daniel Ponte 2023-09-06 22:31:15 UTC
Another vote for this bug. I have an oracle cloud arm64 13.2-RELEASE instance running wireguard. The backtrace:

#0 0xffff0000004fd02c at kdb_backtrace+0x60
#1 0xffff0000004a8328 at vpanic+0x13c
#2 0xffff0000004a81e8 at panic+0x44
#3 0xffff0000007f30ec at do_el1h_sync+0x194
#4 0xffff0000007d3010 at handle_el1h_sync+0x10
#5 0xffff0000006256c8 at fib4_lookup+0x3c
#6 0xffff00000063a904 at ip_output+0x9cc
#7 0xffff0000006746a4 at udp_send+0xa08
#8 0xffff000000557434 at sosend_dgram+0x494
#9 0xffff000000558364 at sosend+0x3c
#10 0xffff00015c8d0164 at wg_send+0xfc
#11 0xffff00015c8d2cf0 at wg_deliver_out+0x190
#12 0xffff0000004fb9d0 at gtaskqueue_run_locked+0x17c
#13 0xffff0000004fb528 at gtaskqueue_thread_loop+0x130
#14 0xffff00000045730c at fork_exit+0x88
#15 0xffff0000007f2dec at fork_trampoline+0x14

I am a fairly advanced user and this VM is not doing anything serious (it's a lemmy server). I am happy to assist in nailing this down.
Comment 16 Lexi Winter 2023-10-28 11:41:58 UTC
i'm seeing this fairly often (daily) under high load on an RPi4 running 13.2-RELEASE-p4 with mixed Wireguard and NFS traffic:


panic: vm_fault failed: 0
cpuid = 2
time = 1698460195
KDB: stack backtrace:
#0 0xffff0000004fd1b4 at kdb_backtrace+0x60
#1 0xffff0000004a84b0 at vpanic+0x13c
#2 0xffff0000004a8370 at panic+0x44
#3 0xffff0000007f42e0 at data_abort+0x200
#4 0xffff0000007d3010 at handle_el1h_sync+0x10
#5 0xffff000000625870 at fib4_lookup+0x3c
#6 0xffff00000063aaac at ip_output+0x9cc
#7 0xffff00000067484c at udp_send+0xa08
#8 0xffff0000005575bc at sosend_dgram+0x494
#9 0xffff0000005584ec at sosend+0x3c
#10 0xffff00000121f164 at wg_send+0xfc
#11 0xffff000001221cf0 at wg_deliver_out+0x190
#12 0xffff0000004fbb58 at gtaskqueue_run_locked+0x17c
#13 0xffff0000004fb6b0 at gtaskqueue_thread_loop+0x130
#14 0xffff000000457494 at fork_exit+0x88
#15 0xffff0000007f2dec at fork_trampoline+0x14
Uptime: 5h45m47s


#0  get_curthread () at /usr/src/sys/arm64/include/pcpu.h:77
#1  doadump (textdump=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:396
#2  0xffff0000004a7fc8 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:484
#3  0xffff0000004a853c in vpanic (fmt=<optimized out>, ap=...) at /usr/src/sys/kern/kern_shutdown.c:923
#4  0xffff0000004a8374 in panic (fmt=0x0) at /usr/src/sys/kern/kern_shutdown.c:847
#5  0xffff0000007f42e4 in data_abort (td=0xffffa00000cab000, frame=0xffff0000403ba2d0, esr=2248146948,
    far=<optimized out>, lower=0) at /usr/src/sys/arm64/arm64/trap.c:363
#6  <signal handler called>
#7  0x0000000000000000 in ?? ()
#8  0xffff000000625874 in fib4_lookup (fibnum=<optimized out>, dst=..., scopeid=0, flags=0, flowid=0)
    at /usr/src/sys/netinet/in_fib.c:120
#9  0xffff00000063aab0 in ip_output (m=<optimized out>, m@entry=0xffffa0004e699000, opt=<optimized out>,
    opt@entry=0x0, ro=0xffff0000403ba530, flags=<optimized out>, flags@entry=64, imo=0x0,
    inp=inp@entry=0xffffa000094cb9b0) at /usr/src/sys/netinet/ip_output.c:518
#10 0xffff000000674850 in udp_output (inp=0xffffa000094cb9b0, m=<optimized out>, addr=<optimized out>,
    control=<optimized out>, td=<optimized out>, flags=<optimized out>)
    at /usr/src/sys/netinet/udp_usrreq.c:1520
#11 udp_send (so=<optimized out>, flags=<optimized out>, m=<optimized out>, addr=<optimized out>,
    control=<optimized out>, td=<optimized out>) at /usr/src/sys/netinet/udp_usrreq.c:1784
#12 0xffff0000005575c0 in sosend_dgram (so=0xffffa000094ea760, addr=0xffff0000403ba7d0, uio=<optimized out>,
    top=0xffffa0004e699000, control=0xffffa000eaa91700, flags=0, td=0xffffa00000cab000)
    at /usr/src/sys/kern/uipc_socket.c:1496
#13 0xffff0000005584f0 in sosend (so=0x0, addr=0x6, addr@entry=0xffff0000403ba7d0, uio=0x0, top=0x0,
    top@entry=0xffffa0004e699000, control=control@entry=0xffffa000eaa91700, flags=156023216, td=0x0)
    at /usr/src/sys/kern/uipc_socket.c:1809
#14 0xffff00000121f168 in wg_send (sc=sc@entry=0xffffa00006648800, e=e@entry=0xffff0000403ba7d0,
    m=m@entry=0xffffa0004e699000) at /usr/src/sys/arm64/include/pcpu.h:77
#15 0xffff000001221cf4 in wg_deliver_out (peer=0xffffa0000638e800) at /usr/src/sys/dev/wg/if_wg.c:1658
#16 0xffff0000004fbb5c in gtaskqueue_run_locked (queue=queue@entry=0xffffa00000c1d900)
    at /usr/src/sys/kern/subr_gtaskqueue.c:371
#17 0xffff0000004fb6b4 in gtaskqueue_thread_loop (arg=<optimized out>, arg@entry=0xffff0000441c8038)
    at /usr/src/sys/kern/subr_gtaskqueue.c:547
#18 0xffff000000457498 in fork_exit (callout=0xffff0000004fb580 <gtaskqueue_thread_loop>,
    arg=0xffff0000441c8038, frame=0xffff0000403ba990) at /usr/src/sys/kern/kern_fork.c:1093
#19 <signal handler called>


this is not a production system, so i'm happy to help with any testing / diagnostics / proposed patches.
Comment 17 Kyle Evans freebsd_committer freebsd_triage 2024-03-07 18:06:57 UTC
Are you all using multiple fibs here, or just the one/default fib? I haven't been able to reproduce this on some Ampere gear yet.
Comment 18 Daniel Ponte 2024-03-07 18:07:59 UTC
I am using a single FIB.
Comment 19 Daniel Ponte 2024-03-07 20:04:27 UTC
I actually hadn't tested this since I upgraded to 14-RELEASE; I've been using wireguard-go since hitting this bug. I've just re-enabled if_wg and will report back if the crashes continue.
Comment 20 Lexi Winter 2024-03-08 06:00:36 UTC
i'm no longer running Wireguard on arm64, so i can't check, but i believe that system was using multiple FIBs -- i always configure Wireguard that way to avoid the issue with traffic to the tunnel endpoint being encapsulated over the tunnel.
Comment 21 OlivierW 2024-03-08 07:54:09 UTC
I was using a single FIB, but I don't have anymore arm64 machine.
Comment 22 Kyle Evans freebsd_committer freebsd_triage 2024-03-08 20:05:10 UTC
Created attachment 249037 [details]
git-diff against src

I'd appreciate it if someone could try to reproduce it, then try again with this patch and confirm if it mitigates it. Thanks.
Comment 23 Kyle Evans freebsd_committer freebsd_triage 2024-03-09 01:15:29 UTC
With the following reproducer:

```
#!/bin/sh

report() {
        1>&2 echo ">> ITERATION $iter DONE"
}

trap 'report' EXIT

iter=0
while true; do
        iperf3 -Zc 10.9.0.1 -p 5201 &
        sleep 1
        iperf3 -Zc 10.9.0.1 -p 9070 &
        wait
        iter=$((iter + 1))
        report
done
```

I can reliably hit this assertion within ~25-30 iterations, unpatched:

```
panic: ip_output: no mbuf packet header!
cpuid = 78
time = 1709946250
KDB: stack backtrace:
db_trace_self() at db_trace_self
db_trace_self_wrapper() at db_trace_self_wrapper+0x38
vpanic() at vpanic+0x1a4
panic() at panic+0x48
ip_output() at ip_output+0x13a8
udp_send() at udp_send+0x9dc
sosend_dgram() at sosend_dgram+0x30c
sosend() at sosend+0x48
wg_send() at wg_send+0x104
wg_deliver_out() at wg_deliver_out+0x20c
gtaskqueue_run_locked() at gtaskqueue_run_locked+0x16c
gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0xcc
fork_exit() at fork_exit+0x78
fork_trampoline() at fork_trampoline+0x18
KDB: enter: panic
```

This would support the theory of something going wrong back in wg_deliver_out() and dequeuing, likely insufficient barriers.  With the attached patch, I can push at least 125 with no problem; I'll go ahead and push this patch to phabricator for some review.
Comment 24 Daniel Ponte 2024-03-12 16:39:49 UTC
Looks like my machine crashed yesterday. I will try to get this applied today and see what happens.
Comment 25 Daniel Ponte 2024-03-13 14:01:20 UTC
Without this patch on 14-RELEASE, the reproducer crashed after 76 iterations. With it, no crash in 2073 iterations. I'd call it fixed.
Comment 26 Kyle Evans freebsd_committer freebsd_triage 2024-03-14 15:23:39 UTC
(In reply to Daniel Ponte from comment #25)

Excellent, thanks for confirming! I'll land this when I get a couple minutes at my laptop (hopefully later today) -- CC'ing secteam as an FYI, I intend to put in an EN request for this one since we do those for aarch64 now. I don't see any security impact, but it is a bit of a stability landmine.
Comment 27 commit-hook freebsd_committer freebsd_triage 2024-03-15 01:26:02 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=3705d679a6344c957cae7a1b6372a8bfb8c44f0e

commit 3705d679a6344c957cae7a1b6372a8bfb8c44f0e
Author:     Kyle Evans <kevans@FreeBSD.org>
AuthorDate: 2024-03-15 01:19:18 +0000
Commit:     Kyle Evans <kevans@FreeBSD.org>
CommitDate: 2024-03-15 01:19:21 +0000

    if_wg: use proper barriers around pkt->p_state

    Without appropriate load-synchronization to pair with store barriers in
    wg_encrypt() and wg_decrypt(), the compiler and hardware are often
    allowed to reorder these loads in wg_deliver_out() and wg_deliver_in()
    such that we end up with a garbage or intermediate mbuf that we try to
    pass on.  The issue is particularly prevalent with the weaker
    memory models of !x86 platforms.

    Switch from the big-hammer wmb() to more explicit acq/rel atomics to
    both make it obvious what we're syncing up with, and to avoid somewhat
    hefty fences on platforms that don't necessarily need this.

    With this patch, my dual-iperf3 reproducer is dramatically more stable
    than it is without on aarch64.

    PR:             264115
    MFC after:      1 week
    Reviewed by:    andrew, zlei
    Differential Revision:  https://reviews.freebsd.org/D44283

 sys/dev/wg/if_wg.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)
Comment 28 commit-hook freebsd_committer freebsd_triage 2024-03-22 18:41:02 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=806e51f81dbae21feb6e7ddd95d2ed2a28b04f8f

commit 806e51f81dbae21feb6e7ddd95d2ed2a28b04f8f
Author:     Kyle Evans <kevans@FreeBSD.org>
AuthorDate: 2024-03-15 01:19:18 +0000
Commit:     Kyle Evans <kevans@FreeBSD.org>
CommitDate: 2024-03-22 15:21:42 +0000

    if_wg: use proper barriers around pkt->p_state

    Without appropriate load-synchronization to pair with store barriers in
    wg_encrypt() and wg_decrypt(), the compiler and hardware are often
    allowed to reorder these loads in wg_deliver_out() and wg_deliver_in()
    such that we end up with a garbage or intermediate mbuf that we try to
    pass on.  The issue is particularly prevalent with the weaker
    memory models of !x86 platforms.

    Switch from the big-hammer wmb() to more explicit acq/rel atomics to
    both make it obvious what we're syncing up with, and to avoid somewhat
    hefty fences on platforms that don't necessarily need this.

    With this patch, my dual-iperf3 reproducer is dramatically more stable
    than it is without on aarch64.

    PR:             264115
    Reviewed by:    andrew, zlei

    (cherry picked from commit 3705d679a6344c957cae7a1b6372a8bfb8c44f0e)

 sys/dev/wg/if_wg.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)
Comment 29 commit-hook freebsd_committer freebsd_triage 2024-03-22 18:41:04 UTC
A commit in branch stable/14 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=590e02d3c088b220c19d53ce40a5aecc6fa099e4

commit 590e02d3c088b220c19d53ce40a5aecc6fa099e4
Author:     Kyle Evans <kevans@FreeBSD.org>
AuthorDate: 2024-03-15 01:19:18 +0000
Commit:     Kyle Evans <kevans@FreeBSD.org>
CommitDate: 2024-03-22 15:21:39 +0000

    if_wg: use proper barriers around pkt->p_state

    Without appropriate load-synchronization to pair with store barriers in
    wg_encrypt() and wg_decrypt(), the compiler and hardware are often
    allowed to reorder these loads in wg_deliver_out() and wg_deliver_in()
    such that we end up with a garbage or intermediate mbuf that we try to
    pass on.  The issue is particularly prevalent with the weaker
    memory models of !x86 platforms.

    Switch from the big-hammer wmb() to more explicit acq/rel atomics to
    both make it obvious what we're syncing up with, and to avoid somewhat
    hefty fences on platforms that don't necessarily need this.

    With this patch, my dual-iperf3 reproducer is dramatically more stable
    than it is without on aarch64.

    PR:             264115
    Reviewed by:    andrew, zlei

    (cherry picked from commit 3705d679a6344c957cae7a1b6372a8bfb8c44f0e)

 sys/dev/wg/if_wg.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)
Comment 30 Kyle Evans freebsd_committer freebsd_triage 2024-03-22 19:05:57 UTC
Created attachment 249416 [details]
Proposed EN template

Proposed EN verbiage -- ideally for 14.0 and 13.2 since the latter isn't EoL for another ~2 months.
Comment 31 Ed Maste freebsd_committer freebsd_triage 2024-03-25 18:19:38 UTC
(In reply to Kyle Evans from comment #30)
Thanks Kyle, this change should be included with the next group of SAs or ENs.
Comment 32 commit-hook freebsd_committer freebsd_triage 2024-03-28 05:07:27 UTC
A commit in branch releng/14.0 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=56be7cd84447d9bc18b5b9e467eb976044194bbe

commit 56be7cd84447d9bc18b5b9e467eb976044194bbe
Author:     Kyle Evans <kevans@FreeBSD.org>
AuthorDate: 2024-03-15 01:19:18 +0000
Commit:     Gordon Tetlow <gordon@FreeBSD.org>
CommitDate: 2024-03-28 03:12:41 +0000

    if_wg: use proper barriers around pkt->p_state

    Without appropriate load-synchronization to pair with store barriers in
    wg_encrypt() and wg_decrypt(), the compiler and hardware are often
    allowed to reorder these loads in wg_deliver_out() and wg_deliver_in()
    such that we end up with a garbage or intermediate mbuf that we try to
    pass on.  The issue is particularly prevalent with the weaker
    memory models of !x86 platforms.

    Switch from the big-hammer wmb() to more explicit acq/rel atomics to
    both make it obvious what we're syncing up with, and to avoid somewhat
    hefty fences on platforms that don't necessarily need this.

    With this patch, my dual-iperf3 reproducer is dramatically more stable
    than it is without on aarch64.

    PR:             264115
    Reviewed by:    andrew, zlei
    Approved by:    so
    Security:       FreeBSD-EN-24:06.wireguard

    (cherry picked from commit 3705d679a6344c957cae7a1b6372a8bfb8c44f0e)
    (cherry picked from commit 590e02d3c088b220c19d53ce40a5aecc6fa099e4)

 sys/dev/wg/if_wg.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)
Comment 33 commit-hook freebsd_committer freebsd_triage 2024-03-28 05:08:35 UTC
A commit in branch releng/13.2 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=8f1f4e60ceb9b8e5eddd54cf1fde62944f56eaa4

commit 8f1f4e60ceb9b8e5eddd54cf1fde62944f56eaa4
Author:     Kyle Evans <kevans@FreeBSD.org>
AuthorDate: 2024-03-15 01:19:18 +0000
Commit:     Gordon Tetlow <gordon@FreeBSD.org>
CommitDate: 2024-03-28 03:05:58 +0000

    if_wg: use proper barriers around pkt->p_state

    Without appropriate load-synchronization to pair with store barriers in
    wg_encrypt() and wg_decrypt(), the compiler and hardware are often
    allowed to reorder these loads in wg_deliver_out() and wg_deliver_in()
    such that we end up with a garbage or intermediate mbuf that we try to
    pass on.  The issue is particularly prevalent with the weaker
    memory models of !x86 platforms.

    Switch from the big-hammer wmb() to more explicit acq/rel atomics to
    both make it obvious what we're syncing up with, and to avoid somewhat
    hefty fences on platforms that don't necessarily need this.

    With this patch, my dual-iperf3 reproducer is dramatically more stable
    than it is without on aarch64.

    PR:             264115
    Reviewed by:    andrew, zlei
    Approved by:    so
    Security:       FreeBSD-EN-24:06.wireguard

    (cherry picked from commit 3705d679a6344c957cae7a1b6372a8bfb8c44f0e)
    (cherry picked from commit 806e51f81dbae21feb6e7ddd95d2ed2a28b04f8f)

 sys/dev/wg/if_wg.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)
Comment 34 commit-hook freebsd_committer freebsd_triage 2024-03-28 07:14:50 UTC
A commit in branch releng/13.3 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=f07351f90aa37d8fc1b86e96d76447eec884d237

commit f07351f90aa37d8fc1b86e96d76447eec884d237
Author:     Kyle Evans <kevans@FreeBSD.org>
AuthorDate: 2024-03-15 01:19:18 +0000
Commit:     Gordon Tetlow <gordon@FreeBSD.org>
CommitDate: 2024-03-28 07:13:08 +0000

    if_wg: use proper barriers around pkt->p_state

    Without appropriate load-synchronization to pair with store barriers in
    wg_encrypt() and wg_decrypt(), the compiler and hardware are often
    allowed to reorder these loads in wg_deliver_out() and wg_deliver_in()
    such that we end up with a garbage or intermediate mbuf that we try to
    pass on.  The issue is particularly prevalent with the weaker
    memory models of !x86 platforms.

    Switch from the big-hammer wmb() to more explicit acq/rel atomics to
    both make it obvious what we're syncing up with, and to avoid somewhat
    hefty fences on platforms that don't necessarily need this.

    With this patch, my dual-iperf3 reproducer is dramatically more stable
    than it is without on aarch64.

    PR:             264115
    Reviewed by:    andrew, zlei
    Approved by:    so
    Approved by:    re (so, implicit, appease the commit-hook)
    Security:       FreeBSD-EN-24:06.wireguard

    (cherry picked from commit 3705d679a6344c957cae7a1b6372a8bfb8c44f0e)
    (cherry picked from commit 806e51f81dbae21feb6e7ddd95d2ed2a28b04f8f)

 sys/dev/wg/if_wg.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)