Summary: | net/wireguard-kmod: Panics ARM64 (RockPro64) on FreeBSD 14-CURRENT (716fd348e01): panic: vm_fault failed: 0 error 1 | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Base System | Reporter: | Poul-Henning Kamp <phk> | ||||||
Component: | kern | Assignee: | Kyle Evans <kevans> | ||||||
Status: | Closed FIXED | ||||||||
Severity: | Affects Only Me | CC: | amigan, dch, decke, diizzy, emaste, franco, freebsd.bugs, kevans, lexi, olivierw1+bugzilla-freebsd, secteam, zarychtam | ||||||
Priority: | --- | Keywords: | crash, needs-qa | ||||||
Version: | CURRENT | ||||||||
Hardware: | arm64 | ||||||||
OS: | Any | ||||||||
URL: | https://reviews.freebsd.org/D44283 | ||||||||
See Also: |
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264094 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264021 |
||||||||
Attachments: |
|
Description
Poul-Henning Kamp
2022-05-21 06:27:31 UTC
Package or ports? FreeBSD version would be useful. Please make sure that the kmod was build for exactly that kernel. FreeBSD version is literally in first line of this report. Kernel, userland and net/wireguard all freshly built. 716fd348e01 is from 19.05. but according to bug #264105 wireguard-kmod cannot be build since b46667c which is from 17.05. https://freshbsd.org/freebsd/src/commit/b46667c63eb7f126a56e23af1401d98d77b912e8 I fixed that locally by adding M_WAITOK. I forgot to add: This is very reproducible. Got an identical panic this night (This machine collects backups every night): Fatal data abort: x0: 0 x1: 6 x2: 0 x3: 0 x4: 0 x5: ffffa000bd0dc000 x6: ff x7: ffffa00000c2e100 x8: ffffa000b79305a8 x9: 0 x10: 20000 x11: 0 x12: 0 x13: 0 x14: ffffa00000ce3700 x15: 0 x16: ffff0000e5abf438 (_DYNAMIC + 490) x17: ffff000000589a44 (sosend + 0) x18: ffff0000e5ef0490 (ratelimit_v6 + 420e38) x19: 0 x20: ffff0000e5ef0518 (ratelimit_v6 + 420ec0) x21: 0 x22: 0 x23: 14 x24: 40 x25: ffff0000e5ef0538 (ratelimit_v6 + 420ee0) x26: 0 x27: 0 x28: ffffa000bd11533c x29: ffff0000e5ef0490 (ratelimit_v6 + 420e38) sp: ffff0000e5ef0490 lr: ffff000000657974 (fib4_lookup + 40) elr: 0 spsr: 60000045 far: 0 esr: 86000004 panic: vm_fault failed: 0 error 1 cpuid = 4 time = 1653197725 KDB: stack backtrace: db_trace_self() at db_trace_self db_trace_self_wrapper() at db_trace_self_wrapper+0x30 vpanic() at vpanic+0x174 panic() at panic+0x44 data_abort() at data_abort+0x2c4 handle_el1h_sync() at handle_el1h_sync+0x10 --- exception, esr 0x86000004 (null)() at 0 ip_output() at ip_output+0x9a4 udp_send() at udp_send+0xb5c sosend_dgram() at sosend_dgram+0x4a4 sosend() at sosend+0x2c wg_send() at wg_send+0x10c wg_deliver_out() at wg_deliver_out+0x17c gtaskqueue_run_locked() at gtaskqueue_run_locked+0x17c gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0x130 fork_exit() at fork_exit+0x88 fork_trampoline() at fork_trampoline+0x14 KDB: enter: panic [ thread pid 0 tid 100185 ] Stopped at kdb_enter+0x40: undefined f907827f db> Slightly different, but same backtrace: x0: 29d954f85865cc4b x1: 6 x2: 0 x3: 0 x4: 0 x5: ffffa000122186a0 x6: ff x7: ffffa00000c2e100 x8: ffffa0000faeb568 x9: d549cccb4d4ecc4d x10: 20000 x11: ff6c x12: 0 x13: 0 x14: 0 x15: 0 x16: ffff0000e5cbf438 (_DYNAMIC + 490) x17: ffff00000056b0c0 (sosend + 0) x18: ffff0000a5fea470 (_end + a509a470) x19: 0 x20: ffff0000a5fea4f8 (_end + a509a4f8) x21: 0 x22: 0 x23: 14 x24: 40 x25: ffff0000a5fea518 (_end + a509a518) x26: 0 x27: ffffa000a96a843c x28: ffff000000beb000 (vfs_smr + 0) x29: ffff0000a5fea470 (_end + a509a470) sp: ffff0000a5fea470 lr: ffff000000638f24 (fib4_lookup + 40) elr: d549cccb4d4ecc4d spsr: 60000045 far: d549cccb4d4ecc4d panic: Unknown kernel exception 22 esr_el1 8a000000 cpuid = 4 time = 1653244554 KDB: stack backtrace: db_trace_self() at db_trace_self db_trace_self_wrapper() at db_trace_self_wrapper+0x30 vpanic() at vpanic+0x178 panic() at panic+0x44 do_el1h_sync() at do_el1h_sync+0x194 handle_el1h_sync() at handle_el1h_sync+0x10 --- exception, esr 0x8a000000 (null)() at 0xcccb4d4ecc4d ip_output() at ip_output+0x994 udp_send() at udp_send+0xb2c sosend_dgram() at sosend_dgram+0x4ac sosend() at sosend+0x2c wg_send() at wg_send+0x10c wg_deliver_out() at wg_deliver_out+0x17c gtaskqueue_run_locked() at gtaskqueue_run_locked+0x17c gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0x130 fork_exit() at fork_exit+0x88 fork_trampoline() at fork_trampoline+0x14 KDB: enter: panic [ thread pid 0 tid 100167 ] Stopped at kdb_enter+0x44: undefined f907827f db> Maybe it's an issue introduced by Clang 14, please see bug 264021, bug 264094, https://lists.freebsd.org/archives/freebsd-arm/2022-May/001356.html, and probably more to follow. This issue is now also seen on FreeBSD-13.1-R: root@mick:~ # uname -a FreeBSD mick.freebsd.dk 13.1-RELEASE FreeBSD 13.1-RELEASE releng/13.1-n250148-fc952ac2212 GENERIC arm64 root@mick:~ # pkg info | grep wire wireguard-2,1 Meta-port for Wireguard wireguard-kmod-0.0.20220615 WireGuard implementation for the FreeBSD kernel wireguard-tools-1.0.20210914_1 Fast, modern and secure VPN Tunnel Can't really test any other version right now but I've been running the previous version for weeks without issue on 13.1-RELEASE wireguard-2,1 Meta-port for Wireguard wireguard-kmod-0.0.20220610_1 WireGuard implementation for the FreeBSD kernel wireguard-tools-1.0.20210914_1 Fast, modern and secure VPN Tunnel It seems a bit odd to me that it's always CPUID=4 since many applications/threads "core jumps" quite a bit. Do you have any boot/loader settings in place and/or running something more than just wireguard? Is it perhaps power and/or temperature related? I'm not using the built-in NIC but it doesn't seem related to that looking at the panic. FreeBSD 14.0-CURRENT main-n258920-f0a15aafcb8: Sun Oct 30 09:44:53 CET 2022 arm64 with if_wg.ko loaded (module from the base system) seems to work fine. FreeBSD 14.0-CURRENT #3 main-n259059-d56c7ac87f9 Sun Nov 6 10:37:24 CET 2022 running on ARMv7 also can load and make right use if_wg(4), so probably this PR could be closed (overtaken by events). Hello, I think I am hitting this bug, too. I have one Bhyve VM on a x86-64 computer where Wireguard works fine and the other side of the tunnel is a VPS instance with an arm64 CPU. The tunnel seemed to work reliably for a few days until I hit more traffic by using iperf3 or curl between both machines. The VPS with the arm64 CPU rebooted each time. The two machines are up to date with FreeBSD 13.2-RELEASE-p1. Another vote for this bug. I have an oracle cloud arm64 13.2-RELEASE instance running wireguard. The backtrace: #0 0xffff0000004fd02c at kdb_backtrace+0x60 #1 0xffff0000004a8328 at vpanic+0x13c #2 0xffff0000004a81e8 at panic+0x44 #3 0xffff0000007f30ec at do_el1h_sync+0x194 #4 0xffff0000007d3010 at handle_el1h_sync+0x10 #5 0xffff0000006256c8 at fib4_lookup+0x3c #6 0xffff00000063a904 at ip_output+0x9cc #7 0xffff0000006746a4 at udp_send+0xa08 #8 0xffff000000557434 at sosend_dgram+0x494 #9 0xffff000000558364 at sosend+0x3c #10 0xffff00015c8d0164 at wg_send+0xfc #11 0xffff00015c8d2cf0 at wg_deliver_out+0x190 #12 0xffff0000004fb9d0 at gtaskqueue_run_locked+0x17c #13 0xffff0000004fb528 at gtaskqueue_thread_loop+0x130 #14 0xffff00000045730c at fork_exit+0x88 #15 0xffff0000007f2dec at fork_trampoline+0x14 I am a fairly advanced user and this VM is not doing anything serious (it's a lemmy server). I am happy to assist in nailing this down. i'm seeing this fairly often (daily) under high load on an RPi4 running 13.2-RELEASE-p4 with mixed Wireguard and NFS traffic: panic: vm_fault failed: 0 cpuid = 2 time = 1698460195 KDB: stack backtrace: #0 0xffff0000004fd1b4 at kdb_backtrace+0x60 #1 0xffff0000004a84b0 at vpanic+0x13c #2 0xffff0000004a8370 at panic+0x44 #3 0xffff0000007f42e0 at data_abort+0x200 #4 0xffff0000007d3010 at handle_el1h_sync+0x10 #5 0xffff000000625870 at fib4_lookup+0x3c #6 0xffff00000063aaac at ip_output+0x9cc #7 0xffff00000067484c at udp_send+0xa08 #8 0xffff0000005575bc at sosend_dgram+0x494 #9 0xffff0000005584ec at sosend+0x3c #10 0xffff00000121f164 at wg_send+0xfc #11 0xffff000001221cf0 at wg_deliver_out+0x190 #12 0xffff0000004fbb58 at gtaskqueue_run_locked+0x17c #13 0xffff0000004fb6b0 at gtaskqueue_thread_loop+0x130 #14 0xffff000000457494 at fork_exit+0x88 #15 0xffff0000007f2dec at fork_trampoline+0x14 Uptime: 5h45m47s #0 get_curthread () at /usr/src/sys/arm64/include/pcpu.h:77 #1 doadump (textdump=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:396 #2 0xffff0000004a7fc8 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:484 #3 0xffff0000004a853c in vpanic (fmt=<optimized out>, ap=...) at /usr/src/sys/kern/kern_shutdown.c:923 #4 0xffff0000004a8374 in panic (fmt=0x0) at /usr/src/sys/kern/kern_shutdown.c:847 #5 0xffff0000007f42e4 in data_abort (td=0xffffa00000cab000, frame=0xffff0000403ba2d0, esr=2248146948, far=<optimized out>, lower=0) at /usr/src/sys/arm64/arm64/trap.c:363 #6 <signal handler called> #7 0x0000000000000000 in ?? () #8 0xffff000000625874 in fib4_lookup (fibnum=<optimized out>, dst=..., scopeid=0, flags=0, flowid=0) at /usr/src/sys/netinet/in_fib.c:120 #9 0xffff00000063aab0 in ip_output (m=<optimized out>, m@entry=0xffffa0004e699000, opt=<optimized out>, opt@entry=0x0, ro=0xffff0000403ba530, flags=<optimized out>, flags@entry=64, imo=0x0, inp=inp@entry=0xffffa000094cb9b0) at /usr/src/sys/netinet/ip_output.c:518 #10 0xffff000000674850 in udp_output (inp=0xffffa000094cb9b0, m=<optimized out>, addr=<optimized out>, control=<optimized out>, td=<optimized out>, flags=<optimized out>) at /usr/src/sys/netinet/udp_usrreq.c:1520 #11 udp_send (so=<optimized out>, flags=<optimized out>, m=<optimized out>, addr=<optimized out>, control=<optimized out>, td=<optimized out>) at /usr/src/sys/netinet/udp_usrreq.c:1784 #12 0xffff0000005575c0 in sosend_dgram (so=0xffffa000094ea760, addr=0xffff0000403ba7d0, uio=<optimized out>, top=0xffffa0004e699000, control=0xffffa000eaa91700, flags=0, td=0xffffa00000cab000) at /usr/src/sys/kern/uipc_socket.c:1496 #13 0xffff0000005584f0 in sosend (so=0x0, addr=0x6, addr@entry=0xffff0000403ba7d0, uio=0x0, top=0x0, top@entry=0xffffa0004e699000, control=control@entry=0xffffa000eaa91700, flags=156023216, td=0x0) at /usr/src/sys/kern/uipc_socket.c:1809 #14 0xffff00000121f168 in wg_send (sc=sc@entry=0xffffa00006648800, e=e@entry=0xffff0000403ba7d0, m=m@entry=0xffffa0004e699000) at /usr/src/sys/arm64/include/pcpu.h:77 #15 0xffff000001221cf4 in wg_deliver_out (peer=0xffffa0000638e800) at /usr/src/sys/dev/wg/if_wg.c:1658 #16 0xffff0000004fbb5c in gtaskqueue_run_locked (queue=queue@entry=0xffffa00000c1d900) at /usr/src/sys/kern/subr_gtaskqueue.c:371 #17 0xffff0000004fb6b4 in gtaskqueue_thread_loop (arg=<optimized out>, arg@entry=0xffff0000441c8038) at /usr/src/sys/kern/subr_gtaskqueue.c:547 #18 0xffff000000457498 in fork_exit (callout=0xffff0000004fb580 <gtaskqueue_thread_loop>, arg=0xffff0000441c8038, frame=0xffff0000403ba990) at /usr/src/sys/kern/kern_fork.c:1093 #19 <signal handler called> this is not a production system, so i'm happy to help with any testing / diagnostics / proposed patches. Are you all using multiple fibs here, or just the one/default fib? I haven't been able to reproduce this on some Ampere gear yet. I am using a single FIB. I actually hadn't tested this since I upgraded to 14-RELEASE; I've been using wireguard-go since hitting this bug. I've just re-enabled if_wg and will report back if the crashes continue. i'm no longer running Wireguard on arm64, so i can't check, but i believe that system was using multiple FIBs -- i always configure Wireguard that way to avoid the issue with traffic to the tunnel endpoint being encapsulated over the tunnel. I was using a single FIB, but I don't have anymore arm64 machine. Created attachment 249037 [details]
git-diff against src
I'd appreciate it if someone could try to reproduce it, then try again with this patch and confirm if it mitigates it. Thanks.
With the following reproducer: ``` #!/bin/sh report() { 1>&2 echo ">> ITERATION $iter DONE" } trap 'report' EXIT iter=0 while true; do iperf3 -Zc 10.9.0.1 -p 5201 & sleep 1 iperf3 -Zc 10.9.0.1 -p 9070 & wait iter=$((iter + 1)) report done ``` I can reliably hit this assertion within ~25-30 iterations, unpatched: ``` panic: ip_output: no mbuf packet header! cpuid = 78 time = 1709946250 KDB: stack backtrace: db_trace_self() at db_trace_self db_trace_self_wrapper() at db_trace_self_wrapper+0x38 vpanic() at vpanic+0x1a4 panic() at panic+0x48 ip_output() at ip_output+0x13a8 udp_send() at udp_send+0x9dc sosend_dgram() at sosend_dgram+0x30c sosend() at sosend+0x48 wg_send() at wg_send+0x104 wg_deliver_out() at wg_deliver_out+0x20c gtaskqueue_run_locked() at gtaskqueue_run_locked+0x16c gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0xcc fork_exit() at fork_exit+0x78 fork_trampoline() at fork_trampoline+0x18 KDB: enter: panic ``` This would support the theory of something going wrong back in wg_deliver_out() and dequeuing, likely insufficient barriers. With the attached patch, I can push at least 125 with no problem; I'll go ahead and push this patch to phabricator for some review. Looks like my machine crashed yesterday. I will try to get this applied today and see what happens. Without this patch on 14-RELEASE, the reproducer crashed after 76 iterations. With it, no crash in 2073 iterations. I'd call it fixed. (In reply to Daniel Ponte from comment #25) Excellent, thanks for confirming! I'll land this when I get a couple minutes at my laptop (hopefully later today) -- CC'ing secteam as an FYI, I intend to put in an EN request for this one since we do those for aarch64 now. I don't see any security impact, but it is a bit of a stability landmine. A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=3705d679a6344c957cae7a1b6372a8bfb8c44f0e commit 3705d679a6344c957cae7a1b6372a8bfb8c44f0e Author: Kyle Evans <kevans@FreeBSD.org> AuthorDate: 2024-03-15 01:19:18 +0000 Commit: Kyle Evans <kevans@FreeBSD.org> CommitDate: 2024-03-15 01:19:21 +0000 if_wg: use proper barriers around pkt->p_state Without appropriate load-synchronization to pair with store barriers in wg_encrypt() and wg_decrypt(), the compiler and hardware are often allowed to reorder these loads in wg_deliver_out() and wg_deliver_in() such that we end up with a garbage or intermediate mbuf that we try to pass on. The issue is particularly prevalent with the weaker memory models of !x86 platforms. Switch from the big-hammer wmb() to more explicit acq/rel atomics to both make it obvious what we're syncing up with, and to avoid somewhat hefty fences on platforms that don't necessarily need this. With this patch, my dual-iperf3 reproducer is dramatically more stable than it is without on aarch64. PR: 264115 MFC after: 1 week Reviewed by: andrew, zlei Differential Revision: https://reviews.freebsd.org/D44283 sys/dev/wg/if_wg.c | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=806e51f81dbae21feb6e7ddd95d2ed2a28b04f8f commit 806e51f81dbae21feb6e7ddd95d2ed2a28b04f8f Author: Kyle Evans <kevans@FreeBSD.org> AuthorDate: 2024-03-15 01:19:18 +0000 Commit: Kyle Evans <kevans@FreeBSD.org> CommitDate: 2024-03-22 15:21:42 +0000 if_wg: use proper barriers around pkt->p_state Without appropriate load-synchronization to pair with store barriers in wg_encrypt() and wg_decrypt(), the compiler and hardware are often allowed to reorder these loads in wg_deliver_out() and wg_deliver_in() such that we end up with a garbage or intermediate mbuf that we try to pass on. The issue is particularly prevalent with the weaker memory models of !x86 platforms. Switch from the big-hammer wmb() to more explicit acq/rel atomics to both make it obvious what we're syncing up with, and to avoid somewhat hefty fences on platforms that don't necessarily need this. With this patch, my dual-iperf3 reproducer is dramatically more stable than it is without on aarch64. PR: 264115 Reviewed by: andrew, zlei (cherry picked from commit 3705d679a6344c957cae7a1b6372a8bfb8c44f0e) sys/dev/wg/if_wg.c | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) A commit in branch stable/14 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=590e02d3c088b220c19d53ce40a5aecc6fa099e4 commit 590e02d3c088b220c19d53ce40a5aecc6fa099e4 Author: Kyle Evans <kevans@FreeBSD.org> AuthorDate: 2024-03-15 01:19:18 +0000 Commit: Kyle Evans <kevans@FreeBSD.org> CommitDate: 2024-03-22 15:21:39 +0000 if_wg: use proper barriers around pkt->p_state Without appropriate load-synchronization to pair with store barriers in wg_encrypt() and wg_decrypt(), the compiler and hardware are often allowed to reorder these loads in wg_deliver_out() and wg_deliver_in() such that we end up with a garbage or intermediate mbuf that we try to pass on. The issue is particularly prevalent with the weaker memory models of !x86 platforms. Switch from the big-hammer wmb() to more explicit acq/rel atomics to both make it obvious what we're syncing up with, and to avoid somewhat hefty fences on platforms that don't necessarily need this. With this patch, my dual-iperf3 reproducer is dramatically more stable than it is without on aarch64. PR: 264115 Reviewed by: andrew, zlei (cherry picked from commit 3705d679a6344c957cae7a1b6372a8bfb8c44f0e) sys/dev/wg/if_wg.c | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) Created attachment 249416 [details]
Proposed EN template
Proposed EN verbiage -- ideally for 14.0 and 13.2 since the latter isn't EoL for another ~2 months.
(In reply to Kyle Evans from comment #30) Thanks Kyle, this change should be included with the next group of SAs or ENs. A commit in branch releng/14.0 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=56be7cd84447d9bc18b5b9e467eb976044194bbe commit 56be7cd84447d9bc18b5b9e467eb976044194bbe Author: Kyle Evans <kevans@FreeBSD.org> AuthorDate: 2024-03-15 01:19:18 +0000 Commit: Gordon Tetlow <gordon@FreeBSD.org> CommitDate: 2024-03-28 03:12:41 +0000 if_wg: use proper barriers around pkt->p_state Without appropriate load-synchronization to pair with store barriers in wg_encrypt() and wg_decrypt(), the compiler and hardware are often allowed to reorder these loads in wg_deliver_out() and wg_deliver_in() such that we end up with a garbage or intermediate mbuf that we try to pass on. The issue is particularly prevalent with the weaker memory models of !x86 platforms. Switch from the big-hammer wmb() to more explicit acq/rel atomics to both make it obvious what we're syncing up with, and to avoid somewhat hefty fences on platforms that don't necessarily need this. With this patch, my dual-iperf3 reproducer is dramatically more stable than it is without on aarch64. PR: 264115 Reviewed by: andrew, zlei Approved by: so Security: FreeBSD-EN-24:06.wireguard (cherry picked from commit 3705d679a6344c957cae7a1b6372a8bfb8c44f0e) (cherry picked from commit 590e02d3c088b220c19d53ce40a5aecc6fa099e4) sys/dev/wg/if_wg.c | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) A commit in branch releng/13.2 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=8f1f4e60ceb9b8e5eddd54cf1fde62944f56eaa4 commit 8f1f4e60ceb9b8e5eddd54cf1fde62944f56eaa4 Author: Kyle Evans <kevans@FreeBSD.org> AuthorDate: 2024-03-15 01:19:18 +0000 Commit: Gordon Tetlow <gordon@FreeBSD.org> CommitDate: 2024-03-28 03:05:58 +0000 if_wg: use proper barriers around pkt->p_state Without appropriate load-synchronization to pair with store barriers in wg_encrypt() and wg_decrypt(), the compiler and hardware are often allowed to reorder these loads in wg_deliver_out() and wg_deliver_in() such that we end up with a garbage or intermediate mbuf that we try to pass on. The issue is particularly prevalent with the weaker memory models of !x86 platforms. Switch from the big-hammer wmb() to more explicit acq/rel atomics to both make it obvious what we're syncing up with, and to avoid somewhat hefty fences on platforms that don't necessarily need this. With this patch, my dual-iperf3 reproducer is dramatically more stable than it is without on aarch64. PR: 264115 Reviewed by: andrew, zlei Approved by: so Security: FreeBSD-EN-24:06.wireguard (cherry picked from commit 3705d679a6344c957cae7a1b6372a8bfb8c44f0e) (cherry picked from commit 806e51f81dbae21feb6e7ddd95d2ed2a28b04f8f) sys/dev/wg/if_wg.c | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) A commit in branch releng/13.3 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=f07351f90aa37d8fc1b86e96d76447eec884d237 commit f07351f90aa37d8fc1b86e96d76447eec884d237 Author: Kyle Evans <kevans@FreeBSD.org> AuthorDate: 2024-03-15 01:19:18 +0000 Commit: Gordon Tetlow <gordon@FreeBSD.org> CommitDate: 2024-03-28 07:13:08 +0000 if_wg: use proper barriers around pkt->p_state Without appropriate load-synchronization to pair with store barriers in wg_encrypt() and wg_decrypt(), the compiler and hardware are often allowed to reorder these loads in wg_deliver_out() and wg_deliver_in() such that we end up with a garbage or intermediate mbuf that we try to pass on. The issue is particularly prevalent with the weaker memory models of !x86 platforms. Switch from the big-hammer wmb() to more explicit acq/rel atomics to both make it obvious what we're syncing up with, and to avoid somewhat hefty fences on platforms that don't necessarily need this. With this patch, my dual-iperf3 reproducer is dramatically more stable than it is without on aarch64. PR: 264115 Reviewed by: andrew, zlei Approved by: so Approved by: re (so, implicit, appease the commit-hook) Security: FreeBSD-EN-24:06.wireguard (cherry picked from commit 3705d679a6344c957cae7a1b6372a8bfb8c44f0e) (cherry picked from commit 806e51f81dbae21feb6e7ddd95d2ed2a28b04f8f) sys/dev/wg/if_wg.c | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) |