In qemu, using FreeBSD-13.2-RELEASE-arm-armv7-GENERICSD.img image, with out without virtio network system fails to boot and crashes on data alignment fault. With virtio VM started as: qemu-system-arm -M virt -m 2048 -nic tap -bios u-boot.bin -hda FreeBSD-13.2-RELEASE-arm-armv7-GENERICSD.img -s \ -device virtio-net-pci,netdev=network0 -netdev tap,id=network0,br=br0 Crashes the system during network initialization: DHCPDISCOVER on vtnet0 to 255.255.255.255 port 67 interval 7 Fatal kernel mode data abort: 'Alignment Fault' on read trapframe: 0xd5ec5a60 FSR=00000001, FAR=d8e4f01a, spsr=20000013 r0 =00000000, r1 =00000001, r2 =00000001, r3 =d5ec5b4c r4 =00000014, r5 =d8dd4300, r6 =d8e4f02e, r7 =00000134 r8 =00000000, r9 =00000134, r10=d8e4f01a, r11=d5ec5b90 r12=4400ffff, ssp=d5ec5af0, slr=c04a9728, pc =c04a9750 panic: Fatal abort cpuid = 0 time = 1680843368 KDB: stack backtrace: #0 0xc035786c at kdb_backtrace+0x48 #1 0xc02fdd20 at vpanic+0x140 #2 0xc02fdbe0 at vpanic+0 #3 0xc06304ac at abort_align+0 #4 0xc063052c at abort_align+0x80 #5 0xc063017c at abort_handler+0x480 #6 0xc060f480 at exception_exit+0 #7 0xc04a9750 at udp_input+0x288 #8 0xc0473f54 at ip_input+0x1e0 #9 0xc04447c0 at netisr_dispatch_src+0xf8 #10 0xc043bf2c at ether_demux+0x1a4 #11 0xc043d5e4 at ether_nh_input+0x480 #12 0xc04447c0 at netisr_dispatch_src+0xf8 #13 0xc043c404 at ether_input+0x50 #14 0xc01c0838 at vtnet_rx_vq_process+0x880 #15 0xc01b70d0 at vtpci_intx_intr+0xac #16 0xc02b87f0 at ithread_loop+0x2ec #17 0xc02b465c at fork_exit+0xc0 Uptime: 49s Quick check with gdb reveals: Breakpoint 2, udp_input (mp=<optimized out>, offp=<optimized out>, proto=17) at /usr/src/sys/netinet/udp_usrreq.c:504 504 in /usr/src/sys/netinet/udp_usrreq.c => 0xc04a9750 <udp_input+648>: 03 00 9a e8 ldm r10, {r0, r1} 0xc04a9754 <udp_input+652>: 00 20 a0 e3 mov r2, #0 0xc04a9758 <udp_input+656>: 08 30 da e5 ldrb r3, [r10, #8] (gdb) i r $r10 $cpsr r10 0xd8ebc01a -655638502 cpsr 0x20000013 536870931 (gdb) I simplified the setup and removed the tap interface from the qemu setup. Machine boots but crashes while I try to SSH to it: login: Fatal kernel mode data abort: 'Alignment Fault' on read trapframe: 0xd5ecea18 FSR=00000001, FAR=e1141032, spsr=60000013 r0 =00000014, r1 =0000003c, r2 =00000010, r3 =00000010 r4 =e114102e, r5 =d8e84200, r6 =00000028, r7 =00000000 r8 =00000014, r9 =00000000, r10=00000028, r11=d5eceb88 r12=d8e84200, ssp=d5eceaa8, slr=c04847c8, pc =c0482e58 panic: Fatal abort cpuid = 0 time = 1680843499 KDB: stack backtrace: #0 0xc035786c at kdb_backtrace+0x48 #1 0xc02fdd20 at vpanic+0x140 #2 0xc02fdbe0 at vpanic+0 #3 0xc06304ac at abort_align+0 #4 0xc063052c at abort_align+0x80 #5 0xc063017c at abort_handler+0x480 #6 0xc060f480 at exception_exit+0 #7 0xc0482e58 at tcp_input_with_port+0x4c4 #8 0xc04847c8 at tcp_input+0x10 #9 0xc0473f54 at ip_input+0x1e0 #10 0xc04447c0 at netisr_dispatch_src+0xf8 #11 0xc043bf2c at ether_demux+0x1a4 #12 0xc043d5e4 at ether_nh_input+0x480 #13 0xc04447c0 at netisr_dispatch_src+0xf8 #14 0xc043c404 at ether_input+0x50 #15 0xc01c0838 at vtnet_rx_vq_process+0x880 #16 0xc01b70d0 at vtpci_intx_intr+0xac #17 0xc02b87f0 at ithread_loop+0x2ec Uptime: 1m17s Breakpoint 3, tcp_fields_to_host (th=0xe114402e) at /usr/src/sys/netinet/tcp_var.h:1126 1126 /usr/src/sys/netinet/tcp_var.h: No such file or directory. => 0xc0482e58 <tcp_input_with_port+1220>: 03 00 94 e9 ldmib r4, {r0, r1} (gdb) i r $r4 $cpsr r4 0xe114402e -518766546 cpsr 0x60000013 1610612755 (gdb) It seems data structures are not aligned to their native boundaries.
In my first example when I checked the udp_input() argument, struct mbuf **mp: (gdb) p **(struct mbuf**)0xd5ecebc8 ... ... m_data = 0xd91e501a "E\020\001.", m_data here is struct ip* , first being used in sys/netinet/udp_usrreq.c:432: 432 ip = mtod(m, struct ip *); Crash occurs later here: 504 bcopy(((struct ipovly *)ip)->ih_x1, b, 9); I don't understand the process how the mbuf gets allocated and used later in the connection. It seems though address for the ip structure is not aligned (maybe system was in thumb mode?). I wondered if this is a driver problem, in this case virtio-net. I changed my configuration to use realtek card instead: qemu-system-arm -M virt -m 2G -hda FreeBSD-13.2-RELEASE-arm-armv7-GENERICSD.img -s -bios u-boot.bin \ -device rtl8139,netdev=network0 -netdev user,id=network And system boots and can be used without problem (at least the network tests I did were ok).
According to ARM docs: https://developer.arm.com/documentation/ddi0406/c/Application-Level-Architecture/Application-Level-Memory-Model/Alignment-support/Unaligned-data-access ldm instruction will always produce exception on unaligned data access. One of the examples how to fix the access with the hint to compiler: https://developer.arm.com/documentation/ka003038/latest Now for me it's interesting why is this happening only with virtio-net. With my realtek example m_data address is properly aligned.
I tried to track where the modification of m_data is happening that causes the issue. The mbuf in sys/sys/mbuf.h: struct mbuf { .. caddr_t m_data; /* location of data */ .. sys/sys/types.h: typedef char * caddr_t; /* core address */ The pointer to the m_data is being adjusted by m_adj() in sys/kern/uipc_mbuf.c: while (m != NULL && len > 0) { if (m->m_len <= len) { len -= m->m_len; m->m_len = 0; m = m->m_next; } else { m->m_len -= len; m->m_data += len; len = 0; } } In the crashing scenario with virtio it is adjusting the m_data by ETHER_HDR_LEN (14) pointing to an address that is not mod by 4. This is a problem later in usr/src/sys/netinet/udp_usrreq.c:409 where m_data is being accessed by ldm instruction. Shouldn't this be caught by compiler? Or is it ok in eyes of the compiler due to the fact it points to char* (and not e.g. void*). ?
I forgot to mentioned that the modification is happening in ether_demux() at sys/net/if_ethersubr.c: /* * Reset layer specific mbuf flags to avoid confusing upper layers. * Strip off Ethernet header. */ m->m_flags &= ~M_VLANTAG; m_clrprotoflags(m); m_adj(m, ETHER_HDR_LEN); Which calls the m_adj(); after this call m_data is pointing to unaligned address.
Created attachment 242085 [details] testcase: offseting the m_data by constant to avoid unaligned data access
I compared other working drivers (fxp, re) to virtio and I saw that the older ones did do a "magic" to offset the m_data by "constant" (FUDGE). While I don't personally think it's right thing to do but as part of the test I did it - I patched the vtnet_rx_alloc_buf() and adjusted m_data by VTNET_ALIGNMENT_FUDGE. It's the same value as in fxp driver. ETHER_HDR_LEN+2 = 0x10 which makes it aligned properly. I tested it in qemu with some traffic going on and so far so good.
Created attachment 242118 [details] fixed the KASSERT check I did previously Previous kassert I did didn't make sense. While it's probably not needed at all I did want to make sure I'm not subtracting more than I can.
Created attachment 242125 [details] apply the mbuf modification only if ARM arch is detected Modify this only when ARM architecture is detected.
Created attachment 242127 [details] offseting the m_data by constant to avoid unaligned data access Fix of the patch: set m_len to size if ARM arch is not detected.
https://reviews.freebsd.org/D43136 has what I think is a slightly better fix.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=9e6d11ce9a51d75ed6a94e180f2fb4e9188a2ba4 commit 9e6d11ce9a51d75ed6a94e180f2fb4e9188a2ba4 Author: Warner Losh <imp@FreeBSD.org> AuthorDate: 2023-12-20 19:09:09 +0000 Commit: Warner Losh <imp@FreeBSD.org> CommitDate: 2023-12-21 04:16:45 +0000 vtnet: Adjust rx buffer so IP header 32-bit aligned Call madj(m, ETHER_ALIGN) to offset rx buffers when allocating them. This improves performance everywhere, and allows armv7 to work at all. PR: 271288 (PR had a different fix than I wound up with) MFC After: 3 days Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D43136 sys/dev/virtio/network/if_vtnet.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
^Triage: assgin to committer and set as optional mfc-stable13 reminder.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=3be59adbb5a2ae7600d46432d3bc82286e507e95 commit 3be59adbb5a2ae7600d46432d3bc82286e507e95 Author: Warner Losh <imp@FreeBSD.org> AuthorDate: 2024-01-29 05:08:55 +0000 Commit: Warner Losh <imp@FreeBSD.org> CommitDate: 2024-01-29 05:08:55 +0000 vtnet: Adjust for ethernet alignment. If the header that we add to the packet's size is 0 % 4 and we're strictly aligning, then we need to adjust where we store the header so the packet that follows will have it's struct ip header properly aligned. We do this on allocation (and when we check the length of the mbufs in the lro_nomrg case). We can't just adjust the clustersz in the softc, because it's also used to allocate the mbufs and it needs to be the proper size for that. Since we otherwise use the size of the mbuf (or sometimes the smaller size of the received packet) to compute how much we can buffer, this ensures no overflows. The 2 byte adjustment also does not affect how many packets we can receive in the lro_nomrg case. PR: 271288 Sponsored by: Netflix Reviewed by: bryanv Differential Revision: https://reviews.freebsd.org/D43224 sys/dev/virtio/network/if_vtnet.c | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+)
Better fix is a better fix. Though getting no response for 9 months and then seeing patch "sponsored by netlifx" without any feedback .. I can't say it's very motivational to bug hunt @ FreeBSD then..
A commit in branch stable/14 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=a51aee0a9828238ebdf2f55448d97f1f00acc5d4 commit a51aee0a9828238ebdf2f55448d97f1f00acc5d4 Author: Warner Losh <imp@FreeBSD.org> AuthorDate: 2024-01-29 05:08:55 +0000 Commit: Warner Losh <imp@FreeBSD.org> CommitDate: 2024-02-19 04:47:16 +0000 vtnet: Adjust for ethernet alignment. If the header that we add to the packet's size is 0 % 4 and we're strictly aligning, then we need to adjust where we store the header so the packet that follows will have it's struct ip header properly aligned. We do this on allocation (and when we check the length of the mbufs in the lro_nomrg case). We can't just adjust the clustersz in the softc, because it's also used to allocate the mbufs and it needs to be the proper size for that. Since we otherwise use the size of the mbuf (or sometimes the smaller size of the received packet) to compute how much we can buffer, this ensures no overflows. The 2 byte adjustment also does not affect how many packets we can receive in the lro_nomrg case. PR: 271288 Sponsored by: Netflix Reviewed by: bryanv Differential Revision: https://reviews.freebsd.org/D43224 (cherry picked from commit 3be59adbb5a2ae7600d46432d3bc82286e507e95) sys/dev/virtio/network/if_vtnet.c | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+)
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=778eb6e1f6668864a6eddd2f6f9bcc2aa26344f3 commit 778eb6e1f6668864a6eddd2f6f9bcc2aa26344f3 Author: Warner Losh <imp@FreeBSD.org> AuthorDate: 2024-01-29 05:08:55 +0000 Commit: Warner Losh <imp@FreeBSD.org> CommitDate: 2024-02-19 04:46:43 +0000 vtnet: Adjust for ethernet alignment. If the header that we add to the packet's size is 0 % 4 and we're strictly aligning, then we need to adjust where we store the header so the packet that follows will have it's struct ip header properly aligned. We do this on allocation (and when we check the length of the mbufs in the lro_nomrg case). We can't just adjust the clustersz in the softc, because it's also used to allocate the mbufs and it needs to be the proper size for that. Since we otherwise use the size of the mbuf (or sometimes the smaller size of the received packet) to compute how much we can buffer, this ensures no overflows. The 2 byte adjustment also does not affect how many packets we can receive in the lro_nomrg case. PR: 271288 Sponsored by: Netflix Reviewed by: bryanv Differential Revision: https://reviews.freebsd.org/D43224 (cherry picked from commit 3be59adbb5a2ae7600d46432d3bc82286e507e95) sys/dev/virtio/network/if_vtnet.c | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+)
A commit in branch releng/13.3 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=fbbfcab6b850fd0125633cc87575b54f0929c1a8 commit fbbfcab6b850fd0125633cc87575b54f0929c1a8 Author: Warner Losh <imp@FreeBSD.org> AuthorDate: 2024-01-29 05:08:55 +0000 Commit: Warner Losh <imp@FreeBSD.org> CommitDate: 2024-02-19 06:59:40 +0000 vtnet: Adjust for ethernet alignment. If the header that we add to the packet's size is 0 % 4 and we're strictly aligning, then we need to adjust where we store the header so the packet that follows will have it's struct ip header properly aligned. We do this on allocation (and when we check the length of the mbufs in the lro_nomrg case). We can't just adjust the clustersz in the softc, because it's also used to allocate the mbufs and it needs to be the proper size for that. Since we otherwise use the size of the mbuf (or sometimes the smaller size of the received packet) to compute how much we can buffer, this ensures no overflows. The 2 byte adjustment also does not affect how many packets we can receive in the lro_nomrg case. PR: 271288 Sponsored by: Netflix Reviewed by: bryanv Differential Revision: https://reviews.freebsd.org/D43224 (cherry picked from commit 3be59adbb5a2ae7600d46432d3bc82286e507e95) (cherry picked from commit 778eb6e1f6668864a6eddd2f6f9bcc2aa26344f3) Approved-by: re (cperciva) sys/dev/virtio/network/if_vtnet.c | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+)
^Triage: committed and MFCed.