Bug 271288 - FreeBSD 13.2; Alignment Fault ; virtio-net
Summary: FreeBSD 13.2; Alignment Fault ; virtio-net
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: arm (show other bugs)
Version: 13.2-RELEASE
Hardware: arm Any
: --- Affects Some People
Assignee: Warner Losh
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-05-06 20:42 UTC by martin
Modified: 2024-02-19 07:00 UTC (History)
2 users (show)

See Also:
linimon: mfc-stable13?


Attachments
testcase: offseting the m_data by constant to avoid unaligned data access (541 bytes, patch)
2023-05-09 22:01 UTC, martin
no flags Details | Diff
fixed the KASSERT check I did previously (577 bytes, patch)
2023-05-11 12:35 UTC, martin
no flags Details | Diff
apply the mbuf modification only if ARM arch is detected (602 bytes, patch)
2023-05-11 20:06 UTC, martin
no flags Details | Diff
offseting the m_data by constant to avoid unaligned data access (609 bytes, patch)
2023-05-11 22:50 UTC, martin
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description martin 2023-05-06 20:42:16 UTC
In qemu, using FreeBSD-13.2-RELEASE-arm-armv7-GENERICSD.img image, with out without virtio network system fails to boot and crashes on data alignment fault.

With virtio VM started as:

qemu-system-arm -M virt -m 2048 -nic tap -bios u-boot.bin -hda FreeBSD-13.2-RELEASE-arm-armv7-GENERICSD.img -s \
-device virtio-net-pci,netdev=network0 -netdev tap,id=network0,br=br0

Crashes the system during network initialization:

DHCPDISCOVER on vtnet0 to 255.255.255.255 port 67 interval 7
Fatal kernel mode data abort: 'Alignment Fault' on read
trapframe: 0xd5ec5a60
FSR=00000001, FAR=d8e4f01a, spsr=20000013
r0 =00000000, r1 =00000001, r2 =00000001, r3 =d5ec5b4c
r4 =00000014, r5 =d8dd4300, r6 =d8e4f02e, r7 =00000134
r8 =00000000, r9 =00000134, r10=d8e4f01a, r11=d5ec5b90
r12=4400ffff, ssp=d5ec5af0, slr=c04a9728, pc =c04a9750

panic: Fatal abort
cpuid = 0
time = 1680843368
KDB: stack backtrace:
#0 0xc035786c at kdb_backtrace+0x48
#1 0xc02fdd20 at vpanic+0x140
#2 0xc02fdbe0 at vpanic+0
#3 0xc06304ac at abort_align+0
#4 0xc063052c at abort_align+0x80
#5 0xc063017c at abort_handler+0x480
#6 0xc060f480 at exception_exit+0
#7 0xc04a9750 at udp_input+0x288
#8 0xc0473f54 at ip_input+0x1e0
#9 0xc04447c0 at netisr_dispatch_src+0xf8
#10 0xc043bf2c at ether_demux+0x1a4
#11 0xc043d5e4 at ether_nh_input+0x480
#12 0xc04447c0 at netisr_dispatch_src+0xf8
#13 0xc043c404 at ether_input+0x50
#14 0xc01c0838 at vtnet_rx_vq_process+0x880
#15 0xc01b70d0 at vtpci_intx_intr+0xac
#16 0xc02b87f0 at ithread_loop+0x2ec
#17 0xc02b465c at fork_exit+0xc0
Uptime: 49s

Quick check with gdb reveals:

Breakpoint 2, udp_input (mp=<optimized out>, offp=<optimized out>, proto=17) at /usr/src/sys/netinet/udp_usrreq.c:504
504	in /usr/src/sys/netinet/udp_usrreq.c
=> 0xc04a9750 <udp_input+648>:	03 00 9a e8	ldm	r10, {r0, r1}
   0xc04a9754 <udp_input+652>:	00 20 a0 e3	mov	r2, #0
   0xc04a9758 <udp_input+656>:	08 30 da e5	ldrb	r3, [r10, #8]

(gdb) i r $r10 $cpsr
r10            0xd8ebc01a          -655638502
cpsr           0x20000013          536870931
(gdb)

I simplified the setup and removed the tap interface from the qemu setup. Machine boots but crashes while I try to SSH to it:

login: Fatal kernel mode data abort: 'Alignment Fault' on read
trapframe: 0xd5ecea18
FSR=00000001, FAR=e1141032, spsr=60000013
r0 =00000014, r1 =0000003c, r2 =00000010, r3 =00000010
r4 =e114102e, r5 =d8e84200, r6 =00000028, r7 =00000000
r8 =00000014, r9 =00000000, r10=00000028, r11=d5eceb88
r12=d8e84200, ssp=d5eceaa8, slr=c04847c8, pc =c0482e58

panic: Fatal abort
cpuid = 0
time = 1680843499
KDB: stack backtrace:
#0 0xc035786c at kdb_backtrace+0x48
#1 0xc02fdd20 at vpanic+0x140
#2 0xc02fdbe0 at vpanic+0
#3 0xc06304ac at abort_align+0
#4 0xc063052c at abort_align+0x80
#5 0xc063017c at abort_handler+0x480
#6 0xc060f480 at exception_exit+0
#7 0xc0482e58 at tcp_input_with_port+0x4c4
#8 0xc04847c8 at tcp_input+0x10
#9 0xc0473f54 at ip_input+0x1e0
#10 0xc04447c0 at netisr_dispatch_src+0xf8
#11 0xc043bf2c at ether_demux+0x1a4
#12 0xc043d5e4 at ether_nh_input+0x480
#13 0xc04447c0 at netisr_dispatch_src+0xf8
#14 0xc043c404 at ether_input+0x50
#15 0xc01c0838 at vtnet_rx_vq_process+0x880
#16 0xc01b70d0 at vtpci_intx_intr+0xac
#17 0xc02b87f0 at ithread_loop+0x2ec
Uptime: 1m17s

Breakpoint 3, tcp_fields_to_host (th=0xe114402e) at /usr/src/sys/netinet/tcp_var.h:1126
1126	/usr/src/sys/netinet/tcp_var.h: No such file or directory.
=> 0xc0482e58 <tcp_input_with_port+1220>:	03 00 94 e9	ldmib	r4, {r0, r1}

(gdb) i r $r4 $cpsr
r4             0xe114402e          -518766546
cpsr           0x60000013          1610612755
(gdb)

It seems data structures are not aligned to their native boundaries.
Comment 1 martin 2023-05-07 20:32:42 UTC
In my first example when I checked the udp_input() argument, struct mbuf **mp:

(gdb) p **(struct mbuf**)0xd5ecebc8
...
...
  m_data = 0xd91e501a "E\020\001.",

m_data here is struct ip* , first being used in sys/netinet/udp_usrreq.c:432:

 432         ip = mtod(m, struct ip *);

Crash occurs later here:

 504                         bcopy(((struct ipovly *)ip)->ih_x1, b, 9);

I don't understand the process how the mbuf gets allocated and used later in the connection. It seems though address for the ip structure is not aligned (maybe system was in thumb mode?).

I wondered if this is a driver problem, in this case virtio-net. I changed my configuration to use realtek card instead:

qemu-system-arm -M virt -m 2G -hda FreeBSD-13.2-RELEASE-arm-armv7-GENERICSD.img -s -bios u-boot.bin \
-device rtl8139,netdev=network0 -netdev user,id=network

And system boots and can be used without problem (at least the network tests I did were ok).
Comment 2 martin 2023-05-08 12:51:31 UTC
According to ARM docs: https://developer.arm.com/documentation/ddi0406/c/Application-Level-Architecture/Application-Level-Memory-Model/Alignment-support/Unaligned-data-access

ldm instruction will always produce exception on unaligned data access.

One of the examples how to fix the access with the hint to compiler: https://developer.arm.com/documentation/ka003038/latest

Now for me it's interesting why is this happening only with virtio-net. With my realtek example m_data address is properly aligned.
Comment 3 martin 2023-05-09 15:41:51 UTC
I tried to track where the modification of m_data is happening that causes the issue. 

The mbuf in sys/sys/mbuf.h:

struct mbuf {
..
        caddr_t          m_data;        /* location of data */
..


sys/sys/types.h:

typedef char *          caddr_t;        /* core address */


The pointer to the m_data is being adjusted by m_adj() in sys/kern/uipc_mbuf.c:

     while (m != NULL && len > 0) {
                        if (m->m_len <= len) {
                                len -= m->m_len;
                                m->m_len = 0;
                                m = m->m_next;
                        } else {
                                m->m_len -= len;
                                m->m_data += len;
                                len = 0;
                        }
                }

In the crashing scenario with virtio it is adjusting the m_data by ETHER_HDR_LEN (14) pointing to an address that is not mod by 4.

This is a problem later in usr/src/sys/netinet/udp_usrreq.c:409 where m_data is being accessed by ldm instruction.

Shouldn't this be caught by compiler? Or is it ok in eyes of the compiler due to the fact it points to char* (and not e.g. void*). ?
Comment 4 martin 2023-05-09 15:45:13 UTC
I forgot to mentioned that the modification is happening in ether_demux() at sys/net/if_ethersubr.c:

     /*
         * Reset layer specific mbuf flags to avoid confusing upper layers.
         * Strip off Ethernet header.
         */
        m->m_flags &= ~M_VLANTAG;
        m_clrprotoflags(m);
        m_adj(m, ETHER_HDR_LEN);

Which calls the m_adj(); after this call m_data is pointing to unaligned address.
Comment 5 martin 2023-05-09 22:01:47 UTC
Created attachment 242085 [details]
testcase: offseting the m_data by constant to avoid unaligned data access
Comment 6 martin 2023-05-09 22:03:07 UTC
I compared other working drivers (fxp, re) to virtio and I saw that the older ones did do a "magic" to offset the m_data by "constant" (FUDGE).

While I don't personally think it's right thing to do but as part of the test I did it - I patched the vtnet_rx_alloc_buf() and adjusted m_data by VTNET_ALIGNMENT_FUDGE. It's the same value as in fxp driver. ETHER_HDR_LEN+2 = 0x10 which makes it aligned properly.

I tested it in qemu with some traffic going on and so far so good.
Comment 7 martin 2023-05-11 12:35:22 UTC
Created attachment 242118 [details]
fixed the KASSERT check I did previously

Previous kassert I did didn't make sense. While it's probably not needed at all I did want to make sure I'm not subtracting more than I can.
Comment 8 martin 2023-05-11 20:06:28 UTC
Created attachment 242125 [details]
apply the mbuf modification only if ARM arch is detected

Modify this only when ARM architecture is detected.
Comment 9 martin 2023-05-11 22:50:14 UTC
Created attachment 242127 [details]
offseting the m_data by constant to avoid unaligned data access

Fix of the patch: set m_len to size if ARM arch is not detected.
Comment 10 Warner Losh freebsd_committer freebsd_triage 2023-12-20 20:49:18 UTC
https://reviews.freebsd.org/D43136 has what I think is a slightly better fix.
Comment 11 commit-hook freebsd_committer freebsd_triage 2023-12-21 04:21:48 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=9e6d11ce9a51d75ed6a94e180f2fb4e9188a2ba4

commit 9e6d11ce9a51d75ed6a94e180f2fb4e9188a2ba4
Author:     Warner Losh <imp@FreeBSD.org>
AuthorDate: 2023-12-20 19:09:09 +0000
Commit:     Warner Losh <imp@FreeBSD.org>
CommitDate: 2023-12-21 04:16:45 +0000

    vtnet: Adjust rx buffer so IP header 32-bit aligned

    Call madj(m, ETHER_ALIGN) to offset rx buffers when allocating them.
    This improves performance everywhere, and allows armv7 to work at all.

    PR:                     271288 (PR had a different fix than I wound up with)
    MFC After:              3 days
    Sponsored by:           Netflix
    Differential Revision:  https://reviews.freebsd.org/D43136

 sys/dev/virtio/network/if_vtnet.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
Comment 12 Mark Linimon freebsd_committer freebsd_triage 2023-12-27 12:03:41 UTC
^Triage: assgin to committer and set as optional mfc-stable13 reminder.
Comment 13 commit-hook freebsd_committer freebsd_triage 2024-01-29 05:14:43 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=3be59adbb5a2ae7600d46432d3bc82286e507e95

commit 3be59adbb5a2ae7600d46432d3bc82286e507e95
Author:     Warner Losh <imp@FreeBSD.org>
AuthorDate: 2024-01-29 05:08:55 +0000
Commit:     Warner Losh <imp@FreeBSD.org>
CommitDate: 2024-01-29 05:08:55 +0000

    vtnet: Adjust for ethernet alignment.

    If the header that we add to the packet's size is 0 % 4 and we're
    strictly aligning, then we need to adjust where we store the header so
    the packet that follows will have it's struct ip header properly
    aligned.  We do this on allocation (and when we check the length of the
    mbufs in the lro_nomrg case). We can't just adjust the clustersz in the
    softc, because it's also used to allocate the mbufs and it needs to be
    the proper size for that. Since we otherwise use the size of the mbuf
    (or sometimes the smaller size of the received packet) to compute how
    much we can buffer, this ensures no overflows. The 2 byte adjustment
    also does not affect how many packets we can receive in the lro_nomrg
    case.

    PR:                     271288
    Sponsored by:           Netflix
    Reviewed by:            bryanv
    Differential Revision:  https://reviews.freebsd.org/D43224

 sys/dev/virtio/network/if_vtnet.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)
Comment 14 martin 2024-02-03 19:13:10 UTC
Better fix is a better fix. 

Though getting no response for 9 months and then seeing patch "sponsored by netlifx" without any feedback ..  I can't say it's very motivational to bug hunt @ FreeBSD then..
Comment 15 commit-hook freebsd_committer freebsd_triage 2024-02-19 04:49:20 UTC
A commit in branch stable/14 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=a51aee0a9828238ebdf2f55448d97f1f00acc5d4

commit a51aee0a9828238ebdf2f55448d97f1f00acc5d4
Author:     Warner Losh <imp@FreeBSD.org>
AuthorDate: 2024-01-29 05:08:55 +0000
Commit:     Warner Losh <imp@FreeBSD.org>
CommitDate: 2024-02-19 04:47:16 +0000

    vtnet: Adjust for ethernet alignment.

    If the header that we add to the packet's size is 0 % 4 and we're
    strictly aligning, then we need to adjust where we store the header so
    the packet that follows will have it's struct ip header properly
    aligned.  We do this on allocation (and when we check the length of the
    mbufs in the lro_nomrg case). We can't just adjust the clustersz in the
    softc, because it's also used to allocate the mbufs and it needs to be
    the proper size for that. Since we otherwise use the size of the mbuf
    (or sometimes the smaller size of the received packet) to compute how
    much we can buffer, this ensures no overflows. The 2 byte adjustment
    also does not affect how many packets we can receive in the lro_nomrg
    case.

    PR:                     271288
    Sponsored by:           Netflix
    Reviewed by:            bryanv
    Differential Revision:  https://reviews.freebsd.org/D43224

    (cherry picked from commit 3be59adbb5a2ae7600d46432d3bc82286e507e95)

 sys/dev/virtio/network/if_vtnet.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)
Comment 16 commit-hook freebsd_committer freebsd_triage 2024-02-19 04:50:24 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=778eb6e1f6668864a6eddd2f6f9bcc2aa26344f3

commit 778eb6e1f6668864a6eddd2f6f9bcc2aa26344f3
Author:     Warner Losh <imp@FreeBSD.org>
AuthorDate: 2024-01-29 05:08:55 +0000
Commit:     Warner Losh <imp@FreeBSD.org>
CommitDate: 2024-02-19 04:46:43 +0000

    vtnet: Adjust for ethernet alignment.

    If the header that we add to the packet's size is 0 % 4 and we're
    strictly aligning, then we need to adjust where we store the header so
    the packet that follows will have it's struct ip header properly
    aligned.  We do this on allocation (and when we check the length of the
    mbufs in the lro_nomrg case). We can't just adjust the clustersz in the
    softc, because it's also used to allocate the mbufs and it needs to be
    the proper size for that. Since we otherwise use the size of the mbuf
    (or sometimes the smaller size of the received packet) to compute how
    much we can buffer, this ensures no overflows. The 2 byte adjustment
    also does not affect how many packets we can receive in the lro_nomrg
    case.

    PR:                     271288
    Sponsored by:           Netflix
    Reviewed by:            bryanv
    Differential Revision:  https://reviews.freebsd.org/D43224

    (cherry picked from commit 3be59adbb5a2ae7600d46432d3bc82286e507e95)

 sys/dev/virtio/network/if_vtnet.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)
Comment 17 commit-hook freebsd_committer freebsd_triage 2024-02-19 07:00:53 UTC
A commit in branch releng/13.3 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=fbbfcab6b850fd0125633cc87575b54f0929c1a8

commit fbbfcab6b850fd0125633cc87575b54f0929c1a8
Author:     Warner Losh <imp@FreeBSD.org>
AuthorDate: 2024-01-29 05:08:55 +0000
Commit:     Warner Losh <imp@FreeBSD.org>
CommitDate: 2024-02-19 06:59:40 +0000

    vtnet: Adjust for ethernet alignment.

    If the header that we add to the packet's size is 0 % 4 and we're
    strictly aligning, then we need to adjust where we store the header so
    the packet that follows will have it's struct ip header properly
    aligned.  We do this on allocation (and when we check the length of the
    mbufs in the lro_nomrg case). We can't just adjust the clustersz in the
    softc, because it's also used to allocate the mbufs and it needs to be
    the proper size for that. Since we otherwise use the size of the mbuf
    (or sometimes the smaller size of the received packet) to compute how
    much we can buffer, this ensures no overflows. The 2 byte adjustment
    also does not affect how many packets we can receive in the lro_nomrg
    case.

    PR:                     271288
    Sponsored by:           Netflix
    Reviewed by:            bryanv
    Differential Revision:  https://reviews.freebsd.org/D43224

    (cherry picked from commit 3be59adbb5a2ae7600d46432d3bc82286e507e95)
    (cherry picked from commit 778eb6e1f6668864a6eddd2f6f9bcc2aa26344f3)

    Approved-by: re (cperciva)

 sys/dev/virtio/network/if_vtnet.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)