Bug 271759 - if_ure(4) panic: unaligned access
Summary: if_ure(4) panic: unaligned access
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: arm (show other bugs)
Version: 13.2-STABLE
Hardware: arm Any
: --- Affects Some People
Assignee: freebsd-usb (Nobody)
URL: https://github.com/freebsd/freebsd-sr...
Keywords: crash
Depends on:
Blocks:
 
Reported: 2023-05-31 22:36 UTC by Vincent Milum Jr
Modified: 2023-08-06 08:00 UTC (History)
5 users (show)

See Also:


Attachments
Patch (1.64 KB, patch)
2023-06-02 08:46 UTC, Hans Petter Selasky
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Vincent Milum Jr 2023-05-31 22:36:54 UTC
https://github.com/freebsd/freebsd-src/blob/main/sys/dev/usb/net/if_ure.c#LL616C1-L617C24

ure_makembuf() has the following code:

	/* uether_newbuf does this. */
	m_adj(m, ETHER_ALIGN);

based on that comment, it looks like someone copied m_adj() over because the base usb ethernet driver has it, without really testing it or setting up the mbuf correctly.

when m_adj is called here, m->m_len is 0 at this point so the call to m_adj() does nothing.

because of this, the data is never shifted by 2 bytes, causing an unaligned access to the IP headers when byte swapping later in the stack when converting from network-order to host-order, causing a panic on 32-bit ARM systems.


NOTE: virtio-net has a similar unaligned access issue, however that driver isn't calling m_adj() at all it looks like. This is referenced over in: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=271288
Comment 1 Vincent Milum Jr 2023-05-31 22:39:06 UTC
removing the call to

m_adj(m, ETHER_ALIGN);

and instead doing

m->m_data += ETHER_ALIGN;

allows the driver to compile and function, but I know this is wrong, probably causing a buffer overflow.

someone with more knowledge of mbuf allocation and management will probably need to take a look at both of these drivers mentioned.
Comment 2 Mark Millard 2023-06-01 01:48:44 UTC
https://cgit.freebsd.org/src/commit/sys/dev/usb/net/if_ure.c?id=7d5522e16a2970ade4e79e647842841cd32a19e1

is the commit that adds the code:

author		John-Mark Gurney <jmg@FreeBSD.org>	2020-09-12 00:33:11 +0000
committer	John-Mark Gurney <jmg@FreeBSD.org>	2020-09-12 00:33:11 +0000
. . .
A major update to the ure driver.
This update adds support for:
HW VLAN tagging
HW checksum offload for IPv4 and IPv6
tx and rx aggreegation (for full gige speeds)
multiple transactions

In my testing, I am able to get 900-950Mbps depending upon
TCP or UDP, which is a significant improvement over the previous
91Mbps (~8kint/sec*1500bytes/packet*1packet/int).

Reviewed by:	hselasky
MFC after:	2 months
Differential Revision:	https://reviews.freebsd.org/D25809

The reviewer (hselasky) would also be someone good to contact
for questions.
Comment 3 Mike Karels freebsd_committer freebsd_triage 2023-06-01 02:58:36 UTC
(In reply to Vincent Milum Jr from comment #1)
This change should not cause a buffer overflow, because the allocation just above uses len + ETHER_ALIGN, leaving len left after the m_adj
Comment 4 Hans Petter Selasky freebsd_committer freebsd_triage 2023-06-01 08:41:33 UTC
Hi,

I think the right fix is to change:

	/* uether_newbuf does this. */
	m_adj(m, ETHER_ALIGN);

	m->m_pkthdr.len = len;

Into:

	m->m_pkthdr.len = len + ETHER_ALIGN;

	/* ensure proper alignment of IP header */
	m_adj(m, ETHER_ALIGN);


Can you test that?

Using m_adj() is more clean than using m_data +=, because m_len should also be updated!

--HPS
Comment 5 Hans Petter Selasky freebsd_committer freebsd_triage 2023-06-01 08:42:18 UTC
Can you also check of other USB ethernet drivers have the same pattern?

--HPS
Comment 6 Mike Karels freebsd_committer freebsd_triage 2023-06-01 12:24:29 UTC
(In reply to Hans Petter Selasky from comment #4)
I don't think that setting the pkthdr len will help; m_adj is using the individual m_len values, and only adjusting the pkthdr len.
Comment 7 Hans Petter Selasky freebsd_committer freebsd_triage 2023-06-01 17:41:23 UTC
Or set both?

m_adj() has a check for the packet header flag.

--HPS
Comment 8 Mark Millard 2023-06-01 19:22:33 UTC
So, what steps to I take to demonstrate the crash?

Context: RPi2B v1.1 (so: armv7), with an EtherNet USB dongle,

# uname -apKU
FreeBSD generic 14.0-CURRENT FreeBSD 14.0-CURRENT #0 main-n263002-743516d51fa7: Thu May 18 11:11:33 UTC 2023     root@releng1.nyi.freebsd.org:/usr/obj/usr/src/arm.armv7/sys/GENERIC arm armv7 1400088 1400088

# usbconfig show_ifdrv

. . .
ugen1.4: <Realtek USB 10/100/1000 LAN> at usbus1, cfg=0 md=HOST spd=HIGH (480Mbps) pwr=ON (200mA)
ugen1.4.0: ure0: <Realtek USB 10/100/1000 LAN, class 0/0, rev 2.10/30.00, addr 4>
. . .

So far it seems to be working fine.
Comment 9 Mark Millard 2023-06-01 19:25:00 UTC
(In reply to Mark Millard from comment #8)

Already fixed in main/CURRENT? Some other difference in stable/13 ?

I do not have a stable/13 context handy.
Comment 10 Vincent Milum Jr 2023-06-01 19:41:13 UTC
In my case, I've tried both 13.1-RELEASE and 13.2-RELEASE on multiple RPi 1 and 2 boards. They all consistently crash with a simple "fetch" request on the command line.
Comment 11 Mark Millard 2023-06-01 20:12:36 UTC
(In reply to Vincent Milum Jr from comment #10)

I decided that I was going to replace the armv7 main snapshot
I've been using anyway, so I tried stable/13 and then main
with the specific activity of fetch involved for today's
snapshots. Both fail. Fetch works well as a test case.

Details:

# uname -apKU
FreeBSD generic 13.2-STABLE FreeBSD 13.2-STABLE stable/13-n255472-cf3a76018cad GENERIC arm armv7 1302505 1302505

# fetch http://ftp3.freebsd.org/pub/FreeBSD/snapshots/ISO-IMAGES/13.2/CHECKSUM.SHA256-FreeBSD-13.2-STABLE-arm-armv7-GENERICSD-20230601-cf3a76018cad-255472
Fatal kernel mode data abort: 'Alignment Fault' on read
. . .

Then:

# uname -apKU
FreeBSD generic 14.0-CURRENT FreeBSD 14.0-CURRENT #0 main-n263302-4f2cc73f34eb: Thu Jun  1 11:13:49 UTC 2023     root@releng1.nyi.freebsd.org:/usr/obj/usr/src/arm.armv7/sys/GENERIC arm armv7 1400089 1400089

# fetch http://ftp3.freebsd.org/pub/FreeBSD/snapshots/ISO-IMAGES/14.0/CHECKSUM.SHA256-FreeBSD-14.0-CURRENT-arm-armv7-GENERICSD-20230601-4f2cc73f34eb-263302
Fatal kernel mode data abort: 'Alignment Fault' on read
. . .
Comment 12 Mark Millard 2023-06-01 21:04:07 UTC
I tried the OrangePi+ 2e with the same boot media
and EtherNet dongle (instead of its builtin EtherNet
port) and got a somewhat different result: An initial
"Kernel page fault with the following non-sleepable
locks held" before the 'Alignment Fault' on read:

. . .
login: root
Password:Kernel page fault with the following non-sleepable locks held:
shared rm in6_ifaddr_lock (in6_ifaddr_lock) r = 0 (0xc0b5acd0) locked @ /usr/src/sys/netinet6/in6.c:1603
stack backtrace:
#0 0xc0360944 at witness_debugger+0x74
#1 0xc0361bd8 at witness_warn+0x424
#2 0xc061382c at abort_handler+0x1d8
#3 0xc05f22fc at exception_exit+0
#4 0xc049b224 at in6ifa_ifwithaddr+0x40
#5 0xc04acdd0 at ip6_input+0xd54
#6 0xc0425d34 at netisr_dispatch_src+0x100
#7 0xc041ca9c at ether_demux+0x1bc
#8 0xc041e28c at ether_nh_input+0x3dc
#9 0xc0425d34 at netisr_dispatch_src+0x100
#10 0xc041cf24 at ether_input+0xf0
#11 0xc01aab34 at uether_rxflush+0x8c
#12 0xe0813018 at $a.10+0x4d4
#13 0xc01a0ea4 at usbd_callback_wrapper+0x7e0
#14 0xc01a21b8 at usb_command_wrapper+0x128
#15 0xc01a1038 at usb_callback_proc+0x90
#16 0xc019bd40 at usb_process+0x10c
#17 0xc02a5388 at fork_exit+0xa0
Fatal kernel mode data abort: 'Alignment Fault' on read
trapframe: 0xc708ca50
FSR=00000001, FAR=e032e776, spsr=00000013
r0 =da692000, r1 =00000001, r2 =ffffffff, r3 =c0b2b548
r4 =00000000, r5 =00000000, r6 =e032e776, r7 =e032e766
r8 =c091af44, r9 =00000000, r10=e0658c00, r11=c708cb10
r12=00000000, ssp=c708cae0, slr=c02e330c, pc =c049b224
. . .
Comment 13 Mark Millard 2023-06-01 21:18:53 UTC
(In reply to Mark Millard from comment #12)

Looks like that failure is associated with my having added to
/etc/rc.conf :

ntpd_enable="YES"
ntpd_sync_on_start="YES"
ntpd_user="root"

No login attempt is required to get the failure. Just
leave it at the login prompt and wait a little bit.
Comment 14 Mike Karels freebsd_committer freebsd_triage 2023-06-02 02:17:45 UTC
(In reply to Hans Petter Selasky from comment #7)
m_len on the first mbuf could be set to ETHER_ALIGN, and pkthdr.len as you suggested.  I'm assuming the mbuf lengths are set as they are filled in, as they are apparently zero to start.
Comment 15 Hans Petter Selasky freebsd_committer freebsd_triage 2023-06-02 08:46:18 UTC
Created attachment 242554 [details]
Patch

I looked at the code and propose the attached solution.

Please test and review!
Comment 16 Hans Petter Selasky freebsd_committer freebsd_triage 2023-06-02 08:47:05 UTC
markj@ : Can you have a quick look at the patch attached to this issue?
Comment 17 Hans Petter Selasky freebsd_committer freebsd_triage 2023-06-02 08:51:18 UTC
Looking at the other USB ethernet drivers, this was the only place I could find, needing to be patched.
Comment 18 Vincent Milum Jr 2023-06-02 23:57:07 UTC
(In reply to Hans Petter Selasky from comment #15)

Looking at what m_adj does, it doesn't move the data around, it only adjusts the pointer to the data. That won't fix alignment, it only chops off data from the front of the buffer.

m_adj needs to be called before data is put into the buffer, but after the length of the buffer is declared. that's at least the impression im getting looking over all these source files and how other drivers are doing it.

https://github.com/freebsd/freebsd-src/blob/8dad5ece49479ba6cdcd5bb4c2799bbd61add3e6/sys/kern/uipc_mbuf.c#L824
Comment 19 Mark Millard 2023-06-03 02:03:37 UTC
May be it would be a good idiea to have an actual backtrace
showing an example of where the fetch style test fails
( tcp_input_with_port+0x650 ):

# fetch http://ftp3.freebsd.org/pub/FreeBSD/snapshots/ISO-IMAGES/14.0/CHECKSUM.SHA256-FreeBSD-14.0-CURRENT-arm-armv7-GENERICSD-20230601-4f2cc73f34eb-263302
Fatal kernel mode data abort: 'Alignment Fault' on read
trapframe: 0xc70c89f0
FSR=00000001, FAR=e0326d76, spsr=60000013
r0 =000000a0, r1 =00000014, r2 =00000000, r3 =c0b5df40
r4 =e0326d72, r5 =00000028, r6 =00000020, r7 =e0326d5e
r8 =00000028, r9 =00000014, r10=00000000, r11=c70c8b88
r12=e0326d00, ssp=c70c8a80, slr=c0467218, pc =c0467834

panic: Fatal abort
cpuid = 3
time = 1262304178
KDB: stack backtrace:
db_trace_self() at db_trace_self
         pc = 0xc05ef9a4  lr = 0xc007a694 (db_trace_self_wrapper+0x30)
         sp = 0xc70c87a8  fp = 0xc70c88c0
db_trace_self_wrapper() at db_trace_self_wrapper+0x30
         pc = 0xc007a694  lr = 0xc02eb598 (vpanic+0x140)
         sp = 0xc70c88c8  fp = 0xc70c88e8
         r4 = 0x00000100  r5 = 0x00000000
         r6 = 0xc075c866  r7 = 0xc0aeaeb0
vpanic() at vpanic+0x140
         pc = 0xc02eb598  lr = 0xc02eb378 (doadump)
         sp = 0xc70c88f0  fp = 0xc70c88f4
         r4 = 0xc70c89f0  r5 = 0x00000013
         r6 = 0xe0326d76  r7 = 0x00000001
         r8 = 0x00000001  r9 = 0xda6cb000
        r10 = 0xe0326d76
doadump() at doadump
         pc = 0xc02eb378  lr = 0xc0613e58 (abort_align)
         sp = 0xc70c88fc  fp = 0xc70c8928
         r4 = 0xe0326d76  r5 = 0xc70c88f4
         r6 = 0xc02eb378 r10 = 0xc70c88fc
abort_align() at abort_align
         pc = 0xc0613e58  lr = 0xc0613ecc (abort_align+0x74)
         sp = 0xc70c8930  fp = 0xc70c8948
         r4 = 0x00000013 r10 = 0xe0326d76
abort_align() at abort_align+0x74
         pc = 0xc0613ecc  lr = 0xc0613aec (abort_handler+0x498)
         sp = 0xc70c8950  fp = 0xc70c89e8
         r4 = 0x00000000 r10 = 0xe0326d76
abort_handler() at abort_handler+0x498
         pc = 0xc0613aec  lr = 0xc05f22fc (exception_exit)
         sp = 0xc70c89f0  fp = 0xc70c8b88
         r4 = 0xe0326d72  r5 = 0x00000028
         r6 = 0x00000020  r7 = 0xe0326d5e
         r8 = 0x00000028  r9 = 0x00000014
        r10 = 0x00000000
exception_exit() at exception_exit
         pc = 0xc05f22fc  lr = 0xc0467218 (tcp_input_with_port+0x34)
         sp = 0xc70c8a80  fp = 0xc70c8b88
         r0 = 0x000000a0  r1 = 0x00000014
         r2 = 0x00000000  r3 = 0xc0b5df40
         r4 = 0xe0326d72  r5 = 0x00000028
         r6 = 0x00000020  r7 = 0xe0326d5e
         r8 = 0x00000028  r9 = 0x00000014
        r10 = 0x00000000 r12 = 0xe0326d00
tcp_input_with_port() at tcp_input_with_port+0x650
         pc = 0xc0467834  lr = 0xc046923c (tcp_input+0x10)
         sp = 0xc70c8b90  fp = 0xc70c8b90
         r4 = 0x00000004  r5 = 0x00000000
         r6 = 0xda96a800  r7 = 0xc0b5df40
         r8 = 0xe0326d5e  r9 = 0xc094cd70
        r10 = 0x00000014
tcp_input() at tcp_input+0x10
         pc = 0xc046923c  lr = 0xc0455eac (ip_input+0x454)
         sp = 0xc70c8b98  fp = 0xc70c8bd8
ip_input() at ip_input+0x454
         pc = 0xc0455eac  lr = 0xc0425d34 (netisr_dispatch_src+0x100)
         sp = 0xc70c8be0  fp = 0xc70c8c08
         r4 = 0x00000005  r5 = 0xe0326d00
         r6 = 0x00000000  r7 = 0xc0b5a2f8
         r8 = 0x00000000  r9 = 0xc5779a00
        r10 = 0x00000008
netisr_dispatch_src() at netisr_dispatch_src+0x100
         pc = 0xc0425d34  lr = 0xc041ca9c (ether_demux+0x1bc)
         sp = 0xc70c8c10  fp = 0xc70c8c28
         r4 = 0xe0326d00  r5 = 0x00000001
         r6 = 0xda965000  r7 = 0x5e4a6f28
         r8 = 0x00000000  r9 = 0xc5779a00
        r10 = 0x00000008
ether_demux() at ether_demux+0x1bc
         pc = 0xc041ca9c  lr = 0xc041e28c (ether_nh_input+0x3dc)
         sp = 0xc70c8c30  fp = 0xc70c8c58
         r4 = 0xda965000  r5 = 0xe0326d00
         r6 = 0xe0326d50 r10 = 0x00000008
ether_nh_input() at ether_nh_input+0x3dc
         pc = 0xc041e28c  lr = 0xc0425d34 (netisr_dispatch_src+0x100)
         sp = 0xc70c8c60  fp = 0xc70c8c88
         r4 = 0x0000000d  r5 = 0xe0326d00
         r6 = 0x00000000  r7 = 0xc0b5a378
         r8 = 0x5e4a6f28  r9 = 0x00000000
        r10 = 0x00000000
netisr_dispatch_src() at netisr_dispatch_src+0x100
         pc = 0xc0425d34  lr = 0xc041cf24 (ether_input+0xf0)
         sp = 0xc70c8c90  fp = 0xc70c8cc8
         r4 = 0xda965000  r5 = 0x00000000
         r6 = 0xe0326d00  r7 = 0x00000000
         r8 = 0x5e4a6f28  r9 = 0x00000000
        r10 = 0x00000000
ether_input() at ether_input+0xf0
         pc = 0xc041cf24  lr = 0xc01aab34 (uether_rxflush+0x8c)
         sp = 0xc70c8cd0  fp = 0xc70c8d00
         r4 = 0xdb63cc00  r5 = 0xe0326d00
         r6 = 0x00000000  r7 = 0x00000000
         r8 = 0xda965000  r9 = 0xc094440c
        r10 = 0xda965000
uether_rxflush() at uether_rxflush+0x8c
         pc = 0xc01aab34  lr = 0xe0713018 ($a.10+0x4d4)
         sp = 0xc70c8d08  fp = 0xc70c8d80
         r4 = 0xe072b534  r5 = 0xe05e7f60
         r6 = 0x00000068  r7 = 0x01000000
         r8 = 0xe05e71c0  r9 = 0xdb63cc00
$a.10() at $a.10+0x4d4
         pc = 0xe0713018  lr = 0xc01a0ea4 (usbd_callback_wrapper+0x7e0)
         sp = 0xc70c8d88  fp = 0xc70c8dc0
         r4 = 0xdb7dac00  r5 = 0xe05e7273
         r6 = 0xe05e71c0  r7 = 0xe05e7000
         r8 = 0xda6d2c78  r9 = 0xe05e7030
        r10 = 0xc09670bc
usbd_callback_wrapper() at usbd_callback_wrapper+0x7e0
         pc = 0xc01a0ea4  lr = 0xc01a21b8 (usb_command_wrapper+0x128)
         sp = 0xc70c8dc8  fp = 0xc70c8de0
         r4 = 0xe05e7030  r5 = 0xc09670bc
         r6 = 0xc0798d04  r7 = 0xc0747886
         r8 = 0x00000000  r9 = 0xc07ab12a
        r10 = 0xda6d2cec
usb_command_wrapper() at usb_command_wrapper+0x128
         pc = 0xc01a21b8  lr = 0xc01a1038 (usb_callback_proc+0x90)
         sp = 0xc70c8de8  fp = 0xc70c8df0
         r4 = 0xe05e7000  r5 = 0xda6d2ce4
         r6 = 0xe05e7044  r7 = 0xc076ab50
         r8 = 0x00000000  r9 = 0xc096711c
usb_callback_proc() at usb_callback_proc+0x90
         pc = 0xc01a1038  lr = 0xc019bd40 (usb_process+0x10c)
         sp = 0xc70c8df8  fp = 0xc70c8e18
         r4 = 0xda6d2cdc r10 = 0xda6d2cec
usb_process() at usb_process+0x10c
         pc = 0xc019bd40  lr = 0xc02a5388 (fork_exit+0xa0)
         sp = 0xc70c8e20  fp = 0xc70c8e38
         r4 = 0xda6cb000  r5 = 0xd79d7740
         r6 = 0xc019bc34  r7 = 0xda6d2cdc
         r8 = 0xc70c8e40  r9 = 0x00000000
        r10 = 0x00000000
fork_exit() at fork_exit+0xa0
         pc = 0xc02a5388  lr = 0xc05f2290 (swi_exit)
         sp = 0xc70c8e40  fp = 0x00000000
         r4 = 0xc019bc34  r5 = 0xda6d2cdc
         r6 = 0x00000000  r7 = 0x00000000
         r8 = 0x00000000 r10 = 0x00000000
swi_exit() at swi_exit
         pc = 0xc05f2290  lr = 0xc05f2290 (swi_exit)
         sp = 0xc70c8e40  fp = 0x00000000
KDB: enter: panic
[ thread pid 14 tid 100092 ]
Stopped at      kdb_enter+0x54: ldrb    r15, [r15, r15, ror r15]!
Comment 20 Vincent Milum Jr 2023-06-03 02:30:16 UTC
As I mentioned in the initial post, the "crash" happens specifically when byte swapping from network endian to local machine endian. This requires a 4-byte aligned access due to how the byte swapping works.

MAC headers are 14-bytes, which causes the items in the IP header to be misaligned by 2-bytes.

The point of m_adj that I can see from other drives and various notes is to start the MAC header misaligned by 2 bytes, in order to line up the IP header on the 4-byte boundary.

The various "simple" fixes proposed here, I had already tried before creating the bug fix, each having their own issues.

This is why I was hoping someone who knew more about m_buf allocations could chime in as to why it doesn't assign a m_len, and to figure out how to properly set that so things don't break.
Comment 21 Graham Perrin freebsd_committer freebsd_triage 2023-06-03 16:52:55 UTC
^Triage: make former assignee freebsd-arm@ a CC recipient.
Comment 22 Mark Millard 2023-06-03 17:59:48 UTC
Back in 2016 armv7 SCTLR was set to allow unaligned access. But
some instructions impose alignment restrictions of there own,
despite the setting.

There is the following for telling clang/gcc to always allow
misaligned (by avoiding the instructions that do not allow it,
as I understand):

# cc -x c -dM -E -munaligned-access /dev/null | grep -i unaligned
#define __ARM_FEATURE_UNALIGNED 1

that is not enabled by default in system clang:

# cc -x c -dM -E /dev/null | grep -i unaligned 
#

FreeBSD's build environment is not currently causing
-munaligned-access to always be in use for armv7.
(I'm not sure it is ever supplied.)

This might be a direction to go to avoid having to
manage alignment in special contexts (that are not
necessarily obvious up front, being dependent on the
code generation choices in the compiler).

However, misalignment could have performance
consequences so avoiding the panic might not be
considered a complete solution for if_ure.c .


For reference:


author	Ian Lepore <ian@FreeBSD.org>	2016-05-26 00:03:23 +0000
committer	Ian Lepore <ian@FreeBSD.org>	2016-05-26 00:03:23 +0000
commit	40cb6df5f3ae8aa0c9839dda49dd5a6357d3c69a (patch)
. . .
Disable alignment faults on armv6, adjust various alignment-related macros
to match the new state of affairs.  The hardware we support has always been
able to do unaligned accesses, we've just never enabled it until now.

This brings FreeBSD into line with all the other major OSes, and should help
with the growing volume of 3rd-party software that assumes unaligned access
will just work on armv6 and armv7.


From a if_ure.o.meta file:

cc -mcpu=cortex-a7 -target armv7-gnueabihf-freebsd14.0 --sysroot=/usr/obj/BUILDs/main-CA7-nodbg-clang/usr/main-src/arm.armv7/tmp -B/usr/obj/BUILDs/main-CA7-nodbg-clang/usr/main-src/arm.armv7/tmp/u
sr/bin  -O2 -pipe -fno-common  -fno-strict-aliasing -Werror -D_KERNEL -DKLD_MODULE -DKLD_TIED -nostdinc   -DHAVE_KERNEL_OPTION_HEADERS -include /usr/obj/BUILDs/main-CA7-nodbg-clang/usr/main-src/arm.ar
mv7/sys/GENERIC-NODBG-CA7/opt_global.h -I. -I/usr/main-src/sys -I/usr/main-src/sys/contrib/ck/include -fno-common -g  -funwind-tables -fdebug-prefix-map=./machine=/usr/main-src/sys/arm/include -I/usr/
obj/BUILDs/main-CA7-nodbg-clang/usr/main-src/arm.armv7/sys/GENERIC-NODBG-CA7    -march=armv7a -ffreestanding -fwrapv -fstack-protector -Wall -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -W
cast-qual -Wundef -Wno-pointer-sign -D__printf__=__freebsd_kprintf__ -Wmissing-include-dirs -fdiagnostics-show-option -Wno-unknown-pragmas -Wno-error=tautological-compare -Wno-error=empty-body -Wno-er
ror=parentheses-equality -Wno-error=unused-function -Wno-error=pointer-sign -Wno-error=shift-negative-value -Wno-address-of-packed-member -Wno-error=strict-prototypes -Wno-error=unused-but-set-variabl
e -Wno-format-zero-length   -mno-movt -mfpu=none  -std=iso9899:1999 -c /usr/main-src/sys/dev/usb/net/if_ure.c -o if_ure.o
Comment 23 Mark Millard 2023-06-03 18:06:09 UTC
(In reply to Mark Millard from comment #22)

I should have noted:

I cause the "-mcpu=cortex-a7" in my environment. It is
not part of a normal FreeBSD build. I grabbed the .meta
text from one of my builds and so it had the extra
command line option.
Comment 24 Mark Millard 2023-06-03 20:41:26 UTC
(In reply to Mark Millard from comment #22)

So much for the -munaligned-access idea: the compiler
still generates the ldmib and stmib instructions that
have the alignment requirement.

Unless there is another compiler option that would lead
to avoiding such instructions, this direction looks to
be a bust.
Comment 25 Mark Millard 2023-06-03 21:10:12 UTC
(In reply to Mark Millard from comment #24)

By the way, I did check if I had the orientation backwards:
-mno-unaligned-access also did not change the use of the
instructions that require alignment.

For reference, -mno-unaligned-access was what my initial
note should have referenced: it is what forces byte-by-byte
access for fields in packed struct's in order to allow
misaligned data in packed struct's: it avoids some forms
of unaligned access.
Comment 26 Hans Petter Selasky freebsd_committer freebsd_triage 2023-06-05 08:05:45 UTC
(In reply to Vincent Milum Jr from comment #18)

If you look carefully at my patch, I copy two bytes earlier from the USB buffer, because there is always a header first in the USB payload. And then I just strip those two bytes, and it is good!

Do you see?

--HPS
Comment 27 Hans Petter Selasky freebsd_committer freebsd_triage 2023-06-05 08:11:45 UTC
From the comments I see there may be two issues:

One issue in the USB driver (if_ure), to get the IP header aligned. Can we agree on some solution here. I think using m_adj() avoids future problems, because we have some magic mbuf types, and doing custom mbuf manipulation might not be good. Then all USB ethernet drivers should be aligned.

Second issue in the ARM area, to allow unaligned memory access. I have my doubts this is possible. Personally I think reading 32-bit memory locations in a single instruction and memory access, is the best!

--HPS
Comment 28 Hans Petter Selasky freebsd_committer freebsd_triage 2023-06-06 07:06:41 UTC
Ping - can you also test the patch on ARM? Else I will just test it on AMD64 by asserting the offset before submitting the patch.

My patch is basically:

Copy ETHER_ALIGN bytes more, and start ETHER_ALIGN bytes earlier in the USB payload. Then m_adj() when the mbuf chain is ready. The mbuf chain is already sized for that.

--HPS
Comment 29 Mark Millard 2023-06-06 17:05:01 UTC
(In reply to Hans Petter Selasky from comment #28)

It likely will not be me doing testing for now: $work instead.
Comment 30 Mike Karels freebsd_committer freebsd_triage 2023-06-06 17:52:26 UTC
(In reply to Hans Petter Selasky from comment #27)
My take on the patch: it seems a bit cumbersome, but should work.  It doesn't add any new direct manipulation of mbuf fields.  I don't have hardware to test though.

On the second issue, allowing unaligned accesses: I think that would be a bad idea, as it would force many things to use sub-optimal code.  Much better to prevent unaligned accesses here, as everywhere else already does.
Comment 31 Vincent Milum Jr 2023-06-07 22:11:33 UTC
I just tried the patch provided on my Raspberry Pi 2 B v1.1 (ARMv7)

Initial TCP connections are now working (eg: fetching HTTP content, pkg)

And now, getting a new panic elsewhere:


root@rpi2 ~# speedtest-cli
Retrieving speedtest.net configuration...
Fatal kernel mode data abort: 'Translation Fault (L1)' on read
trapframe: 0xc49a4c80
FSR=00000005, FAR=00000010, spsr=20000013
r0 =d71e8148, r1 =ddb420cc, r2 =00000000, r3 =d71e81fe
r4 =000000ba, r5 =00000000, r6 =00000200, r7 =d791fb20
r8 =000000b6, r9 =000000cc, r10=00000002, r11=c49a4d70
r12=00000000, ssp=c49a4d10, slr=c01841d8, pc =ddc129f0

panic: Fatal abort
cpuid = 2
time = 1686142856
KDB: stack backtrace:
#0 0xc0355b8c at kdb_backtrace+0x48
#1 0xc02fcb6c at vpanic+0x170
#2 0xc02fc9fc at vpanic+0
#3 0xc061bc14 at abort_align+0
#4 0xc061b704 at abort_handler+0x2a0
#5 0xc05fac98 at exception_exit+0
#6 0xddc129f0 at $a.10+0x1c4
#7 0xc019e54c at usbd_callback_wrapper+0x918
#8 0xc019ff48 at usb_command_wrapper+0xd4
#9 0xc019e968 at usb_callback_proc+0x148
#10 0xc0199648 at usb_process+0x134
#11 0xc02b4b54 at fork_exit+0xc0
#12 0xc05fac28 at swi_exit+0
Uptime: 45s
Resetting system ...
Comment 32 Hans Petter Selasky freebsd_committer freebsd_triage 2023-06-07 23:03:15 UTC
Hi Vincent,

Can you open the kernel using kgdb and then enter:

info line *(usbd_callback_wrapper+0x918)

You should get something like this (but this is from amd64, so the line number is wrong for sure).

Line 2512 of "/vol/freebsd-src/sys/dev/usb/usb_transfer.c"
   starts at address 0xffffffff8095f042 <usbd_callback_wrapper+2322>
   and ends at 0xffffffff8095f04f <usbd_callback_wrapper+2335>.

This looks like another issue. Let's see if this backtrace is correct!

--HPS
Comment 33 Graham Perrin freebsd_committer freebsd_triage 2023-07-09 09:26:37 UTC
^Triage: 

- de-tag the summary line

- slight variation to the GitHub link in comment 0

- whilst here, assign appropriately

- normalise the status for assignment to a list.
Comment 34 Mark Millard 2023-08-05 05:08:10 UTC
(In reply to Mark Millard from comment #12)

I've gotten the same sort of failure backtrace structure
with the built-in EtherNet on the same armv7 system. I
did not initially notice the commonality with here.

I had written on list . . .

While discovered via an attempted overall kyua run, the following is
sufficient to get the crash in my native armv7 context:

# /usr/bin/kyua test -k /usr/tests/Kyuafile sys/netinet6/exthdr:exthdr
sys/netinet6/exthdr:exthdr  ->  Fatal kernel mode data abort: 'Alignment Fault' on read
trapframe: 0xdfb97aa0
FSR=00000001, FAR=db43ab76, spsr=60000013
r0 =dfedd000, r1 =dfb97b34, r2 =00000000, r3 =00000000
r4 =00000000, r5 =00000000, r6 =db43ab76, r7 =db43ab66
r8 =c096383c, r9 =00000000, r10=db132400, r11=dfb97b60
r12=00000000, ssp=dfb97b30, slr=c0b4e2c0, pc =c04e6b70

panic: Fatal abort
cpuid = 0
time = 1691131498
KDB: stack backtrace:
db_trace_self() at db_trace_self
         pc = 0xc065f414  lr = 0xc007db80 (db_trace_self_wrapper+0x30)
         sp = 0xdfb97858  fp = 0xdfb97970
db_trace_self_wrapper() at db_trace_self_wrapper+0x30
         pc = 0xc007db80  lr = 0xc031a834 (vpanic+0x140)
         sp = 0xdfb97978  fp = 0xdfb97998
         r4 = 0x00000100  r5 = 0x00000000
         r6 = 0xc07c369a  r7 = 0xc0b32e58
vpanic() at vpanic+0x140
         pc = 0xc031a834  lr = 0xc031a6f4 (vpanic)
         sp = 0xdfb979a0  fp = 0xdfb979a4
         r4 = 0xdfb97aa0  r5 = 0x00000013
         r6 = 0xdb43ab76  r7 = 0x00000001
         r8 = 0x00000001  r9 = 0xdfedd000
        r10 = 0xdb43ab76
vpanic() at vpanic
         pc = 0xc031a6f4  lr = 0xc06849dc (abort_align)
         sp = 0xdfb979ac  fp = 0xdfb979d8
         r4 = 0x00000001  r5 = 0x00000001
         r6 = 0xdfedd000  r7 = 0xdb43ab76
         r8 = 0xdfb979a4  r9 = 0xc031a6f4
        r10 = 0xdfb979ac
abort_align() at abort_align
         pc = 0xc06849dc  lr = 0xc0684a50 (abort_align+0x74)
         sp = 0xdfb979e0  fp = 0xdfb979f8
         r4 = 0x00000013 r10 = 0xdb43ab76
abort_align() at abort_align+0x74
         pc = 0xc0684a50  lr = 0xc06846a8 (abort_handler+0x45c)
         sp = 0xdfb97a00  fp = 0xdfb97a98
         r4 = 0x00000000 r10 = 0xdb43ab76
abort_handler() at abort_handler+0x45c
         pc = 0xc06846a8  lr = 0xc0661cc8 (exception_exit)
         sp = 0xdfb97aa0  fp = 0xdfb97b60
         r4 = 0x00000000  r5 = 0x00000000
         r6 = 0xdb43ab76  r7 = 0xdb43ab66
         r8 = 0xc096383c  r9 = 0x00000000
        r10 = 0xdb132400
exception_exit() at exception_exit
         pc = 0xc0661cc8  lr = 0xc0b4e2c0 (__pcpu)
         sp = 0xdfb97b30  fp = 0xdfb97b60
         r0 = 0xdfedd000  r1 = 0xdfb97b34
         r2 = 0x00000000  r3 = 0x00000000
         r4 = 0x00000000  r5 = 0x00000000
         r6 = 0xdb43ab76  r7 = 0xdb43ab66
         r8 = 0xc096383c  r9 = 0x00000000
        r10 = 0xdb132400 r12 = 0x00000000
in6ifa_ifwithaddr() at in6ifa_ifwithaddr+0x30
         pc = 0xc04e6b70  lr = 0xc04f9030 (ip6_input+0xd38)
         sp = 0xdfb97b68  fp = 0xdfb97c28
         r4 = 0xdb43ab76  r5 = 0xdb43ab5e
         r6 = 0x00000000  r7 = 0xdb43ab66
ip6_input() at ip6_input+0xd38
         pc = 0xc04f9030  lr = 0xc046d66c (netisr_dispatch_src+0xf8)
         sp = 0xdfb97c30  fp = 0xdfb97c58
         r4 = 0xdb43ab00  r5 = 0x00000006
         r6 = 0x00000007  r7 = 0xc0b49d50
         r8 = 0xdafea0c0  r9 = 0xdb43ab00
        r10 = 0x00000086
netisr_dispatch_src() at netisr_dispatch_src+0xf8
         pc = 0xc046d66c  lr = 0xc04641b0 (ether_demux+0x18c)
         sp = 0xdfb97c60  fp = 0xdfb97c78
         r4 = 0x00000006  r5 = 0x00001201
         r6 = 0xdb132400  r7 = 0x000000ff
         r8 = 0xdafea0c0  r9 = 0xdb43ab00
        r10 = 0x00000086
ether_demux() at ether_demux+0x18c
         pc = 0xc04641b0  lr = 0xc0465880 (ether_nh_input+0x490)
         sp = 0xdfb97c80  fp = 0xdfb97ce0
         r4 = 0xdb132400  r5 = 0xdb43ab00
         r6 = 0xdb43ab50 r10 = 0x00000086
ether_nh_input() at ether_nh_input+0x490
         pc = 0xc0465880  lr = 0xc046d66c (netisr_dispatch_src+0xf8)
         sp = 0xdfb97ce8  fp = 0xdfb97d10
         r4 = 0xdb43ab00  r5 = 0x00000005
         r6 = 0x0000000c  r7 = 0xc0b49d30
         r8 = 0xdafea0c0  r9 = 0xdb43ab00
        r10 = 0xc098d18f
netisr_dispatch_src() at netisr_dispatch_src+0xf8
         pc = 0xc046d66c  lr = 0xc04645c4 (ether_input+0x50)
         sp = 0xdfb97d18  fp = 0xdfb97d48
         r4 = 0xdb43ab00  r5 = 0x00000000
         r6 = 0x00008803  r7 = 0x00000000
         r8 = 0xdafea0c0  r9 = 0xdb43ab00
        r10 = 0xc098d18f
ether_input() at ether_input+0x50
         pc = 0xc04645c4  lr = 0xdffb3f08 ($a.10+0x108)
         sp = 0xdfb97d50  fp = 0xdfb97d78
         r4 = 0xdb132400  r5 = 0xdaff8b00
         r6 = 0xdaff8b10  r7 = 0x00000000
         r8 = 0x00000000 r10 = 0xc098d18f
$a.10() at $a.10+0x108
         pc = 0xdffb3f08  lr = 0xc038cb2c (taskqueue_run_locked+0x1c4)
         sp = 0xdfb97d80  fp = 0xdfb97dd8
         r4 = 0xe0145100  r5 = 0xdaff8b2c
         r6 = 0xe0145150  r7 = 0x00000001
         r8 = 0x00000000  r9 = 0xdfb97d90
        r10 = 0x00000001
taskqueue_run_locked() at taskqueue_run_locked+0x1c4
         pc = 0xc038cb2c  lr = 0xc038e4e4 (taskqueue_thread_loop+0x1b0)
         sp = 0xdfb97de0  fp = 0xdfb97e10
         r4 = 0xe0145100  r5 = 0xe0145140
         r6 = 0xc07af4c4  r7 = 0x00000000
         r8 = 0xc098d18f  r9 = 0x00000100
        r10 = 0xc0b228a0
taskqueue_thread_loop() at taskqueue_thread_loop+0x1b0
         pc = 0xc038e4e4  lr = 0xc02cdf0c (fork_exit+0xc0)
         sp = 0xdfb97e18  fp = 0xdfb97e38
         r4 = 0xdfedd000  r5 = 0xc0b224e0
         r6 = 0xc038e334  r7 = 0xdffc4f54
         r8 = 0xdfb97e40  r9 = 0xc098d191
fork_exit() at fork_exit+0xc0
         pc = 0xc02cdf0c  lr = 0xc0661c5c (swi_exit)
         sp = 0xdfb97e40  fp = 0x00000000
         r4 = 0xc038e334  r5 = 0xdffc4f54
         r6 = 0xc0b45d84  r7 = 0xd73bcba0
         r8 = 0x00000001 r10 = 0xc0b228a0
swi_exit() at swi_exit
         pc = 0xc0661c5c  lr = 0xc0661c5c (swi_exit)
         sp = 0xdfb97e40  fp = 0x00000000
KDB: enter: panic
[ thread pid 0 tid 100230 ]

For reference:

# uname -apKU
FreeBSD OPiP2E-RPi2v1p1 14.0-CURRENT FreeBSD 14.0-CURRENT armv7 1400093 #6 main-n264334-215bab7924f6-dirty: Tue Jul 25 23:11:39 PDT 2023     root@CA72-16Gp-ZFS:/usr/obj/BUILDs/main-CA7-nodbg-clang/usr/main-src/arm.armv7/sys/GENERIC-NODBG-CA7 arm armv7 1400093 1400093

The OrangePi+ 2Ed was the type of system booted and tested.
Comment 35 Mark Millard 2023-08-05 22:42:30 UTC
(In reply to Mark Millard from comment #34)

Hmm. Getting the problem on the built-in Ethernet did not
involve if_ure from what I can tell, looking at the dmsg -a
output:

. . .
awg0: <Allwinner Gigabit Ethernet> mem 0x1c30000-0x1c3ffff irq 27 on simplebus0
miibus0: <MII bus> on awg0
rgephy0: <RTL8169S/8110S/8211 1000BASE-T media interface> PHY 0 on miibus0
rgephy0:  none, 10baseT, 10baseT-FDX, 10baseT-FDX-flow, 100baseTX, 100baseTX-FDX, 100baseTX-FDX-flow, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, 1000baseT-FDX-flow, 1000baseT-FDX-flow-master, auto, auto-flow
rgephy1: <RTL8169S/8110S/8211 1000BASE-T media interface> PHY 1 on miibus0
rgephy1:  none, 10baseT, 10baseT-FDX, 10baseT-FDX-flow, 100baseTX, 100baseTX-FDX, 100baseTX-FDX-flow, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, 1000baseT-FDX-flow, 1000baseT-FDX-flow-master, auto, auto-flow
. . .
Autoloading module: if_ure
ure0 on uhub6
ure0: <Realtek USB 10/100/1000 LAN, class 0/0, rev 2.10/30.00, addr 2> on usbus7
miibus1: <MII bus> on ure0
rgephy2: <RTL8251/8153 1000BASE-T media interface> PHY 0 on miibus1
rgephy2:  none, 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT-FDX, 1000baseT-FDX-master, auto
ue0: <USB Ethernet> on ure0

if_ure would only be involved with the dongle (ure0), for which no
Ethernet cable was attached.

So I'm unsure if what I've reported for the OrangePi+2Ed context overall
is:

A) An independent problem.
vs.
B) Evidence that the original problem involves more than just potential
  if_ure issues.
Comment 36 Mark Millard 2023-08-06 08:00:19 UTC
(In reply to Mark Millard from comment #35)

Other testing with an official snapshot build (so: debug build)
indicates that the failures tied to in6ifa_ifwithaddr involve:

Kernel page fault with the following non-sleepable locks held:
shared rm in6_ifaddr_lock (in6_ifaddr_lock) r = 0 (0xc0b5acd0) locked @ /usr/src/sys/netinet6/in6.c:1620
stack backtrace:
#0 0xc035e060 at witness_debugger+0x74
#1 0xc035f2ec at witness_warn+0x41c
#2 0xc0610b58 at abort_handler+0x1d8
#3 0xc05ef6ac at exception_exit+0
#4 0xc04986b4 at in6ifa_ifwithaddr+0x40
#5 0xc04aa060 at ip6_input+0xd38
#6 0xc04235bc at netisr_dispatch_src+0x100
#7 0xc041a384 at ether_demux+0x1bc
#8 0xc041bb68 at ether_nh_input+0x3dc
#9 0xc04235bc at netisr_dispatch_src+0x100
#10 0xc041a808 at ether_input+0xec
#11 0xe183810c at $a.10+0xbc
#12 0xc03504dc at taskqueue_run_locked+0xb8
#13 0xc0351560 at taskqueue_thread_loop+0x108
#14 0xc02a384c at fork_exit+0xa0
#15 0xc05ef640 at swi_exit+0
Fatal kernel mode data abort: 'Alignment Fault' on read

I've submitted https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=272966
for this.

I've also submitted https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=272965
for a different backtrace structure that various tests produce that also
include: 'Alignment Fault' on read