https://github.com/freebsd/freebsd-src/blob/main/sys/dev/usb/net/if_ure.c#LL616C1-L617C24 ure_makembuf() has the following code: /* uether_newbuf does this. */ m_adj(m, ETHER_ALIGN); based on that comment, it looks like someone copied m_adj() over because the base usb ethernet driver has it, without really testing it or setting up the mbuf correctly. when m_adj is called here, m->m_len is 0 at this point so the call to m_adj() does nothing. because of this, the data is never shifted by 2 bytes, causing an unaligned access to the IP headers when byte swapping later in the stack when converting from network-order to host-order, causing a panic on 32-bit ARM systems. NOTE: virtio-net has a similar unaligned access issue, however that driver isn't calling m_adj() at all it looks like. This is referenced over in: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=271288
removing the call to m_adj(m, ETHER_ALIGN); and instead doing m->m_data += ETHER_ALIGN; allows the driver to compile and function, but I know this is wrong, probably causing a buffer overflow. someone with more knowledge of mbuf allocation and management will probably need to take a look at both of these drivers mentioned.
https://cgit.freebsd.org/src/commit/sys/dev/usb/net/if_ure.c?id=7d5522e16a2970ade4e79e647842841cd32a19e1 is the commit that adds the code: author John-Mark Gurney <jmg@FreeBSD.org> 2020-09-12 00:33:11 +0000 committer John-Mark Gurney <jmg@FreeBSD.org> 2020-09-12 00:33:11 +0000 . . . A major update to the ure driver. This update adds support for: HW VLAN tagging HW checksum offload for IPv4 and IPv6 tx and rx aggreegation (for full gige speeds) multiple transactions In my testing, I am able to get 900-950Mbps depending upon TCP or UDP, which is a significant improvement over the previous 91Mbps (~8kint/sec*1500bytes/packet*1packet/int). Reviewed by: hselasky MFC after: 2 months Differential Revision: https://reviews.freebsd.org/D25809 The reviewer (hselasky) would also be someone good to contact for questions.
(In reply to Vincent Milum Jr from comment #1) This change should not cause a buffer overflow, because the allocation just above uses len + ETHER_ALIGN, leaving len left after the m_adj
Hi, I think the right fix is to change: /* uether_newbuf does this. */ m_adj(m, ETHER_ALIGN); m->m_pkthdr.len = len; Into: m->m_pkthdr.len = len + ETHER_ALIGN; /* ensure proper alignment of IP header */ m_adj(m, ETHER_ALIGN); Can you test that? Using m_adj() is more clean than using m_data +=, because m_len should also be updated! --HPS
Can you also check of other USB ethernet drivers have the same pattern? --HPS
(In reply to Hans Petter Selasky from comment #4) I don't think that setting the pkthdr len will help; m_adj is using the individual m_len values, and only adjusting the pkthdr len.
Or set both? m_adj() has a check for the packet header flag. --HPS
So, what steps to I take to demonstrate the crash? Context: RPi2B v1.1 (so: armv7), with an EtherNet USB dongle, # uname -apKU FreeBSD generic 14.0-CURRENT FreeBSD 14.0-CURRENT #0 main-n263002-743516d51fa7: Thu May 18 11:11:33 UTC 2023 root@releng1.nyi.freebsd.org:/usr/obj/usr/src/arm.armv7/sys/GENERIC arm armv7 1400088 1400088 # usbconfig show_ifdrv . . . ugen1.4: <Realtek USB 10/100/1000 LAN> at usbus1, cfg=0 md=HOST spd=HIGH (480Mbps) pwr=ON (200mA) ugen1.4.0: ure0: <Realtek USB 10/100/1000 LAN, class 0/0, rev 2.10/30.00, addr 4> . . . So far it seems to be working fine.
(In reply to Mark Millard from comment #8) Already fixed in main/CURRENT? Some other difference in stable/13 ? I do not have a stable/13 context handy.
In my case, I've tried both 13.1-RELEASE and 13.2-RELEASE on multiple RPi 1 and 2 boards. They all consistently crash with a simple "fetch" request on the command line.
(In reply to Vincent Milum Jr from comment #10) I decided that I was going to replace the armv7 main snapshot I've been using anyway, so I tried stable/13 and then main with the specific activity of fetch involved for today's snapshots. Both fail. Fetch works well as a test case. Details: # uname -apKU FreeBSD generic 13.2-STABLE FreeBSD 13.2-STABLE stable/13-n255472-cf3a76018cad GENERIC arm armv7 1302505 1302505 # fetch http://ftp3.freebsd.org/pub/FreeBSD/snapshots/ISO-IMAGES/13.2/CHECKSUM.SHA256-FreeBSD-13.2-STABLE-arm-armv7-GENERICSD-20230601-cf3a76018cad-255472 Fatal kernel mode data abort: 'Alignment Fault' on read . . . Then: # uname -apKU FreeBSD generic 14.0-CURRENT FreeBSD 14.0-CURRENT #0 main-n263302-4f2cc73f34eb: Thu Jun 1 11:13:49 UTC 2023 root@releng1.nyi.freebsd.org:/usr/obj/usr/src/arm.armv7/sys/GENERIC arm armv7 1400089 1400089 # fetch http://ftp3.freebsd.org/pub/FreeBSD/snapshots/ISO-IMAGES/14.0/CHECKSUM.SHA256-FreeBSD-14.0-CURRENT-arm-armv7-GENERICSD-20230601-4f2cc73f34eb-263302 Fatal kernel mode data abort: 'Alignment Fault' on read . . .
I tried the OrangePi+ 2e with the same boot media and EtherNet dongle (instead of its builtin EtherNet port) and got a somewhat different result: An initial "Kernel page fault with the following non-sleepable locks held" before the 'Alignment Fault' on read: . . . login: root Password:Kernel page fault with the following non-sleepable locks held: shared rm in6_ifaddr_lock (in6_ifaddr_lock) r = 0 (0xc0b5acd0) locked @ /usr/src/sys/netinet6/in6.c:1603 stack backtrace: #0 0xc0360944 at witness_debugger+0x74 #1 0xc0361bd8 at witness_warn+0x424 #2 0xc061382c at abort_handler+0x1d8 #3 0xc05f22fc at exception_exit+0 #4 0xc049b224 at in6ifa_ifwithaddr+0x40 #5 0xc04acdd0 at ip6_input+0xd54 #6 0xc0425d34 at netisr_dispatch_src+0x100 #7 0xc041ca9c at ether_demux+0x1bc #8 0xc041e28c at ether_nh_input+0x3dc #9 0xc0425d34 at netisr_dispatch_src+0x100 #10 0xc041cf24 at ether_input+0xf0 #11 0xc01aab34 at uether_rxflush+0x8c #12 0xe0813018 at $a.10+0x4d4 #13 0xc01a0ea4 at usbd_callback_wrapper+0x7e0 #14 0xc01a21b8 at usb_command_wrapper+0x128 #15 0xc01a1038 at usb_callback_proc+0x90 #16 0xc019bd40 at usb_process+0x10c #17 0xc02a5388 at fork_exit+0xa0 Fatal kernel mode data abort: 'Alignment Fault' on read trapframe: 0xc708ca50 FSR=00000001, FAR=e032e776, spsr=00000013 r0 =da692000, r1 =00000001, r2 =ffffffff, r3 =c0b2b548 r4 =00000000, r5 =00000000, r6 =e032e776, r7 =e032e766 r8 =c091af44, r9 =00000000, r10=e0658c00, r11=c708cb10 r12=00000000, ssp=c708cae0, slr=c02e330c, pc =c049b224 . . .
(In reply to Mark Millard from comment #12) Looks like that failure is associated with my having added to /etc/rc.conf : ntpd_enable="YES" ntpd_sync_on_start="YES" ntpd_user="root" No login attempt is required to get the failure. Just leave it at the login prompt and wait a little bit.
(In reply to Hans Petter Selasky from comment #7) m_len on the first mbuf could be set to ETHER_ALIGN, and pkthdr.len as you suggested. I'm assuming the mbuf lengths are set as they are filled in, as they are apparently zero to start.
Created attachment 242554 [details] Patch I looked at the code and propose the attached solution. Please test and review!
markj@ : Can you have a quick look at the patch attached to this issue?
Looking at the other USB ethernet drivers, this was the only place I could find, needing to be patched.
(In reply to Hans Petter Selasky from comment #15) Looking at what m_adj does, it doesn't move the data around, it only adjusts the pointer to the data. That won't fix alignment, it only chops off data from the front of the buffer. m_adj needs to be called before data is put into the buffer, but after the length of the buffer is declared. that's at least the impression im getting looking over all these source files and how other drivers are doing it. https://github.com/freebsd/freebsd-src/blob/8dad5ece49479ba6cdcd5bb4c2799bbd61add3e6/sys/kern/uipc_mbuf.c#L824
May be it would be a good idiea to have an actual backtrace showing an example of where the fetch style test fails ( tcp_input_with_port+0x650 ): # fetch http://ftp3.freebsd.org/pub/FreeBSD/snapshots/ISO-IMAGES/14.0/CHECKSUM.SHA256-FreeBSD-14.0-CURRENT-arm-armv7-GENERICSD-20230601-4f2cc73f34eb-263302 Fatal kernel mode data abort: 'Alignment Fault' on read trapframe: 0xc70c89f0 FSR=00000001, FAR=e0326d76, spsr=60000013 r0 =000000a0, r1 =00000014, r2 =00000000, r3 =c0b5df40 r4 =e0326d72, r5 =00000028, r6 =00000020, r7 =e0326d5e r8 =00000028, r9 =00000014, r10=00000000, r11=c70c8b88 r12=e0326d00, ssp=c70c8a80, slr=c0467218, pc =c0467834 panic: Fatal abort cpuid = 3 time = 1262304178 KDB: stack backtrace: db_trace_self() at db_trace_self pc = 0xc05ef9a4 lr = 0xc007a694 (db_trace_self_wrapper+0x30) sp = 0xc70c87a8 fp = 0xc70c88c0 db_trace_self_wrapper() at db_trace_self_wrapper+0x30 pc = 0xc007a694 lr = 0xc02eb598 (vpanic+0x140) sp = 0xc70c88c8 fp = 0xc70c88e8 r4 = 0x00000100 r5 = 0x00000000 r6 = 0xc075c866 r7 = 0xc0aeaeb0 vpanic() at vpanic+0x140 pc = 0xc02eb598 lr = 0xc02eb378 (doadump) sp = 0xc70c88f0 fp = 0xc70c88f4 r4 = 0xc70c89f0 r5 = 0x00000013 r6 = 0xe0326d76 r7 = 0x00000001 r8 = 0x00000001 r9 = 0xda6cb000 r10 = 0xe0326d76 doadump() at doadump pc = 0xc02eb378 lr = 0xc0613e58 (abort_align) sp = 0xc70c88fc fp = 0xc70c8928 r4 = 0xe0326d76 r5 = 0xc70c88f4 r6 = 0xc02eb378 r10 = 0xc70c88fc abort_align() at abort_align pc = 0xc0613e58 lr = 0xc0613ecc (abort_align+0x74) sp = 0xc70c8930 fp = 0xc70c8948 r4 = 0x00000013 r10 = 0xe0326d76 abort_align() at abort_align+0x74 pc = 0xc0613ecc lr = 0xc0613aec (abort_handler+0x498) sp = 0xc70c8950 fp = 0xc70c89e8 r4 = 0x00000000 r10 = 0xe0326d76 abort_handler() at abort_handler+0x498 pc = 0xc0613aec lr = 0xc05f22fc (exception_exit) sp = 0xc70c89f0 fp = 0xc70c8b88 r4 = 0xe0326d72 r5 = 0x00000028 r6 = 0x00000020 r7 = 0xe0326d5e r8 = 0x00000028 r9 = 0x00000014 r10 = 0x00000000 exception_exit() at exception_exit pc = 0xc05f22fc lr = 0xc0467218 (tcp_input_with_port+0x34) sp = 0xc70c8a80 fp = 0xc70c8b88 r0 = 0x000000a0 r1 = 0x00000014 r2 = 0x00000000 r3 = 0xc0b5df40 r4 = 0xe0326d72 r5 = 0x00000028 r6 = 0x00000020 r7 = 0xe0326d5e r8 = 0x00000028 r9 = 0x00000014 r10 = 0x00000000 r12 = 0xe0326d00 tcp_input_with_port() at tcp_input_with_port+0x650 pc = 0xc0467834 lr = 0xc046923c (tcp_input+0x10) sp = 0xc70c8b90 fp = 0xc70c8b90 r4 = 0x00000004 r5 = 0x00000000 r6 = 0xda96a800 r7 = 0xc0b5df40 r8 = 0xe0326d5e r9 = 0xc094cd70 r10 = 0x00000014 tcp_input() at tcp_input+0x10 pc = 0xc046923c lr = 0xc0455eac (ip_input+0x454) sp = 0xc70c8b98 fp = 0xc70c8bd8 ip_input() at ip_input+0x454 pc = 0xc0455eac lr = 0xc0425d34 (netisr_dispatch_src+0x100) sp = 0xc70c8be0 fp = 0xc70c8c08 r4 = 0x00000005 r5 = 0xe0326d00 r6 = 0x00000000 r7 = 0xc0b5a2f8 r8 = 0x00000000 r9 = 0xc5779a00 r10 = 0x00000008 netisr_dispatch_src() at netisr_dispatch_src+0x100 pc = 0xc0425d34 lr = 0xc041ca9c (ether_demux+0x1bc) sp = 0xc70c8c10 fp = 0xc70c8c28 r4 = 0xe0326d00 r5 = 0x00000001 r6 = 0xda965000 r7 = 0x5e4a6f28 r8 = 0x00000000 r9 = 0xc5779a00 r10 = 0x00000008 ether_demux() at ether_demux+0x1bc pc = 0xc041ca9c lr = 0xc041e28c (ether_nh_input+0x3dc) sp = 0xc70c8c30 fp = 0xc70c8c58 r4 = 0xda965000 r5 = 0xe0326d00 r6 = 0xe0326d50 r10 = 0x00000008 ether_nh_input() at ether_nh_input+0x3dc pc = 0xc041e28c lr = 0xc0425d34 (netisr_dispatch_src+0x100) sp = 0xc70c8c60 fp = 0xc70c8c88 r4 = 0x0000000d r5 = 0xe0326d00 r6 = 0x00000000 r7 = 0xc0b5a378 r8 = 0x5e4a6f28 r9 = 0x00000000 r10 = 0x00000000 netisr_dispatch_src() at netisr_dispatch_src+0x100 pc = 0xc0425d34 lr = 0xc041cf24 (ether_input+0xf0) sp = 0xc70c8c90 fp = 0xc70c8cc8 r4 = 0xda965000 r5 = 0x00000000 r6 = 0xe0326d00 r7 = 0x00000000 r8 = 0x5e4a6f28 r9 = 0x00000000 r10 = 0x00000000 ether_input() at ether_input+0xf0 pc = 0xc041cf24 lr = 0xc01aab34 (uether_rxflush+0x8c) sp = 0xc70c8cd0 fp = 0xc70c8d00 r4 = 0xdb63cc00 r5 = 0xe0326d00 r6 = 0x00000000 r7 = 0x00000000 r8 = 0xda965000 r9 = 0xc094440c r10 = 0xda965000 uether_rxflush() at uether_rxflush+0x8c pc = 0xc01aab34 lr = 0xe0713018 ($a.10+0x4d4) sp = 0xc70c8d08 fp = 0xc70c8d80 r4 = 0xe072b534 r5 = 0xe05e7f60 r6 = 0x00000068 r7 = 0x01000000 r8 = 0xe05e71c0 r9 = 0xdb63cc00 $a.10() at $a.10+0x4d4 pc = 0xe0713018 lr = 0xc01a0ea4 (usbd_callback_wrapper+0x7e0) sp = 0xc70c8d88 fp = 0xc70c8dc0 r4 = 0xdb7dac00 r5 = 0xe05e7273 r6 = 0xe05e71c0 r7 = 0xe05e7000 r8 = 0xda6d2c78 r9 = 0xe05e7030 r10 = 0xc09670bc usbd_callback_wrapper() at usbd_callback_wrapper+0x7e0 pc = 0xc01a0ea4 lr = 0xc01a21b8 (usb_command_wrapper+0x128) sp = 0xc70c8dc8 fp = 0xc70c8de0 r4 = 0xe05e7030 r5 = 0xc09670bc r6 = 0xc0798d04 r7 = 0xc0747886 r8 = 0x00000000 r9 = 0xc07ab12a r10 = 0xda6d2cec usb_command_wrapper() at usb_command_wrapper+0x128 pc = 0xc01a21b8 lr = 0xc01a1038 (usb_callback_proc+0x90) sp = 0xc70c8de8 fp = 0xc70c8df0 r4 = 0xe05e7000 r5 = 0xda6d2ce4 r6 = 0xe05e7044 r7 = 0xc076ab50 r8 = 0x00000000 r9 = 0xc096711c usb_callback_proc() at usb_callback_proc+0x90 pc = 0xc01a1038 lr = 0xc019bd40 (usb_process+0x10c) sp = 0xc70c8df8 fp = 0xc70c8e18 r4 = 0xda6d2cdc r10 = 0xda6d2cec usb_process() at usb_process+0x10c pc = 0xc019bd40 lr = 0xc02a5388 (fork_exit+0xa0) sp = 0xc70c8e20 fp = 0xc70c8e38 r4 = 0xda6cb000 r5 = 0xd79d7740 r6 = 0xc019bc34 r7 = 0xda6d2cdc r8 = 0xc70c8e40 r9 = 0x00000000 r10 = 0x00000000 fork_exit() at fork_exit+0xa0 pc = 0xc02a5388 lr = 0xc05f2290 (swi_exit) sp = 0xc70c8e40 fp = 0x00000000 r4 = 0xc019bc34 r5 = 0xda6d2cdc r6 = 0x00000000 r7 = 0x00000000 r8 = 0x00000000 r10 = 0x00000000 swi_exit() at swi_exit pc = 0xc05f2290 lr = 0xc05f2290 (swi_exit) sp = 0xc70c8e40 fp = 0x00000000 KDB: enter: panic [ thread pid 14 tid 100092 ] Stopped at kdb_enter+0x54: ldrb r15, [r15, r15, ror r15]!
As I mentioned in the initial post, the "crash" happens specifically when byte swapping from network endian to local machine endian. This requires a 4-byte aligned access due to how the byte swapping works. MAC headers are 14-bytes, which causes the items in the IP header to be misaligned by 2-bytes. The point of m_adj that I can see from other drives and various notes is to start the MAC header misaligned by 2 bytes, in order to line up the IP header on the 4-byte boundary. The various "simple" fixes proposed here, I had already tried before creating the bug fix, each having their own issues. This is why I was hoping someone who knew more about m_buf allocations could chime in as to why it doesn't assign a m_len, and to figure out how to properly set that so things don't break.
^Triage: make former assignee freebsd-arm@ a CC recipient.
Back in 2016 armv7 SCTLR was set to allow unaligned access. But some instructions impose alignment restrictions of there own, despite the setting. There is the following for telling clang/gcc to always allow misaligned (by avoiding the instructions that do not allow it, as I understand): # cc -x c -dM -E -munaligned-access /dev/null | grep -i unaligned #define __ARM_FEATURE_UNALIGNED 1 that is not enabled by default in system clang: # cc -x c -dM -E /dev/null | grep -i unaligned # FreeBSD's build environment is not currently causing -munaligned-access to always be in use for armv7. (I'm not sure it is ever supplied.) This might be a direction to go to avoid having to manage alignment in special contexts (that are not necessarily obvious up front, being dependent on the code generation choices in the compiler). However, misalignment could have performance consequences so avoiding the panic might not be considered a complete solution for if_ure.c . For reference: author Ian Lepore <ian@FreeBSD.org> 2016-05-26 00:03:23 +0000 committer Ian Lepore <ian@FreeBSD.org> 2016-05-26 00:03:23 +0000 commit 40cb6df5f3ae8aa0c9839dda49dd5a6357d3c69a (patch) . . . Disable alignment faults on armv6, adjust various alignment-related macros to match the new state of affairs. The hardware we support has always been able to do unaligned accesses, we've just never enabled it until now. This brings FreeBSD into line with all the other major OSes, and should help with the growing volume of 3rd-party software that assumes unaligned access will just work on armv6 and armv7. From a if_ure.o.meta file: cc -mcpu=cortex-a7 -target armv7-gnueabihf-freebsd14.0 --sysroot=/usr/obj/BUILDs/main-CA7-nodbg-clang/usr/main-src/arm.armv7/tmp -B/usr/obj/BUILDs/main-CA7-nodbg-clang/usr/main-src/arm.armv7/tmp/u sr/bin -O2 -pipe -fno-common -fno-strict-aliasing -Werror -D_KERNEL -DKLD_MODULE -DKLD_TIED -nostdinc -DHAVE_KERNEL_OPTION_HEADERS -include /usr/obj/BUILDs/main-CA7-nodbg-clang/usr/main-src/arm.ar mv7/sys/GENERIC-NODBG-CA7/opt_global.h -I. -I/usr/main-src/sys -I/usr/main-src/sys/contrib/ck/include -fno-common -g -funwind-tables -fdebug-prefix-map=./machine=/usr/main-src/sys/arm/include -I/usr/ obj/BUILDs/main-CA7-nodbg-clang/usr/main-src/arm.armv7/sys/GENERIC-NODBG-CA7 -march=armv7a -ffreestanding -fwrapv -fstack-protector -Wall -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -W cast-qual -Wundef -Wno-pointer-sign -D__printf__=__freebsd_kprintf__ -Wmissing-include-dirs -fdiagnostics-show-option -Wno-unknown-pragmas -Wno-error=tautological-compare -Wno-error=empty-body -Wno-er ror=parentheses-equality -Wno-error=unused-function -Wno-error=pointer-sign -Wno-error=shift-negative-value -Wno-address-of-packed-member -Wno-error=strict-prototypes -Wno-error=unused-but-set-variabl e -Wno-format-zero-length -mno-movt -mfpu=none -std=iso9899:1999 -c /usr/main-src/sys/dev/usb/net/if_ure.c -o if_ure.o
(In reply to Mark Millard from comment #22) I should have noted: I cause the "-mcpu=cortex-a7" in my environment. It is not part of a normal FreeBSD build. I grabbed the .meta text from one of my builds and so it had the extra command line option.
(In reply to Mark Millard from comment #22) So much for the -munaligned-access idea: the compiler still generates the ldmib and stmib instructions that have the alignment requirement. Unless there is another compiler option that would lead to avoiding such instructions, this direction looks to be a bust.
(In reply to Mark Millard from comment #24) By the way, I did check if I had the orientation backwards: -mno-unaligned-access also did not change the use of the instructions that require alignment. For reference, -mno-unaligned-access was what my initial note should have referenced: it is what forces byte-by-byte access for fields in packed struct's in order to allow misaligned data in packed struct's: it avoids some forms of unaligned access.
(In reply to Vincent Milum Jr from comment #18) If you look carefully at my patch, I copy two bytes earlier from the USB buffer, because there is always a header first in the USB payload. And then I just strip those two bytes, and it is good! Do you see? --HPS
From the comments I see there may be two issues: One issue in the USB driver (if_ure), to get the IP header aligned. Can we agree on some solution here. I think using m_adj() avoids future problems, because we have some magic mbuf types, and doing custom mbuf manipulation might not be good. Then all USB ethernet drivers should be aligned. Second issue in the ARM area, to allow unaligned memory access. I have my doubts this is possible. Personally I think reading 32-bit memory locations in a single instruction and memory access, is the best! --HPS
Ping - can you also test the patch on ARM? Else I will just test it on AMD64 by asserting the offset before submitting the patch. My patch is basically: Copy ETHER_ALIGN bytes more, and start ETHER_ALIGN bytes earlier in the USB payload. Then m_adj() when the mbuf chain is ready. The mbuf chain is already sized for that. --HPS
(In reply to Hans Petter Selasky from comment #28) It likely will not be me doing testing for now: $work instead.
(In reply to Hans Petter Selasky from comment #27) My take on the patch: it seems a bit cumbersome, but should work. It doesn't add any new direct manipulation of mbuf fields. I don't have hardware to test though. On the second issue, allowing unaligned accesses: I think that would be a bad idea, as it would force many things to use sub-optimal code. Much better to prevent unaligned accesses here, as everywhere else already does.
I just tried the patch provided on my Raspberry Pi 2 B v1.1 (ARMv7) Initial TCP connections are now working (eg: fetching HTTP content, pkg) And now, getting a new panic elsewhere: root@rpi2 ~# speedtest-cli Retrieving speedtest.net configuration... Fatal kernel mode data abort: 'Translation Fault (L1)' on read trapframe: 0xc49a4c80 FSR=00000005, FAR=00000010, spsr=20000013 r0 =d71e8148, r1 =ddb420cc, r2 =00000000, r3 =d71e81fe r4 =000000ba, r5 =00000000, r6 =00000200, r7 =d791fb20 r8 =000000b6, r9 =000000cc, r10=00000002, r11=c49a4d70 r12=00000000, ssp=c49a4d10, slr=c01841d8, pc =ddc129f0 panic: Fatal abort cpuid = 2 time = 1686142856 KDB: stack backtrace: #0 0xc0355b8c at kdb_backtrace+0x48 #1 0xc02fcb6c at vpanic+0x170 #2 0xc02fc9fc at vpanic+0 #3 0xc061bc14 at abort_align+0 #4 0xc061b704 at abort_handler+0x2a0 #5 0xc05fac98 at exception_exit+0 #6 0xddc129f0 at $a.10+0x1c4 #7 0xc019e54c at usbd_callback_wrapper+0x918 #8 0xc019ff48 at usb_command_wrapper+0xd4 #9 0xc019e968 at usb_callback_proc+0x148 #10 0xc0199648 at usb_process+0x134 #11 0xc02b4b54 at fork_exit+0xc0 #12 0xc05fac28 at swi_exit+0 Uptime: 45s Resetting system ...
Hi Vincent, Can you open the kernel using kgdb and then enter: info line *(usbd_callback_wrapper+0x918) You should get something like this (but this is from amd64, so the line number is wrong for sure). Line 2512 of "/vol/freebsd-src/sys/dev/usb/usb_transfer.c" starts at address 0xffffffff8095f042 <usbd_callback_wrapper+2322> and ends at 0xffffffff8095f04f <usbd_callback_wrapper+2335>. This looks like another issue. Let's see if this backtrace is correct! --HPS
^Triage: - de-tag the summary line - slight variation to the GitHub link in comment 0 - whilst here, assign appropriately - normalise the status for assignment to a list.
(In reply to Mark Millard from comment #12) I've gotten the same sort of failure backtrace structure with the built-in EtherNet on the same armv7 system. I did not initially notice the commonality with here. I had written on list . . . While discovered via an attempted overall kyua run, the following is sufficient to get the crash in my native armv7 context: # /usr/bin/kyua test -k /usr/tests/Kyuafile sys/netinet6/exthdr:exthdr sys/netinet6/exthdr:exthdr -> Fatal kernel mode data abort: 'Alignment Fault' on read trapframe: 0xdfb97aa0 FSR=00000001, FAR=db43ab76, spsr=60000013 r0 =dfedd000, r1 =dfb97b34, r2 =00000000, r3 =00000000 r4 =00000000, r5 =00000000, r6 =db43ab76, r7 =db43ab66 r8 =c096383c, r9 =00000000, r10=db132400, r11=dfb97b60 r12=00000000, ssp=dfb97b30, slr=c0b4e2c0, pc =c04e6b70 panic: Fatal abort cpuid = 0 time = 1691131498 KDB: stack backtrace: db_trace_self() at db_trace_self pc = 0xc065f414 lr = 0xc007db80 (db_trace_self_wrapper+0x30) sp = 0xdfb97858 fp = 0xdfb97970 db_trace_self_wrapper() at db_trace_self_wrapper+0x30 pc = 0xc007db80 lr = 0xc031a834 (vpanic+0x140) sp = 0xdfb97978 fp = 0xdfb97998 r4 = 0x00000100 r5 = 0x00000000 r6 = 0xc07c369a r7 = 0xc0b32e58 vpanic() at vpanic+0x140 pc = 0xc031a834 lr = 0xc031a6f4 (vpanic) sp = 0xdfb979a0 fp = 0xdfb979a4 r4 = 0xdfb97aa0 r5 = 0x00000013 r6 = 0xdb43ab76 r7 = 0x00000001 r8 = 0x00000001 r9 = 0xdfedd000 r10 = 0xdb43ab76 vpanic() at vpanic pc = 0xc031a6f4 lr = 0xc06849dc (abort_align) sp = 0xdfb979ac fp = 0xdfb979d8 r4 = 0x00000001 r5 = 0x00000001 r6 = 0xdfedd000 r7 = 0xdb43ab76 r8 = 0xdfb979a4 r9 = 0xc031a6f4 r10 = 0xdfb979ac abort_align() at abort_align pc = 0xc06849dc lr = 0xc0684a50 (abort_align+0x74) sp = 0xdfb979e0 fp = 0xdfb979f8 r4 = 0x00000013 r10 = 0xdb43ab76 abort_align() at abort_align+0x74 pc = 0xc0684a50 lr = 0xc06846a8 (abort_handler+0x45c) sp = 0xdfb97a00 fp = 0xdfb97a98 r4 = 0x00000000 r10 = 0xdb43ab76 abort_handler() at abort_handler+0x45c pc = 0xc06846a8 lr = 0xc0661cc8 (exception_exit) sp = 0xdfb97aa0 fp = 0xdfb97b60 r4 = 0x00000000 r5 = 0x00000000 r6 = 0xdb43ab76 r7 = 0xdb43ab66 r8 = 0xc096383c r9 = 0x00000000 r10 = 0xdb132400 exception_exit() at exception_exit pc = 0xc0661cc8 lr = 0xc0b4e2c0 (__pcpu) sp = 0xdfb97b30 fp = 0xdfb97b60 r0 = 0xdfedd000 r1 = 0xdfb97b34 r2 = 0x00000000 r3 = 0x00000000 r4 = 0x00000000 r5 = 0x00000000 r6 = 0xdb43ab76 r7 = 0xdb43ab66 r8 = 0xc096383c r9 = 0x00000000 r10 = 0xdb132400 r12 = 0x00000000 in6ifa_ifwithaddr() at in6ifa_ifwithaddr+0x30 pc = 0xc04e6b70 lr = 0xc04f9030 (ip6_input+0xd38) sp = 0xdfb97b68 fp = 0xdfb97c28 r4 = 0xdb43ab76 r5 = 0xdb43ab5e r6 = 0x00000000 r7 = 0xdb43ab66 ip6_input() at ip6_input+0xd38 pc = 0xc04f9030 lr = 0xc046d66c (netisr_dispatch_src+0xf8) sp = 0xdfb97c30 fp = 0xdfb97c58 r4 = 0xdb43ab00 r5 = 0x00000006 r6 = 0x00000007 r7 = 0xc0b49d50 r8 = 0xdafea0c0 r9 = 0xdb43ab00 r10 = 0x00000086 netisr_dispatch_src() at netisr_dispatch_src+0xf8 pc = 0xc046d66c lr = 0xc04641b0 (ether_demux+0x18c) sp = 0xdfb97c60 fp = 0xdfb97c78 r4 = 0x00000006 r5 = 0x00001201 r6 = 0xdb132400 r7 = 0x000000ff r8 = 0xdafea0c0 r9 = 0xdb43ab00 r10 = 0x00000086 ether_demux() at ether_demux+0x18c pc = 0xc04641b0 lr = 0xc0465880 (ether_nh_input+0x490) sp = 0xdfb97c80 fp = 0xdfb97ce0 r4 = 0xdb132400 r5 = 0xdb43ab00 r6 = 0xdb43ab50 r10 = 0x00000086 ether_nh_input() at ether_nh_input+0x490 pc = 0xc0465880 lr = 0xc046d66c (netisr_dispatch_src+0xf8) sp = 0xdfb97ce8 fp = 0xdfb97d10 r4 = 0xdb43ab00 r5 = 0x00000005 r6 = 0x0000000c r7 = 0xc0b49d30 r8 = 0xdafea0c0 r9 = 0xdb43ab00 r10 = 0xc098d18f netisr_dispatch_src() at netisr_dispatch_src+0xf8 pc = 0xc046d66c lr = 0xc04645c4 (ether_input+0x50) sp = 0xdfb97d18 fp = 0xdfb97d48 r4 = 0xdb43ab00 r5 = 0x00000000 r6 = 0x00008803 r7 = 0x00000000 r8 = 0xdafea0c0 r9 = 0xdb43ab00 r10 = 0xc098d18f ether_input() at ether_input+0x50 pc = 0xc04645c4 lr = 0xdffb3f08 ($a.10+0x108) sp = 0xdfb97d50 fp = 0xdfb97d78 r4 = 0xdb132400 r5 = 0xdaff8b00 r6 = 0xdaff8b10 r7 = 0x00000000 r8 = 0x00000000 r10 = 0xc098d18f $a.10() at $a.10+0x108 pc = 0xdffb3f08 lr = 0xc038cb2c (taskqueue_run_locked+0x1c4) sp = 0xdfb97d80 fp = 0xdfb97dd8 r4 = 0xe0145100 r5 = 0xdaff8b2c r6 = 0xe0145150 r7 = 0x00000001 r8 = 0x00000000 r9 = 0xdfb97d90 r10 = 0x00000001 taskqueue_run_locked() at taskqueue_run_locked+0x1c4 pc = 0xc038cb2c lr = 0xc038e4e4 (taskqueue_thread_loop+0x1b0) sp = 0xdfb97de0 fp = 0xdfb97e10 r4 = 0xe0145100 r5 = 0xe0145140 r6 = 0xc07af4c4 r7 = 0x00000000 r8 = 0xc098d18f r9 = 0x00000100 r10 = 0xc0b228a0 taskqueue_thread_loop() at taskqueue_thread_loop+0x1b0 pc = 0xc038e4e4 lr = 0xc02cdf0c (fork_exit+0xc0) sp = 0xdfb97e18 fp = 0xdfb97e38 r4 = 0xdfedd000 r5 = 0xc0b224e0 r6 = 0xc038e334 r7 = 0xdffc4f54 r8 = 0xdfb97e40 r9 = 0xc098d191 fork_exit() at fork_exit+0xc0 pc = 0xc02cdf0c lr = 0xc0661c5c (swi_exit) sp = 0xdfb97e40 fp = 0x00000000 r4 = 0xc038e334 r5 = 0xdffc4f54 r6 = 0xc0b45d84 r7 = 0xd73bcba0 r8 = 0x00000001 r10 = 0xc0b228a0 swi_exit() at swi_exit pc = 0xc0661c5c lr = 0xc0661c5c (swi_exit) sp = 0xdfb97e40 fp = 0x00000000 KDB: enter: panic [ thread pid 0 tid 100230 ] For reference: # uname -apKU FreeBSD OPiP2E-RPi2v1p1 14.0-CURRENT FreeBSD 14.0-CURRENT armv7 1400093 #6 main-n264334-215bab7924f6-dirty: Tue Jul 25 23:11:39 PDT 2023 root@CA72-16Gp-ZFS:/usr/obj/BUILDs/main-CA7-nodbg-clang/usr/main-src/arm.armv7/sys/GENERIC-NODBG-CA7 arm armv7 1400093 1400093 The OrangePi+ 2Ed was the type of system booted and tested.
(In reply to Mark Millard from comment #34) Hmm. Getting the problem on the built-in Ethernet did not involve if_ure from what I can tell, looking at the dmsg -a output: . . . awg0: <Allwinner Gigabit Ethernet> mem 0x1c30000-0x1c3ffff irq 27 on simplebus0 miibus0: <MII bus> on awg0 rgephy0: <RTL8169S/8110S/8211 1000BASE-T media interface> PHY 0 on miibus0 rgephy0: none, 10baseT, 10baseT-FDX, 10baseT-FDX-flow, 100baseTX, 100baseTX-FDX, 100baseTX-FDX-flow, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, 1000baseT-FDX-flow, 1000baseT-FDX-flow-master, auto, auto-flow rgephy1: <RTL8169S/8110S/8211 1000BASE-T media interface> PHY 1 on miibus0 rgephy1: none, 10baseT, 10baseT-FDX, 10baseT-FDX-flow, 100baseTX, 100baseTX-FDX, 100baseTX-FDX-flow, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, 1000baseT-FDX-flow, 1000baseT-FDX-flow-master, auto, auto-flow . . . Autoloading module: if_ure ure0 on uhub6 ure0: <Realtek USB 10/100/1000 LAN, class 0/0, rev 2.10/30.00, addr 2> on usbus7 miibus1: <MII bus> on ure0 rgephy2: <RTL8251/8153 1000BASE-T media interface> PHY 0 on miibus1 rgephy2: none, 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT-FDX, 1000baseT-FDX-master, auto ue0: <USB Ethernet> on ure0 if_ure would only be involved with the dongle (ure0), for which no Ethernet cable was attached. So I'm unsure if what I've reported for the OrangePi+2Ed context overall is: A) An independent problem. vs. B) Evidence that the original problem involves more than just potential if_ure issues.
(In reply to Mark Millard from comment #35) Other testing with an official snapshot build (so: debug build) indicates that the failures tied to in6ifa_ifwithaddr involve: Kernel page fault with the following non-sleepable locks held: shared rm in6_ifaddr_lock (in6_ifaddr_lock) r = 0 (0xc0b5acd0) locked @ /usr/src/sys/netinet6/in6.c:1620 stack backtrace: #0 0xc035e060 at witness_debugger+0x74 #1 0xc035f2ec at witness_warn+0x41c #2 0xc0610b58 at abort_handler+0x1d8 #3 0xc05ef6ac at exception_exit+0 #4 0xc04986b4 at in6ifa_ifwithaddr+0x40 #5 0xc04aa060 at ip6_input+0xd38 #6 0xc04235bc at netisr_dispatch_src+0x100 #7 0xc041a384 at ether_demux+0x1bc #8 0xc041bb68 at ether_nh_input+0x3dc #9 0xc04235bc at netisr_dispatch_src+0x100 #10 0xc041a808 at ether_input+0xec #11 0xe183810c at $a.10+0xbc #12 0xc03504dc at taskqueue_run_locked+0xb8 #13 0xc0351560 at taskqueue_thread_loop+0x108 #14 0xc02a384c at fork_exit+0xa0 #15 0xc05ef640 at swi_exit+0 Fatal kernel mode data abort: 'Alignment Fault' on read I've submitted https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=272966 for this. I've also submitted https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=272965 for a different backtrace structure that various tests produce that also include: 'Alignment Fault' on read