enabling ipfw with dummynet and running ipfw pipe 1 config bw 100mbit panics the machine on an unaligned memory access Fix: A bit of a kludge, basically creates a new structure in memory to force align it, and then copies it all over (after verifying length) How-To-Repeat: spar64 machine + ipfw + dummynet, then run: ipfw pipe 1 config bw 100mbit
Responsible Changed From-To: freebsd-bugs->freebsd-net Over to maintainer(s).
still reproducible with a fresh -CURRENT on sparc64, could a net person have a look into that patch?
IMHO, this bug doesn't deserve re@ special attention.
Adding dummynet author/maintainer.
I was able to replicate this on my sparc V120 machine today, running a pretty recent head (r291431). To reproduce this, I issued the following commands on the console: root@ton-128: /etc/rc.d/ipfw onestart root@ton-129: kldload dummynet DUMMYNET 0 with IPv6 initialized (100409) load_dn_sched dn_sched FIFO loaded load_dn_sched dn_sched QFQ loaded load_dn_sched dn_sched RR loaded load_dn_sched dn_sched WF2Q+ loaded load_dn_sched dn_sched PRIO loaded root@ton-130: ipfw pipe 1 config bw 100mbit panic: trap: memory address not aligned (kernel) cpuid = 0 KDB: stack backtrace: vpanic() at vpanic+0x1b4 panic() at panic+0x20 trap() at trap+0x5cc -- memory address not aligned sfar=0xfffff800015dee7c sfsr=0x40029 %o7=0xc4ac3518 -- userland() at do_config+0x59c user trace: trap %o7=0xc4ac3518 pc 0xc4ac2d9c, sp 0xee550831 done KDB: enter: panic [ thread pid 8997 tid 100587 ] Stopped at kdb_enter+0x80: ta %xcc, 1 Backtrace from the savecore run: KDB: enter: panic Reading symbols from /boot/kernel/zfs.ko...Reading symbols from /usr/lib/debug// boot/kernel/zfs.ko.debug...done. done. Loaded symbols for /boot/kernel/zfs.ko Reading symbols from /boot/kernel/opensolaris.ko...Reading symbols from /usr/lib /debug//boot/kernel/opensolaris.ko.debug...done. done. Loaded symbols for /boot/kernel/opensolaris.ko Reading symbols from /boot/kernel/geom_mirror.ko...Reading symbols from /usr/lib /debug//boot/kernel/geom_mirror.ko.debug...done. done. Loaded symbols for /boot/kernel/geom_mirror.ko Reading symbols from /boot/kernel/tmpfs.ko...Reading symbols from /usr/lib/debug //boot/kernel/tmpfs.ko.debug...done. done. Loaded symbols for /boot/kernel/ipfw.ko Reading symbols from /boot/kernel/dummynet.ko...Reading symbols from /usr/lib/debug//boot/kernel/dummynet.ko.debug...done. done. Loaded symbols for /boot/kernel/dummynet.ko #0 0x00000000c05dd3a0 in doadump (textdump=0) at /usr/src/sys/kern/kern_shutdown.c:295 295 savectx(&dumppcb); (kgdb) #0 0x00000000c05dd3a0 in doadump (textdump=0) at /usr/src/sys/kern/kern_shutdown.c:295 #1 0x00000000c011c8d0 in db_dump (dummy=3227700832, dummy2=false, dummy3=-1, dummy4=0xee5504e0 "") at /usr/src/sys/ddb/db_command.c:533 #2 0x00000000c011ba04 in db_command (last_cmdp=0xc0d11880, cmd_table=0x0, dopager=1) at /usr/src/sys/ddb/db_command.c:440 #3 0x00000000c011bd14 in db_command_loop () at /usr/src/sys/ddb/db_command.c:493 #4 0x00000000c011fb74 in db_trap (type=<value optimized out>, code=0) at /usr/src/sys/ddb/db_main.c:251 #5 0x00000000c062d70c in kdb_trap (type=107, code=0, tf=0xee5509f0) at /usr/src/sys/kern/subr_kdb.c:654 #6 0x00000000c09df4e0 in trap (tf=0xee5509f0) at /usr/src/sys/sparc64/sparc64/trap.c:344 #7 0x00000000c00b1080 in tl1_trap () #8 0x00000000c062ce60 in kdb_enter (why=0x12 <Address 0x12 out of bounds>, msg=0xc0bcfdc0 "panic") at /usr/src/sys/kern/subr_kdb.c:442 #9 0x00000000c062ce48 in kdb_enter (why=0xc0bcfdc0 "panic", msg=0xc0bcfdc0 "panic") at /usr/src/sys/kern/subr_kdb.c:441 #10 0x00000000c05dde00 in vpanic (fmt=0xc0c1b9a8 "trap: %s (kernel)", ap=0xee550dc8) at /usr/src/sys/kern/kern_shutdown.c:750 #11 0x00000000c05dde68 in panic (fmt=0xc0c1b9a8 "trap: %s (kernel)") at /usr/src/sys/kern/kern_shutdown.c:688 #12 0x00000000c09df754 in trap (tf=0xee550f30) at /usr/src/sys/sparc64/sparc64/trap.c:410 #13 0x00000000c00b1080 in tl1_trap () #14 0x00000000c4ac2d9c in do_config (p=<value optimized out>, l=0) at /usr/src/sys/modules/dummynet/../../netpfil/ipfw/ip_dummynet.c:1235 #15 0x00000000c4ac3520 in do_config (p=<value optimized out>, l=152) at /usr/src/sys/modules/dummynet/../../netpfil/ipfw/ip_dummynet.c:1540 #16 0x00000000c4ac3c40 in ip_dn_ctl (sopt=0xee551580) at /usr/src/sys/modules/dummynet/../../netpfil/ipfw/ip_dummynet.c:2112 #17 0x00000000c078a8dc in rip_ctloutput (so=0xfffff800055d2e88, sopt=0xee551580) at /usr/src/sys/netinet/raw_ip.c:661 #18 0x00000000c0688814 in sosetopt (so=0xfffff800055d2e88, sopt=0xee551580) at /usr/src/sys/kern/uipc_socket.c:2493 #19 0x00000000c0690930 in kern_setsockopt (td=0xfffff80096b0a9a0, s=3, level=0, name=49, val=0x40a18000, valseg=UIO_USERSPACE, valsize=<value optimized out>) at /usr/src/sys/kern/uipc_syscalls.c:1459 #20 0x00000000c06909e8 in sys_setsockopt (td=0xfffff80096b0a9a0, uap=0xee551768) at /usr/src/sys/kern/uipc_syscalls.c:1413 #21 0x00000000c09de68c in syscall (tf=0xee551880) at subr_syscall.c:140 #22 0x00000000c00b0e60 in tl0_intr () #23 0x0000000000000000 in ?? () (kgdb)
(In reply to Michael Moll from comment #2) That patch is overly complicated for solving the unaligned access triggered by config_link(); it's way simpler to just fix do_config() to properly align dn_link before passing it on in the first place. Moreover, the latter also is the actual culprit here as do_config() casts a random chunk of memory to a struct dn_link pointer. Note, though, that the config_link() triggered unaligned access is only one instance of dummynet(4) misaligning dn_link. The following is a complete fix in that regard: http://people.freebsd.org/~marius/dummynet_unfuck_dn_link.diff However, the same incorrect patterns are used for virtually all of dn_<foo> and additionally in some other stuff like flow ID related functions. So someone still needs to sit down and go through the entirety of ip_dummynet.c, fixing all other unaligned accesses.
Issue shouldn't be 'In Progress' without an Assignee. reset. Who wants to pick this up?
Applying Marius' patch fixes up the ip_dummynet code at least enough to make the trivial configuration panic stop happening. Applying the patch to r291431, I can now do the following: root@ton-129: kldload dummynet ipfw2 (+ipv6) initialized, divert loadable, nat loadable, default to deny, logging disabled DUMMYNET 0 with IPv6 initialized (100409) load_dn_sched dn_sched FIFO loaded load_dn_sched dn_sched QFQ loaded load_dn_sched dn_sched RR loaded load_dn_sched dn_sched WF2Q+ loaded load_dn_sched dn_sched PRIO loaded root@ton-130: /etc/rc.d/ipfw onestart ipfw: setsockopt(IP_FW_XDEL): Invalid argument ipfw: getsockopt(IP_FW_XADD): Invalid argument ipfw: getsockopt(IP_FW_XADD): Invalid argument ipfw: getsockopt(IP_FW_XADD): Invalid argument ipfw: getsockopt(IP_FW_XADD): Invalid argument ipfw: getsockopt(IP_FW_XADD): Invalid argument ipfw: getsockopt(IP_FW_XADD): Invalid argument ipfw: getsockopt(IP_FW_XADD): Invalid argument ipfw: getsockopt(IP_FW_XADD): Invalid argument ipfw: getsockopt(IP_FW_XADD): Invalid argument ipfw: getsockopt(IP_FW_XADD): Invalid argument Firewall rules loaded. root@ton-131: ipfw pipe 1 config bw 100mbit root@ton-132: /etc/rc.d/ipfw onestop net.inet.ip.fw.enable: 1 -> 0 net.inet6.ip6.fw.enable: 1 -> 0
A commit references this bug: Author: marius Date: Mon Jan 9 20:51:51 UTC 2017 New revision: 311817 URL: https://svnweb.freebsd.org/changeset/base/311817 Log: In dummynet(4), random chunks of memory are casted to struct dn_*, potentially leading to fatal unaligned accesses on architectures with strict alignment requirements. This change fixes dummynet(4) as far as accesses to 64-bit members of struct dn_* are concerned, tripping up on sparc64 with accesses to 32-bit members happening to be correctly aligned there. In other words, this only fixes the tip of the iceberg; larger parts of dummynet(4) still need to be rewritten in order to properly work on all of !x86. In principle, considering the amount of code in dummynet(4) that needs this erroneous pattern corrected, an acceptable workaround would be to declare all struct dn_* packed, forcing compilers to do byte-accesses as a side-effect. However, given that the structs in question aren't laid out well either, this would break ABI/KBI. While at it, replace all existing bcopy(9) calls with memcpy(9) for performance reasons, as there is no need to check for overlap in these cases. PR: 189219 MFC after: 5 days Changes: head/sys/netpfil/ipfw/ip_dummynet.c
A commit references this bug: Author: marius Date: Wed May 10 20:46:55 UTC 2017 New revision: 318154 URL: https://svnweb.freebsd.org/changeset/base/318154 Log: MFC: r311817 In dummynet(4), random chunks of memory are casted to struct dn_*, potentially leading to fatal unaligned accesses on architectures with strict alignment requirements. This change fixes dummynet(4) as far as accesses to 64-bit members of struct dn_* are concerned, tripping up on sparc64 with accesses to 32-bit members happening to be correctly aligned there. In other words, this only fixes the tip of the iceberg; larger parts of dummynet(4) still need to be rewritten in order to properly work on all of !x86. In principle, considering the amount of code in dummynet(4) that needs this erroneous pattern corrected, an acceptable workaround would be to declare all struct dn_* packed, forcing compilers to do byte-accesses as a side-effect. However, given that the structs in question aren't laid out well either, this would break ABI/KBI. While at it, replace all existing bcopy(9) calls with memcpy(9) for performance reasons, as there is no need to check for overlap in these cases. PR: 189219 Changes: _U stable/11/ stable/11/sys/netpfil/ipfw/ip_dummynet.c
A commit references this bug: Author: marius Date: Wed May 10 20:46:59 UTC 2017 New revision: 318155 URL: https://svnweb.freebsd.org/changeset/base/318155 Log: MFC: r311817 In dummynet(4), random chunks of memory are casted to struct dn_*, potentially leading to fatal unaligned accesses on architectures with strict alignment requirements. This change fixes dummynet(4) as far as accesses to 64-bit members of struct dn_* are concerned, tripping up on sparc64 with accesses to 32-bit members happening to be correctly aligned there. In other words, this only fixes the tip of the iceberg; larger parts of dummynet(4) still need to be rewritten in order to properly work on all of !x86. In principle, considering the amount of code in dummynet(4) that needs this erroneous pattern corrected, an acceptable workaround would be to declare all struct dn_* packed, forcing compilers to do byte-accesses as a side-effect. However, given that the structs in question aren't laid out well either, this would break ABI/KBI. While at it, replace all existing bcopy(9) calls with memcpy(9) for performance reasons, as there is no need to check for overlap in these cases. PR: 189219 Changes: _U stable/10/ stable/10/sys/netpfil/ipfw/ip_dummynet.c