Bug 189219 - [dummynet] [patch] using dummynet on sparc64 and configuring a pipe is an insta-panic
Summary: [dummynet] [patch] using dummynet on sparc64 and configuring a pipe is an ins...
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: Unspecified
Hardware: Any Any
: Normal Affects Only Me
Assignee: freebsd-net (Nobody)
URL:
Keywords: crash, needs-qa, patch
Depends on:
Blocks:
 
Reported: 2014-05-02 06:50 UTC by david
Modified: 2017-05-10 22:03 UTC (History)
5 users (show)

See Also:


Attachments
file.diff (1.59 KB, patch)
2014-05-02 06:50 UTC, david
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description david 2014-05-02 06:50:00 UTC
enabling ipfw with dummynet and running ipfw pipe 1 config bw 100mbit
panics the machine on an unaligned memory access

Fix: A bit of a kludge, basically creates a new structure in memory to force
align it, and then copies it all over (after verifying length)
How-To-Repeat: spar64 machine + ipfw + dummynet, then run:
ipfw pipe 1 config bw 100mbit
Comment 1 Mark Linimon freebsd_committer freebsd_triage 2014-05-04 03:28:32 UTC
Responsible Changed
From-To: freebsd-bugs->freebsd-net

Over to maintainer(s).
Comment 2 Michael Moll freebsd_committer freebsd_triage 2015-11-30 22:28:58 UTC
still reproducible with a fresh -CURRENT on sparc64, could a net person have a look into that patch?
Comment 3 Gleb Smirnoff freebsd_committer freebsd_triage 2015-11-30 23:50:32 UTC
IMHO, this bug doesn't deserve re@ special attention.
Comment 4 Hiren Panchasara freebsd_committer freebsd_triage 2015-12-01 06:25:28 UTC
Adding dummynet author/maintainer.
Comment 5 Kurt Lidl freebsd_committer freebsd_triage 2015-12-02 22:35:24 UTC
I was able to replicate this on my sparc V120 machine today,
running a pretty recent head (r291431).

To reproduce this, I issued the following commands on the console:

root@ton-128: /etc/rc.d/ipfw onestart
root@ton-129: kldload dummynet
DUMMYNET 0 with IPv6 initialized (100409)
load_dn_sched dn_sched FIFO loaded
load_dn_sched dn_sched QFQ loaded
load_dn_sched dn_sched RR loaded
load_dn_sched dn_sched WF2Q+ loaded
load_dn_sched dn_sched PRIO loaded
root@ton-130: ipfw pipe 1 config bw 100mbit
panic: trap: memory address not aligned (kernel)
cpuid = 0
KDB: stack backtrace:
vpanic() at vpanic+0x1b4
panic() at panic+0x20
trap() at trap+0x5cc
-- memory address not aligned sfar=0xfffff800015dee7c sfsr=0x40029 %o7=0xc4ac3518 --
userland() at do_config+0x59c
user trace: trap %o7=0xc4ac3518
pc 0xc4ac2d9c, sp 0xee550831
done
KDB: enter: panic
[ thread pid 8997 tid 100587 ]
Stopped at      kdb_enter+0x80: ta              %xcc, 1

Backtrace from the savecore run:

KDB: enter: panic

Reading symbols from /boot/kernel/zfs.ko...Reading symbols from /usr/lib/debug//
boot/kernel/zfs.ko.debug...done.
done.
Loaded symbols for /boot/kernel/zfs.ko
Reading symbols from /boot/kernel/opensolaris.ko...Reading symbols from /usr/lib
/debug//boot/kernel/opensolaris.ko.debug...done.
done.
Loaded symbols for /boot/kernel/opensolaris.ko
Reading symbols from /boot/kernel/geom_mirror.ko...Reading symbols from /usr/lib
/debug//boot/kernel/geom_mirror.ko.debug...done.
done.
Loaded symbols for /boot/kernel/geom_mirror.ko
Reading symbols from /boot/kernel/tmpfs.ko...Reading symbols from /usr/lib/debug
//boot/kernel/tmpfs.ko.debug...done.
done.
Loaded symbols for /boot/kernel/ipfw.ko
Reading symbols from /boot/kernel/dummynet.ko...Reading symbols from /usr/lib/debug//boot/kernel/dummynet.ko.debug...done.
done.
Loaded symbols for /boot/kernel/dummynet.ko
#0  0x00000000c05dd3a0 in doadump (textdump=0)
    at /usr/src/sys/kern/kern_shutdown.c:295
295             savectx(&dumppcb);
(kgdb) #0  0x00000000c05dd3a0 in doadump (textdump=0)
    at /usr/src/sys/kern/kern_shutdown.c:295
#1  0x00000000c011c8d0 in db_dump (dummy=3227700832, dummy2=false, dummy3=-1, 
    dummy4=0xee5504e0 "") at /usr/src/sys/ddb/db_command.c:533
#2  0x00000000c011ba04 in db_command (last_cmdp=0xc0d11880, cmd_table=0x0, 
    dopager=1) at /usr/src/sys/ddb/db_command.c:440
#3  0x00000000c011bd14 in db_command_loop ()
    at /usr/src/sys/ddb/db_command.c:493
#4  0x00000000c011fb74 in db_trap (type=<value optimized out>, code=0)
    at /usr/src/sys/ddb/db_main.c:251
#5  0x00000000c062d70c in kdb_trap (type=107, code=0, tf=0xee5509f0)
    at /usr/src/sys/kern/subr_kdb.c:654
#6  0x00000000c09df4e0 in trap (tf=0xee5509f0)
    at /usr/src/sys/sparc64/sparc64/trap.c:344
#7  0x00000000c00b1080 in tl1_trap ()
#8  0x00000000c062ce60 in kdb_enter (why=0x12 <Address 0x12 out of bounds>, 
    msg=0xc0bcfdc0 "panic") at /usr/src/sys/kern/subr_kdb.c:442
#9  0x00000000c062ce48 in kdb_enter (why=0xc0bcfdc0 "panic", 
    msg=0xc0bcfdc0 "panic") at /usr/src/sys/kern/subr_kdb.c:441
#10 0x00000000c05dde00 in vpanic (fmt=0xc0c1b9a8 "trap: %s (kernel)", 
    ap=0xee550dc8) at /usr/src/sys/kern/kern_shutdown.c:750
#11 0x00000000c05dde68 in panic (fmt=0xc0c1b9a8 "trap: %s (kernel)")
    at /usr/src/sys/kern/kern_shutdown.c:688
#12 0x00000000c09df754 in trap (tf=0xee550f30)
    at /usr/src/sys/sparc64/sparc64/trap.c:410
#13 0x00000000c00b1080 in tl1_trap ()
#14 0x00000000c4ac2d9c in do_config (p=<value optimized out>, l=0)
    at /usr/src/sys/modules/dummynet/../../netpfil/ipfw/ip_dummynet.c:1235
#15 0x00000000c4ac3520 in do_config (p=<value optimized out>, l=152)
    at /usr/src/sys/modules/dummynet/../../netpfil/ipfw/ip_dummynet.c:1540
#16 0x00000000c4ac3c40 in ip_dn_ctl (sopt=0xee551580)
    at /usr/src/sys/modules/dummynet/../../netpfil/ipfw/ip_dummynet.c:2112
#17 0x00000000c078a8dc in rip_ctloutput (so=0xfffff800055d2e88, 
    sopt=0xee551580) at /usr/src/sys/netinet/raw_ip.c:661
#18 0x00000000c0688814 in sosetopt (so=0xfffff800055d2e88, sopt=0xee551580)
    at /usr/src/sys/kern/uipc_socket.c:2493
#19 0x00000000c0690930 in kern_setsockopt (td=0xfffff80096b0a9a0, s=3, 
    level=0, name=49, val=0x40a18000, valseg=UIO_USERSPACE, 
    valsize=<value optimized out>) at /usr/src/sys/kern/uipc_syscalls.c:1459
#20 0x00000000c06909e8 in sys_setsockopt (td=0xfffff80096b0a9a0, 
    uap=0xee551768) at /usr/src/sys/kern/uipc_syscalls.c:1413
#21 0x00000000c09de68c in syscall (tf=0xee551880) at subr_syscall.c:140
#22 0x00000000c00b0e60 in tl0_intr ()
#23 0x0000000000000000 in ?? ()
(kgdb)
Comment 6 Marius Strobl freebsd_committer freebsd_triage 2015-12-03 01:00:21 UTC
(In reply to Michael Moll from comment #2)

That patch is overly complicated for solving the unaligned access triggered by config_link(); it's way simpler to just fix do_config() to properly align dn_link before passing it on in the first place. Moreover, the latter also is the actual culprit here as do_config() casts a random chunk of memory to a struct dn_link pointer. Note, though, that the config_link() triggered unaligned access is only one instance of dummynet(4) misaligning dn_link. The following is a complete fix in that regard:
http://people.freebsd.org/~marius/dummynet_unfuck_dn_link.diff

However, the same incorrect patterns are used for virtually all of dn_<foo> and additionally in some other stuff like flow ID related functions. So someone still needs to sit down and go through the entirety of ip_dummynet.c, fixing all other unaligned accesses.
Comment 7 Kubilay Kocak freebsd_committer freebsd_triage 2015-12-03 07:11:55 UTC
Issue shouldn't be 'In Progress' without an Assignee. reset. Who wants to pick this up?
Comment 8 Kurt Lidl freebsd_committer freebsd_triage 2015-12-04 00:53:19 UTC
Applying Marius' patch fixes up the ip_dummynet code at least enough
to make the trivial configuration panic stop happening.

Applying the patch to r291431, I can now do the following:

root@ton-129: kldload dummynet
ipfw2 (+ipv6) initialized, divert loadable, nat loadable, default to deny, logging disabled
DUMMYNET 0 with IPv6 initialized (100409)
load_dn_sched dn_sched FIFO loaded
load_dn_sched dn_sched QFQ loaded
load_dn_sched dn_sched RR loaded
load_dn_sched dn_sched WF2Q+ loaded
load_dn_sched dn_sched PRIO loaded
root@ton-130: /etc/rc.d/ipfw onestart
ipfw: setsockopt(IP_FW_XDEL): Invalid argument
ipfw: getsockopt(IP_FW_XADD): Invalid argument
ipfw: getsockopt(IP_FW_XADD): Invalid argument
ipfw: getsockopt(IP_FW_XADD): Invalid argument
ipfw: getsockopt(IP_FW_XADD): Invalid argument
ipfw: getsockopt(IP_FW_XADD): Invalid argument
ipfw: getsockopt(IP_FW_XADD): Invalid argument
ipfw: getsockopt(IP_FW_XADD): Invalid argument
ipfw: getsockopt(IP_FW_XADD): Invalid argument
ipfw: getsockopt(IP_FW_XADD): Invalid argument
ipfw: getsockopt(IP_FW_XADD): Invalid argument
Firewall rules loaded.
root@ton-131: ipfw pipe 1 config bw 100mbit
root@ton-132: /etc/rc.d/ipfw onestop
net.inet.ip.fw.enable: 1 -> 0
net.inet6.ip6.fw.enable: 1 -> 0
Comment 9 commit-hook freebsd_committer freebsd_triage 2017-01-09 20:52:28 UTC
A commit references this bug:

Author: marius
Date: Mon Jan  9 20:51:51 UTC 2017
New revision: 311817
URL: https://svnweb.freebsd.org/changeset/base/311817

Log:
  In dummynet(4), random chunks of memory are casted to struct dn_*,
  potentially leading to fatal unaligned accesses on architectures with
  strict alignment requirements. This change fixes dummynet(4) as far
  as accesses to 64-bit members of struct dn_* are concerned, tripping
  up on sparc64 with accesses to 32-bit members happening to be correctly
  aligned there. In other words, this only fixes the tip of the iceberg;
  larger parts of dummynet(4) still need to be rewritten in order to
  properly work on all of !x86.
  In principle, considering the amount of code in dummynet(4) that needs
  this erroneous pattern corrected, an acceptable workaround would be to
  declare all struct dn_* packed, forcing compilers to do byte-accesses
  as a side-effect. However, given that the structs in question aren't
  laid out well either, this would break ABI/KBI.
  While at it, replace all existing bcopy(9) calls with memcpy(9) for
  performance reasons, as there is no need to check for overlap in these
  cases.

  PR:		189219
  MFC after:	5 days

Changes:
  head/sys/netpfil/ipfw/ip_dummynet.c
Comment 10 commit-hook freebsd_committer freebsd_triage 2017-05-10 20:47:30 UTC
A commit references this bug:

Author: marius
Date: Wed May 10 20:46:55 UTC 2017
New revision: 318154
URL: https://svnweb.freebsd.org/changeset/base/318154

Log:
  MFC: r311817

  In dummynet(4), random chunks of memory are casted to struct dn_*,
  potentially leading to fatal unaligned accesses on architectures with
  strict alignment requirements. This change fixes dummynet(4) as far
  as accesses to 64-bit members of struct dn_* are concerned, tripping
  up on sparc64 with accesses to 32-bit members happening to be correctly
  aligned there. In other words, this only fixes the tip of the iceberg;
  larger parts of dummynet(4) still need to be rewritten in order to
  properly work on all of !x86.
  In principle, considering the amount of code in dummynet(4) that needs
  this erroneous pattern corrected, an acceptable workaround would be to
  declare all struct dn_* packed, forcing compilers to do byte-accesses
  as a side-effect. However, given that the structs in question aren't
  laid out well either, this would break ABI/KBI.
  While at it, replace all existing bcopy(9) calls with memcpy(9) for
  performance reasons, as there is no need to check for overlap in these
  cases.

  PR:		189219

Changes:
_U  stable/11/
  stable/11/sys/netpfil/ipfw/ip_dummynet.c
Comment 11 commit-hook freebsd_committer freebsd_triage 2017-05-10 20:47:33 UTC
A commit references this bug:

Author: marius
Date: Wed May 10 20:46:59 UTC 2017
New revision: 318155
URL: https://svnweb.freebsd.org/changeset/base/318155

Log:
  MFC: r311817

  In dummynet(4), random chunks of memory are casted to struct dn_*,
  potentially leading to fatal unaligned accesses on architectures with
  strict alignment requirements. This change fixes dummynet(4) as far
  as accesses to 64-bit members of struct dn_* are concerned, tripping
  up on sparc64 with accesses to 32-bit members happening to be correctly
  aligned there. In other words, this only fixes the tip of the iceberg;
  larger parts of dummynet(4) still need to be rewritten in order to
  properly work on all of !x86.
  In principle, considering the amount of code in dummynet(4) that needs
  this erroneous pattern corrected, an acceptable workaround would be to
  declare all struct dn_* packed, forcing compilers to do byte-accesses
  as a side-effect. However, given that the structs in question aren't
  laid out well either, this would break ABI/KBI.
  While at it, replace all existing bcopy(9) calls with memcpy(9) for
  performance reasons, as there is no need to check for overlap in these
  cases.

  PR:		189219

Changes:
_U  stable/10/
  stable/10/sys/netpfil/ipfw/ip_dummynet.c