Bug 205743 - null pointer dereference in PF running a vimage jail
Summary: null pointer dereference in PF running a vimage jail
Status: Closed Overcome By Events
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: amd64 Any
: --- Affects Many People
Assignee: Bjoern A. Zeeb
URL:
Keywords: vimage
Depends on:
Blocks:
 
Reported: 2015-12-30 22:48 UTC by gila
Modified: 2018-11-02 14:42 UTC (History)
4 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description gila 2015-12-30 22:48:59 UTC
Running the following jail on -CURRENT:

# cat /etc/jail.conf

allow.raw_sockets = "1";
allow.set_hostname = "0";
allow.sysvipc = "1";

test {
	host.hostname = "test.bsdvm";
	vnet = "new";
	vnet.interface  = "em1", "em2";
	devfs_ruleset = 4;
	allow.raw_sockets = 1;
	allow.mount.devfs = 1;
	allow.mount = 1;
	allow.sysvipc = 1;
	persist;
}

The devfs ruleset is copied for /etc/defaults and modified to expose bp* and pf* devices.

Then within the jail:

ext_if="em1"
int_if="em2"

# options
set block-policy return
set loginterface $ext_if

set skip on lo

# scrub
scrub in

# nat/rdr
nat on $ext_if inet from !($ext_if) -> ($ext_if:0)
nat-anchor "ftp-proxy/*"
rdr-anchor "ftp-proxy/*"

block in all

pass out

anchor "ftp-proxy/*"
antispoof quick for { lo $int_if }
pass in inet proto icmp all icmp-type $icmp_types
pass quick on $int_if no state


Causes a 100% reproducible panic:

Fatal double fault
rip = 0xffffffff80e484a8
rsp = 0xfffffe0230ea0fd0
rbp = 0xfffffe0230ea1000
cpuid = 4; apic id = 05
panic: double fault
cpuid = 4
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2c/frame 0xfffffe0227dd8ce0
kdb_backtrace() at kdb_backtrace+0x53/frame 0xfffffe0227dd8db0
vpanic() at vpanic+0x249/frame 0xfffffe0227dd8e80
vpanic() at vpanic/frame 0xfffffe0227dd8ee0
dblfault_handler() at dblfault_handler+0x10a/frame 0xfffffe0227dd8f30
Xdblfault() at Xdblfault+0xac/frame 0xfffffe0227dd8f30
--- trap 0x17, rip = 0xffffffff80e484a8, rsp = 0xfffffe0230ea0fd0, rbp = 0xfffffe0230ea1000 ---
vtterm_cursor() at vtterm_cursor+0x8/frame 0xfffffe0230ea1000
termteken_cursor() at termteken_cursor+0x37/frame 0xfffffe0230ea1030
teken_funcs_cursor() at teken_funcs_cursor+0x3b/frame 0xfffffe0230ea1050
teken_subr_carriage_return() at teken_subr_carriage_return+0x2c/frame 0xfffffe0230ea1070
teken_input_char() at teken_input_char+0x166/frame 0xfffffe0230ea10b0
teken_input_byte() at teken_input_byte+0x50/frame 0xfffffe0230ea10d0
teken_input() at teken_input+0x52/frame 0xfffffe0230ea1100
termcn_cnputc() at termcn_cnputc+0x1c8/frame 0xfffffe0230ea11b0
cnputc() at cnputc+0x90/frame 0xfffffe0230ea11f0
cnputs() at cnputs+0x154/frame 0xfffffe0230ea1230
putbuf() at putbuf+0x15f/frame 0xfffffe0230ea1260
putchar() at putchar+0xb0/frame 0xfffffe0230ea12a0
kvprintf() at kvprintf+0x15a/frame 0xfffffe0230ea1790
_vprintf() at _vprintf+0xb9/frame 0xfffffe0230ea1890
vprintf() at vprintf+0x2d/frame 0xfffffe0230ea18c0
printf() at printf+0x4b/frame 0xfffffe0230ea1930
trap_fatal() at trap_fatal+0xf5/frame 0xfffffe0230ea1a50
trap_pfault() at trap_pfault+0x188/frame 0xfffffe0230ea1b50
trap() at trap+0x7a9/frame 0xfffffe0230ea1e90
trap_check() at trap_check+0x4a/frame 0xfffffe0230ea1eb0
calltrap() at calltrap+0x8/frame 0xfffffe0230ea1eb0
--- trap 0xc, rip = 0xffffffff8168e17f, rsp = 0xfffffe0230ea1f80, rbp = 0xfffffe0230ea1fb0 ---
pf_begin_rules() at pf_begin_rules+0x6f/frame 0xfffffe0230ea1fb0
pfioctl() at pfioctl+0xb35a/frame 0xfffffe0230ea42e0
devfs_ioctl_f() at devfs_ioctl_f+0x19c/frame 0xfffffe0230ea4420
fo_ioctl() at fo_ioctl+0x4c/frame 0xfffffe0230ea4460
kern_ioctl() at kern_ioctl+0x3c3/frame 0xfffffe0230ea45b0
sys_ioctl() at sys_ioctl+0x2b8/frame 0xfffffe0230ea4690
syscallenter() at syscallenter+0xcfa/frame 0xfffffe0230ea4990
amd64_syscall() at amd64_syscall+0x2a/frame 0xfffffe0230ea4ab0
Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe0230ea4ab0

(kgdb) up 36
#36 0xffffffff8168e17f in pf_begin_rules (ticket=0xfffff801e2162404, rs_num=0x0, anchor=0xfffff801e2162004 "")
    at /usr/src/sys/netpfil/pf/pf_ioctl.c:745
745		while ((rule = TAILQ_FIRST(rs->rules[rs_num].inactive.ptr)) != NULL) {
(kgdb) l
740		if (rs_num < 0 || rs_num >= PF_RULESET_MAX)
741			return (EINVAL);
742		rs = pf_find_or_create_ruleset(anchor);
743		if (rs == NULL)
744			return (EINVAL);
745		while ((rule = TAILQ_FIRST(rs->rules[rs_num].inactive.ptr)) != NULL) {
746			pf_unlink_rule(rs->rules[rs_num].inactive.ptr, rule);
747			rs->rules[rs_num].inactive.rcount--;
748		}
749		*ticket = ++rs->rules[rs_num].inactive.ticket;

(kgdb) print rs->rules[0]
$10 = {
  queues = 0xfffffe0001f8dd28,
  active = {
    ptr = 0x0,
    ptr_array = 0x0,
    rcount = 0x0,
    ticket = 0x0,
    open = 0x0
  },
  inactive = {
    ptr = 0x0,
    ptr_array = 0x0,
    rcount = 0x0,
    ticket = 0x0,
    open = 0x0
  }
}


The TAILQ_FIRST macro tries to deference it a pointer which is per above, NULL. The idea was to run PF in a jail and have it do routing for other jails.

Apologies for not knowing if there are ways to "format" the pastes.
Comment 1 Michael Moll freebsd_committer freebsd_triage 2015-12-31 19:31:06 UTC
Not sure about it, but does https://reviews.freebsd.org/D1944 fix this panic?
Comment 2 gila 2015-12-31 19:48:36 UTC
Im kind new to bugs.freebsd -- but I assume you want me to apply the patch I get when I click "download raw diff" from that particular bug, yes? Or should I pick a particular revision? 

In either case I can try this and report back, thanks.
Comment 3 Michael Moll freebsd_committer freebsd_triage 2015-12-31 20:06:28 UTC
exactly, "Download Raw Diff", patch your sources with it, rebuild the system and try to reproduce. Thanks!
Comment 4 gila 2015-12-31 20:56:46 UTC
I've applied the patch against 0efa1469be94566c09b9f4ce538c28e92d26026c and there is another panic.

(kgdb) bt
#0  doadump (textdump=0x1) at pcpu.h:221
During symbol reading, Incomplete CFI data; unspecified registers at 0xffffffff80a9ed76.
#1  0xffffffff80a9eaa3 in kern_reboot (howto=0x104) at /usr/src/sys/kern/kern_shutdown.c:364
#2  0xffffffff80a9f00b in vpanic (fmt=<value optimized out>, ap=<value optimized out>)
    at /usr/src/sys/kern/kern_shutdown.c:757
#3  0xffffffff80a9ee43 in panic (fmt=0x0) at /usr/src/sys/kern/kern_shutdown.c:688
#4  0xffffffff8038a3b7 in db_panic (addr=<value optimized out>, have_addr=0x0, count=0x0, modif=0x0)
    at /usr/src/sys/ddb/db_command.c:473
#5  0xffffffff8038993e in db_command (cmd_table=0x0) at /usr/src/sys/ddb/db_command.c:440
#6  0xffffffff803896d4 in db_command_loop () at /usr/src/sys/ddb/db_command.c:493
#7  0xffffffff8038c1db in db_trap (type=<value optimized out>, code=0x0) at /usr/src/sys/ddb/db_main.c:251
#8  0xffffffff80ae3803 in kdb_trap (type=0xc, code=0x0, tf=<value optimized out>) at /usr/src/sys/kern/subr_kdb.c:654
#9  0xffffffff80f8e711 in trap_fatal (frame=0xfffffe0231d4e1c0, eva=<value optimized out>)
    at /usr/src/sys/amd64/amd64/trap.c:829
#10 0xffffffff80f8e944 in trap_pfault (frame=0xfffffe0231d4e1c0, usermode=<value optimized out>)
    at /usr/src/sys/amd64/amd64/trap.c:684
#11 0xffffffff80f8e0fe in trap (frame=0xfffffe0231d4e1c0) at /usr/src/sys/amd64/amd64/trap.c:435
#12 0xffffffff80f71337 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:234
#13 0xffffffff80d22752 in pfsync_clear_states (creatorid=<value optimized out>, ifname=0x0)
    at /usr/src/sys/netpfil/pf/if_pfsync.c:1973
#14 0xffffffff80d3bac5 in pfioctl (dev=<value optimized out>, cmd=<value optimized out>, addr=0xfffff80006f62500 "",
    flags=<value optimized out>, td=<value optimized out>) at /usr/src/sys/netpfil/pf/pf_ioctl.c:1692
#15 0xffffffff8095a9ab in devfs_ioctl_f (fp=0xfffff800068e12d0, com=0xc0e04412, data=0xfffff80006f62500,
    cred=<value optimized out>, td=0xfffff8004649e000) at /usr/src/sys/fs/devfs/devfs_vnops.c:813
#16 0xffffffff80b00a3c in kern_ioctl (td=0xfffff8004649e000, fd=<value optimized out>, com=0x0,
    data=0xfffff80006f62500 "") at file.h:324
#17 0xffffffff80b005be in sys_ioctl (td=0xfffff8004649e000, uap=0xfffffe0231d4ea40)
    at /usr/src/sys/kern/sys_generic.c:723
#18 0xffffffff80f8f0e8 in amd64_syscall (td=0xfffff8004649e000, traced=0x0) at subr_syscall.c:135
#19 0xffffffff80f7161b in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:394
#20 0x0000000800de94ba in ?? ()

Now the panic occurs in pfsync_clear_states()
Comment 5 Bjoern A. Zeeb freebsd_committer freebsd_triage 2016-01-01 15:08:05 UTC
(In reply to gila from comment #4)

What's the actual panic from the latter one?
Comment 6 gila 2016-01-01 15:43:49 UTC
(In reply to Bjoern A. Zeeb from comment #5)

I pasted it in the comment -- right above where I say it panics in pfsync_clear_states() did that not come across? 

Let me repast the relevant frames:

#12 0xffffffff80f71337 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:234
#13 0xffffffff80d22752 in pfsync_clear_states (creatorid=<value optimized out>, ifname=0x0)
    at /usr/src/sys/netpfil/pf/if_pfsync.c:1973
#14 0xffffffff80d3bac5 in pfioctl (dev=<value optimized out>, cmd=<value optimized out>, addr=0xfffff80006f62500 "",
    flags=<value optimized out>, td=<value optimized out>) at /usr/src/sys/netpfil/pf/pf_ioctl.c:1692
#15 0xffffffff8095a9ab in devfs_ioctl_f (fp=0xfffff800068e12d0, com=0xc0e04412, data=0xfffff80006f62500,
    cred=<value optimized out>, td=0xfffff8004649e000) at /usr/src/sys/fs/devfs/devfs_vnops.c:813
#16 0xffffffff80b00a3c in kern_ioctl (td=0xfffff8004649e000, fd=<value optimized out>, com=0x0,
Comment 7 gila 2016-01-01 22:00:21 UTC
I've just checked out latest and greatest from master and reapplied the patch and I get the same panic (again to make sure -- the second panic after applying the patch). 

1955	static void
1956	pfsync_clear_states(u_int32_t creatorid, const char *ifname)
1957	{
1958		struct pfsync_softc *sc = V_pfsyncif;
1959		struct {
1960			struct pfsync_subheader subh;
1961			struct pfsync_clr clr;

sc is NULL here and things blow up when we try to acquire the mutex at 1973:

1973		PFSYNC_LOCK(sc);
1974		pfsync_send_plus(&r, sizeof(r));
1975		PFSYNC_UNLOCK(sc);
Comment 8 nvass 2016-01-11 21:55:14 UTC
Hi,

I cannot reproduce it. The ruleset doesn't seem complete since $icmp_types is
nowhere defined.
1) Do you run pfsync? that function should be part of pfsync.
2) Is this 10-STABLE? My test environment is 11-CURRENT which should be close
enough

Could you give me more details?
Comment 9 gila 2016-01-12 20:17:51 UTC
Hi,

It seems there was a paste error

icmp_types="echoreq" 

Is what should be there. I just grabbed the latest image from 11-CURRENT and verified the bug is still -- and I have not applied the patch.

The second panic (after patch) is indeed related to pfsync. I have no use for pfsync however, and I don't really know why it shows up. 


Whats new in this release is when I create the jail; i get a lor message:

#0 0xffffffff80a7c5f0 at witness_debugger+0x70
#1 0xffffffff80a7c4f1 at witness_checkorder+0xe71
#2 0xffffffff809fd99b at __lockmgr_args+0xd3b
#3 0xffffffff80ac5fcc at vop_stdlock+0x3c
#4 0xffffffff80fd3860 at VOP_LOCK1_APV+0x100
#5 0xffffffff80ae6eca at _vn_lock+0x9a
#6 0xffffffff80ad7333 at vget+0x63
#7 0xffffffff808f8fcd at devfs_allocv+0xcd
#8 0xffffffff808f8a93 at devfs_root+0x43
#9 0xffffffff80ace631 at vfs_donmount+0x1521
#10 0xffffffff80acd0e2 at sys_nmount+0x72
#11 0xffffffff80e8615b at amd64_syscall+0x2db
#12 0xffffffff80e6515b at Xfast_syscall+0xfb
lock order reversal:
 1st 0xffffffff81cf4038 allprison (allprison) @ /usr/src/sys/kern/kern_jail.c:1020
 2nd 0xffffffff81d19b10 vnet_sysinit_sxlock (vnet_sysinit_sxlock) @ /usr/src/sys/net/vnet.c:573
stack backtrace:
#0 0xffffffff80a7c5f0 at witness_debugger+0x70
#1 0xffffffff80a7c4f1 at witness_checkorder+0xe71
#2 0xffffffff80a29533 at _sx_slock+0x73
#3 0xffffffff80b1ccce at vnet_alloc+0x10e
#4 0xffffffff809ed3f3 at kern_jail_set+0x1d33
#5 0xffffffff809eede1 at sys_jail_set+0x41
#6 0xffffffff80e8615b at amd64_syscall+0x2db
#7 0xffffffff80e6515b at Xfast_syscall+0xfb

The jail does start though.

Then in the jail, starting PF trigger the panic mentioned earlier. Let me know if there is anything else I can do to help.
Comment 10 nvass 2016-01-12 20:59:25 UTC
Could you try the patch and report back?

11-CURRENT with the patch should be ok, I use it all the time.
If you could try the patch with 10-STABLE, it'd be amazing.
Comment 11 gila 2016-01-13 11:53:56 UTC
Tried latest patch against 11-CURRENT ISO and it seems to work. I've added physical devices, epair devices and netgraph "vnics" -- to jails using vnet no issues so far. pf starts fine, but there isn't any traffic over them which I will try to test on my physical machine that would need to do actual routing and filtering. (this was all done a VM fist)

I'have not tried 10 yet; I can do that later once I have configured that it works on my physical box. I wont have a physical box available for 10.x testing however.
Comment 12 gila 2016-01-14 11:50:26 UTC
Currently I'm typing this with a vnet jail that does PF routing/nat and it works great. 

There are two other things which may or not be related to this bug:

1)

"Freed UMA keg (ripcb) was not empty N itmes, lost X pages" 

They seem to show up when creating epair's not sure if thats related. 

2)

When destroying a jail (jail -r)  that uses unionfs and nullfs mounts (aka "thinjails") umounting the file systems fails due to an EBUSY and mount saying my FS is not a unionfs (while it really is) when not using VNET in the jails, everything seems to work out just fine. 

Like I said, I dont know if its related, does not seem like it as my original bug was about a kernel panic -- which appears with this patch, seems to be fixed. (1) might be however.
Comment 13 gila 2016-01-14 11:51:19 UTC
darn cant seem to edit comments. it happens when *deleting* epairs not creating them.
Comment 14 Bjoern A. Zeeb freebsd_committer freebsd_triage 2016-01-14 15:04:30 UTC
(In reply to gila from comment #12)

#1 ("Freed ..") is not related.  It just means that the cleanup is not right yet or we have not convinced ourselves that it is and rather leak some memory on net teardown.  Something I am working on.
Comment 15 Bjoern A. Zeeb freebsd_committer freebsd_triage 2016-05-26 17:31:25 UTC
Trying to get pf VNET stabler the next days.
Comment 16 Bjoern A. Zeeb freebsd_committer freebsd_triage 2016-07-05 17:09:54 UTC
Can you please try FreeBSD 11.0-ALPHA6 or later?
Comment 17 Bjoern A. Zeeb freebsd_committer freebsd_triage 2018-11-02 14:42:19 UTC
Hi,

lacking feedback I'll close this.  kp@ has fixed a lot of pf bugs for VIMAGE.
If this is still a problem with FreeBSD 12, please feel free to re-open the PR.