Running the following jail on -CURRENT: # cat /etc/jail.conf allow.raw_sockets = "1"; allow.set_hostname = "0"; allow.sysvipc = "1"; test { host.hostname = "test.bsdvm"; vnet = "new"; vnet.interface = "em1", "em2"; devfs_ruleset = 4; allow.raw_sockets = 1; allow.mount.devfs = 1; allow.mount = 1; allow.sysvipc = 1; persist; } The devfs ruleset is copied for /etc/defaults and modified to expose bp* and pf* devices. Then within the jail: ext_if="em1" int_if="em2" # options set block-policy return set loginterface $ext_if set skip on lo # scrub scrub in # nat/rdr nat on $ext_if inet from !($ext_if) -> ($ext_if:0) nat-anchor "ftp-proxy/*" rdr-anchor "ftp-proxy/*" block in all pass out anchor "ftp-proxy/*" antispoof quick for { lo $int_if } pass in inet proto icmp all icmp-type $icmp_types pass quick on $int_if no state Causes a 100% reproducible panic: Fatal double fault rip = 0xffffffff80e484a8 rsp = 0xfffffe0230ea0fd0 rbp = 0xfffffe0230ea1000 cpuid = 4; apic id = 05 panic: double fault cpuid = 4 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2c/frame 0xfffffe0227dd8ce0 kdb_backtrace() at kdb_backtrace+0x53/frame 0xfffffe0227dd8db0 vpanic() at vpanic+0x249/frame 0xfffffe0227dd8e80 vpanic() at vpanic/frame 0xfffffe0227dd8ee0 dblfault_handler() at dblfault_handler+0x10a/frame 0xfffffe0227dd8f30 Xdblfault() at Xdblfault+0xac/frame 0xfffffe0227dd8f30 --- trap 0x17, rip = 0xffffffff80e484a8, rsp = 0xfffffe0230ea0fd0, rbp = 0xfffffe0230ea1000 --- vtterm_cursor() at vtterm_cursor+0x8/frame 0xfffffe0230ea1000 termteken_cursor() at termteken_cursor+0x37/frame 0xfffffe0230ea1030 teken_funcs_cursor() at teken_funcs_cursor+0x3b/frame 0xfffffe0230ea1050 teken_subr_carriage_return() at teken_subr_carriage_return+0x2c/frame 0xfffffe0230ea1070 teken_input_char() at teken_input_char+0x166/frame 0xfffffe0230ea10b0 teken_input_byte() at teken_input_byte+0x50/frame 0xfffffe0230ea10d0 teken_input() at teken_input+0x52/frame 0xfffffe0230ea1100 termcn_cnputc() at termcn_cnputc+0x1c8/frame 0xfffffe0230ea11b0 cnputc() at cnputc+0x90/frame 0xfffffe0230ea11f0 cnputs() at cnputs+0x154/frame 0xfffffe0230ea1230 putbuf() at putbuf+0x15f/frame 0xfffffe0230ea1260 putchar() at putchar+0xb0/frame 0xfffffe0230ea12a0 kvprintf() at kvprintf+0x15a/frame 0xfffffe0230ea1790 _vprintf() at _vprintf+0xb9/frame 0xfffffe0230ea1890 vprintf() at vprintf+0x2d/frame 0xfffffe0230ea18c0 printf() at printf+0x4b/frame 0xfffffe0230ea1930 trap_fatal() at trap_fatal+0xf5/frame 0xfffffe0230ea1a50 trap_pfault() at trap_pfault+0x188/frame 0xfffffe0230ea1b50 trap() at trap+0x7a9/frame 0xfffffe0230ea1e90 trap_check() at trap_check+0x4a/frame 0xfffffe0230ea1eb0 calltrap() at calltrap+0x8/frame 0xfffffe0230ea1eb0 --- trap 0xc, rip = 0xffffffff8168e17f, rsp = 0xfffffe0230ea1f80, rbp = 0xfffffe0230ea1fb0 --- pf_begin_rules() at pf_begin_rules+0x6f/frame 0xfffffe0230ea1fb0 pfioctl() at pfioctl+0xb35a/frame 0xfffffe0230ea42e0 devfs_ioctl_f() at devfs_ioctl_f+0x19c/frame 0xfffffe0230ea4420 fo_ioctl() at fo_ioctl+0x4c/frame 0xfffffe0230ea4460 kern_ioctl() at kern_ioctl+0x3c3/frame 0xfffffe0230ea45b0 sys_ioctl() at sys_ioctl+0x2b8/frame 0xfffffe0230ea4690 syscallenter() at syscallenter+0xcfa/frame 0xfffffe0230ea4990 amd64_syscall() at amd64_syscall+0x2a/frame 0xfffffe0230ea4ab0 Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe0230ea4ab0 (kgdb) up 36 #36 0xffffffff8168e17f in pf_begin_rules (ticket=0xfffff801e2162404, rs_num=0x0, anchor=0xfffff801e2162004 "") at /usr/src/sys/netpfil/pf/pf_ioctl.c:745 745 while ((rule = TAILQ_FIRST(rs->rules[rs_num].inactive.ptr)) != NULL) { (kgdb) l 740 if (rs_num < 0 || rs_num >= PF_RULESET_MAX) 741 return (EINVAL); 742 rs = pf_find_or_create_ruleset(anchor); 743 if (rs == NULL) 744 return (EINVAL); 745 while ((rule = TAILQ_FIRST(rs->rules[rs_num].inactive.ptr)) != NULL) { 746 pf_unlink_rule(rs->rules[rs_num].inactive.ptr, rule); 747 rs->rules[rs_num].inactive.rcount--; 748 } 749 *ticket = ++rs->rules[rs_num].inactive.ticket; (kgdb) print rs->rules[0] $10 = { queues = 0xfffffe0001f8dd28, active = { ptr = 0x0, ptr_array = 0x0, rcount = 0x0, ticket = 0x0, open = 0x0 }, inactive = { ptr = 0x0, ptr_array = 0x0, rcount = 0x0, ticket = 0x0, open = 0x0 } } The TAILQ_FIRST macro tries to deference it a pointer which is per above, NULL. The idea was to run PF in a jail and have it do routing for other jails. Apologies for not knowing if there are ways to "format" the pastes.
Not sure about it, but does https://reviews.freebsd.org/D1944 fix this panic?
Im kind new to bugs.freebsd -- but I assume you want me to apply the patch I get when I click "download raw diff" from that particular bug, yes? Or should I pick a particular revision? In either case I can try this and report back, thanks.
exactly, "Download Raw Diff", patch your sources with it, rebuild the system and try to reproduce. Thanks!
I've applied the patch against 0efa1469be94566c09b9f4ce538c28e92d26026c and there is another panic. (kgdb) bt #0 doadump (textdump=0x1) at pcpu.h:221 During symbol reading, Incomplete CFI data; unspecified registers at 0xffffffff80a9ed76. #1 0xffffffff80a9eaa3 in kern_reboot (howto=0x104) at /usr/src/sys/kern/kern_shutdown.c:364 #2 0xffffffff80a9f00b in vpanic (fmt=<value optimized out>, ap=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:757 #3 0xffffffff80a9ee43 in panic (fmt=0x0) at /usr/src/sys/kern/kern_shutdown.c:688 #4 0xffffffff8038a3b7 in db_panic (addr=<value optimized out>, have_addr=0x0, count=0x0, modif=0x0) at /usr/src/sys/ddb/db_command.c:473 #5 0xffffffff8038993e in db_command (cmd_table=0x0) at /usr/src/sys/ddb/db_command.c:440 #6 0xffffffff803896d4 in db_command_loop () at /usr/src/sys/ddb/db_command.c:493 #7 0xffffffff8038c1db in db_trap (type=<value optimized out>, code=0x0) at /usr/src/sys/ddb/db_main.c:251 #8 0xffffffff80ae3803 in kdb_trap (type=0xc, code=0x0, tf=<value optimized out>) at /usr/src/sys/kern/subr_kdb.c:654 #9 0xffffffff80f8e711 in trap_fatal (frame=0xfffffe0231d4e1c0, eva=<value optimized out>) at /usr/src/sys/amd64/amd64/trap.c:829 #10 0xffffffff80f8e944 in trap_pfault (frame=0xfffffe0231d4e1c0, usermode=<value optimized out>) at /usr/src/sys/amd64/amd64/trap.c:684 #11 0xffffffff80f8e0fe in trap (frame=0xfffffe0231d4e1c0) at /usr/src/sys/amd64/amd64/trap.c:435 #12 0xffffffff80f71337 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:234 #13 0xffffffff80d22752 in pfsync_clear_states (creatorid=<value optimized out>, ifname=0x0) at /usr/src/sys/netpfil/pf/if_pfsync.c:1973 #14 0xffffffff80d3bac5 in pfioctl (dev=<value optimized out>, cmd=<value optimized out>, addr=0xfffff80006f62500 "", flags=<value optimized out>, td=<value optimized out>) at /usr/src/sys/netpfil/pf/pf_ioctl.c:1692 #15 0xffffffff8095a9ab in devfs_ioctl_f (fp=0xfffff800068e12d0, com=0xc0e04412, data=0xfffff80006f62500, cred=<value optimized out>, td=0xfffff8004649e000) at /usr/src/sys/fs/devfs/devfs_vnops.c:813 #16 0xffffffff80b00a3c in kern_ioctl (td=0xfffff8004649e000, fd=<value optimized out>, com=0x0, data=0xfffff80006f62500 "") at file.h:324 #17 0xffffffff80b005be in sys_ioctl (td=0xfffff8004649e000, uap=0xfffffe0231d4ea40) at /usr/src/sys/kern/sys_generic.c:723 #18 0xffffffff80f8f0e8 in amd64_syscall (td=0xfffff8004649e000, traced=0x0) at subr_syscall.c:135 #19 0xffffffff80f7161b in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:394 #20 0x0000000800de94ba in ?? () Now the panic occurs in pfsync_clear_states()
(In reply to gila from comment #4) What's the actual panic from the latter one?
(In reply to Bjoern A. Zeeb from comment #5) I pasted it in the comment -- right above where I say it panics in pfsync_clear_states() did that not come across? Let me repast the relevant frames: #12 0xffffffff80f71337 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:234 #13 0xffffffff80d22752 in pfsync_clear_states (creatorid=<value optimized out>, ifname=0x0) at /usr/src/sys/netpfil/pf/if_pfsync.c:1973 #14 0xffffffff80d3bac5 in pfioctl (dev=<value optimized out>, cmd=<value optimized out>, addr=0xfffff80006f62500 "", flags=<value optimized out>, td=<value optimized out>) at /usr/src/sys/netpfil/pf/pf_ioctl.c:1692 #15 0xffffffff8095a9ab in devfs_ioctl_f (fp=0xfffff800068e12d0, com=0xc0e04412, data=0xfffff80006f62500, cred=<value optimized out>, td=0xfffff8004649e000) at /usr/src/sys/fs/devfs/devfs_vnops.c:813 #16 0xffffffff80b00a3c in kern_ioctl (td=0xfffff8004649e000, fd=<value optimized out>, com=0x0,
I've just checked out latest and greatest from master and reapplied the patch and I get the same panic (again to make sure -- the second panic after applying the patch). 1955 static void 1956 pfsync_clear_states(u_int32_t creatorid, const char *ifname) 1957 { 1958 struct pfsync_softc *sc = V_pfsyncif; 1959 struct { 1960 struct pfsync_subheader subh; 1961 struct pfsync_clr clr; sc is NULL here and things blow up when we try to acquire the mutex at 1973: 1973 PFSYNC_LOCK(sc); 1974 pfsync_send_plus(&r, sizeof(r)); 1975 PFSYNC_UNLOCK(sc);
Hi, I cannot reproduce it. The ruleset doesn't seem complete since $icmp_types is nowhere defined. 1) Do you run pfsync? that function should be part of pfsync. 2) Is this 10-STABLE? My test environment is 11-CURRENT which should be close enough Could you give me more details?
Hi, It seems there was a paste error icmp_types="echoreq" Is what should be there. I just grabbed the latest image from 11-CURRENT and verified the bug is still -- and I have not applied the patch. The second panic (after patch) is indeed related to pfsync. I have no use for pfsync however, and I don't really know why it shows up. Whats new in this release is when I create the jail; i get a lor message: #0 0xffffffff80a7c5f0 at witness_debugger+0x70 #1 0xffffffff80a7c4f1 at witness_checkorder+0xe71 #2 0xffffffff809fd99b at __lockmgr_args+0xd3b #3 0xffffffff80ac5fcc at vop_stdlock+0x3c #4 0xffffffff80fd3860 at VOP_LOCK1_APV+0x100 #5 0xffffffff80ae6eca at _vn_lock+0x9a #6 0xffffffff80ad7333 at vget+0x63 #7 0xffffffff808f8fcd at devfs_allocv+0xcd #8 0xffffffff808f8a93 at devfs_root+0x43 #9 0xffffffff80ace631 at vfs_donmount+0x1521 #10 0xffffffff80acd0e2 at sys_nmount+0x72 #11 0xffffffff80e8615b at amd64_syscall+0x2db #12 0xffffffff80e6515b at Xfast_syscall+0xfb lock order reversal: 1st 0xffffffff81cf4038 allprison (allprison) @ /usr/src/sys/kern/kern_jail.c:1020 2nd 0xffffffff81d19b10 vnet_sysinit_sxlock (vnet_sysinit_sxlock) @ /usr/src/sys/net/vnet.c:573 stack backtrace: #0 0xffffffff80a7c5f0 at witness_debugger+0x70 #1 0xffffffff80a7c4f1 at witness_checkorder+0xe71 #2 0xffffffff80a29533 at _sx_slock+0x73 #3 0xffffffff80b1ccce at vnet_alloc+0x10e #4 0xffffffff809ed3f3 at kern_jail_set+0x1d33 #5 0xffffffff809eede1 at sys_jail_set+0x41 #6 0xffffffff80e8615b at amd64_syscall+0x2db #7 0xffffffff80e6515b at Xfast_syscall+0xfb The jail does start though. Then in the jail, starting PF trigger the panic mentioned earlier. Let me know if there is anything else I can do to help.
Could you try the patch and report back? 11-CURRENT with the patch should be ok, I use it all the time. If you could try the patch with 10-STABLE, it'd be amazing.
Tried latest patch against 11-CURRENT ISO and it seems to work. I've added physical devices, epair devices and netgraph "vnics" -- to jails using vnet no issues so far. pf starts fine, but there isn't any traffic over them which I will try to test on my physical machine that would need to do actual routing and filtering. (this was all done a VM fist) I'have not tried 10 yet; I can do that later once I have configured that it works on my physical box. I wont have a physical box available for 10.x testing however.
Currently I'm typing this with a vnet jail that does PF routing/nat and it works great. There are two other things which may or not be related to this bug: 1) "Freed UMA keg (ripcb) was not empty N itmes, lost X pages" They seem to show up when creating epair's not sure if thats related. 2) When destroying a jail (jail -r) that uses unionfs and nullfs mounts (aka "thinjails") umounting the file systems fails due to an EBUSY and mount saying my FS is not a unionfs (while it really is) when not using VNET in the jails, everything seems to work out just fine. Like I said, I dont know if its related, does not seem like it as my original bug was about a kernel panic -- which appears with this patch, seems to be fixed. (1) might be however.
darn cant seem to edit comments. it happens when *deleting* epairs not creating them.
(In reply to gila from comment #12) #1 ("Freed ..") is not related. It just means that the cleanup is not right yet or we have not convinced ourselves that it is and rather leak some memory on net teardown. Something I am working on.
Trying to get pf VNET stabler the next days.
Can you please try FreeBSD 11.0-ALPHA6 or later?
Hi, lacking feedback I'll close this. kp@ has fixed a lot of pf bugs for VIMAGE. If this is still a problem with FreeBSD 12, please feel free to re-open the PR.