Created attachment 158977 [details] console1.png When testing the 10.2-BETA2 build from today: FreeBSD hub.example.com 10.2-BETA2 FreeBSD 10.2-BETA2 #0: Sun Jul 19 16:29:51 CEST 2015 root@hub.example.com:/usr/obj/usr/src/sys/HUB amd64 Using GENERIC kernel with VIMAGE and RACCT options added, I managed to hang the OS twice, when trying to kill jails using VIMAGE/VNET option. Despite some claims this is only happening when PF is compiled into the kernel, I was using PF module and dont have this in my kernel configuration. I am attaching console screenshots from both crashes.
Created attachment 158978 [details] console2.png
I can reliably repeat this on 10.2p7 (with patch from D1944 ). Crash dump: Fatal trap 12: page fault while in kernel mode cpuid = 4; apic id = 04 fault virtual address = 0x378 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff807b6f40 stack pointer = 0x28:0xfffffe046ac4aab0 frame pointer = 0x28:0xfffffe046ac4ab30 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 2449 (pf purge) trap number = 12 panic: page fault cpuid = 4 KDB: stack backtrace: #0 0xffffffff8080aeb0 at kdb_backtrace+0x60 #1 0xffffffff807cfe46 at vpanic+0x126 #2 0xffffffff807cfd13 at panic+0x43 #3 0xffffffff80b38fab at trap_fatal+0x36b #4 0xffffffff80b392ad at trap_pfault+0x2ed #5 0xffffffff80b3894a at trap+0x47a #6 0xffffffff80b1eee2 at calltrap+0x8 #7 0xffffffff807b6d7e at __mtx_lock_flags+0x5e #8 0xffffffff81a39497 at pf_purge_expired_fragments+0x47 #9 0xffffffff81a1c165 at pf_purge_thread+0x25 #10 0xffffffff8079a83a at fork_exit+0x9a #11 0xffffffff80b1f41e at fork_trampoline+0xe Uptime: 18m29s Dumping 750 out of 16350 MB:..3%..11%..22%..32%..41%..52%..62%..71%..81%..92% Reading symbols from /boot/kernel/zfs.ko.symbols...done. Loaded symbols for /boot/kernel/zfs.ko.symbols Reading symbols from /boot/kernel/opensolaris.ko.symbols...done. Loaded symbols for /boot/kernel/opensolaris.ko.symbols Reading symbols from /boot/kernel/fdescfs.ko.symbols...done. Loaded symbols for /boot/kernel/fdescfs.ko.symbols Reading symbols from /boot/kernel/pflog.ko.symbols...done. Loaded symbols for /boot/kernel/pflog.ko.symbols Reading symbols from /boot/kernel/pf.ko.symbols...done. Loaded symbols for /boot/kernel/pf.ko.symbols #0 doadump (textdump=<value optimized out>) at pcpu.h:219 219 pcpu.h: No such file or directory. in pcpu.h (kgdb) #0 doadump (textdump=<value optimized out>) at pcpu.h:219 #1 0xffffffff807cfaa2 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:451 #2 0xffffffff807cfe85 in vpanic (fmt=<value optimized out>, ap=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:758 #3 0xffffffff807cfd13 in panic (fmt=0x0) at /usr/src/sys/kern/kern_shutdown.c:687 #4 0xffffffff80b38fab in trap_fatal (frame=<value optimized out>, eva=<value optimized out>) at /usr/src/sys/amd64/amd64/trap.c:851 #5 0xffffffff80b392ad in trap_pfault (frame=0xfffffe046ac4aa00, usermode=<value optimized out>) at /usr/src/sys/amd64/amd64/trap.c:674 #6 0xffffffff80b3894a in trap (frame=0xfffffe046ac4aa00) at /usr/src/sys/amd64/amd64/trap.c:440 #7 0xffffffff80b1eee2 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:236 #8 0xffffffff807b6f40 in __mtx_lock_sleep (c=0xffffffff81a4a620, tid=18446735277979849888, opts=0, file=0x0, line=1791274176) at /usr/src/sys/kern/kern_mutex.c:437 #9 0xffffffff807b6d7e in __mtx_lock_flags (c=<value optimized out>, opts=<value optimized out>, file=0x0, line=0) at /usr/src/sys/kern/kern_mutex.c:224 #10 0xffffffff81a39497 in pf_purge_expired_fragments () at /usr/src/sys/modules/pf/../../netpfil/pf/pf_norm.c:239 #11 0xffffffff81a1c165 in pf_purge_thread (v=<value optimized out>) at /usr/src/sys/modules/pf/../../netpfil/pf/pf.c:1475 #12 0xffffffff8079a83a in fork_exit ( callout=0xffffffff81a1c140 <pf_purge_thread>, arg=0xfffff800ddf54700, frame=0xfffffe046ac4ac00) at /usr/src/sys/kern/kern_fork.c:1018 #13 0xffffffff80b1f41e in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:611 #14 0x0000000000000000 in ?? () Current language: auto; currently minimal Kernel: Patched with updates from D1944 kernel config options CONFIG_AUTOGENERATED ident CASSOWARY machine amd64 cpu HAMMER makeoptions WITH_CTF=1 makeoptions DEBUG=-g options HYPERV options USB_DEBUG options SC_PIXEL_MODE options VESA options AHD_REG_PRETTY_PRINT options AHC_REG_PRETTY_PRINT options ATA_STATIC_ID options ACPI_DMAR options SMP options KDB_TRACE options KDB options RCTL options RACCT_DEFAULT_TO_DISABLED options RACCT options INCLUDE_CONFIG_FILE options DDB_CTF options KDTRACE_HOOKS options KDTRACE_FRAME options MAC options PROCDESC options CAPABILITIES options CAPABILITY_MODE options AUDIT options HWPMC_HOOKS options KBD_INSTALL_CDEV options PRINTF_BUFR_SIZE=128 options _KPOSIX_PRIORITY_SCHEDULING options SYSVSEM options SYSVMSG options SYSVSHM options STACK options KTRACE options SCSI_DELAY=5000 options GEOM_LABEL options GEOM_RAID options GEOM_PART_GPT options PSEUDOFS options PROCFS options CD9660 options MSDOSFS options NFS_ROOT options NFSLOCKD options NFSD options NFSCL options MD_ROOT options QUOTA options UFS_GJOURNAL options UFS_DIRHASH options UFS_ACL options SOFTUPDATES options FFS options TCP_OFFLOAD options INET6 options INET options PREEMPTION options SCHED_ULE options NULLFS options VIMAGE options ROUTETABLES=6 options ALTQ_NOPCC options ALTQ_PRIQ options ALTQ_CDNR options ALTQ_HFSC options ALTQ_RIO options ALTQ_RED options ALTQ_CBQ options ALTQ options NEW_PCIB options GEOM_PART_MBR options GEOM_PART_EBR_COMPAT options GEOM_PART_EBR options GEOM_PART_BSD device isa device mem device io device uart_ns8250 device epair device if_bridge device cpufreq device acpi device pci device ahci device ata device mvs device siis device ahc device ahd device esp device hptiop device isp device mpt device mps device mpr device sym device trm device adv device adw device aic device bt device isci device scbus device ch device da device sa device cd device pass device ses device amr device arcmsr device ciss device dpt device hptmv device hptnr device hptrr device hpt27xx device iir device ips device mly device twa device tws device aac device aacp device aacraid device ida device mfi device mlx device mrsas device twe device nvme device nvd device atkbdc device atkbd device psm device kbdmux device vga device sc device vt device vt_vga device vt_efifb device uart device ppc device ppbus device lpt device ppi device puc device bxe device de device em device igb device ix device ixv device ixl device ixlv device le device ti device txp device vx device miibus device age device alc device ale device bce device bfe device bge device fxp device msk device nfe device nge device pcn device re device rl device sf device sge device sis device sk device ste device stge device tl device tx device vge device loop device random device padlock_rng device rdrand_rng device ether device vlan device tun device md device gif device firmware device bpf device uhci device ohci device ehci device xhci device usb device ukbd device umass device sound device snd_cmi device snd_csa device snd_emu10kx device snd_es137x device snd_hda device snd_ich device mmc device mmcsd device sdhci device virtio device virtio_pci device vtnet device virtio_blk device virtio_scsi device virtio_balloon device hyperv Jail config: exec.prestart = ""; exec.start = "/bin/sh /etc/rc"; exec.stop = "/bin/sh /etc/rc.shutdown"; exec.poststop = ""; exec.clean; mount.devfs; mount.fdescfs; mount.procfs; vnet = new; path = "/jail/${host.hostname}"; jtest { host.hostname = "jtest"; vnet.interface = epair0b; } rc.conf ifconfig sections: ifconfig_cloned_interfaces="bridge0 epair0 epair1" ifconfig_bridge0="addm epair0a addm epair1a up" ifconfig_bridge0_ipv6="up"
Apparently, just Dxxx doesn't work, patch applied is from https://reviews.freebsd.org/D1944
Bartek / Paul, To get this issue the attention it needs, id appreciate it if you could both provide: * Updated backtraces for this panic on the latest 10.2-RELEASE / CURRENT (for extra debugging) * Steps to reproduce. The summary mentions crash on 'killing' jails. what steps exactly? * Isolate/reduce the reproduction case and system configuration as much as possible (kernel, ifconfig, whatever) * Hardware (and virtualization if applicable) details. dmesg.boot should be fine for now Note: Please use attachments for any large outputs to keep the conversation clear and easy to follow.
I'm still testing various combinations, but ALTQ seems to be the main culprit here. Once I remove the following from the kernel config, the crashes stop: ALTQ ALTQ_CBQ ALTQ_RED ALTQ_RIO ALTQ_HFSC ALTQ_PRIQ Once I've refined it a bit more, I'll provide a Virtual Box image.
Just adding ALTQ to a 10-STABLE system causes crash on jail start. 10-STABLE OVA (appliance export as OVA2 from Virtual Box 5.0.10): https://s3.amazonaws.com/local-ami-us-east-1/freebsd/FreeBSDJailTest.ova jail config is /var/tmp/jail.conf (to avoid crash cycles). 10 STABLE sources are in /usr/src/10, kernel in /usr/src/10/sys/amd64/conf/JAILTEST login: jailtest password: jailtest jailtest is in wheel To cause crash: jail -f /var/tmp/jail.conf -c jail1 If ALTQ is removed from the kernel, it seems to do fine, as soon as it's there, there's multiple ways to crash. I can also confirm that 10.2-p7 (on bare metal) no longer crashes once ALTQ is removed. Am building an 11-CURRENT VM to see if I can replicate it there or not.
I can also cause a crash in 11-CURRENT 20160113-r293801 I have uploaded another VM to: https://s3.amazonaws.com/local-ami-us-east-1/freebsd/FreeBSDJailTest-current.ova user/pass: jailtest/jailtest To reproduce: jail -f /var/tmp/jail.conf -c jail1 service pf onestart jail -f /var/tmp/jail.conf -rc jail1 The image has some crash dumps in /var/crash and /var/log/messages also has stack traces.
Note that 11-CURRENT is slightly better in that it requires pf unloaded before jail start and then restart after loading it (as opposed to crashing on jail restart if pf is loaded at all).
(In reply to Paul Armstrong from comment #8) You may want to pull in a test for the 10.3 BETA if you have the time.
@Bartek, can you confirm reproduction in the latest 10.3 beta or stable/10?
(In reply to Kubilay Kocak from comment #10) I am afraid not, I dont have any 10.3 beta's nor 10-S running around.
Just tested 10.3B2 If ALTQ_* options are configured in the kernel and pf is running, then it will crash on jail start. If pf starts after the jail, then the kernel will crash on jail stop instead. Looks fine without ALTQ.
(In reply to Paul Armstrong from comment #12) Thank you for the feedback Paul. Could you possibly include both crashes (with backtrace) as attachments please?
Created attachment 167364 [details] crash on jail start (load pf, start jail, crash)
Created attachment 167365 [details] crash on jail stop (start jail, load pf, stop jail, crash)
Core files were too large to attach, have attached everything else. Kernel config delta from GENERIC: options VIMAGE options ALTQ options ALTQ_CBQ options ALTQ_RED options ALTQ_RIO options ALTQ_HFSC options ALTQ_CDNR options ALTQ_PRIQ options ALTQ_NOPCC
Just to confirm: this is believed to a problem with pf only at this point, right? Is it also still true, that this only happens with pf compiled in but not if loaded as a module? I have various other problems with pf an VNETs in HEAD to address. I am not surprised. Is there any chance you could test changes on non-10.2 in the future?
As far as I know, this is only a PF (well, more specifically, ALTQ) problem (I haven't tested the others extensively). Last I checked, compiling PF in directly was a non-starter. PF must be a module and ALTQ must be compiled in (maybe that's the problem...). I'm happy to test changes. Just point me at a patchset and let me know the version it's to be applied against (or if it's post checking, let me know which version of CURRENT it's in).
Can you please try FreeBSD 11.0-ALPHA6 or later?
Sure thing. Will probably be the weekend before I get to it.
I've tested a few scenarios and have not crashed it yet. * pf as module and compiled in, in both cases with ALTQ compiled into custom kernel * vnet and non-vnet jails * pf rules loaded before/after jail start and unloaded before/after jail stop * with and without altq rules So, at this point I believe it to be fixed. Thank you!
Thanks a lot for your feedback! I'll leave the PR open for a bit longer to see if back-porting the changes to FreeBSD 10 is feasible.
> back-porting the changes to FreeBSD 10 is feasible Bjoern, any conclusion on how feasible this looks?
batch change: For bugs that match the following - Status Is In progress AND - Untouched since 2018-01-01. AND - Affects Base System OR Documentation DO: Reset to open status. Note: I did a quick pass but if you are getting this email it might be worthwhile to double check to see if this bug ought to be closed.
Hi, I am closing this. VIMAGE is supposed to be a lot more reliable in the upcoming 12 release and the PR states that some the problems seen were already absent in 11. Backporting any VIMAGE changes to 10 is not going to happen anymore; please try 12.
That's not necessarily appear, try install FreeBSD 12.1 in ESXi, VirtualBox, or Proxmox VE with ipfw, you'll see.