Hi, We run a FreeBSD-11.0-RELEASE-p6 system, GENERIC kernel. running in VMWare, 32GB, 4 cores. open-vm-tools-nox11-1280544_16,1 linux_base-c6_64-6.8 The application is IBM:s Tivoli backup, TSM. It normally runs with no problems, but last month it has shown a series of panics. We believe this is due to too many files to backup. It tries to backup snapshot of the a file system mounted using NFS, so suddenly there where four times the exepected amount of files. TSM does not handle many files very good. Still, you would not expect the kernel to panic. This is not a resource problem. the memory graphs are flat until the crash, no warnings. Since the release binary version has no debug symbols (a separate bug?) I built a kernel to get a core dump consistent with the debug symbols. Seems the stack is corrupted though? Anything more I can do with this? Tips? # kgdb kernel.debug /var/crash/vmcore.3 GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd"... Unread portion of the kernel message buffer: panic: vm_fault: fault on nofault entry, addr: fffffe0004aa9000 cpuid = 1 KDB: stack backtrace: #0 0xffffffff80b24077 at kdb_backtrace+0x67 #1 0xffffffff80ad93e2 at vpanic+0x182 #2 0xffffffff80ad9253 at panic+0x43 #3 0xffffffff80e12601 at vm_fault_hold+0x2721 #4 0xffffffff80e0fe98 at vm_fault+0x78 #5 0xffffffff80fa0e59 at trap_pfault+0xf9 #6 0xffffffff80fa04ec at trap+0x26c #7 0xffffffff80f84141 at calltrap+0x8 #8 0xffffffff80fa16ae at amd64_syscall+0x4ce #9 0xffffffff80f8442b at Xfast_syscall+0xfb Uptime: 11d0h40m52s Dumping 2657 out of 32735 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% Reading symbols from /boot/kernel/accf_data.ko...Reading symbols from /usr/lib/debug//boot/kernel/accf_data.ko.debug...done. done. Loaded symbols for /boot/kernel/accf_data.ko Reading symbols from /boot/kernel/accf_http.ko...Reading symbols from /usr/lib/debug//boot/kernel/accf_http.ko.debug...done. done. Loaded symbols for /boot/kernel/accf_http.ko Reading symbols from /boot/kernel/nullfs.ko...Reading symbols from /usr/lib/debug//boot/kernel/nullfs.ko.debug...done. done. Loaded symbols for /boot/kernel/nullfs.ko Reading symbols from /boot/kernel/linprocfs.ko...Reading symbols from /usr/lib/debug//boot/kernel/linprocfs.ko.debug...done. done. Loaded symbols for /boot/kernel/linprocfs.ko Reading symbols from /boot/kernel/linux_common.ko...Reading symbols from /usr/lib/debug//boot/kernel/linux_common.ko.debug...done. done. Loaded symbols for /boot/kernel/linux_common.ko Reading symbols from /usr/local/lib/vmware-tools/modules/drivers/vmmemctl.ko...done. Loaded symbols for /usr/local/lib/vmware-tools/modules/drivers/vmmemctl.ko Reading symbols from /usr/local/lib/vmware-tools/modules/drivers/vmxnet.ko...done. Loaded symbols for /usr/local/lib/vmware-tools/modules/drivers/vmxnet.ko Reading symbols from /usr/local/lib/vmware-tools/modules/drivers/vmblock.ko...done. Loaded symbols for /usr/local/lib/vmware-tools/modules/drivers/vmblock.ko Reading symbols from /usr/local/lib/vmware-tools/modules/drivers/vmhgfs.ko...done. Loaded symbols for /usr/local/lib/vmware-tools/modules/drivers/vmhgfs.ko Reading symbols from /boot/kernel/linux.ko...Reading symbols from /usr/lib/debug//boot/kernel/linux.ko.debug...done. done. Loaded symbols for /boot/kernel/linux.ko Reading symbols from /boot/kernel/linux64.ko...Reading symbols from /usr/lib/debug//boot/kernel/linux64.ko.debug...done. done. Loaded symbols for /boot/kernel/linux64.ko #0 doadump (textdump=<value optimized out>) at pcpu.h:221 221 __asm("movq %%gs:%1,%0" : "=r" (td) (kgdb) bt #0 doadump (textdump=<value optimized out>) at pcpu.h:221 #1 0xffffffff80ad8e69 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:366 #2 0xffffffff80ad941b in vpanic (fmt=<value optimized out>, ap=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:759 #3 0xffffffff80ad9253 in panic (fmt=0x0) at /usr/src/sys/kern/kern_shutdown.c:690 #4 0xffffffff80e12601 in vm_fault_hold (map=<value optimized out>, vaddr=<value optimized out>, fault_type=<value optimized out>, fault_flags=<value optimized out>, m_hold=<value optimized out>) at /usr/src/sys/vm/vm_fault.c:330 #5 0xffffffff80e0fe98 in vm_fault (map=0xfffff80003000000, vaddr=<value optimized out>, fault_type=1 '\001', fault_flags=<value optimized out>) at /usr/src/sys/vm/vm_fault.c:273 #6 0xffffffff80fa0e59 in trap_pfault (frame=0xfffffe085e0638d0, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:741 #7 0xffffffff80fa04ec in trap (frame=0xfffffe085e0638d0) at /usr/src/sys/amd64/amd64/trap.c:442 #8 0xffffffff80f84141 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:236 #9 0xffffffff8248785a in getdents_common (td=<value optimized out>, args=<value optimized out>, is64bit=<value optimized out>) at /usr/src/sys/modules/linux64/../../compat/linux/linux_file.c:416 #10 0xffffffff80fa16ae in amd64_syscall (td=<value optimized out>, traced=0) at subr_syscall.c:135 #11 0xffffffff80f8442b in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:396 #12 0x00000008034a8fc5 in ?? () Previous frame inner to this frame (corrupt stack?) Current language: auto; currently minimal (kgdb)
please, install gdb712 package and use /usr/local/bin/kgdb instead of vanilla kgdb.
grab
# /usr/local/bin/kgdb /boot/kernel/kernel /var/crash/vmcore.3 GNU gdb (GDB) 7.12 [GDB v7.12 for FreeBSD] Copyright (C) 2016 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-portbld-freebsd11.0". Type "show configuration" for configuration details. For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from /boot/kernel/kernel...Reading symbols from /usr/lib/debug//boot/kernel/kernel.debug...done. done. Unread portion of the kernel message buffer: panic: vm_fault: fault on nofault entry, addr: fffffe0004aa9000 cpuid = 1 KDB: stack backtrace: #0 0xffffffff80b24077 at kdb_backtrace+0x67 #1 0xffffffff80ad93e2 at vpanic+0x182 #2 0xffffffff80ad9253 at panic+0x43 #3 0xffffffff80e12601 at vm_fault_hold+0x2721 #4 0xffffffff80e0fe98 at vm_fault+0x78 #5 0xffffffff80fa0e59 at trap_pfault+0xf9 #6 0xffffffff80fa04ec at trap+0x26c #7 0xffffffff80f84141 at calltrap+0x8 #8 0xffffffff80fa16ae at amd64_syscall+0x4ce #9 0xffffffff80f8442b at Xfast_syscall+0xfb Uptime: 11d0h40m52s Dumping 2657 out of 32735 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% Reading symbols from /boot/kernel/accf_data.ko...Reading symbols from /usr/lib/debug//boot/kernel/accf_data.ko.debug...done. done. Reading symbols from /boot/kernel/accf_http.ko...Reading symbols from /usr/lib/debug//boot/kernel/accf_http.ko.debug...done. done. Reading symbols from /boot/kernel/nullfs.ko...Reading symbols from /usr/lib/debug//boot/kernel/nullfs.ko.debug...done. done. Reading symbols from /boot/kernel/linprocfs.ko...Reading symbols from /usr/lib/debug//boot/kernel/linprocfs.ko.debug...done. done. Reading symbols from /boot/kernel/linux_common.ko...Reading symbols from /usr/lib/debug//boot/kernel/linux_common.ko.debug...done. done. Reading symbols from /usr/local/lib/vmware-tools/modules/drivers/vmmemctl.ko...(no debugging symbols found)...done. Reading symbols from /usr/local/lib/vmware-tools/modules/drivers/vmxnet.ko...(no debugging symbols found)...done. Reading symbols from /usr/local/lib/vmware-tools/modules/drivers/vmblock.ko...(no debugging symbols found)...done. Reading symbols from /usr/local/lib/vmware-tools/modules/drivers/vmhgfs.ko...(no debugging symbols found)...done. Reading symbols from /boot/kernel/linux.ko...Reading symbols from /usr/lib/debug//boot/kernel/linux.ko.debug...done. done. Reading symbols from /boot/kernel/linux64.ko...Reading symbols from /usr/lib/debug//boot/kernel/linux64.ko.debug...done. done. __curthread () at ./machine/pcpu.h:221 221 __asm("movq %%gs:%1,%0" : "=r" (td) (kgdb) bt #0 __curthread () at ./machine/pcpu.h:221 #1 doadump (textdump=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:298 #2 0xffffffff80ad8e69 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:366 #3 0xffffffff80ad941b in vpanic (fmt=<optimized out>, ap=0xfffffe085e0634c0) at /usr/src/sys/kern/kern_shutdown.c:759 #4 0xffffffff80ad9253 in panic (fmt=<unavailable>) at /usr/src/sys/kern/kern_shutdown.c:690 #5 0xffffffff80e12601 in vm_fault_hold (map=<optimized out>, vaddr=<optimized out>, fault_type=<optimized out>, fault_flags=0, m_hold=0x0) at /usr/src/sys/vm/vm_fault.c:330 #6 0xffffffff80e0fe98 in vm_fault (map=0xfffff80003000000, vaddr=<optimized out>, fault_type=1 '\001', fault_flags=<optimized out>) at /usr/src/sys/vm/vm_fault.c:273 #7 0xffffffff80fa0e59 in trap_pfault (frame=0xfffffe085e0638d0, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:741 #8 0xffffffff80fa04ec in trap (frame=0xfffffe085e0638d0) at /usr/src/sys/amd64/amd64/trap.c:442 #9 <signal handler called> #10 getdents_common (td=<optimized out>, args=<optimized out>, is64bit=<optimized out>) at /usr/src/sys/modules/linux64/../../compat/linux/linux_file.c:419 #11 0xffffffff80fa16ae in syscallenter (td=<optimized out>, sa=<optimized out>) at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:135 #12 amd64_syscall (td=<optimized out>, traced=0) at /usr/src/sys/amd64/amd64/trap.c:942 #13 <signal handler called> #14 0x00000008034a8fc5 in ?? () Backtrace stopped: Cannot access memory at address 0x877feb7c0
Hi, We get crashes at a rate of a few per week in getdents, still. Any ideas, anything we can test? Palle # /usr/local/bin/kgdb /boot/kernel/kernel /var/crash/vmcore.3 GNU gdb (GDB) 7.12 [GDB v7.12 for FreeBSD] Copyright (C) 2016 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-portbld-freebsd11.0". Type "show configuration" for configuration details. For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from /boot/kernel/kernel...Reading symbols from /usr/lib/debug//boot/kernel/kernel.debug...done. done. Unread portion of the kernel message buffer: panic: vm_fault: fault on nofault entry, addr: fffffe0004aa9000 cpuid = 1 KDB: stack backtrace: #0 0xffffffff80b24077 at kdb_backtrace+0x67 #1 0xffffffff80ad93e2 at vpanic+0x182 #2 0xffffffff80ad9253 at panic+0x43 #3 0xffffffff80e12601 at vm_fault_hold+0x2721 #4 0xffffffff80e0fe98 at vm_fault+0x78 #5 0xffffffff80fa0e59 at trap_pfault+0xf9 #6 0xffffffff80fa04ec at trap+0x26c #7 0xffffffff80f84141 at calltrap+0x8 #8 0xffffffff80fa16ae at amd64_syscall+0x4ce #9 0xffffffff80f8442b at Xfast_syscall+0xfb Uptime: 11d0h40m52s Dumping 2657 out of 32735 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% Reading symbols from /boot/kernel/accf_data.ko...Reading symbols from /usr/lib/debug//boot/kernel/accf_data.ko.debug...done. done. Reading symbols from /boot/kernel/accf_http.ko...Reading symbols from /usr/lib/debug//boot/kernel/accf_http.ko.debug...done. done. Reading symbols from /boot/kernel/nullfs.ko...Reading symbols from /usr/lib/debug//boot/kernel/nullfs.ko.debug...done. done. Reading symbols from /boot/kernel/linprocfs.ko...Reading symbols from /usr/lib/debug//boot/kernel/linprocfs.ko.debug...done. done. Reading symbols from /boot/kernel/linux_common.ko...Reading symbols from /usr/lib/debug//boot/kernel/linux_common.ko.debug...done. done. Reading symbols from /usr/local/lib/vmware-tools/modules/drivers/vmmemctl.ko...(no debugging symbols found)...done. Reading symbols from /usr/local/lib/vmware-tools/modules/drivers/vmxnet.ko...(no debugging symbols found)...done. Reading symbols from /usr/local/lib/vmware-tools/modules/drivers/vmblock.ko...(no debugging symbols found)...done. Reading symbols from /usr/local/lib/vmware-tools/modules/drivers/vmhgfs.ko...(no debugging symbols found)...done. Reading symbols from /boot/kernel/linux.ko...Reading symbols from /usr/lib/debug//boot/kernel/linux.ko.debug...done. done. Reading symbols from /boot/kernel/linux64.ko...Reading symbols from /usr/lib/debug//boot/kernel/linux64.ko.debug...done. done. __curthread () at ./machine/pcpu.h:221 221 ./machine/pcpu.h: No such file or directory. (kgdb) bt #0 __curthread () at ./machine/pcpu.h:221 #1 doadump (textdump=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:298 #2 0xffffffff80ad8e69 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:366 #3 0xffffffff80ad941b in vpanic (fmt=<optimized out>, ap=0xfffffe085e0634c0) at /usr/src/sys/kern/kern_shutdown.c:759 #4 0xffffffff80ad9253 in panic (fmt=<unavailable>) at /usr/src/sys/kern/kern_shutdown.c:690 #5 0xffffffff80e12601 in vm_fault_hold (map=<optimized out>, vaddr=<optimized out>, fault_type=<optimized out>, fault_flags=0, m_hold=0x0) at /usr/src/sys/vm/vm_fault.c:330 #6 0xffffffff80e0fe98 in vm_fault (map=0xfffff80003000000, vaddr=<optimized out>, fault_type=1 '\001', fault_flags=<optimized out>) at /usr/src/sys/vm/vm_fault.c:273 #7 0xffffffff80fa0e59 in trap_pfault (frame=0xfffffe085e0638d0, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:741 #8 0xffffffff80fa04ec in trap (frame=0xfffffe085e0638d0) at /usr/src/sys/amd64/amd64/trap.c:442 #9 <signal handler called> #10 getdents_common (td=<optimized out>, args=<optimized out>, is64bit=<optimized out>) at /usr/src/sys/modules/linux64/../../compat/linux/linux_file.c:419 #11 0xffffffff80fa16ae in syscallenter (td=<optimized out>, sa=<optimized out>) at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:135 #12 amd64_syscall (td=<optimized out>, traced=0) at /usr/src/sys/amd64/amd64/trap.c:942 #13 <signal handler called> #14 0x00000008034a8fc5 in ?? () Backtrace stopped: Cannot access memory at address 0x877feb7c0
Try to understand which directory was listed when panic occur. I doubt that your panic is a linuxolator bug, much more likely it is underlying fs VOP_READDIR() issue. That said, list of loadded modules includes quite suspicious names like *fs, which might be the culprit. E.g. do not use that module, is the panic reproducable then ?
(In reply to Konstantin Belousov from comment #5) By modules named *fs, you mean these: $ kldstat|grep fs Id Refs Address Size Name 4 1 0xffffffff82419000 665d nullfs.ko 5 1 0xffffffff82420000 a9f1 linprocfs.ko 10 1 0xffffffff8243c000 8eed vmhgfs.ko vmhgfs.ko is for VmWare, it is a wmware client. I'll try without that vmhgfs.ko. As stated above, it is NFS files. I cannot see exactly which one though. Not sure it is the same, sometimes if dies after around 2M files, sometimes after around 5M files. Last line this morning was: 2017-02-02 07.10.38 ANS1898I ***** Processed 5 708 500 files ***** # cat /var/crash/info.7 Dump header from device: /dev/da0p3 Architecture: amd64 Architecture Version: 2 Dump Length: 2562928640 Blocksize: 512 Dumptime: Thu Feb 2 07:11:11 2017 Hostname: XXX Magic: FreeBSD Kernel Dump Version String: FreeBSD 11.0-RELEASE-p6 #0: Mon Jan 2 22:53:23 CET 2017 girgen@XXX:/usr/obj/usr/src/sys/GENERIC Panic String: vm_fault: fault on nofault entry, addr: fffffe00045c6000 Dump Parity: 664679908 Bounds: 7 Dump Status: good
(In reply to Palle Girgensohn from comment #6) I forgot to mention the backup process that seem to kill the machine is also running through some nullfs mounted volumes. The reason for nullfs mounts is to make sure the linuxulator does not backup it's own files instead of ones available for the base system (since it checks /compat/linux first when looking for files). $ df Filesystem 1K-blocks Used Avail Capacity Mounted on /dev/da0p2 30450268 23108028 4906220 82% / devfs 1 1 0 100% /dev /dev/da1p1 203113076 84053648 102810384 45% /opt /opt 203113076 84053648 102810384 45% /backup_opt /etc 30450268 23108028 4906220 82% /backup_etc /var 30450268 23108028 4906220 82% /backup_var procfs 4 4 0 100% /proc linprocfs 4 4 0 100% /compat/linux/proc nfsserver:/volume 1603534848 849065344 754469504 53% /opt/nfs nfsserver:/volume 1603534848 849065344 754469504 53% /backup_opt/nfs $ mount /dev/da0p2 on / (ufs, local, journaled soft-updates) devfs on /dev (devfs, local, multilabel) /dev/da1p1 on /opt (ufs, local, journaled soft-updates) /opt on /backup_opt (nullfs, local) /etc on /backup_etc (nullfs, local) /var on /backup_var (nullfs, local) procfs on /proc (procfs, local) linprocfs on /compat/linux/proc (linprocfs, local) nfsserver:/volume on /opt/nfs (nfs) nfsserver:/volume on /backup_opt/nfs (nfs)
I agree with Konstantin, can you try without nullfs?
It's not a Linuxulator bug