Summary: | Fatal trap12: page fault while in kernel mode; Supervisor read data, page not present | ||
---|---|---|---|
Product: | Base System | Reporter: | IPTRACE <arkadiusz.majewski> |
Component: | kern | Assignee: | freebsd-bugs (Nobody) <bugs> |
Status: | Closed Works As Intended | ||
Severity: | Affects Only Me | CC: | markj, theraven |
Priority: | --- | ||
Version: | 11.0-STABLE | ||
Hardware: | amd64 | ||
OS: | Any |
Description
IPTRACE
2016-10-28 15:36:05 UTC
Second time system terminated... Uptime: 3d0h23m19s Fatal trap 9: general protection fault while in kernel mode cpuid - 0; apic id = 00 instruction pointer = 0x20:0xffffffff80e18717 stack pointer = 0x28:0xfffffe3fc9de37e0 frame pointer = 0x28:0xfffffe3fc9de37e0 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 17 (pagedaemon) trap number = 9 panic: general protection fault cpuid = 0 KDB: stack backtrace: #0 0xffffffff80b24747 at kdb_backtrace+0x67 #1 0xffffffff80ad9ab2 at vpanic+0x182 #2 0xffffffff80ad9923 at panic+0x43 #3 0xffffffff80fa9d51 at trap_fatal+0x351 #4 0xffffffff80fa99e8 at trap+0x768 #5 0xffffffff80f8d101 at calltrap+0x8 #6 0xffffffff80e17dc6 at bucket_cache_drain+0x136 #7 0xffffffff80e1150d at zone_drain_wait+0xed #8 0xffffffff80e1638d at uma_reclaim_locked+0x7d #9 0xffffffff80e16297 at uma_reclaim+0x77 #10 0xffffffff80e38042 at vm_pageout+0x502 #11 0xffffffff80e90725 at fork_exit+0x85 #12 0xffffffff80f8d63e at fork_trampoline+0xe Are kernel dumps getting saved to /var/crash after the panics? If so, could you open one in kgdb and obtain the backtrace? Unfortunately, nothing there. What may indicate this error. 1. I've upgraded 11.0-RELEASEp2 from 10.3-RELEASEp11 and then the kernel crash occured. 2. This system is used as virtual machine host (several dozen FreeBSD guests and one Windows 10). 3. 2x CPU Xeon 20 cores with HT and 256 GB RAM. Do you have a dumpdev configured in rc.conf? If not, the kernel will have nowhere to dump core when it panics. No. I've set it now. dumpdev="AUTO" dumpdir=”/var/crash” How to initialize dump without restart? Can I force to auto-restart system when kernel crashed? Run: # service dumpon start and # dumpon -l to verify that it configured the dump device correctly. The kernel should reboot automatically once it's finished dumping core. % service dumpon start No suitable dump device was found. % dumpon -l /dev/null dumpdev=AUTO will cause the dumpon script to select a swap partition for use as a kerneldump device. In general, there needs to be an unused partition available somewhere for kernel dumps to work. Due to a lot of RAM I don't use swap partition. Is it possible to use /var/crash folder? /var is UFS independent partition. This does not appear to be anything to do with standards compliance and so assigning to standards@ is inappropriate. Resetting assignee - it looks like a VM bug and should be assigned to someone with virtual memory expertise. Okay, apparently I'm not because FreeBSD bugzilla thinks that standards@ is the appropriate default assignee for irrelevant bugs. Will follow up by email. In both cases, we crashed in bucket_drain() when resetting bucket->ub_cnt to 0: 0xffffffff80e17d90 <+256>: movslq %r13d,%r13 0xffffffff80e17d93 <+259>: mov 0x18(%rbx,%r13,8),%rdi 0xffffffff80e17d98 <+264>: mov 0x10c(%r14),%esi 0xffffffff80e17d9f <+271>: callq *0xe8(%r14) 0xffffffff80e17da6 <+278>: inc %r13d 0xffffffff80e17da9 <+281>: movswl 0x10(%rbx),%eax 0xffffffff80e17dad <+285>: cmp %eax,%r13d 0xffffffff80e17db0 <+288>: jl 0xffffffff80e17d90 <bucket_cache_drain+256> 0xffffffff80e17db2 <+290>: mov 0x100(%r14),%rdi 0xffffffff80e17db9 <+297>: movswl %ax,%edx 0xffffffff80e17dbc <+300>: mov %r12,%rsi 0xffffffff80e17dbf <+303>: callq *0xf8(%r14) 0xffffffff80e17dc6 <+310>: movw $0x0,0x10(%rbx) <-- rbx is a callee-saved register that is dereferenced after every call to uz_fini, so it seems as though the uz_release function for the zone is somehow corrupting its frame. Because this is happening in the context of uma_reclaim(), we know that this can't be a cache zone, so uz_release is zone_release(). The PCIe network card (QUAD PORT INTEL PRO1000ET PCI-E 0HM9JY) stopped working. When I was waiting for the new one (INTEL GIGABIT ET2 QUAD PORT SERVER ADAPTER E1G44ET) the system worked fine through 7 days. Then I've installed the new card and after a dozen or so hours system terminated. Fatal trap 9: general protection fault while in kernel mode cpuid = 33; apic id = 33 instruction pointer = 0x20:0xffffffff80b6a89e stack pointer = 0x28:0xffffffe3fcab4e7d0 frame pointer = 0x28:ffffffe3fcab4e820 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 Processor eflags = interrupt enabled, resume, IOPL = 0 Current process = 5639 (vtnet-2:0 tx) Trap number = 9 Panic: general protection fault Cupid = 33 KDB: stack backtrace: #0 0xffffffff80b24747 at kdb_backtrace+0x67 #1 0xffffffff80ad9ab2 at vpanic+0x182 #2 0xffffffff80ad9923 at panic+0x43 #3 0xffffffff80fa9d51 at trap_fatal+0x351 #4 0xffffffff80fa94ec at trap+0x26c #5 0xffffffff80f8d101 at calltrap+0x8 #6 0xffffffff821e4e1e at tapwrite+0x9e #7 0xffffffff80986677 at devfs_write_f+0xe7 #8 0xffffffff80b419a7 at dofilewrite+0x87 #9 0xffffffff80b41688 at kern_writev+0x68 #10 0xffffffff80b418f6 at sys_writev+0x36 #11 0xffffffff80faa6ae at amd64_syscall+0x4ce #12 0xffffffff80f8d3eb at Xfast_syscall+0xfb Uptime: 15h45m2s Please look at the difference between dmesg on 10.3-RELEASE and 11-RELEASE. There is problem with PCI bus or something like this on 11.0-RELEASE?! 11.0-RELEASE: pcib0: <ACPI Host-PCI bridge> on acpi0 pcib0: _OSC returned error 0x10 pci0: <ACPI PCI bus> on pcib0 pcib1: <ACPI Host-PCI bridge> on acpi0 pcib1: _OSC returned error 0x10 pci1: <ACPI PCI bus> on pcib1 pcib2: <ACPI Host-PCI bridge> port 0xcf8-0xcff numa-domain 0 on acpi0 pcib2: _OSC returned error 0x10 pci2: <ACPI PCI bus> numa-domain 0 on pcib2 pcib3: <ACPI PCI-PCI bridge> irq 26 at device 1.0 numa-domain 0 on pci2 pci3: <ACPI PCI bus> numa-domain 0 on pcib3 pcib4: <ACPI PCI-PCI bridge> irq 32 at device 2.0 numa-domain 0 on pci2 pci4: <ACPI PCI bus> numa-domain 0 on pcib4 pcib5: <ACPI PCI-PCI bridge> irq 32 at device 2.2 numa-domain 0 on pci2 pci5: <ACPI PCI bus> numa-domain 0 on pcib5 pcib6: <ACPI PCI-PCI bridge> irq 40 at device 3.0 numa-domain 0 on pci2 pci6: <ACPI PCI bus> numa-domain 0 on pcib6 pcib7: <ACPI PCI-PCI bridge> irq 40 at device 3.2 numa-domain 0 on pci2 pci7: <ACPI PCI bus> numa-domain 0 on pcib7 pci2: <unknown> at device 17.0 (no driver attached) xhci0: <Intel Wellsburg USB 3.0 controller> mem 0xc7200000-0xc720ffff irq 19 at device 20.0 numa-domain 0 on pci2 pci2: <simple comms> at device 22.0 (no driver attached) pci2: <simple comms> at device 22.1 (no driver attached) ehci0: <Intel Wellsburg USB 2.0 controller> mem 0xc7214000-0xc72143ff irq 18 at device 26.0 numa-domain 0 on pci2 pcib8: <ACPI PCI-PCI bridge> irq 16 at device 28.0 numa-domain 0 on pci2 pci8: <ACPI PCI bus> numa-domain 0 on pcib8 pcib9: <ACPI PCI-PCI bridge> irq 18 at device 28.2 numa-domain 0 on pci2 pci9: <ACPI PCI bus> numa-domain 0 on pcib9 pcib10: <ACPI PCI-PCI bridge> at device 0.0 numa-domain 0 on pci9 pci10: <ACPI PCI bus> numa-domain 0 on pcib10 10.3-RELEASE: pcib0: <ACPI Host-PCI bridge> on acpi0 pci255: <ACPI PCI bus> on pcib0 pcib1: <ACPI Host-PCI bridge> on acpi0 pci127: <ACPI PCI bus> on pcib1 pcib2: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0 pci0: <ACPI PCI bus> on pcib2 pcib3: <ACPI PCI-PCI bridge> irq 26 at device 1.0 on pci0 pci1: <ACPI PCI bus> on pcib3 pcib4: <ACPI PCI-PCI bridge> irq 32 at device 2.0 on pci0 pci2: <ACPI PCI bus> on pcib4 pcib5: <ACPI PCI-PCI bridge> irq 32 at device 2.2 on pci0 pci3: <ACPI PCI bus> on pcib5 pcib6: <ACPI PCI-PCI bridge> irq 40 at device 3.0 on pci0 pci4: <ACPI PCI bus> on pcib6 pcib7: <ACPI PCI-PCI bridge> irq 40 at device 3.2 on pci0 pci5: <ACPI PCI bus> on pcib7 pci0: <unknown> at device 17.0 (no driver attached) I've upgraded OS to FreeBSD 11.0-RELEASE-p3 and compiled kernel without ALTQ. The system seems to work fine. As Mark Johnston mentioned by mail about ALTQ, it can be the problem with crashing the kernel. Is it possible to compile ALTQ again without problems? I didn't have any problems with ALTQ on 10.3-RELEASE. |