Bug 247552 - vmware: FreeBSD virtual server fatal trap 12: page fault, supervisor read data, page not present
Summary: vmware: FreeBSD virtual server fatal trap 12: page fault, supervisor read dat...
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 12.1-STABLE
Hardware: amd64 Any
: --- Affects Only Me
Assignee: freebsd-virtualization (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-06-26 03:42 UTC by Matt Grice
Modified: 2021-05-14 12:23 UTC (History)
0 users

See Also:


Attachments
core.txt (166.38 KB, text/plain)
2020-06-26 03:50 UTC, Matt Grice
no flags Details
info.0 (421 bytes, text/plain)
2020-06-26 03:51 UTC, Matt Grice
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Matt Grice 2020-06-26 03:42:37 UTC
I am running FreeBSD 12.1 with the latest update on a vmware vSphere virtual machine. The server is a web server running nginx, uwsgi and mariadb.

Twice this morning the machine has had a kernel page fault and dumped core. The message was as follows:

Jun 26 12:58:41 ws-ag-au-17 kernel: Fatal trap 12: page fault while in kernel mode
Jun 26 12:58:41 ws-ag-au-17 kernel: cpuid = 1; apic id = 02
Jun 26 12:58:41 ws-ag-au-17 kernel: fault virtual address       = 0x8
Jun 26 12:58:41 ws-ag-au-17 kernel: fault code          = supervisor read data, page not present
Jun 26 12:58:41 ws-ag-au-17 kernel: instruction pointer = 0x20:0xffffffff80c658e0
Jun 26 12:58:41 ws-ag-au-17 kernel: stack pointer               = 0x28:0xfffffe00005cf7a0
Jun 26 12:58:41 ws-ag-au-17 kernel: frame pointer               = 0x28:0xfffffe00005cf7d0
Jun 26 12:58:41 ws-ag-au-17 kernel: code segment                = base rx0, limit 0xfffff, type 0x1b
Jun 26 12:58:41 ws-ag-au-17 kernel:                     = DPL 0, pres 1, long 1, def32 0, gran 1
Jun 26 12:58:41 ws-ag-au-17 kernel: processor eflags    = interrupt enabled, resume, IOPL = 0
Jun 26 12:58:41 ws-ag-au-17 kernel: current process             = 1330 (nginx)
Jun 26 12:58:41 ws-ag-au-17 kernel: trap number         = 12
Jun 26 12:58:41 ws-ag-au-17 kernel: panic: page fault
Jun 26 12:58:41 ws-ag-au-17 kernel: cpuid = 1
Jun 26 12:58:41 ws-ag-au-17 kernel: time = 1593140291
Jun 26 12:58:41 ws-ag-au-17 kernel: KDB: stack backtrace:
Jun 26 12:58:41 ws-ag-au-17 kernel: #0 0xffffffff80c1d297 at kdb_backtrace+0x67
Jun 26 12:58:41 ws-ag-au-17 kernel: #1 0xffffffff80bd05cd at vpanic+0x19d
Jun 26 12:58:41 ws-ag-au-17 kernel: #2 0xffffffff80bd0423 at panic+0x43
Jun 26 12:58:41 ws-ag-au-17 kernel: #3 0xffffffff810a7dcc at trap_fatal+0x39c
Jun 26 12:58:41 ws-ag-au-17 kernel: #4 0xffffffff810a7e19 at trap_pfault+0x49
Jun 26 12:58:41 ws-ag-au-17 kernel: #5 0xffffffff810a740f at trap+0x29f
Jun 26 12:58:41 ws-ag-au-17 kernel: #6 0xffffffff81081a0c at calltrap+0x8
Jun 26 12:58:41 ws-ag-au-17 kernel: #7 0xffffffff80c64a01 at sbdestroy+0x41
Jun 26 12:58:41 ws-ag-au-17 kernel: #8 0xffffffff80c67225 at sofree+0x275
Jun 26 12:58:41 ws-ag-au-17 kernel: #9 0xffffffff80c67d77 at soclose+0x2f7
Jun 26 12:58:41 ws-ag-au-17 kernel: #10 0xffffffff80b7802a at _fdrop+0x1a
Jun 26 12:58:41 ws-ag-au-17 kernel: #11 0xffffffff80b7b151 at closef+0x241
Jun 26 12:58:41 ws-ag-au-17 kernel: #12 0xffffffff80b78547 at closefp+0x97
Jun 26 12:58:41 ws-ag-au-17 kernel: #13 0xffffffff810a8984 at amd64_syscall+0x364
Jun 26 12:58:41 ws-ag-au-17 kernel: #14 0xffffffff81082330 at fast_syscall_common+0x101
Jun 26 12:58:41 ws-ag-au-17 kernel: Uptime: 1h33m41s
Jun 26 12:58:41 ws-ag-au-17 kernel: Dumping 274 out of 4057 MB:..6%..12%..24%..36%..41%..53%..65%..71%..82%..94%
Jun 26 12:58:41 ws-ag-au-17 kernel: Dump complete

I have just enabled the writing of core dumps to disk and will post the dump when it happens again.

The web server is usually being accessed when the panic occurs, but only lightly (maximum of two users at once). Memory usage is 500Mb out of 4GB and the dual core CPU does not seem to get above 10% while serving pages.

At this point I cannot deliberately reproduce the bug as it seems to be spurious.
Comment 1 Matt Grice 2020-06-26 03:50:05 UTC
Created attachment 215950 [details]
core.txt
Comment 2 Matt Grice 2020-06-26 03:51:38 UTC
Created attachment 215951 [details]
info.0
Comment 3 Matt Grice 2020-06-28 21:30:07 UTC
Disabling the swap partition stops the constant page faults.
Comment 4 Matt Grice 2020-06-30 00:10:11 UTC
Page faults happen at a lower rate with swap turned off.

Possibly related to: 

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=237568
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=241162

As I am using nginx with uwsgi/sendfile.
Comment 5 Matt Grice 2020-06-30 02:32:18 UTC
Also possibly related to this issue, which had been patched:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=222259