Bug 194525 - Restart Apache Causes page fault and kernel dump
Summary: Restart Apache Causes page fault and kernel dump
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 10.1-RELEASE
Hardware: amd64 Any
: --- Affects Some People
Assignee: Xin LI
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-10-22 04:13 UTC by pete
Modified: 2015-05-27 23:08 UTC (History)
9 users (show)

See Also:
ohauer: mfc-stable10+


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description pete 2014-10-22 04:13:41 UTC
On a 10.1-RC2 system where I run apache24 I seem to be reliably able to cause a kernel panic and system crash while restating apache24.  

I run mod_wsgi on my apache instance for serving graphite webUI.  I have not been able to detect any hardware defects on this system at this time.

Here is the output from /var/crash/panicmail.0.  I can make all other related cores etc. avail upon request:

> sudo cat panicmail.0
Dump header from device /dev/da0p3
  Architecture: amd64
  Architecture Version: 2
  Dump Length: 3512709120B (3349 MB)
  Blocksize: 512
  Dumptime: Tue Oct 21 20:47:57 2014
  Hostname: pop.rubicorp.com
  Magic: FreeBSD Kernel Dump
  Version String: FreeBSD 10.1-RC2 #0 r272876: Fri Oct 10 01:12:21 UTC 2014
    root@releng1.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC
  Panic String: page fault
  Dump Parity: 3316977061
  Bounds: 0
  Dump Status: good

Backtrace:
Reading symbols from /boot/kernel/zfs.ko.symbols...done.
Loaded symbols for /boot/kernel/zfs.ko.symbols
Reading symbols from /boot/kernel/opensolaris.ko.symbols...done.
Loaded symbols for /boot/kernel/opensolaris.ko.symbols
Reading symbols from /boot/kernel/linux.ko.symbols...done.
Loaded symbols for /boot/kernel/linux.ko.symbols
Reading symbols from /boot/kernel/accf_data.ko.symbols...done.
Loaded symbols for /boot/kernel/accf_data.ko.symbols
Reading symbols from /boot/kernel/accf_http.ko.symbols...done.
Loaded symbols for /boot/kernel/accf_http.ko.symbols
Reading symbols from /boot/modules/nvidia.ko...done.
Loaded symbols for /boot/modules/nvidia.ko
Reading symbols from /boot/modules/vboxdrv.ko...done.
Loaded symbols for /boot/modules/vboxdrv.ko
Reading symbols from /boot/kernel/sem.ko.symbols...done.
Loaded symbols for /boot/kernel/sem.ko.symbols
Reading symbols from /boot/kernel/ums.ko.symbols...done.
Loaded symbols for /boot/kernel/ums.ko.symbols
Reading symbols from /boot/kernel/pflog.ko.symbols...done.
Loaded symbols for /boot/kernel/pflog.ko.symbols
Reading symbols from /boot/kernel/pf.ko.symbols...done.
Loaded symbols for /boot/kernel/pf.ko.symbols
Reading symbols from /boot/modules/vboxnetflt.ko...done.
Loaded symbols for /boot/modules/vboxnetflt.ko
Reading symbols from /boot/kernel/netgraph.ko.symbols...done.
Loaded symbols for /boot/kernel/netgraph.ko.symbols
Reading symbols from /boot/kernel/ng_ether.ko.symbols...done.
Loaded symbols for /boot/kernel/ng_ether.ko.symbols
Reading symbols from /boot/modules/vboxnetadp.ko...done.
Loaded symbols for /boot/modules/vboxnetadp.ko
#0  doadump (textdump=<value optimized out>) at pcpu.h:219
	in pcpu.h
(kgdb) #0  doadump (textdump=<value optimized out>) at pcpu.h:219
#1  0xffffffff809264b2 in kern_reboot (howto=260)
    at /usr/src/sys/kern/kern_shutdown.c:452
#2  0xffffffff80926874 in panic (fmt=<value optimized out>)
    at /usr/src/sys/kern/kern_shutdown.c:759
#3  0xffffffff80d2304f in trap_fatal (frame=<value optimized out>,
    eva=<value optimized out>) at /usr/src/sys/amd64/amd64/trap.c:865
#4  0xffffffff80d23368 in trap_pfault (frame=0xfffffe07e215d6d0,
    usermode=<value optimized out>) at /usr/src/sys/amd64/amd64/trap.c:676
#5  0xffffffff80d229ca in trap (frame=0xfffffe07e215d6d0)
    at /usr/src/sys/amd64/amd64/trap.c:440
#6  0xffffffff80d08882 in calltrap ()
    at /usr/src/sys/amd64/amd64/exception.S:232
#7  0xffffffff8090cb7a in lf_advlockasync (ap=0xfffffe07e215d860,
    statep=0xfffff80035c15ab8, size=<value optimized out>)
    at /usr/src/sys/kern/kern_lockf.c:745
#8  0xffffffff8090d345 in lf_advlock (ap=<value optimized out>, statep=0x0,
    size=0) at /usr/src/sys/kern/kern_lockf.c:771
#9  0xffffffff809b7549 in vop_stdadvlock (ap=0xfffffe07e215da18)
    at /usr/src/sys/kern/vfs_default.c:414
#10 0xffffffff80e42387 in VOP_ADVLOCK_APV (vop=<value optimized out>,
    a=<value optimized out>) at vnode_if.c:2531
#11 0xffffffff808e30a9 in kern_fcntl (td=<value optimized out>,
    fd=<value optimized out>, cmd=<value optimized out>,
    arg=<value optimized out>) at vnode_if.h:1041
#12 0xffffffff808e24ec in kern_fcntl_freebsd (td=0xfffff806ce129490, fd=9,
    cmd=<value optimized out>, arg=34383977672)
    at /usr/src/sys/kern/kern_descrip.c:458
#13 0xffffffff80d23981 in amd64_syscall (td=0xfffff806ce129490, traced=0)
    at subr_syscall.c:134
#14 0xffffffff80d08b6b in Xfast_syscall ()
    at /usr/src/sys/amd64/amd64/exception.S:391
#15 0x0000000801c8e33a in ?? ()
Current language:  auto; currently minimal
(kgdb)
>
Comment 1 Walter Hop 2014-12-22 22:49:02 UTC
We currently experience the same problem on FreeBSD 10.1-RELEASE-p1 amd64.

On some Apache restarts, a panic follows. It happens a few times a week since 10.1, but we are not seeing it on all servers, and it tends to happen during office hours, so it may be related to a certain use of WSGI.

Initially it seemed that disabling softupdates helped, since back-to-back panics seemingly stabilized the situation, but after 5 days a panic happened again.

The problem was not present in FreeBSD 9.2 or 10.0.

apache24-2.4.10_2              Version 2.4.x of Apache web server
ap24-mod_wsgi3-3.5             Python WSGI adapter module for Apache

The panic backtrace is the same in all cases.

When finding the kernel function according to https://www.freebsd.org/doc/faq/advanced.html#idp60797648 it turns up as:
ffffffff8090d9c0 T lf_advlockasync

Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address      = 0x30058
fault code         = supervisor write data, page not present
instruction pointer        = 0x20:0xffffffff8090e46a
stack pointer              = 0x28:0xfffffe000024d780
frame pointer              = 0x28:0xfffffe000024d850
code segment               = base rx0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags   = interrupt enabled, resume, IOPL = 0
current process            = 27466 (httpd)
trap number                = 12
panic: page fault
cpuid = 0
KDB: stack backtrace:
#0 0xffffffff80963000 at kdb_backtrace+0x60
#1 0xffffffff80928125 at panic+0x155
#2 0xffffffff80d24f1f at trap_fatal+0x38f
#3 0xffffffff80d25238 at trap_pfault+0x308
#4 0xffffffff80d2489a at trap+0x47a
#5 0xffffffff80d0a782 at calltrap+0x8
#6 0xffffffff8090ec35 at lf_advlock+0x45
#7 0xffffffff809b8e69 at vop_stdadvlock+0xa9
#8 0xffffffff80e44247 at VOP_ADVLOCK_APV+0xa7
#9 0xffffffff808e4919 at kern_fcntl+0xb39
#10 0xffffffff808e3d5c at kern_fcntl_freebsd+0xac
#11 0xffffffff80d25851 at amd64_syscall+0x351
#12 0xffffffff80d0aa6b at Xfast_syscall+0xfb
Comment 2 pete 2014-12-22 23:00:20 UTC
(In reply to Walter Hop from comment #1)
> We currently experience the same problem on FreeBSD 10.1-RELEASE-p1 amd64.
> 

I can verify that this crash is also happening on my system after upgrading to 10.1-RELEASE-p1.  The crash is identical, and the system actually had a clean install of freebsd on it.  I seem to be able to reproduce on my system at will so please let me know if additional info is needed.
Comment 3 Sebastian YEPES F. 2015-01-12 23:01:35 UTC
Hello,

This issue has has been resolved in the last -STABLE version:

FreeBSD 10.1-STABLE (GENERIC) #3 r277044: Mon Jan 12 09:24:48 CET 2015

after upgrading to r277044 I can now restart Apache without a kernel dump ;-)
Comment 4 Walter Hop 2015-01-14 20:43:57 UTC
This bug has likely been fixed, although the fix hasn't been merged back yet. The comment disappeared due to Bugzilla downtime. I'll paste the comment again.

My schedule doesn't permit testing it right now, but if it works for you please close this bug!


A commit references this bug:

Author: delphij
Date: Sat Jan 10 06:48:36 UTC 2015
New revision: 276904
URL: https://svnweb.freebsd.org/changeset/base/276904

Log:
  Improve style and fix a possible use-after-free case introduced in r268384
  by reinitializing the 'freestate' pointer after freeing the memory.

  Obtained from:    HardenedBSD (71fab80c5dd3034b71a29a61064625018671bbeb)
  PR:        194525
  Submitted by:    Oliver Pinter <oliver.pinter@hardenedbsd.org>
  MFC after:    2 weeks

Changes:
  head/sys/kern/kern_lockf.c
Comment 5 Olli Hauer freebsd_committer 2015-01-17 15:17:56 UTC
- restore last known PR state from 2015-01-10
  http://lists.freebsd.org/pipermail/freebsd-bugs/2015-January/059882.html
Comment 6 commit-hook freebsd_committer 2015-01-24 00:28:29 UTC
A commit references this bug:

Author: delphij
Date: Sat Jan 24 00:27:51 UTC 2015
New revision: 277625
URL: https://svnweb.freebsd.org/changeset/base/277625

Log:
  MFC r276904:

  Improve style and fix a possible use-after-free case introduced in r268384
  by reinitializing the 'freestate' pointer after freeing the memory.

  Obtained from:	HardenedBSD (71fab80c5dd3034b71a29a61064625018671bbeb)
  PR:		194525
  Submitted by:	Oliver Pinter <oliver.pinter@hardenedbsd.org>

Changes:
_U  stable/10/
  stable/10/sys/kern/kern_lockf.c
Comment 7 Xin LI freebsd_committer 2015-01-24 00:30:01 UTC
Fixed in 10.1-STABLE.
Comment 8 Steve Roome 2015-05-21 23:35:29 UTC
Could this fix be rolled into p11 perhaps ?

10.1-RELEASE-p10 FreeBSD 10.1-RELEASE-p10 #0: Wed May 13 06:54:13 UTC 2015     root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  amd64
Comment 9 Jason Unovitch 2015-05-27 23:08:04 UTC
As a cross reference, the is some ongoing discussion on the FreeBSD Forums regarding the issue:

https://forums.freebsd.org/threads/fatal-trap-12-page-fault-while-in-kernel-mode-on-new-server-running-freebsd-10-1-release-p10.51737/