Bug 162997

Summary: [geom] multiple gmirror cause kernel panic during shutdown
Product: Base System Reporter: Kaho Toshikazu <kaho>
Component: kernAssignee: freebsd-geom (Nobody) <geom>
Status: Closed FIXED    
Severity: Affects Only Me    
Priority: Normal    
Version: 10.0-CURRENT   
Hardware: Any   
OS: Any   

Description Kaho Toshikazu 2011-12-01 15:20:14 UTC
r227015(sys/geom/geom_vfs.c) causes kernel panic at destroying geom providers.
A system with only one gmirror provider does not affect this problem.
The machines with mirroring slice level and having multiple gmirror names
panic during shutdown. It occurs both i386 and amd64 system.
i386 system shows:

Fatal trap 12: page fault while in kernel mode
fault virtual address   = 0x10
fault code              = supervisor write, page not present
instruction pointer     = 0x20:0xc052e601
stack pointer           = 0x28:0xc1b0ec80
frame pointer           = 0x28:0xc1b0eca0
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 13 (g_event)

(kgdb) bt
#0  doadump (textdump=0) at pcpu.h:244
#1  0xc04a2bb3 in db_dump (dummy=-1068308991, dummy2=0, dummy3=-1, 
    dummy4=0xc1b0ea0c "") at /usr/src/sys/ddb/db_command.c:537
#2  0xc04a25f1 in db_command (last_cmdp=0xc08b003c, cmd_table=0x0, dopager=1)
    at /usr/src/sys/ddb/db_command.c:448
#3  0xc04a2755 in db_command_loop () at /usr/src/sys/ddb/db_command.c:501
#4  0xc04a47dc in db_trap (type=12, code=0) at /usr/src/sys/ddb/db_main.c:229
#5  0xc05c1bfd in kdb_trap (type=12, code=0, tf=0xc1b0ec40)
    at /usr/src/sys/kern/subr_kdb.c:625
#6  0xc07e6f9f in trap_fatal (frame=0xc1b0ec40, eva=16)
    at /usr/src/sys/i386/i386/trap.c:966
#7  0xc07e7099 in trap_pfault (frame=0xc1b0ec40, usermode=0, eva=16)
    at /usr/src/sys/i386/i386/trap.c:839
#8  0xc07e7ec7 in trap (frame=0xc1b0ec40) at /usr/src/sys/i386/i386/trap.c:558
#9  0xc07d3a6c in calltrap () at /usr/src/sys/i386/i386/exception.s:168
#10 0xc052e601 in g_vfs_orphan (cp=0xc1e79440) at atomic.h:246
#11 0xc052905d in g_run_events () at /usr/src/sys/geom/geom_event.c:211
#12 0xc052a648 in g_event_procbody (arg=0x0)
    at /usr/src/sys/geom/geom_kern.c:122
#13 0xc05631a2 in fork_exit (callout=0xc052a5e0 <g_event_procbody>, arg=0x0, 
    frame=0xc1b0ed28) at /usr/src/sys/kern/kern_fork.c:995
#14 0xc07d3ae4 in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:275

Fix: 

I don't have a solution but 
kernel with `svn sys/geom/geom_vfs.c -r 227014` does not panic.
How-To-Repeat: setup more than 2 gmirror names and shutdown
Comment 1 Mark Linimon freebsd_committer freebsd_triage 2011-12-01 16:49:44 UTC
Responsible Changed
From-To: freebsd-bugs->freebsd-geom

Over to maintainer(s).
Comment 2 Florian Smeets freebsd_committer freebsd_triage 2011-12-01 18:37:40 UTC
I was tracking down a similar problem. My sparc64 machine with multiple
gmirrors stopped rebooting, it hangs after the first mirror is
destroyed, and never recovers. I have to reset it via LOM.

I can confirm the hang is caused by r227015.


Syncing disks, vnodes remaining...1 1 0 0 done
GEOM_MIRROR: Device var: provider mirror/var destroyed.
GEOM_MIRROR: Device var destroyed.
*hang*

Florian
Comment 3 dfilter service freebsd_committer freebsd_triage 2011-12-02 17:10:11 UTC
Author: mav
Date: Fri Dec  2 17:09:48 2011
New Revision: 228204
URL: http://svn.freebsd.org/changeset/base/228204

Log:
  Close race between geom destruction on g_vfs_close() when softc destroyed
  and g_vfs_orphan() call that tries to access softc, intruced at r227015.
  
  PR:		kern/162997

Modified:
  head/sys/geom/geom_vfs.c

Modified: head/sys/geom/geom_vfs.c
==============================================================================
--- head/sys/geom/geom_vfs.c	Fri Dec  2 15:47:05 2011	(r228203)
+++ head/sys/geom/geom_vfs.c	Fri Dec  2 17:09:48 2011	(r228204)
@@ -169,8 +169,10 @@ g_vfs_orphan(struct g_consumer *cp)
 	g_topology_assert();
 
 	gp = cp->geom;
-	sc = gp->softc;
 	g_trace(G_T_TOPOLOGY, "g_vfs_orphan(%p(%s))", cp, gp->name);
+	sc = gp->softc;
+	if (sc == NULL)
+		return;
 	mtx_lock(&sc->sc_mtx);
 	sc->sc_orphaned = 1;
 	destroy = (sc->sc_active == 0);
_______________________________________________
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
Comment 4 Kaho Toshikazu 2011-12-05 01:06:11 UTC
Hello,

I don't like to crash real machine many times and build test
environment on qemu. md0p2a is labeled gm0 and md0p2h is gm1,
and mount as UFS2. After sysctl kern.geom.debugflags=7,
machine was rebooted. The copy from console before panic is here.

open delta:[r-1w-1e-3] old:[r2w2e6] provider:[r2w2e6] 0xc14eac00(md0)
g_post_event_x(0xc052c830, 0xc166c300, 2, 0)
  ref 0xc166c300
g_post_event_x(0xc0a03e40, 0xc1446b00, 2, 0)
g_wither_geom(0xc17ffa80(gm1.sync))
GEOM_MIRROR: Device gm1 destroyed.
g_wither_geom(0xc17ffb00(gm1))
g_orphan_register(mirror/gm1)
g_vfs_orphan(0xc1800400(ffs.mirror/gm1))
kernel trap 12 with interrupts disabled

The situation looks like this:
gm1 was destroyed in g_vfs_close() and then g_vfs_orphan() was called to
manipulate gm1. The function g_vfs_close() was freed softc and 
g_vfs_orphan() would like to use softc already freed and it causes
panic. 

I think that malloc() in g_vfs_open() and free() in g_vfs_close()
for mtx_lock is not valid method. malloc() should not be used,
or fee() should be used in other function. Or correct other code
which never call destroyed provider.

--
kaho Toshikazu
Comment 5 Kaho Toshikazu 2011-12-05 03:33:10 UTC
I missed r228204 and it makes machine reboot without panic.
Thanks.

-- 
Kaho Toshikazu
Comment 6 Alexander Motin freebsd_committer freebsd_triage 2011-12-05 08:45:17 UTC
State Changed
From-To: open->closed

Problem fixed by r228204.