Bug 170238 - [zfs] [panic] Panic when deleting data
Summary: [zfs] [panic] Panic when deleting data
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 9.1-PRERELEASE
Hardware: Any Any
: Normal Affects Only Me
Assignee: Andriy Gapon
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-07-28 22:50 UTC by Joshua Beard
Modified: 2012-12-01 18:50 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Joshua Beard 2012-07-28 22:50:00 UTC
I posted this to the -stable mailing list, but unfortunately didn't get any feedback there.

When attempting to delete data from a ZFS pool, the system panics.  I completely rebuilt the pools from scratch and experience the same results.

The data itself isn't anything special - it's data that is rsynced nightly from a Mac OS X server for backups.  It doesn't appear to be any specific file(s) that causes it, but it's consistently reproducible.

Do delete, I'm doing a simple "rm -rf"

I run a zpool scrub weekly and it ran a couple of days ago without errors.  This 
issue was present before and after this.  Additionally, I haven't seen 
anything in my logs that indicate failing drives, but I haven't ran a 
long smartctl test for a while.

The system has 16 GB of RAM and I'm not doing anything special for 
tuning ZFS.  The ZFS pool does use compression.

I do have crash dumps enabled, if that helps, and would be happy to 
provide any further detail and information.  Redoing the filesystem is 
an option, as the data is just duplicate backup data and can easily be 
restored.

Here's a a panic log:
Fatal trap 12: page fault while in kernel mode
cpuid = 3; apic id = 05
fault virtual address   = 0x160
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff816bbbe6
stack pointer           = 0x28:0xffffff8465f22870
frame pointer           = 0x28:0xffffff8465f22930
code segment            = base rx0, limit 0xfffff, type 0x1b
                         = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 1646 (rm)
trap number             = 12
panic: page fault
cpuid = 3
KDB: stack backtrace:
#0 0xffffffff8091c836 at kdb_backtrace+0x66
#1 0xffffffff808e67ee at panic+0x1ce
#2 0xffffffff80bd39c0 at trap_fatal+0x290
#3 0xffffffff80bd3cfd at trap_pfault+0x1ed
#4 0xffffffff80bd431e at trap+0x3ce
#5 0xffffffff80bbef1f at calltrap+0x8
#6 0xffffffff80c637e4 at VOP_REMOVE_APV+0x34
#7 0xffffffff8098300d at kern_unlinkat+0x32d
#8 0xffffffff80bd3270 at amd64_syscall+0x590
#9 0xffffffff80bbf207 at Xfast_syscall+0xf7
Uptime: 4m50s

How-To-Repeat: Attempting to delete data using rm consistently reproduces the issue.
Comment 1 Mark Linimon freebsd_committer freebsd_triage 2012-07-29 02:37:12 UTC
Responsible Changed
From-To: freebsd-bugs->freebsd-fs

Over to maintainer(s).
Comment 2 Andriy Gapon freebsd_committer freebsd_triage 2012-12-01 18:34:32 UTC
State Changed
From-To: open->closed
Comment 3 Andriy Gapon freebsd_committer freebsd_triage 2012-12-01 18:35:33 UTC
Responsible Changed
From-To: freebsd-fs->avg

I've looked at this one.
Comment 4 Andriy Gapon freebsd_committer freebsd_triage 2012-12-01 18:42:02 UTC
As discussed on the mailing list:
http://thread.gmane.org/gmane.os.freebsd.devel.file-systems/16504/focus=16507

I believe that the problem is caused by the corruption in ZFS node attributes.
In r240632 and r240345 some bugs were fixed which could lead to such corruption.

If this problem can be reproduced with data _created_ after those revisions,
then please open a new PR.
Note that data received from older FreeBSD systems or other operating systems
may still contain corruptions of the same variety.

-- 
Andriy Gapon