Bug 162010 - [geli] panic: Provider's error should be set (error=0)(device=label/feiya.eli).
Summary: [geli] panic: Provider's error should be set (error=0)(device=label/feiya.eli).
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: unspecified
Hardware: Any Any
: Normal Affects Only Me
Assignee: FreeBSD bugs mailing list
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-10-25 18:40 UTC by Fabian Keil
Modified: 2017-12-31 22:32 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Fabian Keil 2011-10-25 18:40:09 UTC
I reproducible get a kernel panic when losing an geli-encrypted
labeled umass device that acts as a single vdev for a ZFS pool
that is being scrubbed.

In case the descriptions isn't clear, I'm using pools like this one:

fk@r500 /usr/crash $zpool status extreme
  pool: extreme
 state: ONLINE
  scan: scrub repaired 0 in 97h31m with 0 errors on Fri Oct 21 00:32:54 2011
config:

        NAME                 STATE     READ WRITE CKSUM
        extreme              ONLINE       0     0     0
          label/extreme.eli  ONLINE       0     0     0

errors: No known data errors
fk@r500 /usr/crash $sudo glabel list
Geom name: da0
Providers:
1. Name: label/extreme
   Mediasize: 4023385600 (3.8G)
   Sectorsize: 512
   Mode: r1w1e1
   secoffset: 0
   offset: 0
   seclength: 7858175
   length: 4023385600
   index: 0
Consumers:
1. Name: da0
   Mediasize: 4023386112 (3.8G)
   Sectorsize: 512
   Mode: r1w1e2

fk@r500 /usr/crash $sudo geli list label/extreme.eli
Geom name: label/extreme.eli
State: ACTIVE
EncryptionAlgorithm: AES-XTS
KeyLength: 256
Crypto: software
UsedKey: 0
Flags: NONE
KeysAllocated: 8
KeysTotal: 8
Providers:
1. Name: label/extreme.eli
   Mediasize: 4023385088 (3.8G)
   Sectorsize: 512
   Mode: r1w1e1
Consumers:
1. Name: label/extreme
   Mediasize: 4023385600 (3.8G)
   Sectorsize: 512
   Mode: r1w1e1

I don't know if being the only vdev of the pool or the use of labels is important.

With a GENERIC kernel the panic is:

GEOM_ELI: Device label/feiya.eli created.
GEOM_ELI: Encryption: AES-XTS 256
GEOM_ELI:     Crypto: software
(cd0:ahcich1:0:0:0): SCSI status error
(cd0:ahcich1:0:0:0): READ CAPACITY. CDB: 25 0 0 0 0 0 0 0 0 0
(cd0:ahcich1:0:0:0): CAM status: SCSI Status Error
(cd0:ahcich1:0:0:0): SCSI status: Check Condition
(cd0:ahcich1:0:0:0): SCSI sense: NOT READY asc:3a,1 (Medium not present - tray closed)
(cd0:ahcich1:0:0:0): Error 6, Unretryable error
(da0:umass-sim0:0:0:0): Request completed with CAM_REQ_CMP_ERR
(da0:umass-sim0:0:0:0): Retrying command
ugen7.2: <vendor 0x090c> at usbus7 (disconnected)
umass0: at uhub7, port 2, addr 2 (disconnected)
(da0:umass-sim0:0:0:0): Request completed with CAM_REQ_CMP_ERR
(da0:umass-sim0:0:0:0): Retrying command
(da0:umass-sim0:0:0:0): Selection timeout
(da0:umass-sim0:0:0:0): Retrying command
(da0:umass-sim0:0:0:0): Selection timeout
(da0:umass-sim0:0:0:0): Retrying command
(da0:umass-sim0:0:0:0): lost device - 1 outstanding
(pass2:umass-sim0:0:0:0): lost device
(pass2:umass-sim0:0:0:0): removing device entry
(da0:umass-sim0:0:0:0): Error 6, Retries exhausted
(da0:umass-sim0:0:0:0): oustanding 0
GEOM_ELI: Crypto WRITE request failed (error=6). label/feiya.eli[WRITE(offset=2301105664, length=33280)]
GEOM_ELI: Crypto WRITE request failed (error=6). label/feiya.eli[WRITE(offset=2301138944, length=1536)]
GEOM_ELI: Crypto WRITE request failed (error=6). label/feiya.eli[WRITE(offset=2965543936, length=5120)]
GEOM_ELI: Crypto WRITE request failed (error=6). label/feiya.eli[WRITE(offset=2965690368, length=29696)]
GEOM_ELI: Crypto WRITE request failed (error=6). label/feiya.eli[WRITE(offset=1020291072, length=29696)]
panic: Provider's error should be set (error=0)(device=label/feiya.eli).
cpuid = 0
KDB: stack backtrace:
(da0:umass-sim0:0:0:0): removing device entry
db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
kdb_backtrace() at kdb_backtrace+0x37
panic() at panic+0x187
g_eli_start() at g_eli_start+0x271
g_io_schedule_down() at g_io_schedule_down+0x1e3
g_down_procbody() at g_down_procbody+0x72
fork_exit() at fork_exit+0x135
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xffffff8000244d00, rbp = 0 ---
KDB: enter: panic
[...]
#0  doadump (textdump=0) at /usr/src/sys/kern/kern_shutdown.c:260
#1  0xffffffff802fa740 in db_dump (dummy=Variable "dummy" is not available.
) at /usr/src/sys/ddb/db_command.c:537
#2  0xffffffff802f9d31 in db_command (last_cmdp=0xffffffff810efd40, cmd_table=Variable "cmd_table" is not available.
) at /usr/src/sys/ddb/db_command.c:448
#3  0xffffffff802f9f80 in db_command_loop () at /usr/src/sys/ddb/db_command.c:501
#4  0xffffffff802fc0d9 in db_trap (type=Variable "type" is not available.
) at /usr/src/sys/ddb/db_main.c:229
#5  0xffffffff8085d8a1 in kdb_trap (type=3, code=0, tf=0xffffff80002449c0) at /usr/src/sys/kern/subr_kdb.c:625
#6  0xffffffff80b11036 in trap (frame=0xffffff80002449c0) at /usr/src/sys/amd64/amd64/trap.c:590
#7  0xffffffff80afb2df in calltrap () at /usr/src/sys/amd64/amd64/exception.S:228
#8  0xffffffff8085d64b in kdb_enter (why=0xffffffff80d287eb "panic", msg=0x80 <Address 0x80 out of bounds>) at cpufunc.h:63
#9  0xffffffff80827f00 in panic (fmt=Variable "fmt" is not available.
) at /usr/src/sys/kern/kern_shutdown.c:599
#10 0xffffffff817bd631 in g_eli_start (bp=Variable "bp" is not available.
) at /usr/src/sys/modules/geom/geom_eli/../../../geom/eli/g_eli.c:270
#11 0xffffffff807c4d03 in g_io_schedule_down (tp=Variable "tp" is not available.
) at /usr/src/sys/geom/geom_io.c:632
#12 0xffffffff807c5192 in g_down_procbody (arg=Variable "arg" is not available.
) at /usr/src/sys/geom/geom_kern.c:110
#13 0xffffffff807fcb55 in fork_exit (callout=0xffffffff807c5120 <g_down_procbody>, arg=0x0, frame=0xffffff8000244c50) at /usr/src/sys/kern/kern_fork.c:995
#14 0xffffffff80afb80e in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:602


With a custom kernel without WITNESS the backtrace is:

Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
fault virtual address   = 0x284
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff81314512
stack pointer           = 0x28:0xffffff8000244b90
frame pointer           = 0x28:0xffffff8000244bc0
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 13 (g_down)

Reading symbols from /boot/kernel/zfs.ko...Reading symbols from /boot/kernel/zfs.ko.symbols...done.
done.
[...]
Loaded symbols for /boot/kernel/fdescfs.ko
#0  doadump (textdump=0) at /usr/src/sys/kern/kern_shutdown.c:260
260             if (textdump && textdump_pending) {
(kgdb) where
#0  doadump (textdump=0) at /usr/src/sys/kern/kern_shutdown.c:260
#1  0xffffffff80333380 in db_dump (dummy=Variable "dummy" is not available.
) at /usr/src/sys/ddb/db_command.c:537
#2  0xffffffff80332cb1 in db_command (last_cmdp=0xffffffff80dfd100, cmd_table=Variable "cmd_table" is not available.
) at /usr/src/sys/ddb/db_command.c:448
#3  0xffffffff80332f00 in db_command_loop () at /usr/src/sys/ddb/db_command.c:501
#4  0xffffffff80335039 in db_trap (type=Variable "type" is not available.
) at /usr/src/sys/ddb/db_main.c:229
#5  0xffffffff806a1061 in kdb_trap (type=12, code=0, tf=0xffffff8000244ae0) at /usr/src/sys/kern/subr_kdb.c:625
#6  0xffffffff80903bbd in trap_fatal (frame=0xffffff8000244ae0, eva=Variable "eva" is not available.
) at /usr/src/sys/amd64/amd64/trap.c:813
#7  0xffffffff80903f26 in trap_pfault (frame=0xffffff8000244ae0, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:734
#8  0xffffffff8090448f in trap (frame=0xffffff8000244ae0) at /usr/src/sys/amd64/amd64/trap.c:473
#9  0xffffffff808ee1e3 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:228
#10 0xffffffff81314512 in g_eli_start (bp=0xfffffe000c44cd98) at /usr/src/sys/modules/geom/geom_eli/../../../geom/eli/g_eli.c:320
#11 0xffffffff80600304 in g_io_schedule_down (tp=Variable "tp" is not available.
) at /usr/src/sys/geom/geom_io.c:632
#12 0xffffffff8060060c in g_down_procbody (arg=Variable "arg" is not available.
) at /usr/src/sys/geom/geom_kern.c:110
#13 0xffffffff806399ef in fork_exit (callout=0xffffffff806005b0 <g_down_procbody>, arg=0x0, frame=0xffffff8000244c50) at /usr/src/sys/kern/kern_fork.c:995
#14 0xffffffff808ee70e in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:602
#15 0x0000000000000000 in ?? ()
#16 0x0000000000000000 in ?? ()
#17 0x0000000000000001 in ?? ()
#18 0x0000000000000000 in ?? ()
[...]
#39 0xffffffff80e398f0 in sleepq_chains ()
#40 0xfffffe0002713888 in ?? ()
#41 0x0000000000000000 in ?? ()
#42 0xfffffe0002713460 in ?? ()
#43 0xffffff8000244ad0 in ?? ()
#44 0xffffff8000244a78 in ?? ()
#45 0xfffffe00027138c0 in ?? ()
#46 0xffffffff80693ca0 in sched_switch (td=0xffffffff806005b0, newtd=0x0, flags=Variable "flags" is not available.
) at /usr/src/sys/kern/sched_ule.c:1853
Previous frame inner to this frame (corrupt stack?)
(kgdb) f 10
#10 0xffffffff81314512 in g_eli_start (bp=0xfffffe000c44cd98) at /usr/src/sys/modules/geom/geom_eli/../../../geom/eli/g_eli.c:320
320     }
(kgdb) p *bp
$1 = {bio_cmd = 1 '\001', bio_flags = 0 '\0', bio_cflags = 0 '\0', bio_pflags = 255 'ÿ', bio_dev = 0x0, bio_disk = 0x0, bio_offset = 4022607872, bio_bcount = 0,
  bio_data = 0xffffff8005747000 "\200*ø\f", bio_error = 0, bio_resid = 0, bio_done = 0xffffffff811a72d0 <vdev_geom_io_intr>, bio_driver1 = 0xfffffe0004c83828, bio_driver2 = 0x0,
  bio_caller1 = 0xfffffe001cf5ba50, bio_caller2 = 0x0, bio_queue = {tqe_next = 0xfffffe000c44b3a0, tqe_prev = 0xffffffff80e1c840}, bio_attribute = 0x0, bio_from = 0xfffffe000c782a80,
  bio_to = 0xfffffe000cc8ca00, bio_length = 8192, bio_completed = 0, bio_children = 1, bio_inbed = 0, bio_parent = 0x0, bio_t0 = {sec = 243, frac = 14970883905652654531},
  bio_task = 0, bio_task_arg = 0x0, bio_classifier1 = 0x0, bio_classifier2 = 0x0, bio_pblkno = 0}

How-To-Repeat: Either use a flaky USB stick as geli-encrypted vdev and wait
for it to disappear by itself while a zpool scrub is in progress,
or use a reliable stick and unplug it manually while doing a scrub.
Comment 1 Mark Linimon freebsd_committer 2011-10-26 05:21:49 UTC
Responsible Changed
From-To: freebsd-bugs->freebsd-geom

Over to maintainer(s).
Comment 2 Eitan Adler freebsd_committer freebsd_triage 2017-12-31 08:01:18 UTC
For bugs matching the following criteria:

Status: In Progress Changed: (is less than) 2014-06-01

Reset to default assignee and clear in-progress tags.

Mail being skipped