This is a brand new installation. The panic also occurs with the GENERIC kernel. Boot device is a SCSI disk on a seperate controller. Before the drive is setup for use as a ZFS device, the kernel identifies it as: kernel: twed0: <Unit 1, JBOD, Normal> on twe0 kernel: twed0: 152627MB (312581808 sectors) kernel: GEOM_LABEL: Label for provider twed0p1 is msdosfs/EFI. This is a Western Digial, WD1600JS SATA drive, connected to a 3ware 8002-LP card (2-port SATA, PCI). Last time the drive was used in a different computer and OS, it was in good working order. After the device is setup for use with ZFS (via :zpool create ..." command), at the next boot, the kernel panics when it begins to scan the attached disks. (Just after the "Waiting 5 seconds for SCSI devices to settle" message.) What's interesting is that different SATA drive on the same port of the same card does not cause the panic. The "good" drive is a Western Digital WD360 (SATA, 36 GB). boot-time kernel output: twe1: 152627MB (312581808 sectors) GEOM: new disk twed0 GEOM: new disk twed1 Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 01 fault virtual address = 0x3f80 fault code = supervisors read, page not present instruction pointer = 0x20:0xc06d0e2c stack pointer = 0x28:0xe2fb4b60 frame pointer = 0x28:0xe2fb4c58 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 2 (g_event) trap number = 12 panic: page fault Uptime: 1s Cannot dump. No dumpdevice defined. Automatic reboot in 15 seconds - press a key on the console to abort Fix: disconnect the drive. (not much of a workaround. :) How-To-Repeat: Do "zpool create poolname $dev". Reboot the machine.
Johan A. van Zanten wrote: > > Fatal trap 12: page fault while in kernel mode > cpuid = 0; apic id = 01 > fault virtual address = 0x3f80 > fault code = supervisors read, page not present > instruction pointer = 0x20:0xc06d0e2c > stack pointer = 0x28:0xe2fb4b60 > frame pointer = 0x28:0xe2fb4c58 > code segment = base rx0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, def32 1, gran 1 > processor eflags = interrupt enabled, resume, IOPL = 0 > current process = 2 (g_event) > trap number = 12 > panic: page fault > Uptime: 1s > Cannot dump. No dumpdevice defined. > Automatic reboot in 15 seconds - press a key on the console to abort > Hello, Please set a dumpdevice see http://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug.html for more information. We need this to be able to see what is going on and what is going nuts. Without this we will not be able (imo) to resolve your problem. Thanks for taking the time to report this though and using FreeBSD! Cheers remko -- /"\ Best regards, | remko@FreeBSD.org \ / Remko Lodder | remko@EFnet X http://www.evilcoder.org/ | / \ ASCII Ribbon Campaign | Against HTML Mail and News
Remko Lodder <remko@FreeBSD.org> wrote: > Johan A. van Zanten wrote: > > > > Fatal trap 12: page fault while in kernel mode > > cpuid = 0; apic id = 01 > > fault virtual address = 0x3f80 > > fault code = supervisors read, page not present > > instruction pointer = 0x20:0xc06d0e2c > > stack pointer = 0x28:0xe2fb4b60 > > frame pointer = 0x28:0xe2fb4c58 > > code segment = base rx0, limit 0xfffff, type 0x1b > > = DPL 0, pres 1, def32 1, gran 1 > > processor eflags = interrupt enabled, resume, IOPL = 0 > > current process = 2 (g_event) > > trap number = 12 > > panic: page fault > > Uptime: 1s > > Cannot dump. No dumpdevice defined. > > Automatic reboot in 15 seconds - press a key on the console to abort > > > > Hello, > > Please set a dumpdevice see > http://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug.html > for more information. We need this to be able to see what is going on > and what is going nuts. Without this we will not be able (imo) to > resolve your problem. Can you give and example of the syntax for specifying the dump device in the kernel config? The crash seems to be ahppening before dumpon is run. According to the web page you cite: Alternatively, the dump device can be hard-coded via the dump clause in the config(5) line of a kernel configuration file. This approach is deprecated and should be used only if a kernel is crashing before dumpon(8) can be executed. But i cannot find any example of the syntax for the "dump" clause in /usr/src/sys/conf or in config(5). Thanks, johan
The original reporter seems to have given up on this. I have seen something very similar, and thought I could provide some more information. I now have three disks all in an unusable state, causing freebsd to panic upon seeing these disks. Common to all is that they contained ZFS pools that were online when the computer crashed, possibly for unrelated reasons. Upon reboot, the computer would panic when noticing the disk; in fact, immediately after printing the standard message giving the device name and disk type on the console. ZFS may however be incidental to the problem: The panic happens even if I don't have zfs.ko loaded when the problem disk is plugged in. I wonder if it could be related to kern/127115 somehow? I cannot get a dump unfortunately - the console says "Dumping xxx MB" and hangs if I have activated kernel dumps (using dumpon) before triggering the panic. So I compiled a debug kernel and obtained a backtrace using ddb instead. Here is output, copied by hand from a photo of the screen: Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x3f80 fault code = supervisor read data, page not present [...] current process = 2 (g_event) [thread pid 2 tid 100007 ] Stopped at bcmp+0x8: repe cmpsq (%rsi),%es:(%rdi) db> trace Tracing pid 2 tid 100007 td 0xffffff0001129000 bcmp() at bcmp+0x8 g_part_taste() at g_part_taste+0x252 g_new_provider_event() at g_new_provider_event+0x75 g_run_events() at g_run_events+0x1b8 g_event_procbody() at g_event_procbody+0x57 fork_exit() at fork_exit+0x11f fork_trampoline() at fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xffffffffb3600d30, rbp = 0 --- I am really not very familiar with ddb. Let me know if you wish me to dig deeper, but then I need a pointer as to what to look for. - Harald
For what it's worth, assuming it is the partition table that has gotten screwed up somehow, here are the the first 34 sectors of the disk that caused the panic described in my previous mail: http://www.math.ntnu.no/~hanche/tmp/baddisk.bin (Created by attaching the disk to a mac and running dd bs=512 count=34 on the device file. Not sure if binary attachments are OK here.) I forgot to mention that this is on 7.0-STABLE/amd64 as of 19 August (7.0-STABLE #3). But I also see the problem on 7.0-RELEASE/i386. - Harald
I just had my biggest "duh" moment in a veeery long time. The above two "contributions" to this PR can probably be ignored. For the curious: I intended to do #; gpt create -f da2 #; gpt add -t 6a898cc3-1dd2-11b2-99a6-080020736631 da2 #; zpool create poolname da2p1 but apparently, I created the pool on da2 instead, partially overwriting the GPT. And I managed to do this (count 'em) no less than THREE times! Like I said, DUH, and my apologies for the noise. Maybe we could turn the noise into a feature request: Perhaps zpool should be smart enough to recognize that the user is about to shoot his own foot and refuse to cooperate? - Harald
Harald Hanche-Olsen <hanche@math.ntnu.no> wrote: > The original reporter seems to have given up on this. I have seen > something very similar, and thought I could provide some more > information. Thanks for helping. The problem for me is that the panic occured very early in the boot process, before the dump device is normally configured, and no one on the freebsd-help list, nor anyone reading these bug reports seemed to know or care enough to help me get a dump device configured earlier. I spent some time going through the source, trying to figure out a way to do this, but the time required for me to do this task exceeded the amount of time i had to spend on it. -johan
+ Johan A. van Zanten <johan@giantfoo.org>: > The problem for me is that the panic occured very early in the boot > process, before the dump device is normally configured, and no one > on the freebsd-help list, nor anyone reading these bug reports > seemed to know or care enough to help me get a dump device > configured earlier. Well, the handbook gives a method that it says is "deprecated" http://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug.html (specifying a dump device in the kernel config), but these lines from /usr/src/usr.sbin/config/config.y System_spec: CONFIG System_id System_parameter_list = { errx(1, "%s:%d: root/dump/swap specifications obsolete", yyfile, yyline);} make me think that the handbook itself is obsolete at this point, and the "deprecated" method is no longer available. If you still have the disk and wish to resurrect it, you can try my method: I booted from a ubuntu cd and erased the EFI partition table using dd if=/dev/zero bs=512 count=1 seek=1 of=/dev/disk/by-id/... (making VERY sure I did not clobber the wrong disk). - Harald
Harald Hanche-Olsen <hanche@math.ntnu.no> wrote: > + Johan A. van Zanten <johan@giantfoo.org>: > > > The problem for me is that the panic occured very early in the boot > > process, before the dump device is normally configured, and no one > > on the freebsd-help list, nor anyone reading these bug reports > > seemed to know or care enough to help me get a dump device > > configured earlier. > > Well, the handbook gives a method that it says is "deprecated" > > http://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug.html Yes, i think i tried this and it did not work. :( -johan
Hi, On 2008-09-30, Harald Hanche-Olsen wrote: > For the curious: I intended to do > > #; gpt create -f da2 > #; gpt add -t 6a898cc3-1dd2-11b2-99a6-080020736631 da2 > #; zpool create poolname da2p1 > > but apparently, I created the pool on da2 instead, partially > overwriting the GPT. This PR is a duplicate of kern/127115. The bug is not in zfs code but in the gpart GPT code. It's possible that a corrupted GPT partition table causes a panic in g_part_gpt_read(). These conditions must be true after reading the tables in g_part_gpt_read() to cause the panic: table->state[GPT_ELT_PRIHDR] == GPT_STATE_OK pritbl == NULL table->state[GPT_ELT_SECTBL] == GPT_STATE_OK The panic happens at line 661 in g_part_gpt.c (r183533) when tbl is NULL. Here is a proposed fix: %%% Index: sys/geom/part/g_part_gpt.c =================================================================== --- sys/geom/part/g_part_gpt.c (revision 183533) +++ sys/geom/part/g_part_gpt.c (working copy) @@ -631,7 +631,7 @@ g_part_gpt_read(struct g_part_table *bas table->state[GPT_ELT_PRIHDR] = GPT_STATE_INVALID; } - if (table->state[GPT_ELT_PRIHDR] != GPT_STATE_OK) { + if (table->state[GPT_ELT_PRITBL] != GPT_STATE_OK) { printf("GEOM: %s: the primary GPT table is corrupt or " "invalid.\n", pp->name); printf("GEOM: %s: using the secondary instead -- recovery " @@ -641,7 +641,7 @@ g_part_gpt_read(struct g_part_table *bas if (pritbl != NULL) g_free(pritbl); } else { - if (table->state[GPT_ELT_SECHDR] != GPT_STATE_OK) { + if (table->state[GPT_ELT_SECTBL] != GPT_STATE_OK) { printf("GEOM: %s: the secondary GPT table is corrupt " "or invalid.\n", pp->name); printf("GEOM: %s: using the primary only -- recovery " %%% The patch applied this is that I get with the corrupted GPT table: GEOM: ad0: the primary GPT table is corrupt or invalid. GEOM: ad0: using the secondary instead -- recovery strongly advised. -- Jaakko
+ Jaakko Heinonen <jh@saunalahti.fi>: > This PR is a duplicate of kern/127115. Like I suspected (see my earlier mail). Unfortunately I cannot test your fix, since I have repaired my three damaged disks. - Harald
Hi. I can confirm this fix works on -CURRENT as of yesterday - geom_gpt recognizes the corrupted table, and skips it. -- Kenneth Vestergaard Schmidt
State Changed From-To: open->analyzed Patch has been submitted and has been confirmed as fixing the problem.
Responsible Changed From-To: freebsd-bugs->freebsd-fs
Responsible Changed From-To: freebsd-fs->freebsd-geom Jaakko Heinonen points out that this is actually a bug with geom_gpt and not ZFS. The PR contains a patch, confirmed to fix the issue.
State Changed From-To: analyzed->patched Fix committed in -CURRENT. MFC to happen in a week. Thanks for the analysis and patch.
Responsible Changed From-To: freebsd-geom->marcel Fix committed in -CURRENT. MFC to happen in a week. Thanks for the analysis and patch.
State Changed From-To: patched->closed Fix committed to 7-STABLE.