Created attachment 155528 [details] Core dump text file My home FreeBSD servers is panicking trying to mount one of its ZFS pools. It is a VM running on vSphere 5.5 with two LSI SAS9211-8i HBAs presented to it using PCI passthrough and has been quite stable for a couple of years in this configuration. It was recently updated to 10.1 (from 10), but ran successfully for a week or two before the current problem. It was also relatively recently (within the last month or two) updated from 9.3-STABLE. The ZFS pool was not upgraded to the latest features when the system was. Trying to "zpool import -f" on freshly built 10.1 or 9.3 systems causes the same panic. I have also tried importing with readonly=on. I haven't tried systems earlier than 9.3. A second zpool from the same server is working without problems (ie: I was able to mount it on the freshly built 10.1 box with zpool import-f). The text from the panic is: FreeBSD freebsd 10.1-RELEASE-p9 FreeBSD 10.1-RELEASE-p9 #0: Tue Apr 7 01:09:46 UTC 2015 root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64 panic: solaris assert: range_tree_space(rt) == space (0x6b34cb000 == 0x6b3525000), file: /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/space_map.c, line: 130 GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd"... Unread portion of the kernel message buffer: panic: solaris assert: range_tree_space(rt) == space (0x6b34cb000 == 0x6b3525000), file: /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/space_map.c, line: 130 cpuid = 0 KDB: stack backtrace: #0 0xffffffff80963000 at kdb_backtrace+0x60 #1 0xffffffff80928125 at panic+0x155 #2 0xffffffff81b7c22f at assfail3+0x2f #3 0xffffffff81a836e5 at space_map_load+0x3d5 #4 0xffffffff81a69b0e at metaslab_load+0x2e #5 0xffffffff81a6b609 at metaslab_alloc+0x6b9 #6 0xffffffff81aa9ca6 at zio_dva_allocate+0x76 #7 0xffffffff81aa7382 at zio_execute+0x162 #8 0xffffffff80971475 at taskqueue_run_locked+0xe5 #9 0xffffffff80971f08 at taskqueue_thread_loop+0xa8 #10 0xffffffff808f8b6a at fork_exit+0x9a #11 0xffffffff80d0acfe at fork_trampoline+0xe Uptime: 4m26s Dumping 418 out of 8168 MB:..4%..12%..23%..31%..43%..54%..62%..73%..81%..92% I have attached the core.txt file produced, I also have a core dump. This is from trying to import on the fresh 10.1 system. This may be related to https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=193875 ? It would be great if I could even get this pool mounted in a readonly state to pull some of the data off it. Nothing is irreplaceable, but restoring ~9T over the internet takes a long time.
Have you tried importing with 10-Stable or current to see if they help?
(In reply to Steven Hartland from comment #1) Hi, Tried with 10-STABLE this morning and no luck, though it's a different line reported in space_map.c. root@freebsd:~ # cat /var/crash/core.txt.0 freebsd dumped core - see /var/crash/vmcore.0 Tue Apr 14 20:58:16 UTC 2015 FreeBSD freebsd 10.1-STABLE FreeBSD 10.1-STABLE #0 r281528: Tue Apr 14 20:40:23 UTC 2015 root@freebsd:/usr/obj/usr/src/sys/GENERIC amd64 panic: solaris assert: range_tree_space(rt) == space (0x6b34cb000 == 0x6b3525000), file: /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/space_map.c, line: 131 GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd"... Unread portion of the kernel message buffer: panic: solaris assert: range_tree_space(rt) == space (0x6b34cb000 == 0x6b3525000), file: /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/space_map.c, line: 131 cpuid = 1 KDB: stack backtrace: #0 0xffffffff80973b90 at kdb_backtrace+0x60 #1 0xffffffff80937c15 at panic+0x155 #2 0xffffffff81c0922f at assfail3+0x2f #3 0xffffffff81a8c5a5 at space_map_load+0x3d5 #4 0xffffffff81a721be at metaslab_load+0x2e #5 0xffffffff81a73da7 at metaslab_alloc+0x777 #6 0xffffffff81ab4166 at zio_dva_allocate+0x76 #7 0xffffffff81ab1542 at zio_execute+0x162 #8 0xffffffff80981f75 at taskqueue_run_locked+0xe5 #9 0xffffffff80982a08 at taskqueue_thread_loop+0xa8 #10 0xffffffff80906d6a at fork_exit+0x9a #11 0xffffffff80d1d5de at fork_trampoline+0xe Uptime: 54s Dumping 408 out of 8167 MB:..4%..12%..24%..32%..44%..51%..63%..71%..83%..91% Will compile -CURRENT and try that tonight.
(In reply to Chris Smith from comment #0) Also tried importing the pool to a fresh 8.4 install and had the same panic. I was able to determine (based on the pool that's still OK) that the zpool version is from 8.x.
(In reply to Chris Smith from comment #3) The backtrace suggests that you have a space map corruption, but the validation code is only nominally different so it's not clear to me why it won't panic on 8.x. Will the system allow you to import the pool read-only? (zpool import -o readonly=on)?
(In reply to Xin LI from comment #4) Sorry if that wasn't clear. Trying to mount the pool on a fresh 8.4 system DOES cause a panic, same as 9.3 and 10.1. I did try -o readonly on 10.1 and 9.3, but not 8.4. I'm at work at the moment and can't access the system. I'll try -o readonly on 8.4 when I get home. Is there any chance using the -F "Recovery mode" switch to zpool import may help ? The data in this pool is primarily archival, so I don't mind losing recent changes.
Success ! I was able to mount the pool read-only on the 8.3 system. I'll backup my data off the pool first, but after that if you want me to dump any info out of the pool to help you debug the problem, I'm happy to leave it a few days before nuking and recreating the pool.
I've retrieved all my data from the pool. Does anyone want a dump of any data before I nuke it and recreate ? I will wait 24 hours.
As with any upgrade, where the on disk state is the cause of the issue, its not going clear if the issue occurred due to old code which has already been fixed. Given this I'm not sure how helpful a dump would be tbh. Even so what size are we talking about?
(In reply to Steven Hartland from comment #8) The pool itself is about 15T (~10T of actual data). I thought there may be some way for you to pull out the metadata that's causing the panic. Happy to do that if it can help. Otherwise I'll just nuke and recreate. :)
This _might_ be fixed by https://github.com/openzfs/zfs/commit/c7a4255f128cc493df8383cb9f1ed650191b2081 but unfortunately we were unable to tell with the available information, so marking this as overcome by events.