Summary: | [zfs] [panic] [reproducable] zfs/space_map.c: solaris assert: sm->sm_space + size <= sm->sm_size | ||
---|---|---|---|
Product: | Base System | Reporter: | Palle Girgensohn <girgen> |
Component: | kern | Assignee: | freebsd-fs (Nobody) <fs> |
Status: | Closed Feedback Timeout | ||
Severity: | Affects Only Me | CC: | delphij, seanc |
Priority: | --- | ||
Version: | 10.0-RELEASE | ||
Hardware: | amd64 | ||
OS: | Any |
Description
Palle Girgensohn
2014-09-23 14:26:35 UTC
I just tried BETA3, and it panics within minutes, just running bonnie++ -d /tank/zfs/directory. The file that bonnie managed to create was just one (1) percent the size of the files created with 10.0p3, so BETA3 is even worse in this respect? # kgdb kernel.debug /var/crash/vmcore.1 GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd"... Unread portion of the kernel message buffer: panic: solaris assert: range_tree_space(rt) + size <= sm->sm_size (0x10020000 <= 0x10000000), file: /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/space_map.c, line: 121 cpuid = 8 KDB: stack backtrace: #0 0xffffffff80946fa0 at kdb_backtrace+0x60 #1 0xffffffff8090c0c5 at panic+0x155 #2 0xffffffff81b9b22f at assfail3+0x2f #3 0xffffffff819a5665 at space_map_load+0x355 #4 0xffffffff8198bb0e at metaslab_load+0x2e #5 0xffffffff8198e615 at metaslab_preload+0x65 #6 0xffffffff8193f260 at taskq_run+0x10 #7 0xffffffff80955415 at taskqueue_run_locked+0xe5 #8 0xffffffff80955ea8 at taskqueue_thread_loop+0xa8 #9 0xffffffff808dcb0a at fork_exit+0x9a #10 0xffffffff80cf295e at fork_trampoline+0xe Uptime: 1h31m29s Dumping 1806 out of 32665 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% Reading symbols from /boot/kernel/zfs.ko.symbols...done. Loaded symbols for /boot/kernel/zfs.ko.symbols Reading symbols from /boot/kernel/opensolaris.ko.symbols...done. Loaded symbols for /boot/kernel/opensolaris.ko.symbols Reading symbols from /boot/kernel/ums.ko.symbols...done. Loaded symbols for /boot/kernel/ums.ko.symbols #0 doadump (textdump=<value optimized out>) at pcpu.h:219 219 __asm("movq %%gs:%1,%0" : "=r" (td) (kgdb) bt #0 doadump (textdump=<value optimized out>) at pcpu.h:219 #1 0xffffffff8090bd42 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:452 #2 0xffffffff8090c104 in panic (fmt=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:759 #3 0xffffffff81b9b22f in assfail3 (a=<value optimized out>, lv=<value optimized out>, op=<value optimized out>, rv=<value optimized out>, f=<value optimized out>, l=<value optimized out>) at /usr/src/sys/modules/opensolaris/../../cddl/compat/opensolaris/kern/opensolaris_cmn_err.c:91 #4 0xffffffff819a5665 in space_map_load (sm=0xfffff800176f9480, rt=0xfffff8001747d000, maptype=<value optimized out>) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/space_map.c:120 #5 0xffffffff8198bb0e in metaslab_load (msp=0xfffff800233c2000) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/metaslab.c:1295 #6 0xffffffff8198e615 in metaslab_preload (arg=0xfffff800233c2000) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/metaslab.c:1652 #7 0xffffffff8193f260 in taskq_run (arg=0xfffff8006ecb60f0, pending=0) at /usr/src/sys/modules/zfs/../../cddl/compat/opensolaris/kern/opensolaris_taskq.c:109 #8 0xffffffff80955415 in taskqueue_run_locked (queue=0xfffff80017822400) at /usr/src/sys/kern/subr_taskqueue.c:342 #9 0xffffffff80955ea8 in taskqueue_thread_loop (arg=<value optimized out>) at /usr/src/sys/kern/subr_taskqueue.c:563 #10 0xffffffff808dcb0a in fork_exit (callout=0xffffffff80955e00 <taskqueue_thread_loop>, arg=0xfffff80017fd9090, frame=0xfffffe085c5dfac0) at /usr/src/sys/kern/kern_fork.c:996 #11 0xffffffff80cf295e in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:606 #12 0x0000000000000000 in ?? () Current language: auto; currently minimal (kgdb) (In reply to Palle Girgensohn from comment #1) > I just tried BETA3, That is 10.1-BETA3, of course... Is this reproducable on a newly created pool? It looks like you are using a pool formatted with old format and did not upgrade (DO NOT DO IT NOW!), and there may be existing damage with the space map -- in such case the only way to recover from the situation would be to copy all data off the pool, recreate it and restore the data. (In reply to Xin LI from comment #3) > Is this reproducable on a newly created pool? It looks like you are using a > pool formatted with old format and did not upgrade (DO NOT DO IT NOW!), and > there may be existing damage with the space map -- in such case the only way > to recover from the situation would be to copy all data off the pool, > recreate it and restore the data. Hi Xin Li, thanks for the reply! I did not try a newly created pool, it is a large pool with data, one of two redundant systems where we use zfs send | ssh | zfs recv to keep them in sync. The other machine is still on 9.3, and we got this problem after updating one system to 10.0. So, we cannot really upgrade just yet. Also, it shouln't present such a big problem just running an old version...? But as you say, there seems to something fishy with the pool, and maybe there is nothing wrong with the kernel itself. Are you sure there are no other ways to fix this but to recreate the pool? Thera are just Terabytes of data, it will take a week... :-/ is there no zdb magic or zpool export + scrub + zpool import ditto with vfs.zfs.recover =1 that could help? (In reply to Palle Girgensohn from comment #4) > I did not try a newly created pool, it is a large pool with data, one of two > redundant systems where we use zfs send | ssh | zfs recv to keep them in > sync. The other machine is still on 9.3, and we got this problem after > updating one system to 10.0. So, we cannot really upgrade just yet. Also, it > shouln't present such a big problem just running an old version...? Since you haven't upgraded the pool, will it be possible for you to downgrade your kernel/world and see if you can reproduce the problem? > But as you say, there seems to something fishy with the pool, and maybe > there is nothing wrong with the kernel itself. > > Are you sure there are no other ways to fix this but to recreate the pool? > Thera are just Terabytes of data, it will take a week... :-/ Yeah I know :( > is there no zdb magic or zpool export + scrub + zpool import ditto with > vfs.zfs.recover =1 that could help? Unfortunately, space map corruptions are fatal currently. In the future maybe we can implement a tool that traverses all data and rebuild the space map from that, but it's not trivial... Closed because of timeout. Please reopen if it reoccurs or you have additional information. |