Bug 235162 - [panic] avl_find() succeeded inside avl_add() during zpool -> zpool migration
Summary: [panic] avl_find() succeeded inside avl_add() during zpool -> zpool migration
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 12.0-RELEASE
Hardware: amd64 Any
: --- Affects Only Me
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-01-23 19:51 UTC by Dave Cottlehuber
Modified: 2019-01-30 13:48 UTC (History)
0 users

See Also:


Attachments
FreeBSD 13.0-CURRENT r343518+5470c3dc0d42(master) GENERIC avl_find() succeeded inside avl_add() (48.00 KB, text/plain)
2019-01-29 08:53 UTC, Dave Cottlehuber
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Dave Cottlehuber freebsd_committer 2019-01-23 19:51:36 UTC
Dump header from device: /dev/gpt/swap
  Architecture: amd64
  Architecture Version: 2
  Dump Length: 14036127744
  Blocksize: 512
  Compression: none
  Dumptime: Wed Jan 23 18:36:11 2019
  Hostname: wintermute.skunkwerks.at
  Magic: FreeBSD Kernel Dump
  Version String: FreeBSD 12.0-RELEASE r341666 GENERIC
  Panic String: avl_find() succeeded inside avl_add()
  Dump Parity: 1555455779
  Bounds: 1
  Dump Status: good

Backtrace:
Reading symbols from /boot/kernel/kernel...(no debugging symbols found)...done.
0xffffffff80bcd0bd in sched_switch ()
(kgdb) #0  0xffffffff80bcd0bd in sched_switch ()
#1  0x0000000000000007 in ?? ()
#2  0xfffffe0153d766d0 in ?? ()
#3  0xffffffff80bcbc0a in cpu_search_lowest ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
(kgdb) 

I also have a vmcore.1 to go with it if these are of use.

System was doing just a big zfs transfer, using ~ 100GiB RAM and ~ 2/3 of all CPU.

  zfs send -LDevcR embiggen@repl | zfs recv -Fuvs zroot

about ~ 300MB/s transfer from 1 zfs pool to another local zfs pool, roughly transferred ~2.18TB of 2.25T planned, so very close to end of it :-(

4 datasets out of 85 missing, I was afk drinking coffee.
Comment 1 Dave Cottlehuber freebsd_committer 2019-01-27 12:46:23 UTC
  Panic String: avl_find() succeeded inside avl_add()

same panic doing similar things - large send|recv although this time within same pool:

Reading symbols from /boot/kernel/kernel...(no debugging symbols found)...done.
0xffffffff80bcd0bd in sched_switch ()
(kgdb) #0  0xffffffff80bcd0bd in sched_switch ()
#1  0xffffffff80ba6de1 in mi_switch ()
#2  0xffffffff80bf554c in sleepq_wait ()
#3  0xffffffff80ba6817 in _sleep ()
#4  0xffffffff80bfae71 in taskqueue_thread_loop ()
#5  0xffffffff80b5bf33 in fork_exit ()
#6  <signal handler called>
(kgdb)
Comment 2 Andriy Gapon freebsd_committer 2019-01-27 16:26:47 UTC
You need kernel debug symbols for kgdb to be useful.
Comment 3 Dave Cottlehuber freebsd_committer 2019-01-29 08:53:26 UTC
Created attachment 201500 [details]
FreeBSD 13.0-CURRENT r343518+5470c3dc0d42(master) GENERIC avl_find() succeeded inside avl_add()

FreeBSD 13.0-CURRENT r343518+5470c3dc0d42(master) GENERIC
avl_find() succeeded inside avl_add()

Mon Jan 28 23:26:19 UTC 2019
[767] witness_lock_list_get: witness exhausted
[3270] panic: avl_find() succeeded inside avl_add()
[3270] cpuid = 3
[3270] time = 1548721098
[3270] KDB: stack backtrace:
[3270] db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe015ae95180
[3270] vpanic() at vpanic+0x1b4/frame 0xfffffe015ae951e0
[3270] panic() at panic+0x43/frame 0xfffffe015ae95240
[3270] avl_add() at avl_add+0xce/frame 0xfffffe015ae95280
[3270] dmu_recv_end() at dmu_recv_end+0x16c/frame 0xfffffe015ae953d0
[3270] zfs_ioc_recv() at zfs_ioc_recv+0xa47/frame 0xfffffe015ae95610
[3270] zfsdev_ioctl() at zfsdev_ioctl+0x788/frame 0xfffffe015ae956b0
[3270] devfs_ioctl() at devfs_ioctl+0xca/frame 0xfffffe015ae95700
[3270] VOP_IOCTL_APV() at VOP_IOCTL_APV+0x63/frame 0xfffffe015ae95720
[3270] vn_ioctl() at vn_ioctl+0x124/frame 0xfffffe015ae95830
[3270] devfs_ioctl_f() at devfs_ioctl_f+0x1f/frame 0xfffffe015ae95850
[3270] kern_ioctl() at kern_ioctl+0x29b/frame 0xfffffe015ae958c0
[3270] sys_ioctl() at sys_ioctl+0x15d/frame 0xfffffe015ae95990
[3270] amd64_syscall() at amd64_syscall+0x276/frame 0xfffffe015ae95ab0
[3270] fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe015ae95ab0
[3270] --- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x8004bc0ea, rsp = 0x7fffffff7668, rbp = 0x7fffffff76e0 ---
[3270] KDB: enter: panic

I didn't get a coredump yet, hopefully this is more helpful. It's
somewhat disruptive to "test" this, I only get 1 chance a day.
Comment 4 Andriy Gapon freebsd_committer 2019-01-30 13:11:20 UTC
(In reply to Dave Cottlehuber from comment #3)
Looks like the crash is in add_ds_to_guidmap().
Could it be that you somehow got two datasets with the same GUID in the original pool?
I think that you can check that with 'zfs get -r guid embiggen'.
Comment 5 Dave Cottlehuber freebsd_committer 2019-01-30 13:48:55 UTC
thanks Andriy. No unique guids found via:

zfs list -o guid -r embiggen |sort -h | uniq -d

I'll try to get a coredump again tonight, it looks as if the crash is triggered by a single zvol.