Bug 209508 - zfs import assertion failed in avl_add()
Summary: zfs import assertion failed in avl_add()
Status: Closed Overcome By Events
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 10.3-RELEASE
Hardware: Any Any
: --- Affects Only Me
Assignee: freebsd-fs (Nobody)
URL:
Keywords: patch
Depends on:
Blocks:
 
Reported: 2016-05-14 21:52 UTC by Nathan Friess
Modified: 2018-08-13 01:27 UTC (History)
2 users (show)

See Also:


Attachments
Proposed patch to libzfs_import.c (1.32 KB, patch)
2016-05-14 21:52 UTC, Nathan Friess
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Nathan Friess 2016-05-14 21:52:35 UTC
Created attachment 170292 [details]
Proposed patch to libzfs_import.c

SUMMARY:

Starting with 10.3-RELEASE, running "zpool import" or "zpool import name" crashes with:

Assertion failed: (avl_find() succeeded inside avl_add()), file /usr/src/cddl/lib/libavl/../../../sys/cddl/contrib/opensolaris/common/avl/avl.c, line 649.

Previously when running 10.2-RELEASE this did not happen.



ENVIRONMENT:

This system is running 10.3-RELEASE in a Xen domU.  There are 4 disks attached at boot time, ada0-3.  These have GPT partitions, the largest of which is added to a zfs pool.

I also have two removable disks at that I add to the domU at runtime from the host, but the problem above occurs with just ada0-3 present ("zfs import" leads to the assertion failed instead of showing no pools), so I don't believe the removable disks are related.  However, adding and removing the disks is the reason that I'm using zpool import.



ANALYSIS:

I believe the problem is in cddl/contrib/opensolaris/lib/libzfs/common/libzfs_import.c

Around line 1225, it appears that the code is listing all of the geom providers and adding them to the slice_cache avl tree.

Placing some printfs in that code, I can see that it is trying to add ada0* twice:

...
Adding geom to avl ada0p1
Adding geom to avl ada0p2
Adding geom to avl ada0p3
Already have geom in avl ada0p1
Already have geom in avl ada0p2
Already have geom in avl ada0p3
...
Adding geom to avl ada0
Already have geom in avl ada0
...


In svn, it looks like the assertion was added in base r287703 that prevents adding duplicate entries to the avl.

I then modified libzfs_import.c to call avl_find() before trying to call avl_add:

avl_index_t where;
if (avl_find(&slice_cache, slice, &where) == NULL) {
 avl_add(&slice_cache, slice);
}

(and the same lower down around line 1260)

This allows the "zfs import" to list any available pools, and "zfs import name" to import the pool successfully.

I'm new to FreeBSD so I'm not sure if this the correct solution but it works in my case.  A patch is attached.
Comment 1 Jonathan Chen 2017-08-01 00:26:27 UTC
I hit this error trying to expand a zfs partition, with 11.1-RELEASE

# gpart delete -i 4 ada0
# gpart resize -i 3 ada0
... some errors appeared which I didn't write down ...

# mkdir /tmp/z
# zpool import -f -R /tmp/z irontree
Assertion failed: (avl_find() succeeded inside avl_add()), file /usr/src/cddl/lib/libavl/../../../sys/cddl/contrib/opensolaris/common/avl/avl.c, line 649.

I had to use 10.3-RELEASE to fix this. Using the same commands, I managed to recover and expand my zpool successfully.
Comment 2 Nathan Friess 2018-08-13 01:27:28 UTC
No longer an issue in 11.2.