161968 – [zfs] [hang] renaming snapshot with -r including a zvol snapshot causes total ZFS freeze/lockup

Bug 161968 - [zfs] [hang] renaming snapshot with -r including a zvol snapshot causes total ZFS freeze/lockup

Summary: [zfs] [hang] renaming snapshot with -r including a zvol snapshot causes total...

Status:	Closed FIXED

Alias:	None

Product:	Base System
Classification:	Unclassified
Component:	kern (show other bugs)
Version:	8.2-STABLE
Hardware:	Any Any

Importance:	Normal Affects Only Me
Assignee:	freebsd-fs (Nobody)

URL:
Keywords:

Depends on:
Blocks:

Reported:	2011-10-24 16:20 UTC by Peter Maloney
Modified:	2014-11-05 10:26 UTC (History)
CC List:	2 users (show)

See Also:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Peter Maloney 2011-10-24 16:20:00 UTC

renaming snapshot with -r including a zvol snapshot causes total ZFS freeze/lockup/deadlock. 

After it is locked up, any command using "zfs" "zpool" "sysctl -a", or NFS exports will freeze. And "shutdown -r" will not restart the system, only shut it down until it says the disks are all synced.

CTRL+T done after zfs or zpool shows state "spa_namespace_lock". Done after "sysctl -a" shows state "g_waitfor_event".

Most of the time, a simple "zfs rename" does not cause a lockup, however with a specific snapshot on one system, renaming it always causes a lockup, and on every other 8-STABLE system I have, my script always causes a lockup after a few loops.

My FreeBSD 8-STABLE was installed as 8.2 release plus the mps driver, and then cvsup using this cvsupfile (removed comments):

*default host=cvsup.de.FreeBSD.org
*default base=/var/db
*default prefix=/usr
*default release=cvs tag=RELENG_8
*default delete use-rel-suffix
*default date=2011.09.27.00.00.00
*default compress
src-all

(and the same freeze result occurs with date changed to today, Oct. 24th)

# zpool get all big
NAME  PROPERTY       VALUE       SOURCE
big   size           39.8G       -
big   capacity       24%         -
big   altroot        -           default
big   health         ONLINE      -
big   guid           14576708073682355899  default
big   version        28          default
big   bootfs         -           default
big   delegation     on          default
big   autoreplace    on          local
big   cachefile      -           default
big   failmode       continue    local
big   listsnapshots  on          local
big   autoexpand     off         default
big   dedupditto     0           default
big   dedupratio     1.00x       -
big   free           30.1G       -
big   allocated      9.64G       -
big   readonly       off         -

# zfs get all big
NAME  PROPERTY              VALUE                  SOURCE
big   type                  filesystem             -
big   creation              Thu Jul 21 11:48 2011  -
big   used                  4.80G                  -
big   available             14.7G                  -
big   referenced            4.80G                  -
big   compressratio         1.00x                  -
big   mounted               yes                    -
big   quota                 none                   default
big   reservation           none                   default
big   recordsize            128K                   default
big   mountpoint            /big                   default
big   sharenfs              off                    default
big   checksum              on                     default
big   compression           off                    default
big   atime                 on                     default
big   devices               on                     default
big   exec                  on                     default
big   setuid                on                     default
big   readonly              off                    default
big   jailed                off                    default
big   snapdir               visible                local
big   aclmode               discard                default
big   aclinherit            restricted             default
big   canmount              on                     default
big   xattr                 off                    temporary
big   copies                1                      default
big   version               4                      -
big   utf8only              off                    -
big   normalization         none                   -
big   casesensitivity       sensitive              -
big   vscan                 off                    default
big   nbmand                off                    default
big   sharesmb              off                    default
big   refquota              none                   default
big   refreservation        none                   default
big   primarycache          all                    default
big   secondarycache        all                    default
big   usedbysnapshots       0                      -
big   usedbydataset         4.80G                  -
big   usedbychildren        6.70M                  -
big   usedbyrefreservation  0                      -
big   logbias               latency                default
big   dedup                 off                    default
big   mlslabel                                     -
big   sync                  standard               default
big   refcompressratio      1.00x                  -

# zfs list
NAME                        USED  AVAIL  REFER  MOUNTPOINT
big                        4.80G  14.7G  4.80G  /big
big@testcrashsnap4             0      -  4.80G  -
zroot                      5.64G   109G   894M  legacy
zroot/tmp                  2.14M   109G  2.14M  /tmp
zroot/usr                  4.72G   109G  2.45G  /usr
zroot/usr/home             53.5K   109G  53.5K  /usr/home
zroot/usr/obj               922M   109G   922M  /usr/objtmp
zroot/usr/ports            1.07G   109G   941M  /usr/ports
zroot/usr/ports/distfiles   150M   109G   150M  /usr/ports/distfiles
zroot/usr/ports/packages     21K   109G    21K  /usr/ports/packages
zroot/usr/src               314M   109G   314M  /usr/src
zroot/var                  17.6M   109G   904K  /var
zroot/var/crash            22.5K   109G  22.5K  /var/crash
zroot/var/db               16.2M   109G  15.1M  /var/db
zroot/var/db/pkg           1.10M   109G  1.10M  /var/db/pkg
zroot/var/empty              21K   109G    21K  /var/empty
zroot/var/log               272K   109G   272K  /var/log
zroot/var/mail               48K   109G    48K  /var/mail
zroot/var/run                50K   109G    50K  /var/run
zroot/var/tmp                23K   109G    23K  /var/tmp

# cat /boot/loader.conf
zfs_load="YES"
vfs.root.mountfrom="zfs:zroot"

/etc/sysctl.conf is nothing but comments

On a virtual machine where I have 8.2 release (not stable), I don't know how to reproduce the problem.

I also tested it on the latest downloaded with cvsup today, which freezes the same way.

All my zfs systems are amd64.


I was hoping to use a zvol for iSCSI and use snapshots, so simply avoiding using snapshots on zvols is unacceptable.

How-To-Repeat: Prerequisite: 

A system running 8.2-STABLE (more specifically using *default date=2011.09.27.00.00.00 in cvsup).


(1) Create a zpool.

[root@bcnastest2 ~]# zpool status big
  pool: big
 state: ONLINE
 scan: none requested
config:

        NAME          STATE     READ WRITE CKSUM
        big           ONLINE       0     0     0
          raidz2-0    ONLINE       0     0     0
            ad8       ONLINE       0     0     0
            ad10      ONLINE       0     0     0
            ad12      ONLINE       0     0     0
            ad16      ONLINE       0     0     0
        cache
          gpt/cache0  ONLINE       0     0     0

errors: No known data errors

(2) create a zvol in the above zpool. 

[root@bcnastest2 ~]# zfs create -V 100m big/testzvol

(3) run this script as root (written in bash, works in sh too except for the count printout; make sure to set dataset variable)

#-------begin script-------
dataset=big

count=0

while true; do
    echo Snapshot
    zfs destroy -r ${dataset}@testcrashsnap >/dev/null 2>&1
    zfs snapshot -r ${dataset}@testcrashsnap || break

    current=""
    for next in 1 2 3 4 5; do
        echo Renaming from ${current} to ${next}
        zfs destroy -r ${dataset}@testcrashsnap${next} >/dev/null 2>&1
        zfs rename -r ${dataset}@testcrashsnap${current} ${dataset}@testcrashsnap${next} || break
        current=${next}
    done

    echo Destroy
    zfs destroy -r ${dataset}@testcrashsnap${current} || break
    let count++
    echo $count
done
#-------end script-------




Result: After an arbitrary number of loops, the output stops. Here is the output including result from hitting CTRL+C, CTRL+Z and Ctrl+T. The script was run on a Friday. The last line of output from Ctrl+t was done on the following Monday.

============================================
Snapshot
Renaming from to 1
Renaming from 1 to 2
Renaming from 2 to 3
Renaming from 3 to 4
Renaming from 4 to 5
Destroy
1
Snapshot
Renaming from to 1
Renaming from 1 to 2
Renaming from 2 to 3
Renaming from 3 to 4
Renaming from 4 to 5
Destroy
2
Snapshot
Renaming from to 1
Renaming from 1 to 2
Renaming from 2 to 3
Renaming from 3 to 4
Renaming from 4 to 5
Destroy
3
Snapshot
Renaming from to 1
Renaming from 1 to 2
Renaming from 2 to 3
Renaming from 3 to 4
^C
load: 1.32  cmd: zfs 2363 [tx->tx_sync_done_cv)] 5.56r 0.00u 0.00s 0% 1696k
load: 1.32  cmd: zfs 2363 [tx->tx_sync_done_cv)] 6.07r 0.00u 0.00s 0% 1696k
load: 1.32  cmd: zfs 2363 [tx->tx_sync_done_cv)] 6.26r 0.00u 0.00s 0% 1696k
load: 1.46  cmd: zfs 2363 [tx->tx_sync_done_cv)] 13.42r 0.00u 0.00s 0% 1696k
^C^C^C
load: 1.89  cmd: zfs 2363 [tx->tx_sync_done_cv)] 36.59r 0.00u 0.00s 0% 1696k



^C^D


load: 0.01  cmd: zfs 2363 [tx->tx_sync_done_cv)] 230096.99r 0.00u 0.00s 0% 1696k
============================================

Comment 1 Mark Linimon freebsd_committer

2011-10-25 14:15:29 UTC

Responsible Changed
From-To: freebsd-amd64->freebsd-fs

reclassify and assign.

Comment 2 Peter Maloney 2012-02-10 11:11:41 UTC

I tested this again using 8-STABLE (csup'd on 2012-01-04):

FreeBSD bczfsvm1.bc.local 8.2-STABLE-20120104 FreeBSD
8.2-STABLE-20120104 #0: Mon Feb  6 12:10:32 UTC 2012    
root@bczfsvm1.bc.local:/usr/obj/usr/src/sys/GENERIC  amd64

on hardware:

DELL PowerEdge 2850  - tested with a zfs stripe, raidz1, and raidz2
and a SuperMicro dual xeon system - tested with a zfs mirror

And it didn't hang.

Now there are just brief pauses every 3-5 loops (instead of hangs?).

So if someone tests this in 9.0-STABLE and finds that it doesn't hang,
this PR should be closed.

-- 

--------------------------------------------
Peter Maloney
Brockmann Consult
Max-Planck-Str. 2
21502 Geesthacht
Germany
Tel: +49 4152 889 300
Fax: +49 4152 889 333
E-mail: peter.maloney@brockmann-consult.de
Internet: http://www.brockmann-consult.de
--------------------------------------------

Comment 3 Peter Maloney 2012-02-13 08:56:54 UTC

correction, the newly tested version was csup'd on 2012-02-04 (February,
not Janurary)

-- 

--------------------------------------------
Peter Maloney
Brockmann Consult
Max-Planck-Str. 2
21502 Geesthacht
Germany
Tel: +49 4152 889 300
Fax: +49 4152 889 333
E-mail: peter.maloney@brockmann-consult.de
Internet: http://www.brockmann-consult.de
--------------------------------------------

Comment 4 Peter Maloney 2012-06-11 09:23:58 UTC

I've tested this in 8.3-RELEASE, and 8.3-STABLE pulled last week. *Both
hang*, even though 8.2-STABLE in Feb 2012 did not hang.

First terminal:

Snapshot
Renaming from to 1
load: 0.00  cmd: zfs 57149 [tx->tx_sync_done_cv)] 104.03r 0.00u 0.00s 0%
1920k


Second:
# zfs list
load: 0.00  cmd: zfs 58403 [spa_namespace_lock] 119.06r 0.00u 0.00s 0% 1796k

Comment 5 Richard Yao 2012-06-13 19:17:33 UTC

I tried to reproduce this issue after being contacted about it by Peter
on freenode. I had to modify his script to destroy datasets
individually, instead of recursively by running the following command:

for i in $(zfs list -t snapshot -H -o name | grep
testcrashsnap${current}); do zfs destroy $i; done;

Otherwise, a "dataset is busy" failure occurs. This occurs on both
FreeBSD and Linux.

After doing that (and changing dataset=big to dataset=rpool), I was able
to test his script in virtual machines running Gentoo FreeBSD 9-RELEASE
and Gentoo Linux. I reproduced this issue on Gentoo FreeBSD 9.0-RELEASE.
On the other hand, Gentoo Linux successfully completed 6570 iterations.
This was with the ZFSOnLinux kernel modules. The code is available on
github:

https://github.com/zfsonlinux/zfs

The actual code that I used to test was a patched version that I develop
in a separate branch. You can find it here:

https://github.com/ryao/zfs/tree/gentoo

My current focus is on ZFS support in Gentoo Linux, but I would be happy
to help my FreeBSD counterparts troubleshoot this. Please do not
hesitate to contact me with questions.

Comment 6 Shane 2012-07-18 17:29:46 UTC

I am running 9.0-RELEASE-p1 amd64 clang built world and get a hang with 
zfs rename -r. I find anything already running mostly keeps running (top 
started before the rename will hang) but any (uncached?) disk access 
will cause running progs to hang as well. No new progs can start not 
even a console login. I need to hard reset.

Hardware is an ASUS P8H61-M LE/USB3 with corei5 and 8GB RAM using a 
WD10EARS-00Y5B1 (WD green 1TB SATA2).
Partitioned with 1 64k boot partition and 1 zfs partition. Single disk 
zpool. The volume I have is allocated to swap.

The commands I used for the volume are -
zfs create -V 16G zrp/swap0
zfs set org.freebsd:swap=on zrp/swap0
zfs set copies=1 zrp/swap0

 From a clean pool with no snapshots -
zfs snapshot -r zrp@daily.01 -- works
zfs rename -r zrp@daily.01 zrp@daily.02 -- hangs

Alternatively -
zfs snapshot -r zrp@daily.01 -- works
zfs rename -r zrp/swap0@daily.01 zrp/swap0@daily.02 -- works
zfs rename -r zrp@daily.01 zrp@daily.02 -- works
zfs rename -r zrp@daily.02 zrp@daily.03 -- hangs - now renames vol

Comment 7 paavopok 2012-09-03 20:16:46 UTC

I can confirm this exists on 9.1-RC1. Simply issuing "zfs snapshot -r 
pakka@test" and after that "zfs rename -r pakka@test pakka@test2" hangs. 
The filesystem appears to continue work, but all zfs commands start 
hanging. Only reboot eventually helps.

CTRL-T gives me:
load: 0.00  cmd: zfs 1629 [tx->tx_sync_done_cv)] 412.62r 0.00u 0.00s 0% 
2640k

The hardware is pentium G630T with 8G ram on asus P8H77-I, four disks 
are WD RED 3TB. The zpool is raidz1, and I'm using it on plain disks 
without partitioning. I also have swap on zvol, apparently it also gets 
snapshotted.

--
Paavo Pokkinen

Comment 8 paavopok 2012-09-03 21:25:00 UTC

I did some testing, and it appears at least in my case hanging is 
related to presence of zvols. I removed my swap zvol, and renaming 
snapshots appears to work fine. Then I created the zvol (did not 
swapon), and hang appeared just like previously.

--
Paavo Pokkinen

Comment 9 Steven Hartland freebsd_committer

2013-06-19 15:23:13 UTC

I've reproduced this here, the cause is a live lock between zvols geom
actions and ZFS itself between the two locks:
db> show sleepchain
thread 100553 (pid 6, txg_thread_enter) blocked on sx "spa_namespace_lock" XLOCK
thread 100054 (pid 2, g_event) blocked on sx "dp->dp_config_rwlock" XLOCK

db>     
Tracing pid 2 tid 100054 td 0xffffff001c1d4470
sched_switch() at sched_switch+0x153
mi_switch() at mi_switch+0x1f8
sleepq_switch() at sleepq_switch+0x123
sleepq_wait() at sleepq_wait+0x4d
_sx_slock_hard() at _sx_slock_hard+0x1e2
_sx_slock() at _sx_slock+0xc9
dsl_dir_open_spa() at dsl_dir_open_spa+0xab
dsl_dataset_hold() at dsl_dataset_hold+0x3b
dsl_dataset_own() at dsl_dataset_own+0x2f
dmu_objset_own() at dmu_objset_own+0x36
zvol_first_open() at zvol_first_open+0x34
zvol_geom_access() at zvol_geom_access+0x2df
g_access() at g_access+0x1ba
g_part_taste() at g_part_taste+0xc4
g_new_provider_event() at g_new_provider_event+0xaa
g_run_events() at g_run_events+0x250
fork_exit() at fork_exit+0x135
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xffffff92070a2bb0, rbp = 0 ---
db> bt 100553
Tracing pid 6 tid 100553 td 0xffffff002c2308e0
sched_switch() at sched_switch+0x153
mi_switch() at mi_switch+0x1f8
sleepq_switch() at sleepq_switch+0x123
sleepq_wait() at sleepq_wait+0x4d
_sx_xlock_hard() at _sx_xlock_hard+0x296
_sx_xlock() at _sx_xlock+0xb7
zvol_rename_minors() at zvol_rename_minors+0x75
dsl_dataset_snapshot_rename_sync() at dsl_dataset_snapshot_rename_sync+0x141
dsl_sync_task_group_sync() at dsl_sync_task_group_sync+0x14e
dsl_pool_sync() at dsl_pool_sync+0x47d
spa_sync() at spa_sync+0x34a
txg_sync_thread() at txg_sync_thread+0x139
fork_exit() at fork_exit+0x135
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xffffff920e61abb0, rbp = 0 ---

The following steps recreate the issue on stable/8 r251496

gpart create -s GPT da3
gpart add -t freebsd-zfs da3
zpool create -f testpool da3p1
zfs create -V 150m testpool/testvol
zfs snapshot -r testpool@snap
zfs rename -r testpool@snap testpool@snap-new

I've been unable to reproduce on current r251471.

I'm not sure is this is due to a timing issue due to the significant
changes in ZFS sync tasks in current or if the issue really doesn't
exist any more.

    Regards
    Steve

Comment 10 Shane 2014-11-05 10:23:08 UTC

I am unable to reproduce this on 10.1RC4

Comment 11 Steven Hartland freebsd_committer

2014-11-05 10:26:19 UTC

Thanks for the confirmation.

For the record this was fixed by r273162.