On several servers I manage, I take backups to an external HD and use mksnap_ffs to create snapshots.
I've never had troubles with this up to 11.3.
Lately I started doing this on a 12.1 server and noticed mksnap_ffs will hang the box for several minutes (services stuck, no login allowed, already established ssh sessions partially work; shutdown not feasible unless reset button is pressed).
N.B. This is an external drive, only mounted when needed and only accessed to make backups, so if *it* got stuck, it should not affect the whole system.
I decided to check this and took the external HD to my desktop (11.3): it worked perfectly.
I then upgraded my desktop to 12.1p2 and it started doing as above:
_ mksnap_ffs will work for several minutes under high I/O (the HD is 6TB); meanwhile, the system is responsive;
_ then mksnap_ffs will drastically reduce its I/O (at least as measured with top), but will keep working for some other minutes: during this phase, I cannot open any new program; ThunderBird gets stuck, already open FireFox windows still works, but I cannot open any new window; audacity keeps playing the current track, but will get stuck when moving on to the next; already open terminal windows might partially work;
_ after several minutes mksnap_ffs will exit and everything will get back to normal.
This is of course unacceptable on a production server.
I built a test machine with 12.1/amd64 with the following kernel options: KDB, KDB_TRACE, DDB, GDB, INVARIANTS, INVARIANT_SUPPORT, WITNESS, WITNESS_SKIPSPIN, DEBUG_VFS_LOCKS, LOCK_PROFILING, KTR, ALQ, KTR_ENTRIES=4096.
Such a kernel paniced immediately after launching mksnap_ffs with LOR #269.
I removed WITNESS, WITNESS_SKIPSPIN, issued a "fsck -y" on the disk and tried again.
This time I got a different panic:
panic: ffs_copyonwrite: bad copy block
cpuid = 0
time = 1581243816
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe001beef0e0
vpanic() at vpanic+0x19d/frame 0xfffffe001beef130
panic() at panic+0x43/frame 0xfffffe001beef190
ffs_copyonwrite() at ffs_copyonwrite+0x74c/frame 0xfffffe001beef230
ffs_geom_strategy() at ffs_geom_strategy+0x8c/frame 0xfffffe001beef260
ufs_strategy() at ufs_strategy+0x83/frame 0xfffffe001beef290
VOP_STRATEGY_APV() at VOP_STRATEGY_APV+0xc9/frame 0xfffffe001beef2c0
bufstrategy() at bufstrategy+0x44/frame 0xfffffe001beef2f0
bufwrite() at bufwrite+0x230/frame 0xfffffe001beef330
ffs_snapshot() at ffs_snapshot+0x8e0/frame 0xfffffe001beef630
ffs_mount() at ffs_mount+0xb3a/frame 0xfffffe001beef7d0
vfs_domount() at vfs_domount+0x8b6/frame 0xfffffe001beef9f0
vfs_donmount() at vfs_donmount+0x7e7/frame 0xfffffe001beefa90
sys_nmount() at sys_nmount+0xf2/frame 0xfffffe001beefac0
amd64_syscall() at amd64_syscall+0x281/frame 0xfffffe001beefbf0
fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe001beefbf0
--- syscall (378, FreeBSD ELF64, sys_nmount), rip = 0x8002d88ba, rsp = 0x7fffffffd288, rbp = 0x7fffffffeae0 ---
KDB: enter: panic
So, I also removed INVARIANTS and INVARIANT_SUPPORT (and run "fsck -y" twice) in order to be able to get snapshots.
I haven't been able to collect other data yet.
This week another box was upgraded from 11.3 to 12.1: like the first one, this hangs as soon as mksnap_ffs is run on an external USB-attached HD and only the reset button can get it out of this situaation.
The external HD is not the same, so we can exclude it's something wroing in that particular filesystem.
As a side note, I tried setting up debugging on a test machine via serial cable, but was unable to get it to work.
_ one one box, running GENERIC instead of custom kernel, still shows the problem;
_ on another box, I installed a 11.3 VM (through bhyve) and connected it to the host via ggate: doing the shapshot on the VM does not halt the system (so this has nothing to do with the data on the disk).
I also backup a filesystem from cron. Sometimes this hangs.
When this happens, the mksnap_ffs is running at 100% of one of the processors. It is also unkillable... The hapless dump(8) sits there waiting for mksnap_ffs to exit...
The filesystem becomes unusable -- the already-started processes keep running, but their attempts to access the filesystem hang.
In our case the filesystem's parameters are:
tunefs: POSIX.1e ACLs: (-a) disabled
tunefs: NFSv4 ACLs: (-N) disabled
tunefs: MAC multilabel: (-l) disabled
tunefs: soft updates: (-n) enabled
tunefs: soft update journaling: (-j) disabled
tunefs: gjournal: (-J) disabled
tunefs: trim: (-t) enabled
tunefs: maximum blocks per file in a cylinder group: (-e) 4096
tunefs: average file size: (-f) 16384
tunefs: average number of files in a directory: (-s) 64
tunefs: minimum percentage of free space: (-m) 8%
tunefs: space to hold for metadata blocks: (-k) 6408
tunefs: optimization preference: (-o) time
I cannot say, if this is a regression as this is a new machine.
Today I tried again after upgrading to 12.2.
Still the same behavious (i.e. hardware reset required).
This is becoming more serious, as 11.x is approaching its EOL, as it prevents me from upgrading several machines.
I finally was able to reproduce this systematically on a virtual machine (with INVARIANTS & co enabled).
I've got a snapshot where the VM is deadlocked and I can connect to it via remote GDB: I'd appreciate any help on diagnosing this.
I've also got a different snapshot where the described panic happens, but I've opened a different bug for that (#252821).
After some investigation this is what I found.
(Notice I'm no kernel expert, so I just hope I'm not saying stupid things).
The thread originating from mksnap_ffs is stuck in softdep_check_suspend, sleeping on mp->mnt_secondary_writes ("secwr" for userland utilities).
#0 sched_switch (td=0xfffff8000237e760, newtd=0xfffff8000212c760, flags=<optimized out>) at /usr/src/sys/kern/sched_ule.c:2143
#1 0xffffffff805a5294 in mi_switch (flags=260, newtd=0x0) at /usr/src/sys/kern/kern_synch.c:452
#2 0xffffffff805f272b in sleepq_switch (wchan=0xfffff80004120a00, pri=119) at /usr/src/sys/kern/subr_sleepqueue.c:626
#3 0xffffffff805f25c3 in sleepq_wait (wchan=0xfffff80004120a00, pri=119) at /usr/src/sys/kern/subr_sleepqueue.c:705
#4 0xffffffff805a4a6b in _sleep (ident=0xfffff80004120a00, lock=<optimized out>, priority=631, wmesg=0xffffffff80921d1b "secwr", sbt=0, pr=0, flags=256) at /usr/src/sys/kern/kern_synch.c:217
#5 0xffffffff807b6324 in softdep_check_suspend (mp=0xfffff80004120000, devvp=0xfffff800041ff780, softdep_depcnt=15, softdep_accdepcnt=50188, secondary_writes=1, secondary_accwrites=106) at /usr/src/sys/ufs/ffs/ffs_softdep.c:14299
#6 0xffffffff807c19a6 in ffs_sync (mp=0xfffff80004120000, waitfor=4) at /usr/src/sys/ufs/ffs/ffs_vfsops.c:1620
#7 0xffffffff8067c8bf in vfs_write_suspend (mp=0xfffff80004120000, flags=0) at /usr/src/sys/kern/vfs_vnops.c:1864
#8 0xffffffff8079c0b9 in ffs_snapshot (mp=0xfffff80004120000, snapfile=0xfffff80004214780 "A\003\215\200\377\377\377\377\070\232\254\200\377\377\377\377 \b\361\035") at /usr/src/sys/ufs/ffs/ffs_snapshot.c:430
#9 0xffffffff807bfe5a in ffs_mount (mp=<unavailable>) at /usr/src/sys/ufs/ffs/ffs_vfsops.c:479
#10 0xffffffff80661a54 in vfs_domount_update (td=0xfffff80080ac9401, vp=<optimized out>, fsflags=<optimized out>, optlist=<optimized out>) at /usr/src/sys/kern/vfs_mount.c:1037
#11 vfs_domount (td=0xfffff80080ac9401, fstype=<optimized out>, fspath=<optimized out>, fsflags=<optimized out>, optlist=0xfffffe000053aa38) at /usr/src/sys/kern/vfs_mount.c:1191
#12 0xffffffff80660b27 in vfs_donmount (td=0xfffff8000237e760, fsflags=2166784, fsoptions=0xfffff80004108600) at /usr/src/sys/kern/vfs_mount.c:726
#13 0xffffffff80660312 in sys_nmount (td=0xfffff8000237e760, uap=0xfffff8000237eb20) at /usr/src/sys/kern/vfs_mount.c:431
#14 0xffffffff808418b7 in syscallenter (td=0xfffff8000237e760) at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:144
#15 amd64_syscall (td=0xfffff8000237e760, traced=0) at /usr/src/sys/amd64/amd64/trap.c:1163
#16 <signal handler called>
#17 0x00000008002dcb9a in ?? ()
I *think* it should be awakened by softdep_flush thread, in function process_worklist_item.
Alas, this is stuck waiting for a buffer in bufspace_wait.
#0 sched_switch (td=0xfffff8000425f760, newtd=0xfffff800040df000, flags=<optimized out>) at /usr/src/sys/kern/sched_ule.c:2143
#1 0xffffffff805a5294 in mi_switch (flags=260, newtd=0x0) at /usr/src/sys/kern/kern_synch.c:452
#2 0xffffffff805f272b in sleepq_switch (wchan=0xffffffff80a0a8b8 <bdomain+33208>, pri=96) at /usr/src/sys/kern/subr_sleepqueue.c:626
#3 0xffffffff805f25c3 in sleepq_wait (wchan=0xffffffff80a0a8b8 <bdomain+33208>, pri=96) at /usr/src/sys/kern/subr_sleepqueue.c:705
#4 0xffffffff805a4a6b in _sleep (ident=0xffffffff80a0a8b8 <bdomain+33208>, lock=<optimized out>, priority=96, wmesg=0xffffffff808e5676 "newbuf", sbt=0, pr=0, flags=256) at /usr/src/sys/kern/kern_synch.c:217
#5 0xffffffff8064f6ff in bufspace_wait (bd=0xffffffff80a02700 <bdomain>, vp=0xfffff800042145a0, gbflags=<optimized out>, slpflag=<optimized out>, slptimeo=<optimized out>) at /usr/src/sys/kern/vfs_bio.c:773
#6 0xffffffff8064bbfc in getnewbuf (vp=<optimized out>, slpflag=0, slptimeo=0, maxsize=32768, gbflags=0) at /usr/src/sys/kern/vfs_bio.c:3284
#7 0xffffffff80649155 in getblkx (vp=0xfffff800042145a0, blkno=<optimized out>, size=32768, slpflag=0, slptimeo=0, flags=0, bpp=0xfffffe00005853d8) at /usr/src/sys/kern/vfs_bio.c:4022
#8 0xffffffff8064b905 in getblk (vp=<unavailable>, blkno=<unavailable>, size=<unavailable>, slpflag=<unavailable>, slptimeo=<unavailable>, flags=<unavailable>) at /usr/src/sys/kern/vfs_bio.c:3802
#9 0xffffffff807c69b5 in readindir (vp=<unavailable>, lbn=<unavailable>, daddr=56528, bpp=0xfffffe0000585478) at /usr/src/sys/ufs/ufs/ufs_bmap.c:111
#10 0xffffffff807c6468 in ufs_bmaparray (vp=0xfffff800042145a0, bn=-6049804, bnp=0xfffffe0000585518, nbp=<optimized out>, runp=<optimized out>, runb=0x0) at /usr/src/sys/ufs/ufs/ufs_bmap.c:266
#11 0xffffffff807d3975 in ufs_strategy (ap=<optimized out>) at /usr/src/sys/ufs/ufs/ufs_vnops.c:2309
#12 0xffffffff808b9911 in VOP_STRATEGY_APV (vop=0xffffffff80aca540 <ufs_vnodeops>, a=0xfffffe0000585570) at vnode_if.c:2279
#13 0xffffffff80647114 in VOP_STRATEGY (vp=<unavailable>, bp=0xfffffe000090b5c0) at ./vnode_if.h:940
#14 bufstrategy (bo=<optimized out>, bp=0xfffffe000090b5c0) at /usr/src/sys/kern/vfs_bio.c:4999
#15 0xffffffff80648b1e in bstrategy (bp=<optimized out>) at /usr/src/sys/sys/buf.h:419
#16 breadn_flags (vp=<optimized out>, blkno=<optimized out>, size=<optimized out>, rablkno=0x0, rabsize=0x0, cnt=0, cred=0x0, flags=0, ckhashfunc=0x0, bpp=0xfffffe0000585780) at /usr/src/sys/kern/vfs_bio.c:2181
#17 0xffffffff807982b8 in ffs_balloc_ufs2 (vp=<optimized out>, startoffset=<optimized out>, size=<optimized out>, cred=0xfffff8000211b000, flags=<optimized out>, bpp=0xfffffe0000585820) at /usr/src/sys/ufs/ffs/ffs_balloc.c:894
#18 0xffffffff8079fb22 in ffs_snapblkfree (fs=0xfffffe0011c76000, devvp=<optimized out>, bno=48416024, size=32768, inum=5, vtype=VREG, wkhd=0xfffffe0000585950) at /usr/src/sys/ufs/ffs/ffs_snapshot.c:1790
#19 0xffffffff80790d16 in ffs_blkfree (ump=0xfffff80004116800, fs=0xfffffe0011c76000, devvp=0xfffff800041ff780, bno=48416024, size=32768, inum=5, vtype=VREG, dephd=0xfffffe0000585950, key=2) at /usr/src/sys/ufs/ffs/ffs_alloc.c:2602
#20 0xffffffff807baa4f in indir_trunc (freework=0xfffff800048ed480, dbn=<optimized out>, lbn=<optimized out>) at /usr/src/sys/ufs/ffs/ffs_softdep.c:8259
#21 0xffffffff807ba93b in indir_trunc (freework=0xfffff800048ed480, dbn=<optimized out>, lbn=<optimized out>) at /usr/src/sys/ufs/ffs/ffs_softdep.c:8240
#22 0xffffffff807ad4f9 in handle_workitem_indirblk (freework=<optimized out>) at /usr/src/sys/ufs/ffs/ffs_softdep.c:7875
#23 handle_workitem_freeblocks (freeblks=0xfffff800048eda00, flags=512) at /usr/src/sys/ufs/ffs/ffs_softdep.c:7970
#24 0xffffffff807b5ba1 in process_worklist_item (mp=0xfffff80004120000, target=10, flags=512) at /usr/src/sys/ufs/ffs/ffs_softdep.c:1806
#25 0xffffffff807a1e92 in softdep_process_worklist (mp=0xfffff80004120000, full=0) at /usr/src/sys/ufs/ffs/ffs_softdep.c:1600
#26 0xffffffff807a580f in softdep_flush (addr=0xfffff80004120000) at /usr/src/sys/ufs/ffs/ffs_softdep.c:1402
#27 0xffffffff80556f8c in fork_exit (callout=0xffffffff807a5720 <softdep_flush>, arg=0xfffff80004120000, frame=0xfffffe0000585c00) at /usr/src/sys/kern/kern_fork.c:1080
#28 <signal handler called>
The above explains why mksnap_ffs is halted; however the whole machine is hanged.
It seems almost any user thread (e.g. an "ls" *on a different filesystem*) is stuck in bufspace_wait.
Neither buf_daemon thread, nor its child bufspace_daemon are stuck (they are running their normal loop).
So far I wasn't able to pinpoint what changed between 11.4 and 12.1 to cause this.
Any hint appreciated.
What is the physical hardware where the hang occurs?
The sleep of buffer runspace most often means that io hung. I saw it many
times with USB/umass and low-quality ata/sata-usb adapters.
(In reply to Konstantin Belousov from comment #7)
I doubt it's a hardware trouble:
_ this happens on more than one physical machine;
_ I too have seen umass troubles (on other machines), but this is usually logged;
_ it only happens with 12.1/12.2, not with 11.x;
_ umass works perfectly for everything but mksnap_ffs;
_ if umass hanged, I don't think this should hang the whole system as only one process is using it; the others are depending on different storage and should not be affected (at least that's my experience);
_ the machine I'm debugging is a VirtualBox one (where I was able to recreate the problem): it's not using umass at all. It has system on /dev/ada0 and data on /dev/ada1 (of which I'm taking the snapshot). Underlying hardware is working fine.
I have no strong evidence that this is related, but can you try testing https://reviews.freebsd.org/D28901 ? The problem was introduced in 12.
(In reply to Mark Johnston from comment #9)
Any side effects?
Anyway this is gonna need some time: as the problem is not always triggered, I have to rollback the VM, upgrade the system, save state and try, try and try again until I enter the deadlock once more.
I did a test though: while the machine is locked, I put a breakpoint in bufbdflush (the only function changed in that diff) and see it never gets reached.
Is this an indication that this patch won't solve or possibly it would have made a difference earlier?
(In reply to ml from comment #10)
It would have made a difference earlier. By the time you get to your deadlock, it is too late to recover from it.
Just for the record...
While preparing to install the mentioned patch, I reached a status where:
_ issuing mksnap_ffs always locks the machine;
_ unmounting the filessytem (the one I would run mksnap_ffs on) also causes a deadlock (I guess it's the same deadlock, but I didn't analyze it);
_ shutting down, the machine cannot write all buffers and after reboot all the filesystem will require fsck;
_ the fsck on said filesystem will be problematic (hangs or panics).
So it seems that, after a lot of snapshots creation/deletion, the situation is already compromised *before* I try and take the snapshot that hangs.
(In reply to ml from comment #12)
I am unclear on the order of events
Is the snapshot hanging on a filesystem that passes fsck, or are you trying it on an unclean filesystem? Snapshots should only be done on clean filesystems.
When you say that fsck panics while trying to clean the filesystem, do you mean that the kernel panics or that fsck fails. If it is fsck that is failing, what are the flags that you are using and how does it fail? If it hangs, please type ^T to get it to report what it is doing.
I apologize for not being clear.
The problem is not deterministic: behaviour changes slightly from machine to machine and even subsequent runs starting from the same VM snapshot can differ.
In comment 12 I was talking about the last situation I was able to achieve that I think is interesting.
VirtualBox VM with two disk:
_ ada0 is the "system" disk (UFS SU+J);
_ ada1 is the "data" disk, i.e. the one I'm trying snapshots on (GELI+UFS SU, no journal as it would prevent snapshotting).
The filesystem is clean: fsck finds no errors and everything works normally.
There are already 12 snapshots.
I can take one or two further snapshots (probably more, I didn't try): still everything is fine.
However, if I *delete* an existing snapshot, trouble starts:
_ subsequent mksnap_ffs produces the deadlock I described;
_ trying to unmount the filesystem also hangs (but won't hang the whole machine): ctrl-t may say it's in "secwr", like the above or in "bo_wwait" or in "wdrain"; a LOR (ufs@vfs_mount.c:1349/devfs@ffs_vfsops.c:1610) may appear or not;
_ even shutdown doesn't work properly: vnodes remaining count seems to rise and decrease at the same time; it may reach 0, but will grow again.
What happens after an hard reset can vary: fsck -p in single user mode seems to work and recover the situation; a background fsck is usually asking for troubles, often resulting in a panic (dropping into DDB).
In the end I was perhaps looking at the wrong side of things: it might not be mksnap_ffs that hangs the machine, but *deleting* a snapshot.
I'm not sure, though: I think in the past I've seen situations where it hanged even without deleting, but I cannot reconstruct now.
Unfortunately https://reviews.freebsd.org/D28901 doesn't seem to help.
Next thing I'll do is try 13.0 (unless someone has a better suggestion).
(In reply to ml from comment #14)
Can you successfully run these operations on stable/11?
Or put another way, do these problems only show up when you try to run on stable/12?
It would be helpful to find out if the problem is fixed when running on stable/13 as that would let us review changes made in the development of stable/13 to find the likely culprit. If is not fixed in stable/13 but it works in stable/11, then we will know that the breakage happened during the development of stable/12.
To be reliably able to track down the problem, we need a disk image of the filesystem that experiences the problem (i.e. the UFS+SU filesystem on which you are having the problem) along with a set of commands that when run on that image cause the problem to happen.
I realise that you do not know exactly what the parameters are for triggering the problem. I would suggest saving an image and then use script(1) to record the set of commands that you run trying to cause the hang. When you eventually get a hang you will presumably know how to reliably recreate it. Hopefully you can narrow it down to something where you do not need to run too many commands to trigger the problem.
You can then gzip(1) the disk image and let us know where we can pick it up along with the set of cammands that cause the hang. We can then verify that we can reproduce the hang and if so finding a fix can proceed.
(In reply to Kirk McKusick from comment #15)
I've never had problems with snapshots on 11.x (or earlier), up to 11.4, altough I never tried 11-STABLE.
The problem started when either I built a new machine with 12.1 or upgraded a 11.3 to 12.1. 12.2 also shows the problem. I never tried 12.0 or 12-STABLE.
As I said, I was going to try 13.0 (should almost be the same as 13-STABLE now, shouldn't it?) If you'd like me to also try 12-STABLE, I will.
Does 11-STABLE differ that much from 11.4 (given 11.4 should be the last release on this branch)? I ask because it will be harder for me to try this.
In fact I know how to trigger the problem exactly: I have a VM I can boot and just issue "rm /mnt/.snap/1".
Unfortunately, this disk image, besides being quite large, holds data that belongs to a customer of mine, so I cannot share it.
I will try to build a new one with different data, but previously I wasn't able to trigger the problem until I "rsync"ed from there, so it will take some time and some luck.
Thanks for your help.
(In reply to ml from comment #16
It is not necessary to run on stable/12. Running on any of the point releases is fine (i.e., 11.4, 12.2, etc).
The contents of the files is very unlikely to be relevant to the crash. If you can make an image of the disk, I have a program that walks over a disk image and zeros out every data block that is claimed by a regular file or a snapshot. The resulting image has only metadata remaining. The only customer data that will be disclosed is the names of their files, the structure of their directory trees, and the sizes and ownership of their files and directories. The resulting image has the additional benefit of being very compressible. Let me know if this level of disclosure is acceptable and if so I will resurrect my clrfs program and make it available to you.
(In reply to Kirk McKusick from comment #17)
Thanks for the offer.
Unfortunately, I cannot legally share this image, even with blanked data, if the filenames remain.
I'll try to build another one with only non reserved data.
Meanwhile I upgraded to 13.0 and can confirm the problem persists :(
I'll keep investigating.
(In reply to ml from comment #18)
Can you share it if the file names get changed to just be the inode number of the file?
(In reply to Kirk McKusick from comment #19)
I guess so.
If data is gone and filenames too (including in snapshots), there should be nothing private remaining. Right?
Meanwhile I made another test: I reverted back to 11.x and saw that it's now giving the same troubles as 12.x and 13.0 do.
So my conclusion is that snapshotting/deleting snapshots on 12.x has already messed the disk in a way that fsck doesn't even detect, but hangs the machine on some operations.
Probably this is less useful than I hoped: while it would be nice to have a way to fix this image, this cannot be used to reproduce the corruption on demand, since it's already ruined. Maybe seeing what the trouble is can give some hint?
IOW I hoped to have pinpointed the situation when the trouble starts, but it's not immediately evident and I'm already past it :(
(In reply to ml from comment #20)
Having an image that reliably causes the hang when you remove a snapshot will be useful. Either it will demonstrate a corruption that should be detected by fsck in which case I can fix fsck or it will show a kernel bug which we can hopefully track down and fix. While this may not fix the whole problem, it should certainly get us a step of the way.
I will contact you directly about how to get the image to me so that you can do a direct transfer to my development lab and not have to transit it through the freebsd.org infrastructure.