Created attachment 188378 [details] fix pool deletion "try and fallback" bug Error example: /sbin/zfs snapshot -r zroot/root@daily-2017-09-03_03.42.52--1w ... DONE /sbin/zfs snapshot -r zroot/var@daily-2017-09-03_03.42.52--1w ... DONE /sbin/zfs snapshot -r zroot/usr@daily-2017-09-03_03.42.52--1w ... DONE /sbin/zfs snapshot -r zroot/home@daily-2017-09-03_03.42.52--1w ... DONE /sbin/zfs snapshot -r zroot/ps@daily-2017-09-03_03.42.52--1w ... DONE /sbin/zfs snapshot -r zroot/poudriere@daily-2017-09-03_03.42.52--1w ... DONE /sbin/zfs snapshot -r zdata/media@daily-2017-09-03_03.42.52--1w ... DONE FATAL: trying to delete zfs pool or filesystem? WTF? This is bug, we definitely don't want that. Please report it to https://github.com/graudeejs/zfSnap/issues Don't panic, nothing was deleted :) Exact problem is that rm_zfs_snapshot() uses variable "i" as storage for snapshot name. But if one uses -s/-S options, that function calls skip_pool(), and last function uses variable "i" as pool name storage. But the variable is global so its contents become overwritten with pool name instead of snapshot name. Fix is truly trivial - just make "i" local in skip_pool(). // I don't plan to report this problem upstream because zfsnap[1] development fully stopped a few years ago.
How to repeat: Just run something like "zfSnap -d -s -S -p 'hourly-' -p 'daily-' -p 'weekly-' -p 'monthly-' -p 'reboot-'" on a pool that has respective snapshots and running scrub/resilver currently.
Comment on attachment 188378 [details] fix pool deletion "try and fallback" bug Discovered that patch is incorrect because doesn't apply on "make patch" in port directory. Anyway, fix itself is correct.
Created attachment 188452 [details] revised patch
Requesting forced commit because of maintainer timeout.
A commit references this bug: Author: eugen Date: Mon Jan 22 17:05:49 UTC 2018 New revision: 459676 URL: https://svnweb.freebsd.org/changeset/ports/459676 Log: sysutils/zfsnap: fix -s/-S mode. Move creation of files from "do-patch" to "post-extract" target, so that patches from files/ directory get applied. While here, add LICENSE. PR: 223951 Submitted by: Anton Sayetsky <vsasjason@gmail.com> (based on) Approved by: yamagi@yamagi.org (maintainer timeout, 7 weeks) Changes: head/sysutils/zfsnap/Makefile head/sysutils/zfsnap/files/ head/sysutils/zfsnap/files/patch-zfSnap.sh
Committed with minor changes. Thank you for submission!
Ever since this was committed, my hourly snapshot deletion job creates a spurious snapshot every hour. # pkg query %v zfsnap 1.11.1_1 # sh -x /usr/local/sbin/zfSnap -d -p hourly- -p daily- -p weekly- -p monthly- zroot 2>&1 [...] + skip_pool zroot + local i + is_true false + return 1 + is_true false + return 1 + return 0 + echo zroot + sed -E -e s/^-// + [ zroot '=' zroot ] + zfs_snapshot='/sbin/zfs snapshot zroot@monthly-2018-02-16_02.16.26--1m' + is_false false + is_true false + return 1 + return 0 + /sbin/zfs snapshot zroot@monthly-2018-02-16_02.16.26--1m + is_true false + return 1 [...]
Anton, could you please look at the issue?
This is still not fixed. I've taken a closer look at the code, and I can tell you right away that the patch in this PR only works by luck. The rm_zfs_snapshot function takes a snapshot name as its first argument, or second if the first is $i, so it should use $1 instead of $i. It just happens to work because the caller uses $i for the snapshot name. Note that simply replacing $i with $1 in rm_zfs_snapshot is not enough since the author doesn't seem to know about the shift command, so there is a chance that $i is "-r" instead of the snapshot name. I'm not saying the patch should be reverted—it is always a good idea to declare local variables as such—but it doesn't really address the underlying issue.
(In reply to Dag-Erling Smørgrav from comment #9) PR submitter seems to neglegted it. Would you make a fix instead and/or take the PR? I do not use sysutils/zfsnap and have no intention to use it.
I have now spent a couple of hours staring at zfSnap. There are many, many bugs in the code, but my problem is not one of them. It is a documentation issue: you should never specify a pool when running the delete command, but the documentation suggests that you must always specify one. If you do specify a pool, zfSnap will first create a snapshot with semi-random parameters, and then delete old snapshots. It doesn't seem entirely deterministic — I haven't quite figured out why it sometimes does it and sometimes not — but it has nothing to do with this patch.