Bug 223951

Summary: sysutils/zfsnap tries to delete pool when removing snapshots on pool that running scrub with -s/-S enabled
Product: Ports & Packages Reporter: Anton Saietskii <vsasjason>
Component: Individual Port(s)Assignee: Eugene Grosbein <eugen>
Status: Closed FIXED    
Severity: Affects Some People CC: des, eugen, yamagi
Priority: --- Keywords: patch
Version: LatestFlags: eugen: maintainer-feedback-
Hardware: Any   
OS: Any   
Attachments:
Description Flags
fix pool deletion "try and fallback" bug
none
revised patch vsasjason: maintainer-approval?

Description Anton Saietskii 2017-11-28 22:40:39 UTC
Created attachment 188378 [details]
fix pool deletion "try and fallback" bug

Error example:
/sbin/zfs snapshot -r zroot/root@daily-2017-09-03_03.42.52--1w ... DONE
/sbin/zfs snapshot -r zroot/var@daily-2017-09-03_03.42.52--1w ... DONE
/sbin/zfs snapshot -r zroot/usr@daily-2017-09-03_03.42.52--1w ... DONE
/sbin/zfs snapshot -r zroot/home@daily-2017-09-03_03.42.52--1w ... DONE
/sbin/zfs snapshot -r zroot/ps@daily-2017-09-03_03.42.52--1w ... DONE
/sbin/zfs snapshot -r zroot/poudriere@daily-2017-09-03_03.42.52--1w ... DONE
/sbin/zfs snapshot -r zdata/media@daily-2017-09-03_03.42.52--1w ... DONE
FATAL: trying to delete zfs pool or filesystem? WTF?
  This is bug, we definitely don't want that.
  Please report it to https://github.com/graudeejs/zfSnap/issues
  Don't panic, nothing was deleted :)

Exact problem is that rm_zfs_snapshot() uses variable "i" as storage for snapshot name. But if one uses -s/-S options, that function calls skip_pool(), and last function uses variable "i" as pool name storage.
But the variable is global so its contents become overwritten with pool name instead of snapshot name.
Fix is truly trivial - just make "i" local in skip_pool().

// I don't plan to report this problem upstream because zfsnap[1] development fully stopped a few years ago.
Comment 1 Anton Saietskii 2017-11-28 22:44:03 UTC
How to repeat:
Just run something like "zfSnap -d -s -S -p 'hourly-' -p 'daily-' -p 'weekly-' -p 'monthly-' -p 'reboot-'" on a pool that has respective snapshots and running scrub/resilver currently.
Comment 2 Anton Saietskii 2017-11-28 23:00:29 UTC
Comment on attachment 188378 [details]
fix pool deletion "try and fallback" bug

Discovered that patch is incorrect because doesn't apply on "make patch" in port directory. Anyway, fix itself is correct.
Comment 3 Anton Saietskii 2017-12-01 17:06:04 UTC
Created attachment 188452 [details]
revised patch
Comment 4 Anton Saietskii 2017-12-19 17:45:19 UTC
Requesting forced commit because of maintainer timeout.
Comment 5 commit-hook freebsd_committer freebsd_triage 2018-01-22 17:06:53 UTC
A commit references this bug:

Author: eugen
Date: Mon Jan 22 17:05:49 UTC 2018
New revision: 459676
URL: https://svnweb.freebsd.org/changeset/ports/459676

Log:
  sysutils/zfsnap: fix -s/-S mode.

  Move creation of files from "do-patch" to "post-extract" target,
  so that patches from files/ directory get applied.

  While here, add LICENSE.

  PR:		223951
  Submitted by:	Anton Sayetsky <vsasjason@gmail.com> (based on)
  Approved by:	yamagi@yamagi.org (maintainer timeout, 7 weeks)

Changes:
  head/sysutils/zfsnap/Makefile
  head/sysutils/zfsnap/files/
  head/sysutils/zfsnap/files/patch-zfSnap.sh
Comment 6 Eugene Grosbein freebsd_committer freebsd_triage 2018-01-22 17:08:49 UTC
Committed with minor changes. Thank you for submission!
Comment 7 Dag-Erling Smørgrav freebsd_committer freebsd_triage 2018-02-16 02:23:57 UTC
Ever since this was committed, my hourly snapshot deletion job creates a spurious snapshot every hour.

# pkg query %v zfsnap
1.11.1_1
# sh -x /usr/local/sbin/zfSnap -d -p hourly- -p daily- -p weekly- -p monthly- zroot 2>&1
[...]
+ skip_pool zroot
+ local i
+ is_true false
+ return 1
+ is_true false
+ return 1
+ return 0
+ echo zroot
+ sed -E -e s/^-//
+ [ zroot '=' zroot ]
+ zfs_snapshot='/sbin/zfs snapshot  zroot@monthly-2018-02-16_02.16.26--1m'
+ is_false false
+ is_true false
+ return 1
+ return 0
+ /sbin/zfs snapshot zroot@monthly-2018-02-16_02.16.26--1m
+ is_true false
+ return 1
[...]
Comment 8 Eugene Grosbein freebsd_committer freebsd_triage 2018-02-16 06:21:26 UTC
Anton, could you please look at the issue?
Comment 9 Dag-Erling Smørgrav freebsd_committer freebsd_triage 2018-03-26 11:37:30 UTC
This is still not fixed.

I've taken a closer look at the code, and I can tell you right away that the patch in this PR only works by luck.  The rm_zfs_snapshot function takes a snapshot name as its first argument, or second if the first is $i, so it should use $1 instead of $i.  It just happens to work because the caller uses $i for the snapshot name.  Note that simply replacing $i with $1 in rm_zfs_snapshot is not enough since the author doesn't seem to know about the shift command, so there is a chance that $i is "-r" instead of the snapshot name.

I'm not saying the patch should be reverted—it is always a good idea to declare local variables as such—but it doesn't really address the underlying issue.
Comment 10 Eugene Grosbein freebsd_committer freebsd_triage 2018-03-26 13:47:02 UTC
(In reply to Dag-Erling Smørgrav from comment #9)

PR submitter seems to neglegted it. Would you make a fix instead and/or take the PR? I do not use sysutils/zfsnap and have no intention to use it.
Comment 11 Dag-Erling Smørgrav freebsd_committer freebsd_triage 2018-03-26 15:57:34 UTC
I have now spent a couple of hours staring at zfSnap.  There are many, many bugs in the code, but my problem is not one of them.  It is a documentation issue: you should never specify a pool when running the delete command, but the documentation suggests that you must always specify one.  If you do specify a pool, zfSnap will first create a snapshot with semi-random parameters, and then delete old snapshots.  It doesn't seem entirely deterministic — I haven't quite figured out why it sometimes does it and sometimes not — but it has nothing to do with this patch.