When running dump to create a backup it crashes. This happens only on a 4Tb filesystem (/dev/label/home, see below). dump -0 -aL -f test.0 /dev/label/home DUMP: Date of this level 0 dump: Thu Feb 27 11:33:46 2020 DUMP: Date of last level 0 dump: the epoch DUMP: Dumping snapshot of /dev/label/home (/usr/home) to test.0 DUMP: mapping (Pass I) [regular files] DUMP: mapping (Pass II) [directories] DUMP: estimated 472120814 tape blocks. DUMP: dumping (Pass III) [directories] DUMP: dumping (Pass IV) [regular files] DUMP: 5.26% done, finished in 1:30 at Thu Feb 27 13:08:56 2020 DUMP: 10.48% done, finished in 1:25 at Thu Feb 27 13:09:16 2020 DUMP: 15.44% done, finished in 1:22 at Thu Feb 27 13:11:00 2020 DUMP: 20.04% done, finished in 1:19 at Thu Feb 27 13:13:36 2020 DUMP: 26.92% done, finished in 1:07 at Thu Feb 27 13:06:43 2020 DUMP: 33.80% done, finished in 0:58 at Thu Feb 27 13:02:36 2020 DUMP: 40.61% done, finished in 0:51 at Thu Feb 27 13:00:00 2020 DUMP: 47.29% done, finished in 0:44 at Thu Feb 27 12:58:25 2020 DUMP: 54.04% done, finished in 0:38 at Thu Feb 27 12:57:06 2020 Assertion failed: (spcl.c_count + blks < TP_NINDIR), function appendextdata, file /usr/src/sbin/dump/traverse.c, line 759. DUMP: Child 60825 returns LOB status 206 df -h Filesystem Size Used Avail Capacity Mounted on /dev/label/root 1.9G 391M 1.4G 21% / devfs 1.0K 1.0K 0B 100% /dev /dev/label/var 3.9G 1.2G 2.4G 33% /var /dev/label/tmp 3.9G 116K 3.6G 0% /tmp /dev/label/usr 40G 13G 24G 35% /usr /dev/label/home 3.5T 451G 2.8T 14% /usr/home /dev/label/bkup 3.5T 920G 2.3T 28% /media/bkup Regards, Stefan
(In reply to Stefan Thurner from comment #0) In reference to bug #228807 I've compiled dump from base r331095. That's the version prior to changes of tape.c and traverse.c. Now dump works like intended. Regards, Stefan
I hit this today on a ProxMox vm running FreeBSD 12.1-RELEASE-p4 system on a 351G file system in its own partition. The hypervisor is an AMD FX chip. I got the exact error reported here about 35% into the file system. It worked on 20 April 2020 and EVERY Sunday before that, but stopped working as of 26 April 2020. Exact Command: /sbin/dump -0u -L -b 64 -a -f /archive/agamemnon/usr.dump /usr /archive is nfs mounted root@agamemnon:/sbin # file dump dump: ELF 64-bit LSB executable, x86-64, version 1 (FreeBSD), dynamically linked, interpreter /libexec/ld-elf.so.1, for FreeBSD 12.1, FreeBSD-style, stripped as it is an upgrade from source, I figured the md5sum is useless I have rebooted, and updated the system from source. I will be returning to the previous dump if my current test fails. I loaded telegraf on the system this week as the only delta to my configuration. I have disabled telegraf and am running a test dump now.
Created attachment 213898 [details] Core Dump of /sbin/dump from FreeBSD 12.1-RELEASE-p4 r360175 The file system is UFS, it is /usr, it fails and dumps at 40% everytime. Just happened out of the blue. fsck says the file system is fine.
"Me too" FreeBSD rwsrv08.gfn.riverwillow.net.au 12.1-RELEASE-p5 FreeBSD 12.1-RELEASE-p5 #0 r361272: Thu May 21 06:07:33 AEST 2020 john@rwsrv08.gfn.riverwillow.net.au:/build/obj/john/kits/src/amd64.amd64/sys/RWSRV08 amd64 + dump -0aLu -C 256 -f /ubackup/nfsdata.dmp /dev/gpt/NFSDATA DUMP: Date of this level 0 dump: Tue Jun 9 00:28:51 2020 DUMP: Date of last level 0 dump: the epoch DUMP: Dumping snapshot of /dev/gpt/NFSDATA (/nfsdata) to /ubackup/nfsdata.dmp DUMP: mapping (Pass I) [regular files] DUMP: Cache 256 MB, blocksize = 65536 DUMP: mapping (Pass II) [directories] DUMP: estimated 27202416 tape blocks. DUMP: dumping (Pass III) [directories] DUMP: dumping (Pass IV) [regular files] DUMP: 36.85% done, finished in 0:08 at Tue Jun 9 00:42:26 2020 DUMP: 74.56% done, finished in 0:03 at Tue Jun 9 00:42:16 2020 Assertion failed: (spcl.c_count + blks < TP_NINDIR), function appendextdata, fil e /kits/src/sbin/dump/traverse.c, line 759. DUMP: Child 11854 returns LOB status 206 + dump -0aLu -C 256 -f /ubackup/rw2.dmp /dev/gpt/RW2 DUMP: Date of this level 0 dump: Tue Jun 9 01:39:07 2020 DUMP: Date of last level 0 dump: the epoch DUMP: Dumping snapshot of /dev/gpt/RW2 (/rw2) to /ubackup/rw2.dmp DUMP: mapping (Pass I) [regular files] DUMP: Cache 256 MB, blocksize = 65536 DUMP: mapping (Pass II) [directories] DUMP: estimated 18888847 tape blocks. DUMP: dumping (Pass III) [directories] DUMP: dumping (Pass IV) [regular files] Assertion failed: (spcl.c_count + blks < TP_NINDIR), function appendextdata, fil e /kits/src/sbin/dump/traverse.c, line 759. DUMP: Child 14199 returns LOB status 206 rwsrv08> df -ht ufs /nfsdata /rw2 /ubackup Filesystem Size Used Avail Capacity Mounted on /dev/gpt/NFSDATA 62G 26G 31G 46% /nfsdata /dev/gpt/RW2 22G 18G 1.7G 91% /rw2 /dev/da0p8 310G 197G 88G 69% /ubackup
I would also like to report having this issue on a 12.1 system. I am attempting to the dump a large singular root partition to an external USB drive dump -C16 -b64 -0uanL -h0 -f /backup/root.dump / and at some point the dump fails with Assertion failed: (spcl.c_count + blks < TP_NINDIR), function appendextdata, file /usr/src/sbin/dump/traverse.c, line 759. DUMP: Child 45872 returns LOB status 206
I am having this issue on 12.1-RELEASE-p10. It previously happened on 12.1-RELEASE-p3. The problem only appeared about three weeks ago, and there had been no updates to the system at that time. It was on 12.1-RELEASE-P3, and the update to 12.1-RELEASE-p10 did not fix it. The file system was 320GB in size. It didn't need to be that big, so it was reduced to 50GB with the same contents copied over via rsync. The problem is still there. The message is subtly different, although that may not be relevant: 04:00:49 DUMP: Date of this level 9 dump: Fri Sep 25 04:01:02 2020 04:00:49 DUMP: Date of last level 0 dump: Sat Sep 12 12:19:28 2020 04:00:49 DUMP: Dumping snapshot of /dev/mirror/dxhome (/data/export/home) to standard output 04:00:49 DUMP: mapping (Pass I) [regular files] 04:00:49 DUMP: mapping (Pass II) [directories] 04:00:49 DUMP: estimated 330684 tape blocks. 04:00:49 DUMP: dumping (Pass III) [directories] 04:00:49 DUMP: dumping (Pass IV) [regular files] 04:00:49 Assertion failed: (spcl.c_count + blks < TP_NINDIR), function appendextdata, file /usr/src/sbin/dump/traverse.c, line 759. 04:00:49 DUMP: Child 54548 returns LOB status 206 This is a showstopper.
This happened to me as well after I added a bunch of large files. I went searching for answers and eventually noticed that the assertion in question in traverse.c was different than the other assertions implemented as part of bug #228807, comment #5 implemented in base r334969. Specifically it tested for "<" TP_NINDIR instead of "<=" TP_NINDIR where other asserts implemented a <= test. On my file system "spcl.c_count + blks" reached exactly 512 at the point of failure, which is the value of TP_NINDIR. Perhaps the assert should have tested for <= rather than <. I changed the line in traverse.c and now dump seems to be working again on the file system with lots of indirect inodes. I've uploaded a diff. FYI, I'm the equivalent of a passenger trying to land the plane after the flight crew passes out, so YMMV.
Created attachment 222230 [details] Patch to traverse.c This fixes one thing but IDK if it breaks others. Review is needed.
*** Bug 253182 has been marked as a duplicate of this bug. ***
I have made the change, compiled a new /sbin/dump, and dump now works in my use case. I will continue testing, but it looks like this is fixed with the diff supplied.
I would notice that this bug must be considered as critical. I've upgraded servers from FreeBSD 10 to 12-stable in January 2021. A few month later i've discovered that all my offline backups are corrupted because of this bug. All backup scripts runs and all backup files are in place. But large backup files has less size than they have to because dump exits somewhere in the middle of the backup process. The patch is a cure but it still not in the source. Than more people will try to use they corrupted backup for restore than more attention wil be paid to this bug.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=efe145a7453e4208f032816ce3f80e9fb6b0e4ee commit efe145a7453e4208f032816ce3f80e9fb6b0e4ee Author: Kirk McKusick <mckusick@FreeBSD.org> AuthorDate: 2021-05-17 23:33:59 +0000 Commit: Kirk McKusick <mckusick@FreeBSD.org> CommitDate: 2021-05-17 23:34:53 +0000 Correct assert added to dump program. The dump program was exiting with the message: Assertion failed: (spcl.c_count + blks < TP_NINDIR), function appendextdata, file /usr/src/sbin/dump/traverse.c, line 759. The problem arose when dumping external attributes. This assertion was added in this commit with no review by someone with expertise in the dump program: commit 2d518c6518cdb256ff6f2c463e6b115d89c104c3 Author: Warner Losh <imp@FreeBSD.org> AuthorDate: Mon Jun 11 19:32:36 2018 +0000 Commit: Warner Losh <imp@FreeBSD.org> CommitDate: Mon Jun 11 19:32:36 2018 +0000 Add asserts to prevent overflows of c_addr. It is clearly wrong as the statement immediately above it in the code which is deciding if the data will fit is: if (spcl.c_count + blks > TP_NINDIR) return (0); As is pointed out in the bug report, the assert should be: (spcl.c_count + blks <= TP_NINDIR) This commit corrects the assert. I am sorry that it took so long to be brought to my attention and get fixed. Reported by: Hampton Finger PR: 244470 MFC after: 3 days Sponsored by: Netflix sbin/dump/traverse.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
This fix has been MFC'ed to 13-stable and 12-stable
(In reply to Kirk McKusick from comment #13) Thank you!
(In reply to ant2 from comment #14) Sorry it took so long to come to my attention. Thanks for your patience.
(In reply to Kirk McKusick from comment #15) …and thanks from the rest of us too! Upgraded a server to Stable-13 yesterday and now have full dumps of ALL my volumes at last.
(In reply to commit-hook from comment #12) > … this commit with no review … Not wishing to split hairs, and no offence to anyone involved, but I _do_ see a "Reviewed by:" line in that commit.
(In reply to Graham Perrin from comment #17) Not reviewed by Kirk though. The quality of the review turned out, in hindsight, to be lacking. As the original committer, I think Kirk's characterization is largely correct and instructive to others that wish to make commits in this area w/o running it past him: It's too easy to make a subtle mistake that had awful results due to a tiny mismatch in testing...
Thanks; clearer now.
This bugs still exists in FreeBSD13.0-RELEASE-p5 /usr/mynetshare is about 400GB,dump is used to do the full backup of it. It exists abnormally as follows: # dump -0u -rnf - /usr/mynetshare | zstd -o /usr/largedisk/mynetshare-L0.dump.zstd DUMP: WARNING: should use -L when dumping live read-write filesystems! DUMP: Date of this level 0 dump: the epoch DUMP: Date of last level 0 dump: the epoch DUMP: Dumping /dev/mirror/gm0p4 (/usr/mynetshare) to standard output DUMP: mapping (Pass I) [regular files] DUMP: mapping (Pass II) [directories] DUMP: estimated 348945219 tape blocks. DUMP: dumping (Pass III) [directories] Read : 56 MB ==> 3% DUMP: dumping (Pass IV) [regular files] Read : 18072 MB ==> 82% DUMP: 5.31% done, finished in 1:29 at Mon Jan 24 19:23:23 2022 Read : 41784 MB ==> 85% DUMP: 12.26% done, finished in 1:11 at Mon Jan 24 19:10:46 2022 Read : 64504 MB ==> 85% DUMP: 18.94% done, finished in 1:04 at Mon Jan 24 19:08:26 2022 Read : 88008 MB ==> 82% DUMP: 25.84% done, finished in 0:57 at Mon Jan 24 19:06:38 2022 Read : 100016 MB ==> 81%Assertion failed: (spcl.c_count + blks < TP_NINDIR), function appendextdata, file /usr/src/sbin/dump/traverse.c, line 759. DUMP: Child 24748 returns LOB status 206 /*stdin*\ : 81.34% (104916541440 => 85344243222 bytes, /usr/largedisk/mynetshare-L0.dump.zstd)
(In reply to lbfoo from comment #20) This bug has been fixed in 13-STABLE but has not been issued as an errata. Unless an errata is issued for it, it will not appear in a FreeBSD13.0-RELEASE. Rather you will have to wait for the FreeBSD13.1 release. In the meantime you can recompile your dump program with the one-line fix described in comment #12 or grab a copy of 13-STABLE and copy out the /sbin/dump binary to your system.
(In reply to Kirk McKusick from comment #21) Thanks. The comment is very informative. I'm going to try the advice.
(In reply to lbfoo from comment #22) To expand on my previous comment, the way to get an updated copy of dump for the amd64 architecture is as follows: fetch https://download.freebsd.org/ftp/snapshots/VM-IMAGES/13.0-STABLE/amd64/Latest/FreeBSD-13.0-STABLE-amd64.raw.xz unxz FreeBSD-13.0-STABLE-amd64.raw.xz mdconfig -a -u 10 -t vnode -f FreeBSD-13.0-STABLE-amd64.raw mount /dev/md10p4 /mnt cp /mnt/sbin/dump /sbin/dump umount /mnt mdconfig -d -u 10 rm FreeBSD-13.0-STABLE-amd64.raw
MARKED AS SPAM