Bug 244470 - /sbin/dump crashes on larger filesystem
Summary: /sbin/dump crashes on larger filesystem
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: 12.1-RELEASE
Hardware: amd64 Any
: --- Affects Some People
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-02-27 13:44 UTC by Stefan Thurner
Modified: 2020-09-27 23:58 UTC (History)
6 users (show)

See Also:


Attachments
Core Dump of /sbin/dump from FreeBSD 12.1-RELEASE-p4 r360175 (177.99 KB, application/x-compressed-tar)
2020-04-29 01:26 UTC, Hampton Finger
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Stefan Thurner 2020-02-27 13:44:33 UTC
When running dump to create a backup it crashes. This happens only on a 4Tb filesystem (/dev/label/home, see below).

dump -0 -aL -f test.0 /dev/label/home
  DUMP: Date of this level 0 dump: Thu Feb 27 11:33:46 2020
  DUMP: Date of last level 0 dump: the epoch
  DUMP: Dumping snapshot of /dev/label/home (/usr/home) to test.0
  DUMP: mapping (Pass I) [regular files]
  DUMP: mapping (Pass II) [directories]
  DUMP: estimated 472120814 tape blocks.
  DUMP: dumping (Pass III) [directories]
  DUMP: dumping (Pass IV) [regular files]
  DUMP: 5.26% done, finished in 1:30 at Thu Feb 27 13:08:56 2020
  DUMP: 10.48% done, finished in 1:25 at Thu Feb 27 13:09:16 2020
  DUMP: 15.44% done, finished in 1:22 at Thu Feb 27 13:11:00 2020
  DUMP: 20.04% done, finished in 1:19 at Thu Feb 27 13:13:36 2020
  DUMP: 26.92% done, finished in 1:07 at Thu Feb 27 13:06:43 2020
  DUMP: 33.80% done, finished in 0:58 at Thu Feb 27 13:02:36 2020
  DUMP: 40.61% done, finished in 0:51 at Thu Feb 27 13:00:00 2020
  DUMP: 47.29% done, finished in 0:44 at Thu Feb 27 12:58:25 2020
  DUMP: 54.04% done, finished in 0:38 at Thu Feb 27 12:57:06 2020
Assertion failed: (spcl.c_count + blks < TP_NINDIR), function appendextdata, file /usr/src/sbin/dump/traverse.c, line 759.
  DUMP: Child 60825 returns LOB status 206

df -h
Filesystem         Size    Used   Avail Capacity  Mounted on
/dev/label/root    1.9G    391M    1.4G    21%    /
devfs              1.0K    1.0K      0B   100%    /dev
/dev/label/var     3.9G    1.2G    2.4G    33%    /var
/dev/label/tmp     3.9G    116K    3.6G     0%    /tmp
/dev/label/usr      40G     13G     24G    35%    /usr
/dev/label/home    3.5T    451G    2.8T    14%    /usr/home
/dev/label/bkup    3.5T    920G    2.3T    28%    /media/bkup

Regards,
Stefan
Comment 1 Stefan Thurner 2020-03-02 06:32:20 UTC
(In reply to Stefan Thurner from comment #0)

In reference to bug #228807 I've compiled dump from base r331095. That's the version prior to changes of tape.c and traverse.c.

Now dump works like intended.

Regards,
Stefan
Comment 2 Hampton Finger 2020-04-27 21:56:28 UTC
I hit this today on a ProxMox vm running FreeBSD 12.1-RELEASE-p4 system on a 351G file system in its own partition.  The hypervisor is an AMD FX chip.  I got the exact error reported here about 35% into the file system.  It worked on 20 April 2020 and EVERY Sunday before that, but stopped working as of 26 April 2020.

Exact Command: /sbin/dump -0u -L -b 64 -a -f /archive/agamemnon/usr.dump /usr

/archive is nfs mounted

root@agamemnon:/sbin # file dump
dump: ELF 64-bit LSB executable, x86-64, version 1 (FreeBSD), dynamically linked, interpreter /libexec/ld-elf.so.1, for FreeBSD 12.1, FreeBSD-style, stripped

as it is an upgrade from source, I figured the md5sum is useless

I have rebooted, and updated the system from source.

I will be returning to the previous dump if my current test fails.  I loaded telegraf on the system this week as the only delta to my configuration.  I have disabled telegraf and am running a test dump now.
Comment 3 Hampton Finger 2020-04-29 01:26:26 UTC
Created attachment 213898 [details]
Core Dump of /sbin/dump from FreeBSD 12.1-RELEASE-p4 r360175

The file system is UFS, it is /usr, it fails and dumps at 40% everytime.  Just happened out of the blue.  fsck says the file system is fine.
Comment 4 John Marshall 2020-06-09 01:36:17 UTC
"Me too"

FreeBSD rwsrv08.gfn.riverwillow.net.au 12.1-RELEASE-p5 FreeBSD 12.1-RELEASE-p5 #0 r361272: Thu May 21 06:07:33 AEST 2020     john@rwsrv08.gfn.riverwillow.net.au:/build/obj/john/kits/src/amd64.amd64/sys/RWSRV08  amd64

+ dump -0aLu -C 256 -f /ubackup/nfsdata.dmp /dev/gpt/NFSDATA
  DUMP: Date of this level 0 dump: Tue Jun  9 00:28:51 2020
  DUMP: Date of last level 0 dump: the epoch
  DUMP: Dumping snapshot of /dev/gpt/NFSDATA (/nfsdata) to /ubackup/nfsdata.dmp
  DUMP: mapping (Pass I) [regular files]
  DUMP: Cache 256 MB, blocksize = 65536
  DUMP: mapping (Pass II) [directories]
  DUMP: estimated 27202416 tape blocks.
  DUMP: dumping (Pass III) [directories]
  DUMP: dumping (Pass IV) [regular files]
  DUMP: 36.85% done, finished in 0:08 at Tue Jun  9 00:42:26 2020
  DUMP: 74.56% done, finished in 0:03 at Tue Jun  9 00:42:16 2020
Assertion failed: (spcl.c_count + blks < TP_NINDIR), function appendextdata, fil
e /kits/src/sbin/dump/traverse.c, line 759.
  DUMP: Child 11854 returns LOB status 206

+ dump -0aLu -C 256 -f /ubackup/rw2.dmp /dev/gpt/RW2
  DUMP: Date of this level 0 dump: Tue Jun  9 01:39:07 2020
  DUMP: Date of last level 0 dump: the epoch
  DUMP: Dumping snapshot of /dev/gpt/RW2 (/rw2) to /ubackup/rw2.dmp
  DUMP: mapping (Pass I) [regular files]
  DUMP: Cache 256 MB, blocksize = 65536
  DUMP: mapping (Pass II) [directories]
  DUMP: estimated 18888847 tape blocks.
  DUMP: dumping (Pass III) [directories]
  DUMP: dumping (Pass IV) [regular files]
Assertion failed: (spcl.c_count + blks < TP_NINDIR), function appendextdata, fil
e /kits/src/sbin/dump/traverse.c, line 759.
  DUMP: Child 14199 returns LOB status 206

rwsrv08> df -ht ufs /nfsdata /rw2 /ubackup
Filesystem          Size    Used   Avail Capacity  Mounted on
/dev/gpt/NFSDATA     62G     26G     31G    46%    /nfsdata
/dev/gpt/RW2         22G     18G    1.7G    91%    /rw2
/dev/da0p8          310G    197G     88G    69%    /ubackup
Comment 5 kailockwood 2020-07-19 08:42:18 UTC
I would also like to report having this issue on a 12.1 system. I am attempting to the dump a large singular root partition to an external USB drive

dump -C16 -b64 -0uanL -h0 -f /backup/root.dump /

and at some point the dump fails with 

Assertion failed: (spcl.c_count + blks < TP_NINDIR), function appendextdata, file /usr/src/sbin/dump/traverse.c, line 759.
  DUMP: Child 45872 returns LOB status 206
Comment 6 bob 2020-09-27 12:30:55 UTC
I am having this issue on 12.1-RELEASE-p10. It previously happened on 12.1-RELEASE-p3.

The problem only appeared about three weeks ago, and there had been no updates to the system at that time. It was on 12.1-RELEASE-P3, and the update to 12.1-RELEASE-p10 did not fix it.

The file system was 320GB in size. It didn't need to be that big, so it was reduced to 50GB with the same contents copied over via rsync. The problem is still there.

The message is subtly different, although that may not be relevant:

04:00:49   DUMP: Date of this level 9 dump: Fri Sep 25 04:01:02 2020                                                                                                                                                                                                           
04:00:49   DUMP: Date of last level 0 dump: Sat Sep 12 12:19:28 2020                                                                                                                                                                                                           
04:00:49   DUMP: Dumping snapshot of /dev/mirror/dxhome (/data/export/home) to standard output                                                                                                                                                                                 
04:00:49   DUMP: mapping (Pass I) [regular files]                                                                                                                                                                                                                              
04:00:49   DUMP: mapping (Pass II) [directories]                                                                                                                                                                                                                               
04:00:49   DUMP: estimated 330684 tape blocks.                                                                                                                                                                                                                                 
04:00:49   DUMP: dumping (Pass III) [directories]                                                                                                                                                                                                                              
04:00:49   DUMP: dumping (Pass IV) [regular files]                                                                                                                                                                                                                             
04:00:49 Assertion failed: (spcl.c_count + blks < TP_NINDIR), function appendextdata, file /usr/src/sbin/dump/traverse.c, line 759.                                                                                                                                            
04:00:49   DUMP: Child 54548 returns LOB status 206      

This is a showstopper.