Bug 94769

Summary: [ufs] Multiple file deletions on multi-snapshotted filesystems causes hang
Product: Base System Reporter: John Kozubik <john>
Component: kernAssignee: freebsd-bugs mailing list <bugs>
Status: Open ---    
Severity: Affects Only Me CC: chris
Priority: Normal    
Version: 6.1-BETA4   
Hardware: Any   
OS: Any   

Description John Kozubik 2006-03-21 07:10:15 UTC
If you have a UFS2 filesystem with multiple snapshots on it, and commence
several processes of rapid file deletions on that filesystem, the system
will hang.

I first produced this problem on a system that had a filesystem with six
snapshots enabled on it.  I proceeded to start three concurrent `rm`
processes of directories with large numbers of files in them (on that
filesystem).  The system hung as a result.

Further testing showed that in the exact same environment, three such
`rm` processes did not hang a system when the filesystem in question only
had two snapshots.  I incremented the number of snapshots to 3, and the
system hung when the same three `rm` processes were performed.

It is possible that any ratio of a large number of snapshots to a large
number of intensive file deletions on the filesystem that is snapshotted
will cause the system to hang.  For instance, one snapshot and ten file
deletion processes, or perhaps ten snapshots and two file deletion
processes.  I have not tested any further combinations than the ones
listed above.

I am highly confident that this behavior exists and is reproducible.
I have reproduced this behavior on both 6.0 and 6.1.

Fix: 

Do not use UFS2 snapshots.  It is unreasonable to assume that concurrent
rapid file deletions will not occur on a filesystem that has snapshots on it.
How-To-Repeat: 
(assume a filesystem mounted on /mnt/data1)

cp -R /usr/src /mnt/data1/test1
cp -R /usr/src /mnt/data1/test2
cp -R /usr/src /mnt/data1/test3

mksnap_ffs /mnt/data1 /mnt/data1/.sync/snap1
mksnap_ffs /mnt/data1 /mnt/data1/.sync/snap2
mksnap_ffs /mnt/data1 /mnt/data1/.sync/snap3

rm -rf /mnt/data1/test1 &
rm -rf /mnt/data1/test2 &
rm -rf /mnt/data1/test3 &

(system will hang)
Comment 1 Bruce Cran freebsd_committer 2009-03-28 23:32:14 UTC
Responsible Changed
From-To: freebsd-bugs->freebsd-fs

Over to maintainer(s).
Comment 2 Peter Holm freebsd_committer 2009-04-02 16:51:48 UTC
With the described test scenario I was able to reproduce a deadlock
consistently on HEAD. The time of deadlock however seems to be a
bit different from that described in the pr. On HEAD the deadlock
occur during deletion of the snapshot files.

http://people.freebsd.org/~pho/stress/log/pr-94769.txt

- Peter
Comment 3 Doug Poland 2009-05-04 21:11:33 UTC
I recently started experiencing similar behavior on a 7.1-RELEASE i386
host.  In this case the deadlock occurs as soon as
sysutils/freebsd-snapshots run it's weekly cron entry:

0   *   *   *   *   root    periodic-snapshot hourly
0   0   *   *   *   root    periodic-snapshot daily
0   0   *   *   0   root    periodic-snapshot weekly

I am running a GENERIC kernel without quotas.

Here's the list of snapshots on the various filesystems:

Filesystem          User   User%     Snap   Snap%  Snapshot
/                  267MB   54.0%    704KB    0.1%  daily.0
/                  267MB   54.0%    704KB    0.1%  daily.1
/                  267MB   54.0%    704KB    0.1%  daily.2
/                  267MB   54.0%    704KB    0.1%  daily.3
/                  267MB   54.0%    704KB    0.1%  daily.4
/                  267MB   54.0%    784KB    0.2%  daily.5
/                  267MB   54.0%    784KB    0.2%  daily.6
/                  267MB   54.0%    608KB    0.1%  hourly.0
/                  267MB   54.0%    640KB    0.1%  hourly.1
/                  267MB   54.0%    656KB    0.1%  hourly.2
/                  267MB   54.0%    656KB    0.1%  hourly.3
/                  267MB   54.0%    704KB    0.1%  weekly.0
/                  267MB   54.0%    896KB    0.2%  weekly.1
/                  267MB   54.0%    928KB    0.2%  weekly.2
/                  267MB   54.0%      1MB    0.2%  weekly.3
/var              1237MB   10.4%     16MB    0.1%  daily.0
/var              1237MB   10.4%     17MB    0.1%  daily.1
/var              1237MB   10.4%     30MB    0.3%  daily.2
/var              1237MB   10.4%     32MB    0.3%  daily.3
/var              1237MB   10.4%     32MB    0.3%  daily.4
/var              1237MB   10.4%     33MB    0.3%  daily.5
/var              1237MB   10.4%    141MB    1.2%  daily.6
/var              1237MB   10.4%      7MB    0.1%  hourly.0
/var              1237MB   10.4%      7MB    0.1%  hourly.1
/var              1237MB   10.4%      7MB    0.1%  hourly.2
/var              1237MB   10.4%      8MB    0.1%  hourly.3
/var              1237MB   10.4%      8MB    0.1%  hourly.4
/var              1237MB   10.4%      8MB    0.1%  hourly.5
/var              1237MB   10.4%      8MB    0.1%  hourly.6
/var              1237MB   10.4%      8MB    0.1%  hourly.7
/var              1237MB   10.4%      8MB    0.1%  hourly.8
/var              1237MB   10.4%      8MB    0.1%  hourly.9
/var              1237MB   10.4%     25MB    0.2%  weekly.0
/usr             87545MB   74.1%    138MB    0.1%  daily.0
/usr             87545MB   74.1%    217MB    0.2%  daily.1
/usr             87545MB   74.1%    234MB    0.2%  daily.2
/usr             87545MB   74.1%    287MB    0.2%  daily.3
/usr             87545MB   74.1%    292MB    0.2%  daily.4
/usr             87545MB   74.1%    278MB    0.2%  daily.5
/usr             87545MB   74.1%    349MB    0.3%  daily.6
/usr             87545MB   74.1%     88MB    0.1%  hourly.0
/usr             87545MB   74.1%    200MB    0.2%  hourly.1
/usr             87545MB   74.1%    864MB    0.7%  weekly.0
/usr             87545MB   74.1%    721MB    0.6%  weekly.1
/usr             87545MB   74.1%    782MB    0.7%  weekly.2
/usr             87545MB   74.1%   4772MB    4.0%  weekly.3

I am eager and willing to assist in anyway to investigate and eliminate
this issue.


-- 
Regards,
Doug
Comment 4 Eitan Adler freebsd_committer freebsd_triage 2017-12-31 07:58:51 UTC
For bugs matching the following criteria:

Status: In Progress Changed: (is less than) 2014-06-01

Reset to default assignee and clear in-progress tags.

Mail being skipped