Bug 194479 - [ffs] many file i/o operations hanging: softdepflush, suspfs
Summary: [ffs] many file i/o operations hanging: softdepflush, suspfs
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 10.0-RELEASE
Hardware: amd64 Any
: --- Affects Only Me
Assignee: freebsd-fs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-10-20 09:36 UTC by Kalten
Modified: 2025-03-04 23:04 UTC (History)
2 users (show)

See Also:


Attachments
boot and fsck (4.53 KB, text/plain)
2014-10-20 13:32 UTC, Kalten
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Kalten 2014-10-20 09:36:42 UTC
I think, this problem may have to do with the kernel:

(May be closed Bug 149022 has to do with it?)

I was building packages in poudriere when I noticed, that write operations in emacs hung. Audio files are still being read from disk (musicpd running) from another mount point (/usr) than the mount point poudriere resides on (/MOUNTER/ufs_D2 which is at 93% according to df(1)).
A ls(1) on the mount point of poudriere (/MOUNTER/ufs_D2) has just stuck too.

I had a look at poudrieres shell, and it is stuck at
---8<---
====>> Starting/Cloning builders
--->8---

Hitting ^t leads to:
---8<---
load: 0.16  cmd: sh 98249 [suspfs] 755.26r 0.00u 0.00s 0% 784k
--->8---
According to ps(1) the PID 98249 belonges to /usr/local/share/poudriere/bulk.sh
(same with the mentioned ls: [suspfs])

ps auxww | grep -E "(^USER|suspfs|softdepflush|dup)"
---8<---
USER         PID  %CPU %MEM     VSZ     RSS TT  STAT STARTED        TIME COMMAND
root          19   1.2  0.0       0      16  -  DL    9Oct14     0:54.63 [softdepflush]
root       97052   0.0  0.0   14756    2020 25  IN+  10:30AM     0:00.96 cpdup -x /usr/local/poudriere/data/build/100amd64-default/ref /MOUNTER/ufs_D2/poudriere/poudriere/data/build/100amd64-default/02
root       97053   0.0  0.0   14756    1964 25  DN+  10:30AM     0:00.84 cpdup -x /usr/local/poudriere/data/build/100amd64-default/ref /MOUNTER/ufs_D2/poudriere/poudriere/data/build/100amd64-default/03
root       97054   0.0  0.0   14756    1936 25  IN+  10:30AM     0:00.84 cpdup -x /usr/local/poudriere/data/build/100amd64-default/ref /MOUNTER/ufs_D2/poudriere/poudriere/data/build/100amd64-default/01
--->8---

top tells me when hitting “m”:
---8<---
PID USERNAME     VCSW  IVCSW   READ  WRITE  FAULT  TOTAL PERCENT COMMAND
 19 root           28      0      0     24      0     24  92.31% softdepflush
--->8---

tunefs -p /MOUNTER/ufs_D2
---8<---
tunefs: POSIX.1e ACLs: (-a)                                disabled
tunefs: NFSv4 ACLs: (-N)                                   enabled
tunefs: MAC multilabel: (-l)                               enabled
tunefs: soft updates: (-n)                                 enabled
tunefs: soft update journaling: (-j)                       enabled
tunefs: gjournal: (-J)                                     disabled
tunefs: trim: (-t)                                         disabled
tunefs: maximum blocks per file in a cylinder group: (-e)  2048
tunefs: average file size: (-f)                            16384
tunefs: average number of files in a directory: (-s)       64
tunefs: minimum percentage of free space: (-m)             8%
tunefs: space to hold for metadata blocks: (-k)            0
tunefs: optimization preference: (-o)                      time
tunefs: volume label: (-L)                                 ufsD2
--->8---

sync(8) does not help either. I can not kill the cpdup commands, etc.
I am stuck with reboot. :-(

ru,
 Kalten
Comment 1 Kalten 2014-10-20 09:38:53 UTC
I am sorry: I have forgotten my uname(1) output:

uname -a
---8<---
FreeBSD freeHugin.Walhalla.Leben 10.0-RELEASE-p9 FreeBSD 10.0-RELEASE-p9 #0: Mon Sep 15 14:35:52 UTC 2014     root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  amd64
--->8---

ru,
 Kalten
Comment 2 Oleg Ginzburg 2014-10-20 10:20:11 UTC
The same behavior on 11.0-CURRENT/amd64 r273159M - system hangs ( in suspfs state ) during active disk operations (compiling and tar -cfz)

root@gizmo:~ # tunefs -p /
tunefs: POSIX.1e ACLs: (-a)                                disabled
tunefs: NFSv4 ACLs: (-N)                                   disabled
tunefs: MAC multilabel: (-l)                               disabled
tunefs: soft updates: (-n)                                 enabled
tunefs: soft update journaling: (-j)                       enabled
tunefs: gjournal: (-J)                                     disabled
tunefs: trim: (-t)                                         disabled
tunefs: maximum blocks per file in a cylinder group: (-e)  4096
tunefs: average file size: (-f)                            16384
tunefs: average number of files in a directory: (-s)       64
tunefs: minimum percentage of free space: (-m)             8%
tunefs: space to hold for metadata blocks: (-k)            6408
tunefs: optimization preference: (-o)                      time
tunefs: volume label: (-L)
Comment 3 Kalten 2014-10-20 13:32:21 UTC
Created attachment 148501 [details]
boot and fsck

Well: rebooting took quite some work. :-(

At shotdown the system got to
---8<---
init: some processes would not die; ps axl advised
Syncing disks, vnodes remaining […] 0 0 0 0 0 done
All buffers synced.
--->8---
and than it hung—I had to switch power off.

Now to the boot thereafter:
In this attachment (fsck.txt) you can read, what I have transcribed from photos taken (I should have remounted rw and copied at least dmesg there, but I did not ;-)).
(I have separated the following points by lines of ----------- in the file)
1) The boot itself. (terrible Errors)
2) »fsck -y /«
3) »fsck -y /var«
4) »fsck -y /usr«
5) »fsck -y /MOUNTER/ufs_D2«
They all differ a little bit in what went wrong.

I have turned “soft update journaling” off for now (»tunefs -j disable /« etc.).
I have not deleted the ».sujournal«-files yet, in case you should wish me to attache them in this report for debuging reasons.

ru,
 Kalten
Comment 4 Kirk McKusick freebsd_committer freebsd_triage 2025-03-04 23:04:29 UTC
Sorry that this did not get dealt with back when you submitted it.

This problem was most likely fixed by this commit:

commit 243a0eda9ace2f4d9cdd5291c352816ddc9ebdb2
Author: Kirk McKusick <mckusick@FreeBSD.org>
Date:   Fri Oct 21 11:00:00 2022 -0700

Increase the maximum size of the journaled soft-updates journal.
    
The size of the journaled soft-updates journal should be big enough to hold two minutes of filesystem metadata-update activity. The maximum size of the soft updates journal was set in the 1990s. At the time, it was assumed that disk arrays would top out at 16 drives and disk writes per drive would top out at 500 per second. Today's I/O subsystems are considerably bigger and faster than those limits.  Thus, this delta removes the hard upper limit and lets tunefs(8) and newfs(8) set the upper bound based on the size of the filesystem and its cylinder groups.