Bug 224292 - processes are hanging in state ufs / possible deadlock in file system
Summary: processes are hanging in state ufs / possible deadlock in file system
Status: Closed Overcome By Events
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Only Me
Assignee: freebsd-fs (Nobody)
URL:
Keywords:
Depends on:
Blocks: 104406 223491
  Show dependency treegraph
 
Reported: 2017-12-12 19:02 UTC by Wolfram Schneider
Modified: 2021-06-07 02:54 UTC (History)
11 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Wolfram Schneider freebsd_committer 2017-12-12 19:02:29 UTC
A friend and I are testing perl modules on FreeBSD, on all the major versions (12-current, 11-stable, 11-RELEASE). The tests are done in a cheap VM (1GB RAM, 40GB SSD, 1 core), apparently running on virtio/kvm.

While testing we noticed that the VM randomly crashes every 1-3 days, and we had to run a hard fsck on the console. I first thought it is a resource problem, but munin doesn’t show any shortage of RAM, disk or file descriptors.

We find out that the crashes occurred shortly after processes hanging in state “ufs”, as shown in top(1) output:

$ top -b 300 | egrep ufs
73278 root          1  31    0  8364K  2340K ufs      0:02  10.35% find
34548 cpansand      1  52   10 43828K 15348K ufs      0:00   0.00% perl
34824 cpansand      1  20    0 21060K  1960K ufs      0:00   0.00% procstat
34557 cpansand      1  52   10 13180K  2112K ufs      0:00   0.00% sh


$ ps xuawww | egrep 73278  
root       73278   0.0  0.2   8364   2340  -  D    03:01       0:02.25 find -sx / /dev/null ( ! -fstype local ) -prune -o -type f ( -perm -u+x -or -perm -g+x -or -perm -o+x ) ( -perm -u+s -or -perm -g+s ) -exec ls -liTd {} +
root       73667   0.0  0.0    412    312  8  R+   03:08       0:00.00 egrep 73278

$ ps xuawww | egrep 34548  
cpansand   34548   0.0  1.5  43828  15348  4  DN+  15:29       0:00.10 /usr/perl5.26.1p/bin/perl t/test_command.t


Another example:

$ lsof  | egrep 12710
perl5.26. 12710   cpansand  cwd     VDIR                    18446735278340920664    0 / -- home/cpansand/.cpan/build/2017112212/Fsdb-2.64-1
perl5.26. 12710   cpansand  rtd     VDIR                     7507057954083258955    0 / (/dev/vtbd0p3)
perl5.26. 12710   cpansand    0u    VCHR               0,90              0t20972   90 /dev/pts/0
perl5.26. 12710   cpansand    1u    PIPE 0xfffff8000399f168                    0      ->0xfffff8000399f000
perl5.26. 12710   cpansand    2u    PIPE 0xfffff8001c30da50                    0      ->0xfffff8001c30d8e8
perl5.26. 12710   cpansand    3u    PIPE 0xfffff8001c30ebe0                16384      ->0xfffff8001c30ed48
perl5.26. 12710   cpansand    5u    PIPE 0xfffff8000399f168                    0      ->0xfffff8000399f000
perl5.26. 12710   cpansand    6u    PIPE 0xfffff8001c30da50                    0      ->0xfffff8001c30d8e8
perl5.26. 12710   cpansand    7u    PIPE 0xfffff8000399f168                    0      ->0xfffff8000399f000
perl5.26. 12710   cpansand    8u    PIPE 0xfffff8001c30da50                    0      ->0xfffff8001c30d8e8


$ top -b 300 | egrep ufs
12358 cpansand      1  20   10   268M 12980K ufs      0:04   0.00% perl
12710 cpansand      1  20   10 21412K 13808K ufs      0:00   0.00% perl5.26.0
56111 root          1  20    0  7752K  3180K ufs      0:00   0.00% bash
56077 root          1  52    0  6332K  2312K ufs      0:00   0.00% find
12719 cpansand      1  20   10  7060K  1472K ufs      0:00   0.00% sh


$ lsof | egrep '12710|12358' | egrep home
perl      12358   cpansand  cwd     VDIR                    18446735278340920664    0 / -- home/cpansand/.cpan/build/2017112212/Fsdb-2.64-1
perl5.26. 12710   cpansand  cwd     VDIR                    18446735278340920664    0 / -- home/cpansand/.cpan/build/2017112212/Fsdb-2.64-1


Perl often hangs in a directory during a build of a CPAN module. If I try to switch to this directory, my current shell will hang too:

$ cd /home/cpansand/.cpan/build/2017112212/Fsdb-2.64-1
[hangs]

$ find /home/cpansand/
/home/cpansand/
/home/cpansand/.login_conf
/home/cpansand/.cshrc
/home/cpansand/.rhosts
/home/cpansand/.shrc
/home/cpansand/.mail_aliases
/home/cpansand/.profile
/home/cpansand/.mailrc
/home/cpansand/.login
/home/cpansand/.ssh
/home/cpansand/.ssh/authorized_keys
/home/cpansand/.ssh/known_hosts
/home/cpansand/.zshrc
/home/cpansand/bin
/home/cpansand/src
/home/cpansand/.cpan
/home/cpansand/.cpan/build
/home/cpansand/.cpan/build/2017112215
/home/cpansand/.cpan/build/2017112212
[ hangs]


$ top -b 300 | egrep ufs
12358 cpansand      1  20   10   268M 12980K ufs      0:04   0.00% perl
12710 cpansand      1  20   10 21412K 13808K ufs      0:00   0.00% perl5.26.0
56111 root          1  20    0  7752K  3180K ufs      0:00   0.00% bash
56077 root          1  52    0  6332K  2312K ufs      0:00   0.00% find
57160 root          1  20    0  6332K  1792K ufs      0:00   0.00% find
57169 root          1  20    0  6332K  1792K ufs      0:00   0.00% find
12719 cpansand      1  20   10  7060K  1472K ufs      0:00   0.00% sh


$ pstree
|-+= 00675 root /usr/sbin/cron -s
 | \-+- 55939 root cron: running job (cron)
 |   \-+= 55941 root /bin/sh - /usr/sbin/periodic daily
 |     \-+- 55943 root lockf -t 0 /var/run/periodic.daily.lock /bin/sh /usr/sbin/periodic LOCKED daily
 |       \-+- 55944 root /bin/sh /usr/sbin/periodic LOCKED daily
 |         |-+- 55953 root /bin/sh /usr/sbin/periodic LOCKED daily
 |         | \-+- 56059 root /bin/sh /etc/periodic/daily/450.status-security
 |         |   \-+- 56060 root /bin/sh - /usr/sbin/periodic security
 |         |     \-+- 56062 root lockf -t 0 /var/run/periodic.security.lock /bin/sh /usr/sbin/periodic LOCKED security
 |         |       \-+- 56063 root /bin/sh /usr/sbin/periodic LOCKED security
 |         |         |-+- 56071 root /bin/sh /usr/sbin/periodic LOCKED security
 |         |         | \-+- 56073 root /bin/sh - /etc/periodic/security/100.chksetuid
 |         |         |   |--- 56077 root find -sx / /dev/null ( ! -fstype local ) -prune -o -type f ( -perm -u+x -or -perm -g+x -or -perm -o+x ) ( -perm -u+s -or -perm -g+s ) -exec ls -liTd {} +
 |         |         |   \-+- 56078 root /bin/sh - /etc/periodic/security/100.chksetuid
 |         |         |     \--- 56080 root cat


     \--= 12358 cpansand /usr/perl5.26.0p/bin/perl5.26.0 (CPAN.pm test J/JO/JOHNH/Fsdb-2.64.tar.gz) (perl)

 |-+- 12710 cpansand /usr/perl5.26.0p/bin/perl5.26.0 t/test_command.t
 | \--- 12719 cpansand sh -c blib/script/cgi_to_db TEST/cgi_to_db_ex.in TEST/cgi_to_db_ex.in-2  < TEST/cgi_to_db_ex.in 2>&1



root       56077   0.0  0.2   6332   1936  -  D    03:01      0:00.02 find -sx / /dev/null ( ! -fstype local ) -prune -o -type f ( -perm -u+x -or -perm -g+x -or -perm -o+x ) ( -perm -u+s -or -perm -g+s ) -exec ls -liTd {} +

cpansand   12710   0.0  1.4  21412 13780  0  DN+  14:11      0:00.10 /usr/perl5.26.0p/bin/perl5.26.0 t/test_command.t


$ ps -o wchan -auxgww | grep ufs    
-        root          19   0.0  0.0      0    16  -  DL   Wed02      0:03.12 [bufspacedaemon]
ufs      root       56077   0.0  0.2   6332  1936  -  D    03:01      0:00.02 find -sx / /dev/null ( ! -fstype local ) -prune -o -type f ( -perm -u+x -or -perm -g+x -or -perm -o+x ) ( -perm -u+s -or -perm -g+s ) -exec ls -liTd {} +
ufs      cpansand   12358   0.0  0.7 274408  7280  0  DNs+ 14:11      0:03.89 /usr/perl5.26.0p/bin/perl5.26.0 (CPAN.pm test J/JO/JOHNH/Fsdb-2.64.tar.gz) (perl)
ufs      cpansand   12710   0.0  1.4  21412 13780  0  DN+  14:11      0:00.10 /usr/perl5.26.0p/bin/perl5.26.0 t/test_command.t
ufs      cpansand   12719   0.0  0.1   7060  1444  0  DN+  14:11      0:00.00 sh -c blib/script/cgi_to_db TEST/cgi_to_db_ex.in TEST/cgi_to_db_ex.in-2  < TEST/cgi_to_db_ex.in 2>&1
-        root       78756   0.0  0.0    412   304  7  R+   04:18      0:00.00 grep ufs
ufs      root       56111   0.0  0.3   7752  2688  8  Ds+  03:03      0:00.02 -bash (bash)
ufs      root       57160   0.0  0.1   6332  1416 10  D+   03:15      0:00.00 find .
ufs      root       57169   0.0  0.1   6332  1416 11  D+   03:15      0:00.00 find /home/cpansand/


The perl interpreter usually crashed while testing the CPAN module J/JO/JOHNH/Fsdb-2.64.tar.gz in test t/test_command.t
But I think this is a red herring. The module just runs fine if you test only this module. It must be a combination of other, earlier test which finally bring Fsdb-2.64.tar.gz down.

We saw this error on FreeBSD 9.x, 10.3, 11.1 Release, 11-stable, 12-current. 

My gut feeling is that we have a deadlock or forgotten lock of a directory/file in the ufs code. Some kernel thread forgot to release a lock. Any other process which need to open this directory will hang, and soon or later (2-4 hours) the machine crashes hard.

Everytime the machine crashed we had to run fsck twice, repair inodes and the superblock on the console manually. This is not fun.

We saw the errors at different cloud providers. I’m pretty sure it is not a bad setup, it is a general problem of the FreeBSD kernel/ufs filesystem.
Comment 1 Wolfram Schneider freebsd_committer 2017-12-12 19:05:00 UTC
I googled for similar bug reports and found this:

from 2007, for FreeBSD 6.x


https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=104406
kern/104406: Processes get stuck in "ufs" state under persistent 	CPU load

https://forums.freebsd.org/threads/9150/
We have a problem with FreeBSD 6.2-STABLE (aw62) #2: Fri Jun 22 02:26:27 MSD 2007


some reports about dirhash tuning:

https://forums.freebsd.org/threads/7794/

Try something like procstat -f -w 1 pid_of_httpd or run top -U www -m io to see disk activity. I ran into same, in our case it the solution was dirhashing:

sysctl -a | grep dirhash

will show you current state, and most likely increasing vfs.ufs.dirhash_maxmem will solve your problems.      

sysctl -a | grep mem


https://wiki.freebsd.org/WaitChannels
ufs: Waiting to read from a file or directory on a UFS filesystem that is being written to by another thread, or waiting to write to a local file or directory that is being read or written by another thread 


https://wiki.freebsd.org/DirhashDynamicMemory

https://forums.freebsd.org/threads/34023/

Since vfs.ufs.dirhash_lowmemcount is already at 25, you should consider increase vfs.ufs.dirhash_maxmem.
Comment 2 Kirk McKusick freebsd_committer 2017-12-12 19:30:45 UTC
It is possible that this patch will fix this problem:

r326657 | kib | 2017-12-07 01:05:34 -0800 (Thu, 07 Dec 2017) | 15 lines

Fix livelock in ufsdirhash_create().

When more than one thread enters ufsdirhash_create() for the same
directory and the inode dirhash is instantiated, but the dirhash' hash
is not, all of them lock the dirhash shared and then try to upgrade.
Since there are several threads owning the lock shared, upgrade fails
and the same attempt is repeated, ad infinitum.

To break the lockstep, lock the dirhash in exclusive mode after the
failed try-upgrade.

Reported and tested by: pho
Sponsored by:   Mellanox Technologies
MFC after:      1 week

It is also known that dirhash tuning as described by Wolfram Schneider above significantly mitigates this problem.
Comment 3 Wolfram Schneider freebsd_committer 2017-12-12 20:56:24 UTC
Hi Kirk,

thanks for the fast response. I can run the tests again and see if the machine survives the next 3 days.

I'm wondering if we have a simple test program to trigger this bug? 

Any suggestion to tune dirhash, to get the the broken lock faster?

my current setup:
sysctl  -a|egrep vfs.ufs.dirhash
vfs.ufs.dirhash_reclaimpercent: 10
vfs.ufs.dirhash_lowmemcount: 188
vfs.ufs.dirhash_docheck: 0
vfs.ufs.dirhash_mem: 510163
vfs.ufs.dirhash_maxmem: 2097152
vfs.ufs.dirhash_minsize: 2560
Comment 4 Kirk McKusick freebsd_committer 2017-12-12 21:26:43 UTC
(In reply to Wolfram Schneider from comment #3)
Unfortunately I do not know the correlation of tuning to triggering the livelock bug.
Comment 5 Peter Holm freebsd_committer 2017-12-12 21:37:56 UTC
(In reply to Kirk McKusick from comment #2)
I have booted a test box with amd64, 1 CPU and 1GB of RAM to see if I can reproduce the problem.

The issue fixed by r326657 looked like this:
https://people.freebsd.org/~pho/stress/log/vfs_cache.txt
Comment 6 Wolfram Schneider freebsd_committer 2017-12-15 15:08:49 UTC
The machine crashed again, with a kernel panic. Unfortunately, the kernel dump was not configured. I can access the kernel debug with a VNC web console, and take some screenshots.

You can see the screenshots here:
https://people.freebsd.org/~wosch/tmp/pr-224292/kernel-panic-deadlock-1/

actually, on
https://people.freebsd.org/~wosch/tmp/pr-224292/kernel-panic-deadlock-1/Screen%20Shot%202017-12-15%20at%2011.57.55.png

you can see on the top procstat, sh and perl5.26.1 are in state "ufs"

and here is the perl process pid 28098 with sys_openat()
https://people.freebsd.org/~wosch/tmp/pr-224292/kernel-panic-deadlock-1/Screen%20Shot%202017-12-15%20at%2011.59.47.png
Comment 7 Wolfram Schneider freebsd_committer 2017-12-15 15:09:23 UTC
(In reply to Wolfram Schneider from comment #6)

I forgot to mention the commands I run on the kernel debugger:

show allpcpu
show alllocks
show lockedvnods
show mount
show page
ps
allt
Comment 8 Wolfram Schneider freebsd_committer 2017-12-16 09:54:51 UTC
(In reply to Wolfram Schneider from comment #6)

As fare we can see from the test logs, it crashed in the same perl module as before. I'm sorry to say, but r326657 didn't solved the issue. The witch is not dead.
Comment 9 Wolfram Schneider freebsd_committer 2017-12-17 10:26:12 UTC
Right after boot I see this messages

lock order reversal:
 1st 0xfffffe0000cbf2c0 bufwait (bufwait) @ /home/projects/freebsd/sys/kern/vfs_bio.c:3564
 2nd 0xfffff8001c445800 dirhash (dirhash) @ /home/projects/freebsd/sys/ufs/ufs/ufs_dirhash.c:289
stack backtrace:
#0 0xffffffff80ac5a23 at witness_debugger+0x73
#1 0xffffffff80ac58a2 at witness_checkorder+0xe02
#2 0xffffffff80a6a5f8 at _sx_xlock+0x68
#3 0xffffffff80d6056d at ufsdirhash_add+0x3d
#4 0xffffffff80d633f9 at ufs_direnter+0x459
#5 0xffffffff80d6b7a7 at ufs_mkdir+0x8d7
#6 0xffffffff8108a889 at VOP_MKDIR_APV+0xd9
#7 0xffffffff80b3ddc2 at kern_mkdirat+0x1e2
#8 0xffffffff80f0dd0b at amd64_syscall+0x79b
#9 0xffffffff80eec33b at Xfast_syscall+0xfb

and hours later:
lock order reversal:
 1st 0xfffff80017613068 ufs (ufs) @ /home/projects/freebsd/sys/kern/vfs_subr.c:2608
 2nd 0xfffffe0000f01100 bufwait (bufwait) @ /home/projects/freebsd/sys/ufs/ffs/ffs_vnops.c:282
 3rd 0xfffff8001c9985f0 ufs (ufs) @ /home/projects/freebsd/sys/kern/vfs_subr.c:2608
stack backtrace:
#0 0xffffffff80ac5a23 at witness_debugger+0x73
#1 0xffffffff80ac58a2 at witness_checkorder+0xe02
#2 0xffffffff80a37f63 at __lockmgr_args+0x883
#3 0xffffffff80d5b295 at ffs_lock+0xa5
#4 0xffffffff8108b309 at VOP_LOCK1_APV+0xd9
#5 0xffffffff80b40c86 at _vn_lock+0x66
#6 0xffffffff80b2f6f2 at vget+0x82
#7 0xffffffff80b21651 at vfs_hash_get+0xd1
#8 0xffffffff80d56f5f at ffs_vgetf+0x3f
#9 0xffffffff80d4ce3f at softdep_sync_buf+0x54f
#10 0xffffffff80d5c0f4 at ffs_syncvnode+0x324
#11 0xffffffff80d4c0d8 at softdep_fsync+0x4c8
#12 0xffffffff80d5b16c at ffs_fsync+0x7c
#13 0xffffffff8108a279 at VOP_FSYNC_APV+0xd9
#14 0xffffffff80b3d5d6 at kern_fsync+0x1b6
#15 0xffffffff80f0dd0b at amd64_syscall+0x79b
#16 0xffffffff80eec33b at Xfast_syscall+0xfb
Comment 10 Kirk McKusick freebsd_committer 2017-12-18 00:13:07 UTC
(In reply to Wolfram Schneider from comment #9)
These particular LOR messages are harmless, see discussion of them in https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224176
Comment 11 Eitan Adler freebsd_committer freebsd_triage 2018-05-28 19:50:15 UTC
batch change:

For bugs that match the following
-  Status Is In progress 
AND
- Untouched since 2018-01-01.
AND
- Affects Base System OR Documentation

DO:

Reset to open status.


Note:
I did a quick pass but if you are getting this email it might be worthwhile to double check to see if this bug ought to be closed.
Comment 12 Konstantin Belousov freebsd_committer 2018-05-29 15:04:55 UTC
(In reply to Wolfram Schneider from comment #8)
Look at the https://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html and gather the information as explained there.

Do you use SU+J ?  If yes, try to disable +J and see if it changes anything.
Comment 13 Ali Abdallah 2021-01-28 07:23:12 UTC
While upgrading my FreeBSD 13-CURRENT box yesterday, I ran into this issue, when compiling the kernel and then world with -j8, I was ending up always with all the cc processes stuck in biowr and ufs. At the end I've upgraded my box through NFS from my 13-CURRENT server (zfs based).

But when upgrading ports from the same server, using pkg upgrade -f, it was also getting stuck with the same symptoms. I've managed to take a kernel dump, then booted into single user mode, disabled  soft update journaling: (-j), then booted the system and upgraded my packages, all went ok.

Looking at the dump, I see lots of dd processes (launched by pkg I believe), that are in the following state:

read 345 (Thread 100752 (PID=32594: dd)):
#0  sched_switch (td=0xfffffe00fdd9be00, flags=<optimized out>) at /usr/src/sys/kern/sched_ule.c:2147
#1  0xffffffff803db4be in mi_switch (flags=flags@entry=260) at /usr/src/sys/kern/kern_synch.c:542
#2  0xffffffff8042a43d in sleepq_switch (wchan=wchan@entry=0x0, pri=pri@entry=2122752) at /usr/src/sys/kern/subr_sleepqueue.c:608
#3  0xffffffff8042a327 in sleepq_wait (wchan=<unavailable>, pri=<unavailable>) at /usr/src/sys/kern/subr_sleepqueue.c:659
#4  0xffffffff803a9f7f in sleeplk (lk=lk@entry=0xfffff8000535f810, flags=flags@entry=2122752, ilk=<optimized out>, ilk@entry=0xfffff8000535f840, wmesg=<optimized out>, wmesg@entry=0xffffffff806c8ccb "ufs", pri=<optimized out>, pri@entry=96, timo=timo@entry=51, queue=1) at /usr/src/sys/kern/kern_lock.c:301
#5  0xffffffff803a88e2 in lockmgr_slock_hard (lk=0xfffff8000535f810, flags=flags@entry=2122752, ilk=0xfffff8000535f840, ilk@entry=0xfffffe00fdaff658, file=<optimized out>, line=<optimized out>, lwa=lwa@entry=0x0) at /usr/src/sys/kern/kern_lock.c:696
#6  0xffffffff803a84e8 in lockmgr_lock_flags (lk=<optimized out>, lk@entry=0xfffff8000535f810, flags=flags@entry=2122752, ilk=ilk@entry=0xfffff8000535f840, file=<unavailable>, line=<unavailable>) at /usr/src/sys/kern/kern_lock.c:1056
#7  0xffffffff805c90e5 in ffs_lock (ap=0xfffffe00fdaff658, ap@entry=<error reading variable: value is not available>) at /usr/src/sys/ufs/ffs/ffs_vnops.c:494
#8  0xffffffff804be098 in VOP_LOCK1 (vp=0xfffff8000535f7a0, flags=2106368, file=0xffffffff806fac3e "/usr/src/sys/kern/vfs_lookup.c", line=877) at ./vnode_if.h:1096
#9  _vn_lock (vp=vp@entry=0xfffff8000535f7a0, flags=2106368, file=0xffffffff806fac3e "/usr/src/sys/kern/vfs_lookup.c", line=line@entry=877) at /usr/src/sys/kern/vfs_vnops.c:1741
#10 0xffffffff80497a18 in lookup (ndp=ndp@entry=0xfffffe00fdaff970) at /usr/src/sys/kern/vfs_lookup.c:875
#11 0xffffffff804973dd in namei (ndp=ndp@entry=0xfffffe00fdaff970) at /usr/src/sys/kern/vfs_lookup.c:631
--Type <RET> for more, q to quit, c to continue without paging--
#12 0xffffffff804bd590 in vn_open_cred (ndp=ndp@entry=0xfffffe00fdaff970, flagp=<optimized out>, flagp@entry=0xfffffe00fdaffa94, cmode=<optimized out>, vn_open_flags=vn_open_flags@entry=0, cred=<optimized out>, fp=0xfffff80003d49140) at /usr/src/sys/kern/vfs_vnops.c:252
#13 0xffffffff804bd44d in vn_open (ndp=<unavailable>, ndp@entry=0xfffffe00fdaff970, flagp=<unavailable>, flagp@entry=0xfffffe00fdaffa94, cmode=<unavailable>, fp=<unavailable>) at /usr/src/sys/kern/vfs_vnops.c:193
#14 0xffffffff804b3723 in kern_openat (td=0xfffffe00fdd9be00, fd=<optimized out>, path=<optimized out>, pathseg=<optimized out>, flags=1539, mode=<optimized out>) at /usr/src/sys/kern/vfs_syscalls.c:1143
#15 0xffffffff8068d5cc in syscallenter (td=0xfffffe00fdd9be00) at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:189
#16 amd64_syscall (td=0xfffffe00fdd9be00, traced=0) at /usr/src/sys/amd64/amd64/trap.c:1156


I will enable back -j, and kernel deadlocks debugging features, to hopefully get more debugging information.
Comment 14 Konstantin Belousov freebsd_committer 2021-01-28 13:56:47 UTC
(In reply to Ali Abdallah from comment #13)
This is certainly not related to the bug you replied to.

That said, your issue _might_ be fixed by the patch series available at
https://kib.kiev.ua/git/gitweb.cgi?p=deviant3.git;a=shortlog;h=refs/heads/ufs_needsync
I.e. I am sufficiently curious if this patches from the ufs_needsync branch
make any difference for your setup.  Something similar was reported by Peter
Holm as well, I believe.

If not, I will look at your report after this branch lands.
You would need to follow the 'debugging deadlock' chapter from the
developers handbook to gather required information.  Or provide a reproduction
recipe to Peter Holm.
Comment 15 sigsys 2021-02-08 06:45:52 UTC
I'm getting occasional UFS "hangs" of sorts on a -CURRENT VM too. Must be unrelated to whatever the problem was in the original bug report but I figured I'd dump this here in case it could help figuring out more recent problems.

Last time it happened and that I saved some infos was from a little while ago (FreeBSD 13.0-CURRENT #32 main-c529869-g4f597837d531: Tue Jan 12 14:41:03 EST 2021). I'll update and try to get more infos if it happens again.

`pkg upgrade` and running kyua tests are what usually trigger it but it still happens very rarely.


load: 1.71  cmd: pkg 11799 [biowr] 331.89r 20.33u 66.48s 52% 764496k
mi_switch+0x155 sleepq_switch+0x109 _sleep+0x2b4 bufwait+0xc4 bufwrite+0x25a ffs_update+0x2ed ffs_syncvnode+0x4da ffs_fsync+0x1f softdep_prerename+0x21a ufs_rename+0x3ee VOP_RENAME_APV+0x40 kern_renameat+0x3fd amd64_syscall+0x149 fast_syscall_common+0xf8 

root@vm2:[~] # procstat -kk 11799
  PID    TID COMM                TDNAME              KSTACK                       
11799 100255 pkg                 -                   __mtx_lock_sleep+0xe8 __mtx_lock_flags+0xe5 process_worklist_item+0x63 softdep_prerename+0x4bd ufs_rename+0x3ee VOP_RENAME_APV+0x40 kern_renameat+0x3fd amd64_syscall+0x149 fast_syscall_common+0xf8 

root@vm2:[~] # procstat -kk 11799
  PID    TID COMM                TDNAME              KSTACK                       
11799 100255 pkg                 -                   mi_switch+0x155 sleepq_switch+0x109 _sleep+0x2b4 bufwait+0xc4 bufwrite+0x25a ffs_update+0x2ed ffs_syncvnode+0x4da ffs_fsync+0x1f softdep_prerename+0x21a ufs_rename+0x3ee VOP_RENAME_APV+0x40 kern_renameat+0x3fd amd64_syscall+0x149 fast_syscall_common+0xf8 

root@vm2:[~] # ps -lp 11799
UID   PID  PPID C PRI NI    VSZ    RSS MWCHAN STAT TT     TIME COMMAND
  0 11799 11798 4  52  0 828024 764512 -      R+    1  2:38.77 pkg upgrade

root@vm2:[~] # iostat -w 1 -d
           vtbd0            vtbd1            vtbd2 
 KB/t  tps  MB/s  KB/t  tps  MB/s  KB/t  tps  MB/s 
 24.4   10   0.2  33.3    2   0.1  38.2    0   0.0 
 32.0 25091 784.1   0.0    0   0.0   0.0    0   0.0 
 32.0 23305 728.3   0.0    0   0.0   0.0    0   0.0 
 32.0 23539 735.6   0.0    0   0.0   0.0    0   0.0 
 32.0 22151 692.2   0.0    0   0.0   0.0    0   0.0 
 32.0 19310 603.4   0.0    0   0.0   0.0    0   0.0 
 32.0 22848 714.0   0.0    0   0.0   0.0    0   0.0 
 32.0 24287 759.0   0.0    0   0.0   0.0    0   0.0 
 32.0 23392 731.0   0.0    0   0.0   0.0    0   0.0 
 32.0 24586 768.3   0.0    0   0.0   0.0    0   0.0 
 32.0 23980 749.4   0.0    0   0.0   0.0    0   0.0 
 32.0 23549 735.9   0.0    0   0.0   0.0    0   0.0 
 32.0 23328 729.0   0.0    0   0.0   0.0    0   0.0 
 32.0 23173 724.2   0.0    0   0.0   0.0    0   0.0 
 32.0 24906 778.3   0.0    0   0.0   0.0    0   0.0 
 32.0 23534 735.4   0.0    0   0.0   0.0    0   0.0 
 32.0 24242 757.6   0.0    0   0.0   0.0    0   0.0 
 32.0 21295 665.5   0.0    0   0.0   0.0    0   0.0 
 32.0 19002 593.8   0.0    0   0.0   0.0    0   0.0 
 32.0 18702 584.4   0.0    0   0.0   0.0    0   0.0 
 32.0 19285 602.7   0.0    0   0.0   0.0    0   0.0 
 32.0 18171 567.8   0.0    0   0.0   0.0    0   0.0 
 32.0 18603 581.3   0.0    0   0.0   0.0    0   0.0 


I think there's always rmdir/mkdir or rename in the kernel call stack of the hung processes when it happens.

And it's really going nuts with the I/O.  Makes me think it must be some kind of live lock within softupdates.

When some processes get stuck in this state, more and more processes eventually get stuck until the VM is unusable.  But running `sync` unwedges the whole thing and everything seems to be running fine after that.

I'll try to get a core dump next time.
Comment 16 Edward Tomasz Napierala freebsd_committer 2021-02-08 11:54:52 UTC
I believe I'm seeing the same problem on my T420.  It started a couple (three?) months ago.  I'm not sure it's a deadlock/livelock, though: it almost seems like a buffer is being written to disk over and over.

One particular case I remember is dd instance used to write out entropy (4k write to a file somewhere in /var).  This was the culprit according to "top -m io".  On some other occurences of this bug I've seen Firefox doing an absurd amount of IO.  Assuming we can trust top(1), this aligns with the idea of a buffer being written over and over.
Comment 17 Konstantin Belousov freebsd_committer 2021-02-08 14:06:09 UTC
Try this https://kib.kiev.ua/git/gitweb.cgi?p=deviant3.git;a=commitdiff;hp=main;h=ufs_needsync

I would be not surprised if the patch fixes the issue for you.
Comment 18 sigsys 2021-03-31 08:12:59 UTC
(In reply to Konstantin Belousov from comment #17)
This sure seems to have helped.  I was about to report that the problem is most likely gone since it hadn't happened in a while (despite running kyua in a loop for hours) after getting that patch series.

But then it happened again with chrome this time and I got a dump.  Dunno if running "sync" would have unwedged the whole thing since I made it panic instead.

There were two threads from two processes looping and doing crazy I/O: a chrome process and a zsh process.

zsh thread backtrace:

#0  sched_switch (td=td@entry=0xfffffe00aa35ce00, flags=<optimized out>, flags@entry=260) at /usr/src/sys/kern/sched_ule.c:2147
#1  0xffffffff80c1f4c9 in mi_switch (flags=flags@entry=260) at /usr/src/sys/kern/kern_synch.c:542
#2  0xffffffff80c6f929 in sleepq_switch (wchan=wchan@entry=0xfffffe00097da0a8, pri=92, pri@entry=0) at /usr/src/sys/kern/subr_sleepqueue.c:608
#3  0xffffffff80c6f7fe in sleepq_wait (wchan=<optimized out>, pri=<optimized out>) at /usr/src/sys/kern/subr_sleepqueue.c:659
#4  0xffffffff80c1e9e6 in _sleep (ident=ident@entry=0xfffffe00097da0a8, lock=<optimized out>, lock@entry=0xfffffe000863b0c0, priority=priority@entry=92, wmesg=<optimized out>, sbt=sbt@entry=0, pr=pr@entry=0, flags=256) at /usr/src/sys/kern/kern_synch.c:221
#5  0xffffffff80cd5214 in bwait (bp=0xfffffe00097da0a8, pri=92 '\\', wchan=<optimized out>) at /usr/src/sys/kern/vfs_bio.c:5020
#6  bufwait (bp=bp@entry=0xfffffe00097da0a8) at /usr/src/sys/kern/vfs_bio.c:4433
#7  0xffffffff80cd285a in bufwrite (bp=0xfffffe00097da0a8, bp@entry=<error reading variable: value is not available>) at /usr/src/sys/kern/vfs_bio.c:2305
#8  0xffffffff80f01789 in bwrite (bp=<unavailable>) at /usr/src/sys/sys/buf.h:430
#9  ffs_update (vp=vp@entry=0xfffff80004c61380, waitfor=waitfor@entry=1) at /usr/src/sys/ufs/ffs/ffs_inode.c:204
#10 0xffffffff80f2f98a in ffs_syncvnode (vp=vp@entry=0xfffff80004c61380, waitfor=<optimized out>, waitfor@entry=1, flags=<optimized out>, flags@entry=0) at /usr/src/sys/ufs/ffs/ffs_vnops.c:447
#11 0xffffffff80f0f91d in softdep_prelink (dvp=dvp@entry=0xfffff80004c61380, vp=vp@entry=0x0) at /usr/src/sys/ufs/ffs/ffs_softdep.c:3417
#12 0xffffffff80f3fee3 in ufs_makeinode (mode=33188, dvp=0xfffff80004c61380, vpp=0xfffffe00aae0a9d8, cnp=<unavailable>, callfunc=<unavailable>) at /usr/src/sys/ufs/ufs/ufs_vnops.c:2741
#13 0xffffffff80f3bfa4 in ufs_create (ap=0xfffffe00aae0a8a8) at /usr/src/sys/ufs/ufs/ufs_vnops.c:213
#14 0xffffffff8118a31d in VOP_CREATE_APV (vop=0xffffffff81b63158 <ffs_vnodeops2>, a=a@entry=0xfffffe00aae0a8a8) at vnode_if.c:244
#15 0xffffffff80d15233 in VOP_CREATE (dvp=<unavailable>, vpp=0xfffffe00aae0a9d8, cnp=0xfffffe00aae0aa00, vap=0xfffffe00aae0a7f0) at ./vnode_if.h:133
#16 vn_open_cred (ndp=ndp@entry=0xfffffe00aae0a968, flagp=flagp@entry=0xfffffe00aae0aa94, cmode=cmode@entry=420, vn_open_flags=<optimized out>, vn_open_flags@entry=0, cred=0xfffff80048d42e00, fp=0xfffff8010aeabc30) at /usr/src/sys/kern/vfs_vnops.c:285
#17 0xffffffff80d14f6d in vn_open (ndp=<unavailable>, ndp@entry=0xfffffe00aae0a968, flagp=<unavailable>, flagp@entry=0xfffffe00aae0aa94, cmode=<unavailable>, cmode@entry=420, fp=<unavailable>) at /usr/src/sys/kern/vfs_vnops.c:202
#18 0xffffffff80d08999 in kern_openat (td=0xfffffe00aa35ce00, fd=-100, path=0x8002fd420 <error: Cannot access memory at address 0x8002fd420>, pathseg=UIO_USERSPACE, flags=34306, mode=<optimized out>) at /usr/src/sys/kern/vfs_syscalls.c:1142
#19 0xffffffff810c5803 in syscallenter (td=<optimized out>) at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:205
#20 amd64_syscall (td=0xfffffe00aa35ce00, traced=0) at /usr/src/sys/amd64/amd64/trap.c:1156
#21 <signal handler called>
#22 0x00000008004f223a in ?? ()

chrome thread backtrace:

#0  cpustop_handler () at /usr/src/sys/x86/x86/mp_x86.c:1475
#1  0xffffffff8108afe9 in ipi_nmi_handler () at /usr/src/sys/x86/x86/mp_x86.c:1432
#2  0xffffffff810c4256 in trap (frame=0xfffffe0009848f30) at /usr/src/sys/amd64/amd64/trap.c:201
#3  <signal handler called>
#4  vtpci_legacy_notify_vq (dev=<optimized out>, queue=0, offset=16) at /usr/src/sys/dev/virtio/pci/virtio_pci_legacy.c:485
#5  0xffffffff80a45417 in VIRTIO_BUS_NOTIFY_VQ (dev=0xfffff8000362fb00, queue=0, offset=16) at ./virtio_bus_if.h:144
#6  vq_ring_notify_host (vq=0xfffffe0063e27000) at /usr/src/sys/dev/virtio/virtqueue.c:834
#7  virtqueue_notify (vq=0xfffffe0063e27000, vq@entry=0xfffff8004de6f600) at /usr/src/sys/dev/virtio/virtqueue.c:439
#8  0xffffffff80a538c0 in vtblk_startio (sc=sc@entry=0xfffff8000362f100) at /usr/src/sys/dev/virtio/block/virtio_blk.c:1123
#9  0xffffffff80a53bed in vtblk_strategy (bp=0xfffff8004de6f600) at /usr/src/sys/dev/virtio/block/virtio_blk.c:571
#10 0xffffffff80b4bcfc in g_disk_start (bp=<optimized out>) at /usr/src/sys/geom/geom_disk.c:473
#11 0xffffffff80b4f147 in g_io_request (bp=0xfffff80021d33c00, cp=<optimized out>, cp@entry=0xfffff8000398ce80) at /usr/src/sys/geom/geom_io.c:589
#12 0xffffffff80b5b1a9 in g_part_start (bp=0xfffff8004e974900) at /usr/src/sys/geom/part/g_part.c:2332
#13 0xffffffff80b4f147 in g_io_request (bp=0xfffff8004e974900, cp=<optimized out>) at /usr/src/sys/geom/geom_io.c:589
#14 0xffffffff80cd284c in bstrategy (bp=0xfffffe0008ac5388) at /usr/src/sys/sys/buf.h:442
#15 bufwrite (bp=0xfffffe0008ac5388) at /usr/src/sys/kern/vfs_bio.c:2302
#16 0xffffffff80f01789 in bwrite (bp=0x0) at /usr/src/sys/sys/buf.h:430
#17 ffs_update (vp=vp@entry=0xfffff80139495000, waitfor=waitfor@entry=1) at /usr/src/sys/ufs/ffs/ffs_inode.c:204
#18 0xffffffff80f2f98a in ffs_syncvnode (vp=vp@entry=0xfffff80139495000, waitfor=<optimized out>, waitfor@entry=1, flags=<optimized out>, flags@entry=0) at /usr/src/sys/ufs/ffs/ffs_vnops.c:447
#19 0xffffffff80f0f86f in softdep_prelink (dvp=dvp@entry=0xfffff80139495000, vp=vp@entry=0xfffff8013c8328c0) at /usr/src/sys/ufs/ffs/ffs_softdep.c:3417
#20 0xffffffff80f3d797 in ufs_remove (ap=0xfffffe00aabdfa20) at /usr/src/sys/ufs/ufs/ufs_vnops.c:1011
#21 0xffffffff8118bf90 in VOP_REMOVE_APV (vop=0xffffffff81b63158 <ffs_vnodeops2>, a=a@entry=0xfffffe00aabdfa20) at vnode_if.c:1540
#22 0xffffffff80d0a468 in VOP_REMOVE (dvp=0x0, vp=0xfffff8013c8328c0, cnp=<optimized out>) at ./vnode_if.h:802
#23 kern_funlinkat (td=0xfffffe00aa6e3100, dfd=dfd@entry=-100, path=0x8288d40e0 <error: Cannot access memory at address 0x8288d40e0>, fd=<optimized out>, fd@entry=-200, pathseg=pathseg@entry=UIO_USERSPACE, flag=<optimized out>, flag@entry=0, oldinum=0) at /usr/src/sys/kern/vfs_syscalls.c:1927
#24 0xffffffff80d0a138 in sys_unlink (td=0xfffff8000362fb00, uap=<optimized out>) at /usr/src/sys/kern/vfs_syscalls.c:1808
#25 0xffffffff810c5803 in syscallenter (td=<optimized out>) at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:205
#26 amd64_syscall (td=0xfffffe00aa6e3100, traced=0) at /usr/src/sys/amd64/amd64/trap.c:1156
#27 <signal handler called>
#28 0x000000080e40d17a in ?? ()

syncer backtrace:

#0  sched_switch (td=td@entry=0xfffffe00a5e29100, flags=<optimized out>, flags@entry=260) at /usr/src/sys/kern/sched_ule.c:2147
#1  0xffffffff80c1f4c9 in mi_switch (flags=flags@entry=260) at /usr/src/sys/kern/kern_synch.c:542
#2  0xffffffff80c6f929 in sleepq_switch (wchan=wchan@entry=0xffffffff81fa9550 <sync_wakeup>, pri=pri@entry=0) at /usr/src/sys/kern/subr_sleepqueue.c:608
#3  0xffffffff80c6fe3b in sleepq_timedwait (wchan=wchan@entry=0xffffffff81fa9550 <sync_wakeup>, pri=pri@entry=0) at /usr/src/sys/kern/subr_sleepqueue.c:690
#4  0xffffffff80ba34b0 in _cv_timedwait_sbt (cvp=0xffffffff81fa9550 <sync_wakeup>, lock=0xffffffff81fa9520 <sync_mtx>, sbt=<optimized out>, pr=<optimized out>, pr@entry=0, flags=0, flags@entry=256) at /usr/src/sys/kern/kern_condvar.c:312
#5  0xffffffff80d036dc in sched_sync () at /usr/src/sys/kern/vfs_subr.c:2739
#6  0xffffffff80bcb9a0 in fork_exit (callout=0xffffffff80d03090 <sched_sync>, arg=0x0, frame=0xfffffe006a491c00) at /usr/src/sys/kern/kern_fork.c:1077
#7  <signal handler called>

It seems like some kind of livelock involving ERELOOKUP loops. I can only guess though, softupdates' is way too complicated for me.

That's with cb0dd7e122b8936ad61a141e65ef8ef874bfebe5 merged.  This kernel has some local changes and I'm a little bit worried that this might be the problem but I think it's unlikely.  The problem happens pretty rarely and that's the only -CURRENT install on UFS that I'm working with so that's the best that I've got.  That's with a virtio disk backed by a ZFS volume on bhyve BTW.
Comment 19 Kirk McKusick freebsd_committer 2021-05-17 17:17:23 UTC
I believe that this problem has been resolved.
Comment 20 Tomasz Sowa 2021-06-07 02:54:19 UTC
I am sorry but the problem is not solved, at least for stable/13 (abd116de1d4). Today one of my server for two hours was not able to serve any www page. I had an open ssh connection and was able to call 'sync'. After that everything returns to normal behavior.

When the server was not able to respond to www requests I saw that 'systat -vmstat 1' reports 100% activity for two disks (mirror/gm0s1a). Now I am changing soft updated journaling to only soft updates.