Bug 224292 - processes are hanging in state ufs / possible deadlock in file system
Summary: processes are hanging in state ufs / possible deadlock in file system
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Only Me
Assignee: freebsd-fs mailing list
URL:
Keywords:
Depends on:
Blocks: 104406 223491
  Show dependency treegraph
 
Reported: 2017-12-12 19:02 UTC by Wolfram Schneider
Modified: 2018-10-04 21:21 UTC (History)
6 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Wolfram Schneider freebsd_committer 2017-12-12 19:02:29 UTC
A friend and I are testing perl modules on FreeBSD, on all the major versions (12-current, 11-stable, 11-RELEASE). The tests are done in a cheap VM (1GB RAM, 40GB SSD, 1 core), apparently running on virtio/kvm.

While testing we noticed that the VM randomly crashes every 1-3 days, and we had to run a hard fsck on the console. I first thought it is a resource problem, but munin doesn’t show any shortage of RAM, disk or file descriptors.

We find out that the crashes occurred shortly after processes hanging in state “ufs”, as shown in top(1) output:

$ top -b 300 | egrep ufs
73278 root          1  31    0  8364K  2340K ufs      0:02  10.35% find
34548 cpansand      1  52   10 43828K 15348K ufs      0:00   0.00% perl
34824 cpansand      1  20    0 21060K  1960K ufs      0:00   0.00% procstat
34557 cpansand      1  52   10 13180K  2112K ufs      0:00   0.00% sh


$ ps xuawww | egrep 73278  
root       73278   0.0  0.2   8364   2340  -  D    03:01       0:02.25 find -sx / /dev/null ( ! -fstype local ) -prune -o -type f ( -perm -u+x -or -perm -g+x -or -perm -o+x ) ( -perm -u+s -or -perm -g+s ) -exec ls -liTd {} +
root       73667   0.0  0.0    412    312  8  R+   03:08       0:00.00 egrep 73278

$ ps xuawww | egrep 34548  
cpansand   34548   0.0  1.5  43828  15348  4  DN+  15:29       0:00.10 /usr/perl5.26.1p/bin/perl t/test_command.t


Another example:

$ lsof  | egrep 12710
perl5.26. 12710   cpansand  cwd     VDIR                    18446735278340920664    0 / -- home/cpansand/.cpan/build/2017112212/Fsdb-2.64-1
perl5.26. 12710   cpansand  rtd     VDIR                     7507057954083258955    0 / (/dev/vtbd0p3)
perl5.26. 12710   cpansand    0u    VCHR               0,90              0t20972   90 /dev/pts/0
perl5.26. 12710   cpansand    1u    PIPE 0xfffff8000399f168                    0      ->0xfffff8000399f000
perl5.26. 12710   cpansand    2u    PIPE 0xfffff8001c30da50                    0      ->0xfffff8001c30d8e8
perl5.26. 12710   cpansand    3u    PIPE 0xfffff8001c30ebe0                16384      ->0xfffff8001c30ed48
perl5.26. 12710   cpansand    5u    PIPE 0xfffff8000399f168                    0      ->0xfffff8000399f000
perl5.26. 12710   cpansand    6u    PIPE 0xfffff8001c30da50                    0      ->0xfffff8001c30d8e8
perl5.26. 12710   cpansand    7u    PIPE 0xfffff8000399f168                    0      ->0xfffff8000399f000
perl5.26. 12710   cpansand    8u    PIPE 0xfffff8001c30da50                    0      ->0xfffff8001c30d8e8


$ top -b 300 | egrep ufs
12358 cpansand      1  20   10   268M 12980K ufs      0:04   0.00% perl
12710 cpansand      1  20   10 21412K 13808K ufs      0:00   0.00% perl5.26.0
56111 root          1  20    0  7752K  3180K ufs      0:00   0.00% bash
56077 root          1  52    0  6332K  2312K ufs      0:00   0.00% find
12719 cpansand      1  20   10  7060K  1472K ufs      0:00   0.00% sh


$ lsof | egrep '12710|12358' | egrep home
perl      12358   cpansand  cwd     VDIR                    18446735278340920664    0 / -- home/cpansand/.cpan/build/2017112212/Fsdb-2.64-1
perl5.26. 12710   cpansand  cwd     VDIR                    18446735278340920664    0 / -- home/cpansand/.cpan/build/2017112212/Fsdb-2.64-1


Perl often hangs in a directory during a build of a CPAN module. If I try to switch to this directory, my current shell will hang too:

$ cd /home/cpansand/.cpan/build/2017112212/Fsdb-2.64-1
[hangs]

$ find /home/cpansand/
/home/cpansand/
/home/cpansand/.login_conf
/home/cpansand/.cshrc
/home/cpansand/.rhosts
/home/cpansand/.shrc
/home/cpansand/.mail_aliases
/home/cpansand/.profile
/home/cpansand/.mailrc
/home/cpansand/.login
/home/cpansand/.ssh
/home/cpansand/.ssh/authorized_keys
/home/cpansand/.ssh/known_hosts
/home/cpansand/.zshrc
/home/cpansand/bin
/home/cpansand/src
/home/cpansand/.cpan
/home/cpansand/.cpan/build
/home/cpansand/.cpan/build/2017112215
/home/cpansand/.cpan/build/2017112212
[ hangs]


$ top -b 300 | egrep ufs
12358 cpansand      1  20   10   268M 12980K ufs      0:04   0.00% perl
12710 cpansand      1  20   10 21412K 13808K ufs      0:00   0.00% perl5.26.0
56111 root          1  20    0  7752K  3180K ufs      0:00   0.00% bash
56077 root          1  52    0  6332K  2312K ufs      0:00   0.00% find
57160 root          1  20    0  6332K  1792K ufs      0:00   0.00% find
57169 root          1  20    0  6332K  1792K ufs      0:00   0.00% find
12719 cpansand      1  20   10  7060K  1472K ufs      0:00   0.00% sh


$ pstree
|-+= 00675 root /usr/sbin/cron -s
 | \-+- 55939 root cron: running job (cron)
 |   \-+= 55941 root /bin/sh - /usr/sbin/periodic daily
 |     \-+- 55943 root lockf -t 0 /var/run/periodic.daily.lock /bin/sh /usr/sbin/periodic LOCKED daily
 |       \-+- 55944 root /bin/sh /usr/sbin/periodic LOCKED daily
 |         |-+- 55953 root /bin/sh /usr/sbin/periodic LOCKED daily
 |         | \-+- 56059 root /bin/sh /etc/periodic/daily/450.status-security
 |         |   \-+- 56060 root /bin/sh - /usr/sbin/periodic security
 |         |     \-+- 56062 root lockf -t 0 /var/run/periodic.security.lock /bin/sh /usr/sbin/periodic LOCKED security
 |         |       \-+- 56063 root /bin/sh /usr/sbin/periodic LOCKED security
 |         |         |-+- 56071 root /bin/sh /usr/sbin/periodic LOCKED security
 |         |         | \-+- 56073 root /bin/sh - /etc/periodic/security/100.chksetuid
 |         |         |   |--- 56077 root find -sx / /dev/null ( ! -fstype local ) -prune -o -type f ( -perm -u+x -or -perm -g+x -or -perm -o+x ) ( -perm -u+s -or -perm -g+s ) -exec ls -liTd {} +
 |         |         |   \-+- 56078 root /bin/sh - /etc/periodic/security/100.chksetuid
 |         |         |     \--- 56080 root cat


     \--= 12358 cpansand /usr/perl5.26.0p/bin/perl5.26.0 (CPAN.pm test J/JO/JOHNH/Fsdb-2.64.tar.gz) (perl)

 |-+- 12710 cpansand /usr/perl5.26.0p/bin/perl5.26.0 t/test_command.t
 | \--- 12719 cpansand sh -c blib/script/cgi_to_db TEST/cgi_to_db_ex.in TEST/cgi_to_db_ex.in-2  < TEST/cgi_to_db_ex.in 2>&1



root       56077   0.0  0.2   6332   1936  -  D    03:01      0:00.02 find -sx / /dev/null ( ! -fstype local ) -prune -o -type f ( -perm -u+x -or -perm -g+x -or -perm -o+x ) ( -perm -u+s -or -perm -g+s ) -exec ls -liTd {} +

cpansand   12710   0.0  1.4  21412 13780  0  DN+  14:11      0:00.10 /usr/perl5.26.0p/bin/perl5.26.0 t/test_command.t


$ ps -o wchan -auxgww | grep ufs    
-        root          19   0.0  0.0      0    16  -  DL   Wed02      0:03.12 [bufspacedaemon]
ufs      root       56077   0.0  0.2   6332  1936  -  D    03:01      0:00.02 find -sx / /dev/null ( ! -fstype local ) -prune -o -type f ( -perm -u+x -or -perm -g+x -or -perm -o+x ) ( -perm -u+s -or -perm -g+s ) -exec ls -liTd {} +
ufs      cpansand   12358   0.0  0.7 274408  7280  0  DNs+ 14:11      0:03.89 /usr/perl5.26.0p/bin/perl5.26.0 (CPAN.pm test J/JO/JOHNH/Fsdb-2.64.tar.gz) (perl)
ufs      cpansand   12710   0.0  1.4  21412 13780  0  DN+  14:11      0:00.10 /usr/perl5.26.0p/bin/perl5.26.0 t/test_command.t
ufs      cpansand   12719   0.0  0.1   7060  1444  0  DN+  14:11      0:00.00 sh -c blib/script/cgi_to_db TEST/cgi_to_db_ex.in TEST/cgi_to_db_ex.in-2  < TEST/cgi_to_db_ex.in 2>&1
-        root       78756   0.0  0.0    412   304  7  R+   04:18      0:00.00 grep ufs
ufs      root       56111   0.0  0.3   7752  2688  8  Ds+  03:03      0:00.02 -bash (bash)
ufs      root       57160   0.0  0.1   6332  1416 10  D+   03:15      0:00.00 find .
ufs      root       57169   0.0  0.1   6332  1416 11  D+   03:15      0:00.00 find /home/cpansand/


The perl interpreter usually crashed while testing the CPAN module J/JO/JOHNH/Fsdb-2.64.tar.gz in test t/test_command.t
But I think this is a red herring. The module just runs fine if you test only this module. It must be a combination of other, earlier test which finally bring Fsdb-2.64.tar.gz down.

We saw this error on FreeBSD 9.x, 10.3, 11.1 Release, 11-stable, 12-current. 

My gut feeling is that we have a deadlock or forgotten lock of a directory/file in the ufs code. Some kernel thread forgot to release a lock. Any other process which need to open this directory will hang, and soon or later (2-4 hours) the machine crashes hard.

Everytime the machine crashed we had to run fsck twice, repair inodes and the superblock on the console manually. This is not fun.

We saw the errors at different cloud providers. I’m pretty sure it is not a bad setup, it is a general problem of the FreeBSD kernel/ufs filesystem.
Comment 1 Wolfram Schneider freebsd_committer 2017-12-12 19:05:00 UTC
I googled for similar bug reports and found this:

from 2007, for FreeBSD 6.x


https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=104406
kern/104406: Processes get stuck in "ufs" state under persistent 	CPU load

https://forums.freebsd.org/threads/9150/
We have a problem with FreeBSD 6.2-STABLE (aw62) #2: Fri Jun 22 02:26:27 MSD 2007


some reports about dirhash tuning:

https://forums.freebsd.org/threads/7794/

Try something like procstat -f -w 1 pid_of_httpd or run top -U www -m io to see disk activity. I ran into same, in our case it the solution was dirhashing:

sysctl -a | grep dirhash

will show you current state, and most likely increasing vfs.ufs.dirhash_maxmem will solve your problems.      

sysctl -a | grep mem


https://wiki.freebsd.org/WaitChannels
ufs: Waiting to read from a file or directory on a UFS filesystem that is being written to by another thread, or waiting to write to a local file or directory that is being read or written by another thread 


https://wiki.freebsd.org/DirhashDynamicMemory

https://forums.freebsd.org/threads/34023/

Since vfs.ufs.dirhash_lowmemcount is already at 25, you should consider increase vfs.ufs.dirhash_maxmem.
Comment 2 Kirk McKusick freebsd_committer 2017-12-12 19:30:45 UTC
It is possible that this patch will fix this problem:

r326657 | kib | 2017-12-07 01:05:34 -0800 (Thu, 07 Dec 2017) | 15 lines

Fix livelock in ufsdirhash_create().

When more than one thread enters ufsdirhash_create() for the same
directory and the inode dirhash is instantiated, but the dirhash' hash
is not, all of them lock the dirhash shared and then try to upgrade.
Since there are several threads owning the lock shared, upgrade fails
and the same attempt is repeated, ad infinitum.

To break the lockstep, lock the dirhash in exclusive mode after the
failed try-upgrade.

Reported and tested by: pho
Sponsored by:   Mellanox Technologies
MFC after:      1 week

It is also known that dirhash tuning as described by Wolfram Schneider above significantly mitigates this problem.
Comment 3 Wolfram Schneider freebsd_committer 2017-12-12 20:56:24 UTC
Hi Kirk,

thanks for the fast response. I can run the tests again and see if the machine survives the next 3 days.

I'm wondering if we have a simple test program to trigger this bug? 

Any suggestion to tune dirhash, to get the the broken lock faster?

my current setup:
sysctl  -a|egrep vfs.ufs.dirhash
vfs.ufs.dirhash_reclaimpercent: 10
vfs.ufs.dirhash_lowmemcount: 188
vfs.ufs.dirhash_docheck: 0
vfs.ufs.dirhash_mem: 510163
vfs.ufs.dirhash_maxmem: 2097152
vfs.ufs.dirhash_minsize: 2560
Comment 4 Kirk McKusick freebsd_committer 2017-12-12 21:26:43 UTC
(In reply to Wolfram Schneider from comment #3)
Unfortunately I do not know the correlation of tuning to triggering the livelock bug.
Comment 5 Peter Holm freebsd_committer 2017-12-12 21:37:56 UTC
(In reply to Kirk McKusick from comment #2)
I have booted a test box with amd64, 1 CPU and 1GB of RAM to see if I can reproduce the problem.

The issue fixed by r326657 looked like this:
https://people.freebsd.org/~pho/stress/log/vfs_cache.txt
Comment 6 Wolfram Schneider freebsd_committer 2017-12-15 15:08:49 UTC
The machine crashed again, with a kernel panic. Unfortunately, the kernel dump was not configured. I can access the kernel debug with a VNC web console, and take some screenshots.

You can see the screenshots here:
https://people.freebsd.org/~wosch/tmp/pr-224292/kernel-panic-deadlock-1/

actually, on
https://people.freebsd.org/~wosch/tmp/pr-224292/kernel-panic-deadlock-1/Screen%20Shot%202017-12-15%20at%2011.57.55.png

you can see on the top procstat, sh and perl5.26.1 are in state "ufs"

and here is the perl process pid 28098 with sys_openat()
https://people.freebsd.org/~wosch/tmp/pr-224292/kernel-panic-deadlock-1/Screen%20Shot%202017-12-15%20at%2011.59.47.png
Comment 7 Wolfram Schneider freebsd_committer 2017-12-15 15:09:23 UTC
(In reply to Wolfram Schneider from comment #6)

I forgot to mention the commands I run on the kernel debugger:

show allpcpu
show alllocks
show lockedvnods
show mount
show page
ps
allt
Comment 8 Wolfram Schneider freebsd_committer 2017-12-16 09:54:51 UTC
(In reply to Wolfram Schneider from comment #6)

As fare we can see from the test logs, it crashed in the same perl module as before. I'm sorry to say, but r326657 didn't solved the issue. The witch is not dead.
Comment 9 Wolfram Schneider freebsd_committer 2017-12-17 10:26:12 UTC
Right after boot I see this messages

lock order reversal:
 1st 0xfffffe0000cbf2c0 bufwait (bufwait) @ /home/projects/freebsd/sys/kern/vfs_bio.c:3564
 2nd 0xfffff8001c445800 dirhash (dirhash) @ /home/projects/freebsd/sys/ufs/ufs/ufs_dirhash.c:289
stack backtrace:
#0 0xffffffff80ac5a23 at witness_debugger+0x73
#1 0xffffffff80ac58a2 at witness_checkorder+0xe02
#2 0xffffffff80a6a5f8 at _sx_xlock+0x68
#3 0xffffffff80d6056d at ufsdirhash_add+0x3d
#4 0xffffffff80d633f9 at ufs_direnter+0x459
#5 0xffffffff80d6b7a7 at ufs_mkdir+0x8d7
#6 0xffffffff8108a889 at VOP_MKDIR_APV+0xd9
#7 0xffffffff80b3ddc2 at kern_mkdirat+0x1e2
#8 0xffffffff80f0dd0b at amd64_syscall+0x79b
#9 0xffffffff80eec33b at Xfast_syscall+0xfb

and hours later:
lock order reversal:
 1st 0xfffff80017613068 ufs (ufs) @ /home/projects/freebsd/sys/kern/vfs_subr.c:2608
 2nd 0xfffffe0000f01100 bufwait (bufwait) @ /home/projects/freebsd/sys/ufs/ffs/ffs_vnops.c:282
 3rd 0xfffff8001c9985f0 ufs (ufs) @ /home/projects/freebsd/sys/kern/vfs_subr.c:2608
stack backtrace:
#0 0xffffffff80ac5a23 at witness_debugger+0x73
#1 0xffffffff80ac58a2 at witness_checkorder+0xe02
#2 0xffffffff80a37f63 at __lockmgr_args+0x883
#3 0xffffffff80d5b295 at ffs_lock+0xa5
#4 0xffffffff8108b309 at VOP_LOCK1_APV+0xd9
#5 0xffffffff80b40c86 at _vn_lock+0x66
#6 0xffffffff80b2f6f2 at vget+0x82
#7 0xffffffff80b21651 at vfs_hash_get+0xd1
#8 0xffffffff80d56f5f at ffs_vgetf+0x3f
#9 0xffffffff80d4ce3f at softdep_sync_buf+0x54f
#10 0xffffffff80d5c0f4 at ffs_syncvnode+0x324
#11 0xffffffff80d4c0d8 at softdep_fsync+0x4c8
#12 0xffffffff80d5b16c at ffs_fsync+0x7c
#13 0xffffffff8108a279 at VOP_FSYNC_APV+0xd9
#14 0xffffffff80b3d5d6 at kern_fsync+0x1b6
#15 0xffffffff80f0dd0b at amd64_syscall+0x79b
#16 0xffffffff80eec33b at Xfast_syscall+0xfb
Comment 10 Kirk McKusick freebsd_committer 2017-12-18 00:13:07 UTC
(In reply to Wolfram Schneider from comment #9)
These particular LOR messages are harmless, see discussion of them in https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224176
Comment 11 Eitan Adler freebsd_committer freebsd_triage 2018-05-28 19:50:15 UTC
batch change:

For bugs that match the following
-  Status Is In progress 
AND
- Untouched since 2018-01-01.
AND
- Affects Base System OR Documentation

DO:

Reset to open status.


Note:
I did a quick pass but if you are getting this email it might be worthwhile to double check to see if this bug ought to be closed.
Comment 12 Konstantin Belousov freebsd_committer 2018-05-29 15:04:55 UTC
(In reply to Wolfram Schneider from comment #8)
Look at the https://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html and gather the information as explained there.

Do you use SU+J ?  If yes, try to disable +J and see if it changes anything.