A friend and I are testing perl modules on FreeBSD, on all the major versions (12-current, 11-stable, 11-RELEASE). The tests are done in a cheap VM (1GB RAM, 40GB SSD, 1 core), apparently running on virtio/kvm. While testing we noticed that the VM randomly crashes every 1-3 days, and we had to run a hard fsck on the console. I first thought it is a resource problem, but munin doesn’t show any shortage of RAM, disk or file descriptors. We find out that the crashes occurred shortly after processes hanging in state “ufs”, as shown in top(1) output: $ top -b 300 | egrep ufs 73278 root 1 31 0 8364K 2340K ufs 0:02 10.35% find 34548 cpansand 1 52 10 43828K 15348K ufs 0:00 0.00% perl 34824 cpansand 1 20 0 21060K 1960K ufs 0:00 0.00% procstat 34557 cpansand 1 52 10 13180K 2112K ufs 0:00 0.00% sh $ ps xuawww | egrep 73278 root 73278 0.0 0.2 8364 2340 - D 03:01 0:02.25 find -sx / /dev/null ( ! -fstype local ) -prune -o -type f ( -perm -u+x -or -perm -g+x -or -perm -o+x ) ( -perm -u+s -or -perm -g+s ) -exec ls -liTd {} + root 73667 0.0 0.0 412 312 8 R+ 03:08 0:00.00 egrep 73278 $ ps xuawww | egrep 34548 cpansand 34548 0.0 1.5 43828 15348 4 DN+ 15:29 0:00.10 /usr/perl5.26.1p/bin/perl t/test_command.t Another example: $ lsof | egrep 12710 perl5.26. 12710 cpansand cwd VDIR 18446735278340920664 0 / -- home/cpansand/.cpan/build/2017112212/Fsdb-2.64-1 perl5.26. 12710 cpansand rtd VDIR 7507057954083258955 0 / (/dev/vtbd0p3) perl5.26. 12710 cpansand 0u VCHR 0,90 0t20972 90 /dev/pts/0 perl5.26. 12710 cpansand 1u PIPE 0xfffff8000399f168 0 ->0xfffff8000399f000 perl5.26. 12710 cpansand 2u PIPE 0xfffff8001c30da50 0 ->0xfffff8001c30d8e8 perl5.26. 12710 cpansand 3u PIPE 0xfffff8001c30ebe0 16384 ->0xfffff8001c30ed48 perl5.26. 12710 cpansand 5u PIPE 0xfffff8000399f168 0 ->0xfffff8000399f000 perl5.26. 12710 cpansand 6u PIPE 0xfffff8001c30da50 0 ->0xfffff8001c30d8e8 perl5.26. 12710 cpansand 7u PIPE 0xfffff8000399f168 0 ->0xfffff8000399f000 perl5.26. 12710 cpansand 8u PIPE 0xfffff8001c30da50 0 ->0xfffff8001c30d8e8 $ top -b 300 | egrep ufs 12358 cpansand 1 20 10 268M 12980K ufs 0:04 0.00% perl 12710 cpansand 1 20 10 21412K 13808K ufs 0:00 0.00% perl5.26.0 56111 root 1 20 0 7752K 3180K ufs 0:00 0.00% bash 56077 root 1 52 0 6332K 2312K ufs 0:00 0.00% find 12719 cpansand 1 20 10 7060K 1472K ufs 0:00 0.00% sh $ lsof | egrep '12710|12358' | egrep home perl 12358 cpansand cwd VDIR 18446735278340920664 0 / -- home/cpansand/.cpan/build/2017112212/Fsdb-2.64-1 perl5.26. 12710 cpansand cwd VDIR 18446735278340920664 0 / -- home/cpansand/.cpan/build/2017112212/Fsdb-2.64-1 Perl often hangs in a directory during a build of a CPAN module. If I try to switch to this directory, my current shell will hang too: $ cd /home/cpansand/.cpan/build/2017112212/Fsdb-2.64-1 [hangs] $ find /home/cpansand/ /home/cpansand/ /home/cpansand/.login_conf /home/cpansand/.cshrc /home/cpansand/.rhosts /home/cpansand/.shrc /home/cpansand/.mail_aliases /home/cpansand/.profile /home/cpansand/.mailrc /home/cpansand/.login /home/cpansand/.ssh /home/cpansand/.ssh/authorized_keys /home/cpansand/.ssh/known_hosts /home/cpansand/.zshrc /home/cpansand/bin /home/cpansand/src /home/cpansand/.cpan /home/cpansand/.cpan/build /home/cpansand/.cpan/build/2017112215 /home/cpansand/.cpan/build/2017112212 [ hangs] $ top -b 300 | egrep ufs 12358 cpansand 1 20 10 268M 12980K ufs 0:04 0.00% perl 12710 cpansand 1 20 10 21412K 13808K ufs 0:00 0.00% perl5.26.0 56111 root 1 20 0 7752K 3180K ufs 0:00 0.00% bash 56077 root 1 52 0 6332K 2312K ufs 0:00 0.00% find 57160 root 1 20 0 6332K 1792K ufs 0:00 0.00% find 57169 root 1 20 0 6332K 1792K ufs 0:00 0.00% find 12719 cpansand 1 20 10 7060K 1472K ufs 0:00 0.00% sh $ pstree |-+= 00675 root /usr/sbin/cron -s | \-+- 55939 root cron: running job (cron) | \-+= 55941 root /bin/sh - /usr/sbin/periodic daily | \-+- 55943 root lockf -t 0 /var/run/periodic.daily.lock /bin/sh /usr/sbin/periodic LOCKED daily | \-+- 55944 root /bin/sh /usr/sbin/periodic LOCKED daily | |-+- 55953 root /bin/sh /usr/sbin/periodic LOCKED daily | | \-+- 56059 root /bin/sh /etc/periodic/daily/450.status-security | | \-+- 56060 root /bin/sh - /usr/sbin/periodic security | | \-+- 56062 root lockf -t 0 /var/run/periodic.security.lock /bin/sh /usr/sbin/periodic LOCKED security | | \-+- 56063 root /bin/sh /usr/sbin/periodic LOCKED security | | |-+- 56071 root /bin/sh /usr/sbin/periodic LOCKED security | | | \-+- 56073 root /bin/sh - /etc/periodic/security/100.chksetuid | | | |--- 56077 root find -sx / /dev/null ( ! -fstype local ) -prune -o -type f ( -perm -u+x -or -perm -g+x -or -perm -o+x ) ( -perm -u+s -or -perm -g+s ) -exec ls -liTd {} + | | | \-+- 56078 root /bin/sh - /etc/periodic/security/100.chksetuid | | | \--- 56080 root cat \--= 12358 cpansand /usr/perl5.26.0p/bin/perl5.26.0 (CPAN.pm test J/JO/JOHNH/Fsdb-2.64.tar.gz) (perl) |-+- 12710 cpansand /usr/perl5.26.0p/bin/perl5.26.0 t/test_command.t | \--- 12719 cpansand sh -c blib/script/cgi_to_db TEST/cgi_to_db_ex.in TEST/cgi_to_db_ex.in-2 < TEST/cgi_to_db_ex.in 2>&1 root 56077 0.0 0.2 6332 1936 - D 03:01 0:00.02 find -sx / /dev/null ( ! -fstype local ) -prune -o -type f ( -perm -u+x -or -perm -g+x -or -perm -o+x ) ( -perm -u+s -or -perm -g+s ) -exec ls -liTd {} + cpansand 12710 0.0 1.4 21412 13780 0 DN+ 14:11 0:00.10 /usr/perl5.26.0p/bin/perl5.26.0 t/test_command.t $ ps -o wchan -auxgww | grep ufs - root 19 0.0 0.0 0 16 - DL Wed02 0:03.12 [bufspacedaemon] ufs root 56077 0.0 0.2 6332 1936 - D 03:01 0:00.02 find -sx / /dev/null ( ! -fstype local ) -prune -o -type f ( -perm -u+x -or -perm -g+x -or -perm -o+x ) ( -perm -u+s -or -perm -g+s ) -exec ls -liTd {} + ufs cpansand 12358 0.0 0.7 274408 7280 0 DNs+ 14:11 0:03.89 /usr/perl5.26.0p/bin/perl5.26.0 (CPAN.pm test J/JO/JOHNH/Fsdb-2.64.tar.gz) (perl) ufs cpansand 12710 0.0 1.4 21412 13780 0 DN+ 14:11 0:00.10 /usr/perl5.26.0p/bin/perl5.26.0 t/test_command.t ufs cpansand 12719 0.0 0.1 7060 1444 0 DN+ 14:11 0:00.00 sh -c blib/script/cgi_to_db TEST/cgi_to_db_ex.in TEST/cgi_to_db_ex.in-2 < TEST/cgi_to_db_ex.in 2>&1 - root 78756 0.0 0.0 412 304 7 R+ 04:18 0:00.00 grep ufs ufs root 56111 0.0 0.3 7752 2688 8 Ds+ 03:03 0:00.02 -bash (bash) ufs root 57160 0.0 0.1 6332 1416 10 D+ 03:15 0:00.00 find . ufs root 57169 0.0 0.1 6332 1416 11 D+ 03:15 0:00.00 find /home/cpansand/ The perl interpreter usually crashed while testing the CPAN module J/JO/JOHNH/Fsdb-2.64.tar.gz in test t/test_command.t But I think this is a red herring. The module just runs fine if you test only this module. It must be a combination of other, earlier test which finally bring Fsdb-2.64.tar.gz down. We saw this error on FreeBSD 9.x, 10.3, 11.1 Release, 11-stable, 12-current. My gut feeling is that we have a deadlock or forgotten lock of a directory/file in the ufs code. Some kernel thread forgot to release a lock. Any other process which need to open this directory will hang, and soon or later (2-4 hours) the machine crashes hard. Everytime the machine crashed we had to run fsck twice, repair inodes and the superblock on the console manually. This is not fun. We saw the errors at different cloud providers. I’m pretty sure it is not a bad setup, it is a general problem of the FreeBSD kernel/ufs filesystem.
I googled for similar bug reports and found this: from 2007, for FreeBSD 6.x https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=104406 kern/104406: Processes get stuck in "ufs" state under persistent CPU load https://forums.freebsd.org/threads/9150/ We have a problem with FreeBSD 6.2-STABLE (aw62) #2: Fri Jun 22 02:26:27 MSD 2007 some reports about dirhash tuning: https://forums.freebsd.org/threads/7794/ Try something like procstat -f -w 1 pid_of_httpd or run top -U www -m io to see disk activity. I ran into same, in our case it the solution was dirhashing: sysctl -a | grep dirhash will show you current state, and most likely increasing vfs.ufs.dirhash_maxmem will solve your problems. sysctl -a | grep mem https://wiki.freebsd.org/WaitChannels ufs: Waiting to read from a file or directory on a UFS filesystem that is being written to by another thread, or waiting to write to a local file or directory that is being read or written by another thread https://wiki.freebsd.org/DirhashDynamicMemory https://forums.freebsd.org/threads/34023/ Since vfs.ufs.dirhash_lowmemcount is already at 25, you should consider increase vfs.ufs.dirhash_maxmem.
It is possible that this patch will fix this problem: r326657 | kib | 2017-12-07 01:05:34 -0800 (Thu, 07 Dec 2017) | 15 lines Fix livelock in ufsdirhash_create(). When more than one thread enters ufsdirhash_create() for the same directory and the inode dirhash is instantiated, but the dirhash' hash is not, all of them lock the dirhash shared and then try to upgrade. Since there are several threads owning the lock shared, upgrade fails and the same attempt is repeated, ad infinitum. To break the lockstep, lock the dirhash in exclusive mode after the failed try-upgrade. Reported and tested by: pho Sponsored by: Mellanox Technologies MFC after: 1 week It is also known that dirhash tuning as described by Wolfram Schneider above significantly mitigates this problem.
Hi Kirk, thanks for the fast response. I can run the tests again and see if the machine survives the next 3 days. I'm wondering if we have a simple test program to trigger this bug? Any suggestion to tune dirhash, to get the the broken lock faster? my current setup: sysctl -a|egrep vfs.ufs.dirhash vfs.ufs.dirhash_reclaimpercent: 10 vfs.ufs.dirhash_lowmemcount: 188 vfs.ufs.dirhash_docheck: 0 vfs.ufs.dirhash_mem: 510163 vfs.ufs.dirhash_maxmem: 2097152 vfs.ufs.dirhash_minsize: 2560
(In reply to Wolfram Schneider from comment #3) Unfortunately I do not know the correlation of tuning to triggering the livelock bug.
(In reply to Kirk McKusick from comment #2) I have booted a test box with amd64, 1 CPU and 1GB of RAM to see if I can reproduce the problem. The issue fixed by r326657 looked like this: https://people.freebsd.org/~pho/stress/log/vfs_cache.txt
The machine crashed again, with a kernel panic. Unfortunately, the kernel dump was not configured. I can access the kernel debug with a VNC web console, and take some screenshots. You can see the screenshots here: https://people.freebsd.org/~wosch/tmp/pr-224292/kernel-panic-deadlock-1/ actually, on https://people.freebsd.org/~wosch/tmp/pr-224292/kernel-panic-deadlock-1/Screen%20Shot%202017-12-15%20at%2011.57.55.png you can see on the top procstat, sh and perl5.26.1 are in state "ufs" and here is the perl process pid 28098 with sys_openat() https://people.freebsd.org/~wosch/tmp/pr-224292/kernel-panic-deadlock-1/Screen%20Shot%202017-12-15%20at%2011.59.47.png
(In reply to Wolfram Schneider from comment #6) I forgot to mention the commands I run on the kernel debugger: show allpcpu show alllocks show lockedvnods show mount show page ps allt
(In reply to Wolfram Schneider from comment #6) As fare we can see from the test logs, it crashed in the same perl module as before. I'm sorry to say, but r326657 didn't solved the issue. The witch is not dead.
Right after boot I see this messages lock order reversal: 1st 0xfffffe0000cbf2c0 bufwait (bufwait) @ /home/projects/freebsd/sys/kern/vfs_bio.c:3564 2nd 0xfffff8001c445800 dirhash (dirhash) @ /home/projects/freebsd/sys/ufs/ufs/ufs_dirhash.c:289 stack backtrace: #0 0xffffffff80ac5a23 at witness_debugger+0x73 #1 0xffffffff80ac58a2 at witness_checkorder+0xe02 #2 0xffffffff80a6a5f8 at _sx_xlock+0x68 #3 0xffffffff80d6056d at ufsdirhash_add+0x3d #4 0xffffffff80d633f9 at ufs_direnter+0x459 #5 0xffffffff80d6b7a7 at ufs_mkdir+0x8d7 #6 0xffffffff8108a889 at VOP_MKDIR_APV+0xd9 #7 0xffffffff80b3ddc2 at kern_mkdirat+0x1e2 #8 0xffffffff80f0dd0b at amd64_syscall+0x79b #9 0xffffffff80eec33b at Xfast_syscall+0xfb and hours later: lock order reversal: 1st 0xfffff80017613068 ufs (ufs) @ /home/projects/freebsd/sys/kern/vfs_subr.c:2608 2nd 0xfffffe0000f01100 bufwait (bufwait) @ /home/projects/freebsd/sys/ufs/ffs/ffs_vnops.c:282 3rd 0xfffff8001c9985f0 ufs (ufs) @ /home/projects/freebsd/sys/kern/vfs_subr.c:2608 stack backtrace: #0 0xffffffff80ac5a23 at witness_debugger+0x73 #1 0xffffffff80ac58a2 at witness_checkorder+0xe02 #2 0xffffffff80a37f63 at __lockmgr_args+0x883 #3 0xffffffff80d5b295 at ffs_lock+0xa5 #4 0xffffffff8108b309 at VOP_LOCK1_APV+0xd9 #5 0xffffffff80b40c86 at _vn_lock+0x66 #6 0xffffffff80b2f6f2 at vget+0x82 #7 0xffffffff80b21651 at vfs_hash_get+0xd1 #8 0xffffffff80d56f5f at ffs_vgetf+0x3f #9 0xffffffff80d4ce3f at softdep_sync_buf+0x54f #10 0xffffffff80d5c0f4 at ffs_syncvnode+0x324 #11 0xffffffff80d4c0d8 at softdep_fsync+0x4c8 #12 0xffffffff80d5b16c at ffs_fsync+0x7c #13 0xffffffff8108a279 at VOP_FSYNC_APV+0xd9 #14 0xffffffff80b3d5d6 at kern_fsync+0x1b6 #15 0xffffffff80f0dd0b at amd64_syscall+0x79b #16 0xffffffff80eec33b at Xfast_syscall+0xfb
(In reply to Wolfram Schneider from comment #9) These particular LOR messages are harmless, see discussion of them in https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224176
batch change: For bugs that match the following - Status Is In progress AND - Untouched since 2018-01-01. AND - Affects Base System OR Documentation DO: Reset to open status. Note: I did a quick pass but if you are getting this email it might be worthwhile to double check to see if this bug ought to be closed.
(In reply to Wolfram Schneider from comment #8) Look at the https://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html and gather the information as explained there. Do you use SU+J ? If yes, try to disable +J and see if it changes anything.
While upgrading my FreeBSD 13-CURRENT box yesterday, I ran into this issue, when compiling the kernel and then world with -j8, I was ending up always with all the cc processes stuck in biowr and ufs. At the end I've upgraded my box through NFS from my 13-CURRENT server (zfs based). But when upgrading ports from the same server, using pkg upgrade -f, it was also getting stuck with the same symptoms. I've managed to take a kernel dump, then booted into single user mode, disabled soft update journaling: (-j), then booted the system and upgraded my packages, all went ok. Looking at the dump, I see lots of dd processes (launched by pkg I believe), that are in the following state: read 345 (Thread 100752 (PID=32594: dd)): #0 sched_switch (td=0xfffffe00fdd9be00, flags=<optimized out>) at /usr/src/sys/kern/sched_ule.c:2147 #1 0xffffffff803db4be in mi_switch (flags=flags@entry=260) at /usr/src/sys/kern/kern_synch.c:542 #2 0xffffffff8042a43d in sleepq_switch (wchan=wchan@entry=0x0, pri=pri@entry=2122752) at /usr/src/sys/kern/subr_sleepqueue.c:608 #3 0xffffffff8042a327 in sleepq_wait (wchan=<unavailable>, pri=<unavailable>) at /usr/src/sys/kern/subr_sleepqueue.c:659 #4 0xffffffff803a9f7f in sleeplk (lk=lk@entry=0xfffff8000535f810, flags=flags@entry=2122752, ilk=<optimized out>, ilk@entry=0xfffff8000535f840, wmesg=<optimized out>, wmesg@entry=0xffffffff806c8ccb "ufs", pri=<optimized out>, pri@entry=96, timo=timo@entry=51, queue=1) at /usr/src/sys/kern/kern_lock.c:301 #5 0xffffffff803a88e2 in lockmgr_slock_hard (lk=0xfffff8000535f810, flags=flags@entry=2122752, ilk=0xfffff8000535f840, ilk@entry=0xfffffe00fdaff658, file=<optimized out>, line=<optimized out>, lwa=lwa@entry=0x0) at /usr/src/sys/kern/kern_lock.c:696 #6 0xffffffff803a84e8 in lockmgr_lock_flags (lk=<optimized out>, lk@entry=0xfffff8000535f810, flags=flags@entry=2122752, ilk=ilk@entry=0xfffff8000535f840, file=<unavailable>, line=<unavailable>) at /usr/src/sys/kern/kern_lock.c:1056 #7 0xffffffff805c90e5 in ffs_lock (ap=0xfffffe00fdaff658, ap@entry=<error reading variable: value is not available>) at /usr/src/sys/ufs/ffs/ffs_vnops.c:494 #8 0xffffffff804be098 in VOP_LOCK1 (vp=0xfffff8000535f7a0, flags=2106368, file=0xffffffff806fac3e "/usr/src/sys/kern/vfs_lookup.c", line=877) at ./vnode_if.h:1096 #9 _vn_lock (vp=vp@entry=0xfffff8000535f7a0, flags=2106368, file=0xffffffff806fac3e "/usr/src/sys/kern/vfs_lookup.c", line=line@entry=877) at /usr/src/sys/kern/vfs_vnops.c:1741 #10 0xffffffff80497a18 in lookup (ndp=ndp@entry=0xfffffe00fdaff970) at /usr/src/sys/kern/vfs_lookup.c:875 #11 0xffffffff804973dd in namei (ndp=ndp@entry=0xfffffe00fdaff970) at /usr/src/sys/kern/vfs_lookup.c:631 --Type <RET> for more, q to quit, c to continue without paging-- #12 0xffffffff804bd590 in vn_open_cred (ndp=ndp@entry=0xfffffe00fdaff970, flagp=<optimized out>, flagp@entry=0xfffffe00fdaffa94, cmode=<optimized out>, vn_open_flags=vn_open_flags@entry=0, cred=<optimized out>, fp=0xfffff80003d49140) at /usr/src/sys/kern/vfs_vnops.c:252 #13 0xffffffff804bd44d in vn_open (ndp=<unavailable>, ndp@entry=0xfffffe00fdaff970, flagp=<unavailable>, flagp@entry=0xfffffe00fdaffa94, cmode=<unavailable>, fp=<unavailable>) at /usr/src/sys/kern/vfs_vnops.c:193 #14 0xffffffff804b3723 in kern_openat (td=0xfffffe00fdd9be00, fd=<optimized out>, path=<optimized out>, pathseg=<optimized out>, flags=1539, mode=<optimized out>) at /usr/src/sys/kern/vfs_syscalls.c:1143 #15 0xffffffff8068d5cc in syscallenter (td=0xfffffe00fdd9be00) at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:189 #16 amd64_syscall (td=0xfffffe00fdd9be00, traced=0) at /usr/src/sys/amd64/amd64/trap.c:1156 I will enable back -j, and kernel deadlocks debugging features, to hopefully get more debugging information.
(In reply to Ali Abdallah from comment #13) This is certainly not related to the bug you replied to. That said, your issue _might_ be fixed by the patch series available at https://kib.kiev.ua/git/gitweb.cgi?p=deviant3.git;a=shortlog;h=refs/heads/ufs_needsync I.e. I am sufficiently curious if this patches from the ufs_needsync branch make any difference for your setup. Something similar was reported by Peter Holm as well, I believe. If not, I will look at your report after this branch lands. You would need to follow the 'debugging deadlock' chapter from the developers handbook to gather required information. Or provide a reproduction recipe to Peter Holm.
I'm getting occasional UFS "hangs" of sorts on a -CURRENT VM too. Must be unrelated to whatever the problem was in the original bug report but I figured I'd dump this here in case it could help figuring out more recent problems. Last time it happened and that I saved some infos was from a little while ago (FreeBSD 13.0-CURRENT #32 main-c529869-g4f597837d531: Tue Jan 12 14:41:03 EST 2021). I'll update and try to get more infos if it happens again. `pkg upgrade` and running kyua tests are what usually trigger it but it still happens very rarely. load: 1.71 cmd: pkg 11799 [biowr] 331.89r 20.33u 66.48s 52% 764496k mi_switch+0x155 sleepq_switch+0x109 _sleep+0x2b4 bufwait+0xc4 bufwrite+0x25a ffs_update+0x2ed ffs_syncvnode+0x4da ffs_fsync+0x1f softdep_prerename+0x21a ufs_rename+0x3ee VOP_RENAME_APV+0x40 kern_renameat+0x3fd amd64_syscall+0x149 fast_syscall_common+0xf8 root@vm2:[~] # procstat -kk 11799 PID TID COMM TDNAME KSTACK 11799 100255 pkg - __mtx_lock_sleep+0xe8 __mtx_lock_flags+0xe5 process_worklist_item+0x63 softdep_prerename+0x4bd ufs_rename+0x3ee VOP_RENAME_APV+0x40 kern_renameat+0x3fd amd64_syscall+0x149 fast_syscall_common+0xf8 root@vm2:[~] # procstat -kk 11799 PID TID COMM TDNAME KSTACK 11799 100255 pkg - mi_switch+0x155 sleepq_switch+0x109 _sleep+0x2b4 bufwait+0xc4 bufwrite+0x25a ffs_update+0x2ed ffs_syncvnode+0x4da ffs_fsync+0x1f softdep_prerename+0x21a ufs_rename+0x3ee VOP_RENAME_APV+0x40 kern_renameat+0x3fd amd64_syscall+0x149 fast_syscall_common+0xf8 root@vm2:[~] # ps -lp 11799 UID PID PPID C PRI NI VSZ RSS MWCHAN STAT TT TIME COMMAND 0 11799 11798 4 52 0 828024 764512 - R+ 1 2:38.77 pkg upgrade root@vm2:[~] # iostat -w 1 -d vtbd0 vtbd1 vtbd2 KB/t tps MB/s KB/t tps MB/s KB/t tps MB/s 24.4 10 0.2 33.3 2 0.1 38.2 0 0.0 32.0 25091 784.1 0.0 0 0.0 0.0 0 0.0 32.0 23305 728.3 0.0 0 0.0 0.0 0 0.0 32.0 23539 735.6 0.0 0 0.0 0.0 0 0.0 32.0 22151 692.2 0.0 0 0.0 0.0 0 0.0 32.0 19310 603.4 0.0 0 0.0 0.0 0 0.0 32.0 22848 714.0 0.0 0 0.0 0.0 0 0.0 32.0 24287 759.0 0.0 0 0.0 0.0 0 0.0 32.0 23392 731.0 0.0 0 0.0 0.0 0 0.0 32.0 24586 768.3 0.0 0 0.0 0.0 0 0.0 32.0 23980 749.4 0.0 0 0.0 0.0 0 0.0 32.0 23549 735.9 0.0 0 0.0 0.0 0 0.0 32.0 23328 729.0 0.0 0 0.0 0.0 0 0.0 32.0 23173 724.2 0.0 0 0.0 0.0 0 0.0 32.0 24906 778.3 0.0 0 0.0 0.0 0 0.0 32.0 23534 735.4 0.0 0 0.0 0.0 0 0.0 32.0 24242 757.6 0.0 0 0.0 0.0 0 0.0 32.0 21295 665.5 0.0 0 0.0 0.0 0 0.0 32.0 19002 593.8 0.0 0 0.0 0.0 0 0.0 32.0 18702 584.4 0.0 0 0.0 0.0 0 0.0 32.0 19285 602.7 0.0 0 0.0 0.0 0 0.0 32.0 18171 567.8 0.0 0 0.0 0.0 0 0.0 32.0 18603 581.3 0.0 0 0.0 0.0 0 0.0 I think there's always rmdir/mkdir or rename in the kernel call stack of the hung processes when it happens. And it's really going nuts with the I/O. Makes me think it must be some kind of live lock within softupdates. When some processes get stuck in this state, more and more processes eventually get stuck until the VM is unusable. But running `sync` unwedges the whole thing and everything seems to be running fine after that. I'll try to get a core dump next time.
I believe I'm seeing the same problem on my T420. It started a couple (three?) months ago. I'm not sure it's a deadlock/livelock, though: it almost seems like a buffer is being written to disk over and over. One particular case I remember is dd instance used to write out entropy (4k write to a file somewhere in /var). This was the culprit according to "top -m io". On some other occurences of this bug I've seen Firefox doing an absurd amount of IO. Assuming we can trust top(1), this aligns with the idea of a buffer being written over and over.
Try this https://kib.kiev.ua/git/gitweb.cgi?p=deviant3.git;a=commitdiff;hp=main;h=ufs_needsync I would be not surprised if the patch fixes the issue for you.
(In reply to Konstantin Belousov from comment #17) This sure seems to have helped. I was about to report that the problem is most likely gone since it hadn't happened in a while (despite running kyua in a loop for hours) after getting that patch series. But then it happened again with chrome this time and I got a dump. Dunno if running "sync" would have unwedged the whole thing since I made it panic instead. There were two threads from two processes looping and doing crazy I/O: a chrome process and a zsh process. zsh thread backtrace: #0 sched_switch (td=td@entry=0xfffffe00aa35ce00, flags=<optimized out>, flags@entry=260) at /usr/src/sys/kern/sched_ule.c:2147 #1 0xffffffff80c1f4c9 in mi_switch (flags=flags@entry=260) at /usr/src/sys/kern/kern_synch.c:542 #2 0xffffffff80c6f929 in sleepq_switch (wchan=wchan@entry=0xfffffe00097da0a8, pri=92, pri@entry=0) at /usr/src/sys/kern/subr_sleepqueue.c:608 #3 0xffffffff80c6f7fe in sleepq_wait (wchan=<optimized out>, pri=<optimized out>) at /usr/src/sys/kern/subr_sleepqueue.c:659 #4 0xffffffff80c1e9e6 in _sleep (ident=ident@entry=0xfffffe00097da0a8, lock=<optimized out>, lock@entry=0xfffffe000863b0c0, priority=priority@entry=92, wmesg=<optimized out>, sbt=sbt@entry=0, pr=pr@entry=0, flags=256) at /usr/src/sys/kern/kern_synch.c:221 #5 0xffffffff80cd5214 in bwait (bp=0xfffffe00097da0a8, pri=92 '\\', wchan=<optimized out>) at /usr/src/sys/kern/vfs_bio.c:5020 #6 bufwait (bp=bp@entry=0xfffffe00097da0a8) at /usr/src/sys/kern/vfs_bio.c:4433 #7 0xffffffff80cd285a in bufwrite (bp=0xfffffe00097da0a8, bp@entry=<error reading variable: value is not available>) at /usr/src/sys/kern/vfs_bio.c:2305 #8 0xffffffff80f01789 in bwrite (bp=<unavailable>) at /usr/src/sys/sys/buf.h:430 #9 ffs_update (vp=vp@entry=0xfffff80004c61380, waitfor=waitfor@entry=1) at /usr/src/sys/ufs/ffs/ffs_inode.c:204 #10 0xffffffff80f2f98a in ffs_syncvnode (vp=vp@entry=0xfffff80004c61380, waitfor=<optimized out>, waitfor@entry=1, flags=<optimized out>, flags@entry=0) at /usr/src/sys/ufs/ffs/ffs_vnops.c:447 #11 0xffffffff80f0f91d in softdep_prelink (dvp=dvp@entry=0xfffff80004c61380, vp=vp@entry=0x0) at /usr/src/sys/ufs/ffs/ffs_softdep.c:3417 #12 0xffffffff80f3fee3 in ufs_makeinode (mode=33188, dvp=0xfffff80004c61380, vpp=0xfffffe00aae0a9d8, cnp=<unavailable>, callfunc=<unavailable>) at /usr/src/sys/ufs/ufs/ufs_vnops.c:2741 #13 0xffffffff80f3bfa4 in ufs_create (ap=0xfffffe00aae0a8a8) at /usr/src/sys/ufs/ufs/ufs_vnops.c:213 #14 0xffffffff8118a31d in VOP_CREATE_APV (vop=0xffffffff81b63158 <ffs_vnodeops2>, a=a@entry=0xfffffe00aae0a8a8) at vnode_if.c:244 #15 0xffffffff80d15233 in VOP_CREATE (dvp=<unavailable>, vpp=0xfffffe00aae0a9d8, cnp=0xfffffe00aae0aa00, vap=0xfffffe00aae0a7f0) at ./vnode_if.h:133 #16 vn_open_cred (ndp=ndp@entry=0xfffffe00aae0a968, flagp=flagp@entry=0xfffffe00aae0aa94, cmode=cmode@entry=420, vn_open_flags=<optimized out>, vn_open_flags@entry=0, cred=0xfffff80048d42e00, fp=0xfffff8010aeabc30) at /usr/src/sys/kern/vfs_vnops.c:285 #17 0xffffffff80d14f6d in vn_open (ndp=<unavailable>, ndp@entry=0xfffffe00aae0a968, flagp=<unavailable>, flagp@entry=0xfffffe00aae0aa94, cmode=<unavailable>, cmode@entry=420, fp=<unavailable>) at /usr/src/sys/kern/vfs_vnops.c:202 #18 0xffffffff80d08999 in kern_openat (td=0xfffffe00aa35ce00, fd=-100, path=0x8002fd420 <error: Cannot access memory at address 0x8002fd420>, pathseg=UIO_USERSPACE, flags=34306, mode=<optimized out>) at /usr/src/sys/kern/vfs_syscalls.c:1142 #19 0xffffffff810c5803 in syscallenter (td=<optimized out>) at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:205 #20 amd64_syscall (td=0xfffffe00aa35ce00, traced=0) at /usr/src/sys/amd64/amd64/trap.c:1156 #21 <signal handler called> #22 0x00000008004f223a in ?? () chrome thread backtrace: #0 cpustop_handler () at /usr/src/sys/x86/x86/mp_x86.c:1475 #1 0xffffffff8108afe9 in ipi_nmi_handler () at /usr/src/sys/x86/x86/mp_x86.c:1432 #2 0xffffffff810c4256 in trap (frame=0xfffffe0009848f30) at /usr/src/sys/amd64/amd64/trap.c:201 #3 <signal handler called> #4 vtpci_legacy_notify_vq (dev=<optimized out>, queue=0, offset=16) at /usr/src/sys/dev/virtio/pci/virtio_pci_legacy.c:485 #5 0xffffffff80a45417 in VIRTIO_BUS_NOTIFY_VQ (dev=0xfffff8000362fb00, queue=0, offset=16) at ./virtio_bus_if.h:144 #6 vq_ring_notify_host (vq=0xfffffe0063e27000) at /usr/src/sys/dev/virtio/virtqueue.c:834 #7 virtqueue_notify (vq=0xfffffe0063e27000, vq@entry=0xfffff8004de6f600) at /usr/src/sys/dev/virtio/virtqueue.c:439 #8 0xffffffff80a538c0 in vtblk_startio (sc=sc@entry=0xfffff8000362f100) at /usr/src/sys/dev/virtio/block/virtio_blk.c:1123 #9 0xffffffff80a53bed in vtblk_strategy (bp=0xfffff8004de6f600) at /usr/src/sys/dev/virtio/block/virtio_blk.c:571 #10 0xffffffff80b4bcfc in g_disk_start (bp=<optimized out>) at /usr/src/sys/geom/geom_disk.c:473 #11 0xffffffff80b4f147 in g_io_request (bp=0xfffff80021d33c00, cp=<optimized out>, cp@entry=0xfffff8000398ce80) at /usr/src/sys/geom/geom_io.c:589 #12 0xffffffff80b5b1a9 in g_part_start (bp=0xfffff8004e974900) at /usr/src/sys/geom/part/g_part.c:2332 #13 0xffffffff80b4f147 in g_io_request (bp=0xfffff8004e974900, cp=<optimized out>) at /usr/src/sys/geom/geom_io.c:589 #14 0xffffffff80cd284c in bstrategy (bp=0xfffffe0008ac5388) at /usr/src/sys/sys/buf.h:442 #15 bufwrite (bp=0xfffffe0008ac5388) at /usr/src/sys/kern/vfs_bio.c:2302 #16 0xffffffff80f01789 in bwrite (bp=0x0) at /usr/src/sys/sys/buf.h:430 #17 ffs_update (vp=vp@entry=0xfffff80139495000, waitfor=waitfor@entry=1) at /usr/src/sys/ufs/ffs/ffs_inode.c:204 #18 0xffffffff80f2f98a in ffs_syncvnode (vp=vp@entry=0xfffff80139495000, waitfor=<optimized out>, waitfor@entry=1, flags=<optimized out>, flags@entry=0) at /usr/src/sys/ufs/ffs/ffs_vnops.c:447 #19 0xffffffff80f0f86f in softdep_prelink (dvp=dvp@entry=0xfffff80139495000, vp=vp@entry=0xfffff8013c8328c0) at /usr/src/sys/ufs/ffs/ffs_softdep.c:3417 #20 0xffffffff80f3d797 in ufs_remove (ap=0xfffffe00aabdfa20) at /usr/src/sys/ufs/ufs/ufs_vnops.c:1011 #21 0xffffffff8118bf90 in VOP_REMOVE_APV (vop=0xffffffff81b63158 <ffs_vnodeops2>, a=a@entry=0xfffffe00aabdfa20) at vnode_if.c:1540 #22 0xffffffff80d0a468 in VOP_REMOVE (dvp=0x0, vp=0xfffff8013c8328c0, cnp=<optimized out>) at ./vnode_if.h:802 #23 kern_funlinkat (td=0xfffffe00aa6e3100, dfd=dfd@entry=-100, path=0x8288d40e0 <error: Cannot access memory at address 0x8288d40e0>, fd=<optimized out>, fd@entry=-200, pathseg=pathseg@entry=UIO_USERSPACE, flag=<optimized out>, flag@entry=0, oldinum=0) at /usr/src/sys/kern/vfs_syscalls.c:1927 #24 0xffffffff80d0a138 in sys_unlink (td=0xfffff8000362fb00, uap=<optimized out>) at /usr/src/sys/kern/vfs_syscalls.c:1808 #25 0xffffffff810c5803 in syscallenter (td=<optimized out>) at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:205 #26 amd64_syscall (td=0xfffffe00aa6e3100, traced=0) at /usr/src/sys/amd64/amd64/trap.c:1156 #27 <signal handler called> #28 0x000000080e40d17a in ?? () syncer backtrace: #0 sched_switch (td=td@entry=0xfffffe00a5e29100, flags=<optimized out>, flags@entry=260) at /usr/src/sys/kern/sched_ule.c:2147 #1 0xffffffff80c1f4c9 in mi_switch (flags=flags@entry=260) at /usr/src/sys/kern/kern_synch.c:542 #2 0xffffffff80c6f929 in sleepq_switch (wchan=wchan@entry=0xffffffff81fa9550 <sync_wakeup>, pri=pri@entry=0) at /usr/src/sys/kern/subr_sleepqueue.c:608 #3 0xffffffff80c6fe3b in sleepq_timedwait (wchan=wchan@entry=0xffffffff81fa9550 <sync_wakeup>, pri=pri@entry=0) at /usr/src/sys/kern/subr_sleepqueue.c:690 #4 0xffffffff80ba34b0 in _cv_timedwait_sbt (cvp=0xffffffff81fa9550 <sync_wakeup>, lock=0xffffffff81fa9520 <sync_mtx>, sbt=<optimized out>, pr=<optimized out>, pr@entry=0, flags=0, flags@entry=256) at /usr/src/sys/kern/kern_condvar.c:312 #5 0xffffffff80d036dc in sched_sync () at /usr/src/sys/kern/vfs_subr.c:2739 #6 0xffffffff80bcb9a0 in fork_exit (callout=0xffffffff80d03090 <sched_sync>, arg=0x0, frame=0xfffffe006a491c00) at /usr/src/sys/kern/kern_fork.c:1077 #7 <signal handler called> It seems like some kind of livelock involving ERELOOKUP loops. I can only guess though, softupdates' is way too complicated for me. That's with cb0dd7e122b8936ad61a141e65ef8ef874bfebe5 merged. This kernel has some local changes and I'm a little bit worried that this might be the problem but I think it's unlikely. The problem happens pretty rarely and that's the only -CURRENT install on UFS that I'm working with so that's the best that I've got. That's with a virtio disk backed by a ZFS volume on bhyve BTW.
I believe that this problem has been resolved.
I am sorry but the problem is not solved, at least for stable/13 (abd116de1d4). Today one of my server for two hours was not able to serve any www page. I had an open ssh connection and was able to call 'sync'. After that everything returns to normal behavior. When the server was not able to respond to www requests I saw that 'systat -vmstat 1' reports 100% activity for two disks (mirror/gm0s1a). Now I am changing soft updated journaling to only soft updates.
This happened to me too after upgrade to 13-RELEASE. Mostly mv and rm processes stuck at biowr state. After 'sync' everything returns normal. FreeBSD 13.0-RELEASE-p3 #0: Tue Jun 29 19:46:20 UTC 2021 root@amd64-builder.daemonology.net:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64
(In reply to Tomasz Sowa from comment #20) (In reply to ATAG from comment #21) The fix for this problem was MFC'ed to stable/13 on Wed Jul 7 13:50:44 2021 in commit d36505493d2a876b37da0c7850ef906b1f32df08. You need to run a kernel compiled from stable/13 sources on or after this date to get the needed fix.