Sometimes processes go to D state and never change to other states under FreeBSD from 10.0 to 10.3. Than new and new processes go to D state until server hangs at all. There is no disk activity at that time. uname -a: FreeBSD hostname 10.3-RELEASE FreeBSD 10.3-RELEASE #0 r297264: Fri Mar 25 02:10:02 UTC 2016 root@releng1.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC amd64 ps auxwwO wchan: USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND PID WCHAN TT STAT TIME COMMAND www 1395 0,0 0,0 97932 7704 - D 11:12AM 0:00,02 /usr/local/sbin/ 1395 suspfs - D 0:00,02 /usr/local/sbin/httpd -f /usr/local/etc/apache24/httpd_ssl.conf -c PidFile /var/run/httpd.ssl.pid www 2273 0,0 0,0 97932 7652 - D 11:14AM 0:00,02 /usr/local/sbin/ 2273 vofflock - D 0:00,02 /usr/local/sbin/httpd -f /usr/local/etc/apache24/httpd_ssl.conf -c PidFile /var/run/httpd.ssl.pid www 2831 0,0 0,0 97932 7660 - D 11:14AM 0:00,02 /usr/local/sbin/ 2831 vofflock - D 0:00,02 /usr/local/sbin/httpd -f /usr/local/etc/apache24/httpd_ssl.conf -c PidFile /var/run/httpd.ssl.pid www 3627 0,0 0,0 97932 7652 - D 11:16AM 0:00,02 /usr/local/sbin/ 3627 vofflock - D 0:00,02 /usr/local/sbin/httpd -f /usr/local/etc/apache24/httpd_ssl.conf -c PidFile /var/run/httpd.ssl.pid www 3634 0,0 0,0 97932 7612 - D 11:16AM 0:00,01 /usr/local/sbin/ 3634 vofflock - D 0:00,01 /usr/local/sbin/httpd -f /usr/local/etc/apache24/httpd_ssl.conf -c PidFile /var/run/httpd.ssl.pid www 3635 0,0 0,0 97932 7588 - D 11:16AM 0:00,00 /usr/local/sbin/ 3635 vofflock - D 0:00,00 /usr/local/sbin/httpd -f /usr/local/etc/apache24/httpd_ssl.conf -c PidFile /var/run/httpd.ssl.pid www 4912 0,0 0,3 158388 71048 - D 11:18AM 0:00,05 /usr/local/sbin/ 4912 suspfs - D 0:00,05 /usr/local/sbin/httpd -f /usr/local/etc/apache24/httpd.conf -c PidFile /var/run/httpd.users.pid www 4913 0,0 0,3 158388 70960 - D 11:18AM 0:00,05 /usr/local/sbin/ 4913 suspfs - D 0:00,05 /usr/local/sbin/httpd -f /usr/local/etc/apache24/httpd.conf -c PidFile /var/run/httpd.users.pid www 5258 0,0 0,3 158388 71228 - D 11:19AM 0:00,09 /usr/local/sbin/ 5258 suspfs - D 0:00,09 /usr/local/sbin/httpd -f /usr/local/etc/apache24/httpd.conf -c PidFile /var/run/httpd.users.pid www 5361 0,0 0,0 97932 7624 - D 11:19AM 0:00,01 /usr/local/sbin/ 5361 vofflock - D 0:00,01 /usr/local/sbin/httpd -f /usr/local/etc/apache24/httpd_ssl.conf -c PidFile /var/run/httpd.ssl.pid www 5362 0,0 0,0 97932 7624 - D 11:19AM 0:00,01 /usr/local/sbin/ 5362 vofflock - D 0:00,01 /usr/local/sbin/httpd -f /usr/local/etc/apache24/httpd_ssl.conf -c PidFile /var/run/httpd.ssl.pid www 5381 0,0 0,3 158388 70952 - D 11:19AM 0:00,03 /usr/local/sbin/ 5381 suspfs - D 0:00,03 /usr/local/sbin/httpd -f /usr/local/etc/apache24/httpd.conf -c PidFile /var/run/httpd.users.pid www 5382 0,0 0,3 158388 70944 - D 11:19AM 0:00,04 /usr/local/sbin/ 5382 suspfs - D 0:00,04 /usr/local/sbin/httpd -f /usr/local/etc/apache24/httpd.conf -c PidFile /var/run/httpd.users.pid latokar 5424 0,0 0,0 41736 3180 - Ds 11:19AM 0:01,55 ftpd: 31.129.249 5424 suspfs - Ds 0:01,55 ftpd: XXX.XXX.XXX.XXX: user/latokar: STOR icons.php\r\n (ftpd) www 5431 0,0 0,3 158388 70972 - D 11:19AM 0:00,02 /usr/local/sbin/ 5431 suspfs - D 0:00,02 /usr/local/sbin/httpd -f /usr/local/etc/apache24/httpd.conf -c PidFile /var/run/httpd.users.pid ..... procstat -kk 1395: PID TID COMM TDNAME KSTACK 1395 102111 httpd - mi_switch+0xe1 sleepq_wait+0x3a _sleep+0x287 vn_start_write_locked+0xa7 vn_start_write+0xa3 vn_write+0xb0 vn_io_fault_doio+0x22 vn_io_fault1+0x1ac vn_io_fault+0x18b dofilewrite+0x87 kern_writev+0x68 sys_write+0x63 amd64_syscall+0x40f Xfast_syscall+0xfb procstat -kk 2273: PID TID COMM TDNAME KSTACK 2273 100096 httpd - mi_switch+0xe1 sleepq_wait+0x3a _sleep+0x287 foffset_lock+0xaa vn_io_fault+0x5c dofilewrite+0x87 kern_writev+0x68 sys_write+0x63 amd64_syscall+0x40f Xfast_syscall+0xfb I don't know how to repeat. It occasionally happens at the different servers.
Same issue here. It happens randomly, for example during tar, or rsync: procstat -kk 66610 PID TID COMM TDNAME KSTACK 66610 101290 bsdtar - mi_switch+0xe1 sleepq_wait+0x3a _sleep+0x287 vnode_create_vobject+0x100 ufs_open+0x6d VOP_OPEN_APV+0xa1 vn_open_vnode+0x234 vn_open_cred+0x36a kern_openat+0x26f amd64_syscall+0x40f Xfast_syscall+0xfb fstat -p 66610 USER CMD PID FD MOUNT INUM MODE SZ|DV R/W root bsdtar 66610 text /usr 903274 -r-xr-xr-x 58392 r root bsdtar 66610 wd - 290045184 d--------- 512 r root bsdtar 66610 root / 2 drwxr-xr-x 512 r root bsdtar 66610 0* pipe fffff804efe05000 <-> fffff804efe05160 0 rw root bsdtar 66610 1* pipe fffff8000e2d1730 <-> fffff8000e2d15d0 0 rw root bsdtar 66610 2* pipe fffff804e9480448 <-> fffff804e94802e8 0 rw root bsdtar 66610 3 - 290045184 d--------- 512 r root bsdtar 66610 4 - 293018454 drwxr-xr-x 15111168 r root bsdtar 66610 5 - 293018454 drwxr-xr-x 15111168 r
It's possible this issue is the same as https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204764
Te same problem is under 11.0-RELEASE.
For me it's resolved when I upgrade to 10.3-STABLE
I mean 10-STABLE
uname -a: FreeBSD hostname 11.0-RELEASE-p1 FreeBSD 11.0-RELEASE-p1 #0 r306420: Thu Sep 29 01:43:23 UTC 2016 root@releng2.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC amd64 ps auxwwO wchan: user123 21246 0,0 0,2 192920 45800 - D 18:20 0:00,19 /usr/local/bin/p 21246 suspfs - D 0:00,19 /usr/local/bin/php-cgi root 21442 0,0 0,0 8396 2676 - Ds 18:21 0:00,01 find -H /tmp -na 21442 suspfs - Ds 0:00,01 find -H /tmp -name sess_* -mtime +1h -delete user234 21669 0,0 0,0 11544 3420 - D 18:21 0:00,02 unzip -ao /tmp/f 21669 suspfs - D 0:00,02 unzip -ao /tmp/fm/55BA42987CD9F1D024546F689963E25C/dist.zip -d /tmp/fm/55BA42987CD9F1D024546F689963E25C www 22188 0,0 0,2 146188 62200 - D 18:23 0:00,24 /usr/local/sbin/ 22188 suspfs - D 0:00,24 /usr/local/sbin/httpd -f /usr/local/etc/apache24/httpd.conf -c PidFile /var/run/httpd.users.pid www 22439 0,0 0,2 146188 61772 - D 18:24 0:00,03 /usr/local/sbin/ 22439 suspfs - D 0:00,03 /usr/local/sbin/httpd -f /usr/local/etc/apache24/httpd.conf -c PidFile /var/run/httpd.users.pid www 22442 0,0 0,2 146188 61768 - D 18:24 0:00,02 /usr/local/sbin/ 22442 suspfs - D 0:00,02 /usr/local/sbin/httpd -f /usr/local/etc/apache24/httpd.conf -c PidFile /var/run/httpd.users.pid www 22789 0,0 0,2 146188 61780 - D 18:26 0:00,03 /usr/local/sbin/ 22789 suspfs - D 0:00,03 /usr/local/sbin/httpd -f /usr/local/etc/apache24/httpd.conf -c PidFile /var/run/httpd.users.pid www 22790 0,0 0,2 146188 61772 - D 18:26 0:00,02 /usr/local/sbin/ 22790 suspfs - D 0:00,02 /usr/local/sbin/httpd -f /usr/local/etc/apache24/httpd.conf -c PidFile /var/run/httpd.users.pid www 22799 0,0 0,2 146188 61800 - D 18:26 0:00,03 /usr/local/sbin/ 22799 suspfs - D 0:00,03 /usr/local/sbin/httpd -f /usr/local/etc/apache24/httpd.conf -c PidFile /var/run/httpd.users.pid www 22802 0,0 0,2 146188 61764 - D 18:26 0:00,02 /usr/local/sbin/ 22802 suspfs - D 0:00,02 /usr/local/sbin/httpd -f /usr/local/etc/apache24/httpd.conf -c PidFile /var/run/httpd.users.pid user345 22821 0,0 0,0 44148 3504 - Ds 18:26 0:00,12 ftpd: bzq-79-183 22821 range - Ds 0:00,12 ftpd: ???.red.bezeqint.net: user/user345: STOR pack-473a0372073c2c7baee9ef960158fa0d3fa750e4.idx\r\n (ftpd) www 22834 0,0 0,2 146188 61764 - D 18:26 0:00,02 /usr/local/sbin/ 22834 suspfs - D 0:00,02 /usr/local/sbin/httpd -f /usr/local/etc/apache24/httpd.conf -c PidFile /var/run/httpd.users.pid www 22935 0,0 0,2 146188 61828 - D 18:26 0:00,04 /usr/local/sbin/ 22935 vofflock - D 0:00,04 /usr/local/sbin/httpd -f /usr/local/etc/apache24/httpd.conf -c PidFile /var/run/httpd.users.pid www 23009 0,0 0,2 146188 61772 - D 18:27 0:00,01 /usr/local/sbin/ 23009 vofflock - D 0:00,01 /usr/local/sbin/httpd -f /usr/local/etc/apache24/httpd.conf -c PidFile /var/run/httpd.users.pid ..... procstat -kk 22188: PID TID COMM TDNAME KSTACK 22188 100207 httpd - mi_switch+0xd2 sleepq_wait+0x3a _sleep+0x2a1 vn_start_write_locked+0xa6 vn_start_write+0xdf vn_close+0x5b vn_closefile+0x4a _fdrop+0x1a closef+0x2d4 closefp+0xb6 amd64_syscall+0x4ce Xfast_syscall+0xfb procstat -kk 22935: PID TID COMM TDNAME KSTACK 22935 100997 httpd - mi_switch+0xd2 sleepq_wait+0x3a _sleep+0x2a1 foffset_lock+0xda vn_io_fault+0x5a dofilewrite+0x87 kern_writev+0x68 sys_write+0x84 amd64_syscall+0x4ce Xfast_syscall+0xfb
The same problem is under 11.1-RELEASE.
(In reply to vvv from comment #7) You either use journaled soft updates, or your disk controller stopped processing the io requests. If you do use journaling, try to switch to plain soft updates.
It isn't a problem of controller, because the behavior is randomly observed at different servers with different hardware. Yes, SU+J is enabled. Is it a known problem? Disabling SU+J is undesirable because fsck will take a very long time on unclean file systems. But I'll try.
Try sysctl -w vfs.lookup_shared=0 In our case it helped, lockup was in lockmgr due to heavy nfs load.
Thanks. I'll try.
vfs.lookup_shared=0 didn't help. Trying to disable journaling and leave plain SU.
Did disabling SU+J help?
I've disabled soft update journaling (-j) and left soft updates (-n) enabled at two servers. Now they works fine.
I've got the problem with disabled soft update journaling and enabled soft updates: tunefs: soft updates: (-n) enabled tunefs: soft update journaling: (-j) disabled So, disabling SU+J didn't help.