Summary: | Processes hangs in D state, suspfs or vofflock wchan under FreeBSD 10.X-11.X | ||
---|---|---|---|
Product: | Base System | Reporter: | vvv <work+freebsd> |
Component: | kern | Assignee: | freebsd-bugs (Nobody) <bugs> |
Status: | New --- | ||
Severity: | Affects Many People | CC: | amd64, chris, kib, lampa |
Priority: | --- | ||
Version: | 11.2-RELEASE | ||
Hardware: | amd64 | ||
OS: | Any |
Description
vvv
2016-05-10 12:02:46 UTC
Same issue here. It happens randomly, for example during tar, or rsync: procstat -kk 66610 PID TID COMM TDNAME KSTACK 66610 101290 bsdtar - mi_switch+0xe1 sleepq_wait+0x3a _sleep+0x287 vnode_create_vobject+0x100 ufs_open+0x6d VOP_OPEN_APV+0xa1 vn_open_vnode+0x234 vn_open_cred+0x36a kern_openat+0x26f amd64_syscall+0x40f Xfast_syscall+0xfb fstat -p 66610 USER CMD PID FD MOUNT INUM MODE SZ|DV R/W root bsdtar 66610 text /usr 903274 -r-xr-xr-x 58392 r root bsdtar 66610 wd - 290045184 d--------- 512 r root bsdtar 66610 root / 2 drwxr-xr-x 512 r root bsdtar 66610 0* pipe fffff804efe05000 <-> fffff804efe05160 0 rw root bsdtar 66610 1* pipe fffff8000e2d1730 <-> fffff8000e2d15d0 0 rw root bsdtar 66610 2* pipe fffff804e9480448 <-> fffff804e94802e8 0 rw root bsdtar 66610 3 - 290045184 d--------- 512 r root bsdtar 66610 4 - 293018454 drwxr-xr-x 15111168 r root bsdtar 66610 5 - 293018454 drwxr-xr-x 15111168 r It's possible this issue is the same as https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204764 Te same problem is under 11.0-RELEASE. For me it's resolved when I upgrade to 10.3-STABLE I mean 10-STABLE uname -a: FreeBSD hostname 11.0-RELEASE-p1 FreeBSD 11.0-RELEASE-p1 #0 r306420: Thu Sep 29 01:43:23 UTC 2016 root@releng2.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC amd64 ps auxwwO wchan: user123 21246 0,0 0,2 192920 45800 - D 18:20 0:00,19 /usr/local/bin/p 21246 suspfs - D 0:00,19 /usr/local/bin/php-cgi root 21442 0,0 0,0 8396 2676 - Ds 18:21 0:00,01 find -H /tmp -na 21442 suspfs - Ds 0:00,01 find -H /tmp -name sess_* -mtime +1h -delete user234 21669 0,0 0,0 11544 3420 - D 18:21 0:00,02 unzip -ao /tmp/f 21669 suspfs - D 0:00,02 unzip -ao /tmp/fm/55BA42987CD9F1D024546F689963E25C/dist.zip -d /tmp/fm/55BA42987CD9F1D024546F689963E25C www 22188 0,0 0,2 146188 62200 - D 18:23 0:00,24 /usr/local/sbin/ 22188 suspfs - D 0:00,24 /usr/local/sbin/httpd -f /usr/local/etc/apache24/httpd.conf -c PidFile /var/run/httpd.users.pid www 22439 0,0 0,2 146188 61772 - D 18:24 0:00,03 /usr/local/sbin/ 22439 suspfs - D 0:00,03 /usr/local/sbin/httpd -f /usr/local/etc/apache24/httpd.conf -c PidFile /var/run/httpd.users.pid www 22442 0,0 0,2 146188 61768 - D 18:24 0:00,02 /usr/local/sbin/ 22442 suspfs - D 0:00,02 /usr/local/sbin/httpd -f /usr/local/etc/apache24/httpd.conf -c PidFile /var/run/httpd.users.pid www 22789 0,0 0,2 146188 61780 - D 18:26 0:00,03 /usr/local/sbin/ 22789 suspfs - D 0:00,03 /usr/local/sbin/httpd -f /usr/local/etc/apache24/httpd.conf -c PidFile /var/run/httpd.users.pid www 22790 0,0 0,2 146188 61772 - D 18:26 0:00,02 /usr/local/sbin/ 22790 suspfs - D 0:00,02 /usr/local/sbin/httpd -f /usr/local/etc/apache24/httpd.conf -c PidFile /var/run/httpd.users.pid www 22799 0,0 0,2 146188 61800 - D 18:26 0:00,03 /usr/local/sbin/ 22799 suspfs - D 0:00,03 /usr/local/sbin/httpd -f /usr/local/etc/apache24/httpd.conf -c PidFile /var/run/httpd.users.pid www 22802 0,0 0,2 146188 61764 - D 18:26 0:00,02 /usr/local/sbin/ 22802 suspfs - D 0:00,02 /usr/local/sbin/httpd -f /usr/local/etc/apache24/httpd.conf -c PidFile /var/run/httpd.users.pid user345 22821 0,0 0,0 44148 3504 - Ds 18:26 0:00,12 ftpd: bzq-79-183 22821 range - Ds 0:00,12 ftpd: ???.red.bezeqint.net: user/user345: STOR pack-473a0372073c2c7baee9ef960158fa0d3fa750e4.idx\r\n (ftpd) www 22834 0,0 0,2 146188 61764 - D 18:26 0:00,02 /usr/local/sbin/ 22834 suspfs - D 0:00,02 /usr/local/sbin/httpd -f /usr/local/etc/apache24/httpd.conf -c PidFile /var/run/httpd.users.pid www 22935 0,0 0,2 146188 61828 - D 18:26 0:00,04 /usr/local/sbin/ 22935 vofflock - D 0:00,04 /usr/local/sbin/httpd -f /usr/local/etc/apache24/httpd.conf -c PidFile /var/run/httpd.users.pid www 23009 0,0 0,2 146188 61772 - D 18:27 0:00,01 /usr/local/sbin/ 23009 vofflock - D 0:00,01 /usr/local/sbin/httpd -f /usr/local/etc/apache24/httpd.conf -c PidFile /var/run/httpd.users.pid ..... procstat -kk 22188: PID TID COMM TDNAME KSTACK 22188 100207 httpd - mi_switch+0xd2 sleepq_wait+0x3a _sleep+0x2a1 vn_start_write_locked+0xa6 vn_start_write+0xdf vn_close+0x5b vn_closefile+0x4a _fdrop+0x1a closef+0x2d4 closefp+0xb6 amd64_syscall+0x4ce Xfast_syscall+0xfb procstat -kk 22935: PID TID COMM TDNAME KSTACK 22935 100997 httpd - mi_switch+0xd2 sleepq_wait+0x3a _sleep+0x2a1 foffset_lock+0xda vn_io_fault+0x5a dofilewrite+0x87 kern_writev+0x68 sys_write+0x84 amd64_syscall+0x4ce Xfast_syscall+0xfb The same problem is under 11.1-RELEASE. (In reply to vvv from comment #7) You either use journaled soft updates, or your disk controller stopped processing the io requests. If you do use journaling, try to switch to plain soft updates. It isn't a problem of controller, because the behavior is randomly observed at different servers with different hardware. Yes, SU+J is enabled. Is it a known problem? Disabling SU+J is undesirable because fsck will take a very long time on unclean file systems. But I'll try. Try sysctl -w vfs.lookup_shared=0 In our case it helped, lockup was in lockmgr due to heavy nfs load. Thanks. I'll try. vfs.lookup_shared=0 didn't help. Trying to disable journaling and leave plain SU. Did disabling SU+J help? I've disabled soft update journaling (-j) and left soft updates (-n) enabled at two servers. Now they works fine. I've got the problem with disabled soft update journaling and enabled soft updates: tunefs: soft updates: (-n) enabled tunefs: soft update journaling: (-j) disabled So, disabling SU+J didn't help. |