Bug 227404

Summary: UP FreeBSD VM always hangs on reboot since 20180329-r331740
Product: Base System Reporter: Dexuan Cui <decui>
Component: kernAssignee: Tijl Coosemans <tijl>
Status: Closed FIXED    
Severity: Affects Some People CC: avg, bde, bobf, bugs, cy, jeff, markj, tijl, work+freebsd
Priority: ---    
Version: CURRENT   
Hardware: amd64   
OS: Any   

Description Dexuan Cui 2018-04-10 01:13:14 UTC
It looks on Hyper-V the issue only happens to UP VM (SMP VM is good), but the VM hangs in the generic FreeBSD kernel code, so I suspect the issue should reproduce on bare metal as well.

More info:
FreeBSD-12.0-CURRENT-amd64-20180322-r331345-disc1.iso is good, but
FreeBSD-12.0-CURRENT-amd64-20180329-r331740-disc1.iso is broken.


These are the serial messages:

Tue Apr 10 01:52:02 CST 2018

FreeBSD/amd64 (decui-b11) (ttyu0)

login: Stopping cron.
Stopping sshd.
appending output to nohup.out
Stopping devd.
Writing entropy file:.
Writing early boot entropy file:.
Terminated
.
Apr 10 01:52:10 decui-b11 syslogd: exiting on signal 15
Waiting (max 60 seconds) for system process `vnlru' to stop... done
Waiting (max 60 seconds) for system process `bufdaemon' to stop... done
Waiting (max 60 seconds) for system process `syncer' to stop...
Syncing disks, vnodes remaining... 4 NMI ISA 0, EISA ff
NMI/cpu0 ... going to debugger
[ thread pid 18 tid 100055 ]
Stopped at      witness_lock+0x1ae:     popq    %rbp
db> bt
Tracing pid 18 tid 100055 td 0xfffff8000b052000
witness_lock() at witness_lock+0x1ae/frame 0xfffffe0026553930
sched_add() at sched_add+0xe3/frame 0xfffffe0026553970
setrunnable() at setrunnable+0x8f/frame 0xfffffe0026553990
sleepq_broadcast() at sleepq_broadcast+0xe8/frame 0xfffffe00265539d0
wakeup() at wakeup+0x1d/frame 0xfffffe00265539f0
kproc_suspend_check() at kproc_suspend_check+0x58/frame 0xfffffe0026553a20
buf_daemon() at buf_daemon+0x11a/frame 0xfffffe0026553a70
fork_exit() at fork_exit+0x84/frame 0xfffffe0026553ab0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0026553ab0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
db>
Comment 1 Dexuan Cui 2018-04-10 01:16:23 UTC
When the issue happens, the cpu utilization of the UP VM is 100%.

While we're trying to find the first bad revision, it would be great if somebody can report if the issue also happens to bare metal or other hypervisors.
Comment 2 Dexuan Cui 2018-04-10 01:23:13 UTC
When the hang issue happens, the last serial message is the line
"Syncing disks, vnodes remaining... 4 ".

Later I sent an NMI interrupt to the VM by Hyper-V PowerShell cmdlet, and the kernel printed out the later lines showing the call-stack.
Comment 3 Andriy Gapon freebsd_committer freebsd_triage 2018-04-10 05:28:12 UTC
(In reply to Dexuan Cui from comment #0)
I have observed a similar (perhaps the same issue) on real multi-core hardware but with the SMP kernel (GENERIC) when all APs were disabled via hints (hint.lapic.%d.disabled=1).

The reboot got stuck and a thread doing it was in this state:
db> bt
Tracing pid 1 tid 100002 td 0xfffff80003230560
sched_switch() at sched_switch+0x95c/frame 0xfffffe00213b9880
mi_switch() at mi_switch+0x188/frame 0xfffffe00213b98b0
sleepq_switch() at sleepq_switch+0x10d/frame 0xfffffe00213b98f0
sleepq_timedwait() at sleepq_timedwait+0x50/frame 0xfffffe00213b9930
_sleep() at _sleep+0x2fa/frame 0xfffffe00213b99d0
kproc_suspend() at kproc_suspend+0x9a/frame 0xfffffe00213b9a00
kproc_shutdown() at kproc_shutdown+0x4e/frame 0xfffffe00213b9a20
kern_reboot() at kern_reboot+0x19e/frame 0xfffffe00213b9a60
sys_reboot() at sys_reboot+0x3ab/frame 0xfffffe00213b9ac0
amd64_syscall() at amd64_syscall+0x79b/frame 0xfffffe00213b9bf0
fast_syscall_common() at fast_syscall_common+0x105/frame 0xfffffe00213b9bf0

I didn't examine other threads much.
Comment 4 Dexuan Cui 2018-04-10 07:22:27 UTC
I think the first bad patch is this one:
https://github.com/freebsd/freebsd/commit/63a483ed5f4eaadb8979992c7a5de24c7a471c61 (Fix a bug introduced in r329612 that slowly invalidates all clean bufs.): 

Today's https://github.com/freebsd/freebsd/commit/66e8725e8d24141506bc4f458ec7d1a51e86304c is broken, but if I revert 63a483ed5f4eaadb8979992c7a5de24c7a471c61, the bug can not reproduce.

Cc bde & jeff.
Comment 5 Dexuan Cui 2018-04-10 07:26:24 UTC
(In reply to Dexuan Cui from comment #4)
When the bug reproduces, the log is:

Stopping cron.
Stopping sshd.
appending output to nohup.out
Stopping devd.
Writing entropy file:.
Writing early boot entropy file:.
Terminated
.
Apr 10 14:46:40 decui-b11 syslogd: exiting on signal 15
Waiting (max 60 seconds) for system process `vnlru' to stop... done
Waiting (max 60 seconds) for system process `bufdaemon' to stop... done
Waiting (max 60 seconds) for system process `syncer' to stop...
Syncing disks, vnodes remaining... 4
(It hangs here)


After I revert 63a483ed5f4eaadb8979992c7a5de24c7a471c61, the bug can't reproduce, despite the messages:

Apr 10 14:28:44 decui-b11 syslogd: exiting on signal 15
Waiting (max 60 seconds) for system process `vnlru' to stop... done
Waiting (max 60 seconds) for system process `bufdaemon' to stop... done
Waiting (max 60 seconds) for system process `syncer' to stop...
Syncing disks, vnodes remaining... 4 1 0 done
All buffers synced.
lock order reversal:
 1st 0xfffff80008533ba8 ufs (ufs) @ /root/bsd.git/sys/kern/vfs_mount.c:1335
 2nd 0xfffff800085d0428 syncer (syncer) @ /root/bsd.git/sys/kern/vfs_subr.c:2732
stack backtrace:
#0 0xffffffff80bccfa3 at witness_debugger+0x73
#1 0xffffffff80bcce24 at witness_checkorder+0xe34
#2 0xffffffff80b3bb9b at lockmgr_lock_fast_path+0x17b
#3 0xffffffff8119d069 at VOP_LOCK1_APV+0xd9
#4 0xffffffff80c488a6 at _vn_lock+0x66
#5 0xffffffff80c379a7 at vputx+0x157
#6 0xffffffff80c2f7d9 at dounmount+0x4d9
#7 0xffffffff80c3919b at vfs_unmountall+0x6b
#8 0xffffffff80c14a25 at bufshutdown+0x2c5
#9 0xffffffff80b66d7a at kern_reboot+0x21a
#10 0xffffffff80b66b09 at sys_reboot+0x3a9
#11 0xffffffff8102706b at amd64_syscall+0x79b
#12 0xffffffff8100191d at fast_syscall_common+0x101
lock order reversal:
 1st 0xfffff80008533ba8 ufs (ufs) @ /root/bsd.git/sys/kern/vfs_mount.c:1335
 2nd 0xfffff800085d07e8 devfs (devfs) @ /root/bsd.git/sys/ufs/ffs/ffs_vfsops.c:1371
stack backtrace:
#0 0xffffffff80bccfa3 at witness_debugger+0x73
#1 0xffffffff80bcce24 at witness_checkorder+0xe34
#2 0xffffffff80b3bb9b at lockmgr_lock_fast_path+0x17b
#3 0xffffffff8119d069 at VOP_LOCK1_APV+0xd9
#4 0xffffffff80c488a6 at _vn_lock+0x66
#5 0xffffffff80e67a93 at ffs_flushfiles+0x93
#6 0xffffffff80e4adf2 at softdep_flushfiles+0x82
#7 0xffffffff80e6a147 at ffs_unmount+0x77
#8 0xffffffff80c2f819 at dounmount+0x519
#9 0xffffffff80c3919b at vfs_unmountall+0x6b
#10 0xffffffff80c14a25 at bufshutdown+0x2c5
#11 0xffffffff80b66d7a at kern_reboot+0x21a
#12 0xffffffff80b66b09 at sys_reboot+0x3a9
#13 0xffffffff8102706b at amd64_syscall+0x79b
#14 0xffffffff8100191d at fast_syscall_common+0x101
Uptime: 2m50s
acpi0: Powering system off



BTW, when the patch is reverted, I occasionally get this when the VM boots, but I guess that's a different issue:

(da1:storvsc2:0:0:0): storvsc inquiry (6) [0 b2 0 4 1 ... ]
da2 at storvsc2 bus 0 scbus4 target 0 lun 2
da2: <Msft Virtual Disk 1.0> Fixed Direct Access SPC-3 SCSI device
da2: 300.000MB/s transfers
da2: Command Queueing enabled
da2: 51200MB (104857600 512 byte sectors)
s_debugger+0x73
#1 0xffffffff80(da2:storvsc2:0:0:2): storvsc inquiry (6) [0 b2 0 4 1 ... ]
(da1:storvsc2:0:0:0): storvsc inquiry (5) [0 b0 0 3c 0 ... ]
bce381 at witness_warn+0x461
#2 0xffffffff81026273 at trap_pfa(da2:storvsc2:0:0:2): storvsc inquiry (5) [0 b0 0 3c 0 ... ]
(da1:storvsc2:0:0:0): storvsc inquiry (5) [0 b1 0 3c 0 ... ]
da1: Delete methods: <UNMAP(*),ZERO>
ult+0x53
#3 0xffffffff81025a72 (da2:storvsc2:0:0:2): storvsc inquiry (5) [0 b1 0 3c 0 ... ]
da2: Delete methods: <UNMAP(*),ZERO>
at trap+0x2f2
#4 0xffffffff810010cc at calltrap+0x8
#5 0xffffffff80c1af78 at vfs_vmio_unwire+0x78
#6 0xffffffff80c16350 at bGEOM: new disk da2
relse+0x3c0
#7 0xffffffff80e6af3a at ffs_use_bread+0x9a
#8 0xffffffff80e6659c at ffs_sbget+0x8c
#9 0xffffffff80e69213 at ffs_mount+0xe03
#10 0xffffffff80c2e449 at vfs_domount+0x719
#11 0xffffffff80c2d727 at vfs_donmount+0x7f7
#12 0xffffffff80c30a32 at kernel_mount+0x62
#13 0xffffffff80c32ddd at parse_mount+0x43d
#14 0xffffffff80c3150c at vfs_mountroot+0x68c
#15 0xffffffff80afe567 at start_init+0x27
#16 0xffffffff80b277b4 at fork_exit+0x84
#17 0xffffffff81001dee at fork_trampoline+0xe


Fatal trap 12: page fault while in kernel mode
cpuid = 12; apic id = 0c
fault virtual address   = 0x0
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff80ea6081
stack pointer           = 0x28:0xfffffe002d0b1140
frame pointer           = 0x28:0xfffffe002d0b1150
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 1 (kernel)
[ thread pid 1 tid 100002 ]
Stopped at      _vm_page_deactivate+0xb1:       cmpq    %rcx,(%rax)
db> bt
Tracing pid 1 tid 100002 td 0xfffff800032e0560
_vm_page_deactivate() at _vm_page_deactivate+0xb1/frame 0xfffffe002d0b1150
vfs_vmio_unwire() at vfs_vmio_unwire+0x78/frame 0xfffffe002d0b1170
brelse() at brelse+0x3c0/frame 0xfffffe002d0b11c0
ffs_use_bread() at ffs_use_bread+0x9a/frame 0xfffffe002d0b1200
ffs_sbget() at ffs_sbget+0x8c/frame 0xfffffe002d0b1270
ffs_mount() at ffs_mount+0xe03/frame 0xfffffe002d0b1410
vfs_domount() at vfs_domount+0x719/frame 0xfffffe002d0b1630
vfs_donmount() at vfs_donmount+0x7f7/frame 0xfffffe002d0b16e0
kernel_mount() at kernel_mount+0x62/frame 0xfffffe002d0b1740
parse_mount() at parse_mount+0x43d/frame 0xfffffe002d0b1880
vfs_mountroot() at vfs_mountroot+0x68c/frame 0xfffffe002d0b1a00
start_init() at start_init+0x27/frame 0xfffffe002d0b1a70
fork_exit() at fork_exit+0x84/frame 0xfffffe002d0b1ab0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe002d0b1ab0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
db>
Comment 6 Tijl Coosemans freebsd_committer freebsd_triage 2018-04-18 09:31:20 UTC
I ran into this a couple of times now on a UP machine.  The problem is that base r329612 (jeff) turned the bufspace daemon process into a thread of the buf daemon process.  Now both threads call kproc_suspend_check with the same proc argument and it contains the following while loop:

while (SIGISMEMBER(p->p_siglist, SIGSTOP)) {
	wakeup(&p->p_siglist);
	msleep(&p->p_siglist, &p->p_mtx, PPAUSE, "kpsusp", 0);
}

So one thread wakes up the other and the other wakes up the first again, starving other processes like the syncer.  On an SMP system other processes can still run on another CPU.  I suspect one or both daemons need to be converted to use kthread_suspend and kthread_suspend_check instead, but I cannot reboot this machine right now to test this.
Comment 7 Cy Schubert freebsd_committer freebsd_triage 2018-04-21 22:27:19 UTC
+1 on a Penium-M laptop. I'll try the patch sometime this weekend or early next week.
Comment 8 commit-hook freebsd_committer freebsd_triage 2018-04-22 16:05:42 UTC
A commit references this bug:

Author: tijl
Date: Sun Apr 22 16:05:30 UTC 2018
New revision: 332871
URL: https://svnweb.freebsd.org/changeset/base/332871

Log:
  Make bufdaemon and bufspacedaemon use kthread_suspend_check instead of
  kproc_suspend_check.  In r329612 bufspacedaemon was turned into a thread
  of the bufdaemon process causing both to call kproc_suspend_check with the
  same proc argument and that function contains the following while loop:

  while (SIGISMEMBER(p->p_siglist, SIGSTOP)) {
  	wakeup(&p->p_siglist);
  	msleep(&p->p_siglist, &p->p_mtx, PPAUSE, "kpsusp", 0);
  }

  So one thread wakes up the other and the other wakes up the first again,
  locking up UP machines on shutdown.

  Also register the shutdown handlers with SHUTDOWN_PRI_LAST + 100 so they
  run after the syncer has shutdown, because the syncer can cause a
  situation where bufdaemon help is needed to proceed.

  PR:		227404
  Reviewed by:	kib
  Tested by:	cy, rmacklem

Changes:
  head/sys/kern/vfs_bio.c
Comment 9 Ian Lepore freebsd_committer freebsd_triage 2018-04-22 16:31:58 UTC
*** Bug 227352 has been marked as a duplicate of this bug. ***
Comment 10 Dexuan Cui 2018-04-24 02:48:37 UTC
We can confirm this fix works for us. Thank!!!
Comment 11 Bob Frazier 2018-04-24 17:04:06 UTC
re-tested with latest -CURRENT on armv6 (where I had originally reported it)

FreeBSD pi1c 12.0-CURRENT FreeBSD 12.0-CURRENT #0 r332894: Tue Apr 24 07:43:22 PDT 2018     bobf@hack.SFT.local:/e-drive/obj.current/e-drive/src.current/arm.armv6/sys/RPI-B  arm

did not observe the hang.  however, I _did_ see this in the console:

Apr 24 16:55:56 pi1c syslogd: exiting on signal 15
Waiting (max 60 seconds) for system process `vnlru' to stop... done
Waiting (max 60 seconds) for system process `syncer' to stop... 
Syncing disks, vnodes remaining... 1 1 1 0 done
Waiting (max 60 seconds) for system thread `bufdaemon' to stop... done
Waiting (max 60 seconds) for system thread `bufspacedaemon-0' to stop... done
All buffers synced.
lock order reversal:
 1st 0xd2c34974 ufs (ufs) @ /e-drive/src.current/sys/kern/vfs_mount.c:1335
 2nd 0xd2c2fbc4 syncer (syncer) @ /e-drive/src.current/sys/kern/vfs_subr.c:2732
stack backtrace:
lock order reversal:
 1st 0xd2c34724 ufs (ufs) @ /e-drive/src.current/sys/kern/vfs_mount.c:1335
 2nd 0xd2c34034 devfs (devfs) @ /e-drive/src.current/sys/fs/msdosfs/msdosfs_vfsops.c:934
stack backtrace:
Uptime: 1m49s

The operating system has halted.
Please press any key to reboot.
Comment 12 Tijl Coosemans freebsd_committer freebsd_triage 2018-04-24 17:58:12 UTC
Yes, there are several LORs in the filesystem code (search for "LOR" in the search box at the top of this page).  The ones you see are probably harmless because at that point during shutdown no other thread will access the filesystem so deadlock cannot occur.