Summary: | Lock Order Reversals somewhere in VFS/UFS | ||
---|---|---|---|
Product: | Base System | Reporter: | lankfordandrew |
Component: | kern | Assignee: | freebsd-fs (Nobody) <fs> |
Status: | Closed Not A Bug | ||
Severity: | Affects Only Me | CC: | bjk, cem, wosch |
Priority: | --- | ||
Version: | 11.1-STABLE | ||
Hardware: | amd64 | ||
OS: | Any |
Description
lankfordandrew
2017-12-08 14:11:42 UTC
Just in case you need this: $ kldstat Id Refs Address Size Name 1 28 0xffffffff80200000 1070d08 kernel 2 1 0xffffffff81272000 4d80 coretemp.ko 3 1 0xffffffff81421000 100df tmpfs.ko 4 1 0xffffffff81432000 9fa blank_saver.ko 5 1 0xffffffff81433000 991bf i915kms.ko 6 1 0xffffffff814cd000 56589 drm2.ko 7 4 0xffffffff81524000 299f iicbus.ko 8 1 0xffffffff81527000 1cd7 iic.ko 9 1 0xffffffff81529000 1e66 iicbb.ko $ Some more after a reboot. I'm probably getting one or several right at shutdown , but I'm unsure if I can capture them. [70] lock order reversal: [70] 1st 0xfffffe007b886f80 bufwait (bufwait) @ /usr/src/sys/kern/vfs_bio.c:3550 [70] 2nd 0xfffff80002aa9400 dirhash (dirhash) @ /usr/src/sys/ufs/ufs/ufs_dirhash.c:281 [70] stack backtrace: [70] #0 0xffffffff80580bf0 at witness_debugger+0x70 [70] #1 0xffffffff80580a72 at witness_checkorder+0xe02 [70] #2 0xffffffff80527908 at _sx_xlock+0x68 [70] #3 0xffffffff807d75b7 at ufsdirhash_remove+0x37 [70] #4 0xffffffff807da779 at ufs_dirremove+0x129 [70] #5 0xffffffff807dfe95 at ufs_remove+0x75 [70] #6 0xffffffff808b60e0 at VOP_REMOVE_APV+0xe0 [70] #7 0xffffffff805ee708 at kern_unlinkat+0x1e8 [70] #8 0xffffffff8083fd08 at amd64_syscall+0x798 [70] #9 0xffffffff8081ea6b at Xfast_syscall+0xfb [218] lock order reversal: [218] 1st 0xfffff800638f89a0 ufs (ufs) @ /usr/src/sys/kern/vfs_subr.c:2537 [218] 2nd 0xfffffe007b900a60 bufwait (bufwait) @ /usr/src/sys/ufs/ffs/ffs_vnops.c:277 [218] 3rd 0xfffff800638f87c8 ufs (ufs) @ /usr/src/sys/kern/vfs_subr.c:2537 [218] stack backtrace: [218] #0 0xffffffff80580bf0 at witness_debugger+0x70 [218] #1 0xffffffff80580a72 at witness_checkorder+0xe02 [218] #2 0xffffffff804fcd83 at __lockmgr_args+0x883 [218] #3 0xffffffff807d1ea5 at ffs_lock+0xa5 [218] #4 0xffffffff808b6f30 at VOP_LOCK1_APV+0xe0 [218] #5 0xffffffff805f4546 at _vn_lock+0x66 [218] #6 0xffffffff805e4dc2 at vget+0x82 [218] #7 0xffffffff805d6fb1 at vfs_hash_get+0xd1 [218] #8 0xffffffff807cdb4e at ffs_vgetf+0x3e [218] #9 0xffffffff807c3da9 at softdep_sync_buf+0x7d9 [218] #10 0xffffffff807d2dc1 at ffs_syncvnode+0x321 [218] #11 0xffffffff807c2dc7 at softdep_fsync+0x4b7 [218] #12 0xffffffff807d1d7c at ffs_fsync+0x7c [218] #13 0xffffffff808b5f80 at VOP_FSYNC_APV+0xe0 [218] #14 0xffffffff805f112c at kern_fsync+0x1bc [218] #15 0xffffffff8083fd08 at amd64_syscall+0x798 [218] #16 0xffffffff8081ea6b at Xfast_syscall+0xfb While the enthusiasm at reporting what appears to be a bug is appreciated, the reality seems to be that all of these LORs are "well-known" and presumed harmless (though, IIRC, not thoroughly analyzed and proven to be harmless). http://sources.zabbadoz.net/freebsd/lor.html is no longer up-to-date, but is still a good place to get a handle on what has been seen before. I'll close this bug, as I don't think there is anything actionable here. Okidoke. Seem to be lots and lots of these LORs anyway. But thanks for the link. I must admit that these LOR kernel message looks scary if you see them for the first time. But if these messages are mostly harmless, why doesn't we say this? My proposed change request: instead: lock order reversal: 1st 0xfffffe0000cbf2c0 bufwait (bufwait) @ /home/projects/freebsd/sys/kern/vfs_bio.c:3564 2nd 0xfffff8001c445800 dirhash (dirhash) @ /home/projects/freebsd/sys/ufs/ufs/ufs_dirhash.c:289 stack backtrace: we display lock order reversal (mostly harmless, see https://www.freebsd.org/doc/handbook/freebsd-glossary.html#lor-glossary ) 1st 0xfffffe0000cbf2c0 bufwait (bufwait) @ /home/projects/freebsd/sys/kern/vfs_bio.c:3564 2nd 0xfffff8001c445800 dirhash (dirhash) @ /home/projects/freebsd/sys/ufs/ufs/ufs_dirhash.c:289 stack backtrace: If we could identify the harmless ones accurately, we may as well not print them at all. (In reply to Wolfram Schneider from comment #5) The idea is that those messages can be useful when you get and report a real problem (crash, hang, etc). Reporting the messages themselves is almost never useful if you do not have a problem. (In reply to Andriy Gapon from comment #7) 99.9% of these kernel warnings are false positives. We scare our users. We shouldn't. I see two options to solve the issue: 1. "If we could identify the harmless ones accurately, we may as well not print them at all." (Conrad) 2. tell the users that they can ignore these warnings as long as the machine runs fine. I prefer 1), but in the meantime we should do 2) |