Bug 195262

Summary: [lor] Possibly two LORs: entropy harvest mutex and scrlock, and entropy harvest mutex and sleepq chain
Product: Base System Reporter: Ellis H. Wilson III <ellisw>
Component: kernAssignee: freebsd-bugs (Nobody) <bugs>
Status: Closed Overcome By Events    
Severity: Affects Many People CC: delphij, ellisw, jhb, markm, rpokala
Priority: ---    
Version: 10.1-RELEASE   
Hardware: amd64   
OS: Any   
Attachments:
Description Flags
A possible fix none

Description Ellis H. Wilson III 2014-11-21 20:58:53 UTC
I get the following output from WITNESS detecting a LOR for possibly two LORs relating to the harvest mutex on bootup.  This is an almost perfectly standard boot of 10.1-RELEASE with the exception that I've enabled WITNESS.  Please note that I have NOT enabled WITNESS_SKIPSPIN.  That seems to be the widely accepted band-aid for issues that look like this.  Such a solution, if you can call it such, is not acceptable at my company.  I'm happy to do the fix for this bug, but I wanted to open this at least so it's on people's radars, and to solicit feedback from the community if this is an absolutely unavoidable LOR (and why we can't just mark the lock BLESSED if so).

Selected output from my VM running 10.1 with WITNESS:

<snip>
 atrtc0: <AT realtime clock> at port 0x70 irq 8 on isa0
 Event timer "RTC" frequency 32768 Hz quality 0
 ppc0: cannot reserve I/O port range
 Timecounters tick every 10.000 msec
 pcm0: measured ac97 link rate at 30976 Hz
 lock order reversal:
 lock order reversal:
  1st 0xffffffff81633d88 entropy harvest mutex (entropy harvest mutex) @ /usr/src/sys/dev/random/random_harvestq.c:198
  2nd 0xffffffff813b6208 scrlock (scrlock) @ /usr/src/sys/dev/syscons/syscons.c:2682
 KDB: stack backtrace:
 #0 0xffffffff808b6c40 at kdb_backtrace+0x60
#1 0xffffffff808ca56a at witness_checkorder+0xb8a
#2 0xffffffff808714ae at __mtx_lock_spin_flags+0x4e
#3 0xffffffff806db7d0 at sc_puts+0xb0
#4 0xffffffff806dee55 at sc_cnputc+0xe5
#5 0xffffffff80842c7f at cnputc+0x7f
#6 0xffffffff80842f18 at cnputs+0x58
#7 0xffffffff808bba57 at putchar+0x137
#8 0xffffffff808ba7ea at kvprintf+0xda
#9 0xffffffff808bc057 at vprintf+0x87
#10 0xffffffff808bbfc3 at printf+0x43
#11 0xffffffff808ca35e at witness_checkorder+0x97e
#12 0xffffffff808714ae at __mtx_lock_spin_flags+0x4e
#13 0xffffffff8088a29b at msleep_spin_sbt+0x5b
#14 0xffffffff8063f0b8 at random_kthread+0x78
#15 0xffffffff80858b11 at fork_exit+0x71
#16 0xffffffff80c2143e at fork_trampoline+0xe
 1st 0xffffffff81633d88 entropy harvest mutex (entropy harvest mutex) @ /usr/src/sys/dev/random/random_harvestq.c:198
 2nd 0xffffffff81424bb8 sleepq chain (sleepq chain) @ /usr/src/sys/kern/subr_sleepqueue.c:240
KDB: stack backtrace:
#0 0xffffffff808b6c40 at kdb_backtrace+0x60
#1 0xffffffff808ca56a at witness_checkorder+0xb8a
#2 0xffffffff808714ae at __mtx_lock_spin_flags+0x4e
#3 0xffffffff8088a29b at msleep_spin_sbt+0x5b
#4 0xffffffff8063f0b8 at random_kthread+0x78
#5 0xffffffff80858b11 at fork_exit+0x71
#6 0xffffffff80c2143e at fork_trampoline+0xe
usbus0: 12Mbps Full Speed USB v1.0
ugen0.1: <Apple> at usbus0
uhub0: <Apple OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus0
ada0 at ata0 bus 0 scbus0 target 0 lun 0
ada0: <VBOX HARDDISK 1.0> ATA-6 device
ada0: Serial Number VB96c35aac-e62dbb5d
ada0: 33.300MB/s transfers (UDMA2, PIO 65536bytes)
ada0: 20480MB (41943040 512 byte sectors: 16H 63S/T 16383C)
ada0: Previously was known as ad0
<snip> 

I have it panicking into DDB now, so if there are suggestions on best strategies for tackling LORs I'm all ears.
Comment 1 Ellis H. Wilson III 2014-11-21 21:02:48 UTC
Also, as a brief explanation as to why I marked it "Affects Many People," I have been able to reproduce this bug on real hardware and on 10.0 as have my colleagues in our testing of FreeBSD 10.0 and 10.1.  So I am strongly led to believe the only reason Many People aren't seeing it is because of the decision to default to testing with WITNESS + WITNESS_SKIPSPIN.  That doesn't mean this bug isn't affecting each and every one of them, it sure is; it simply may not be causing any major problems yet.
Comment 2 Xin LI freebsd_committer freebsd_triage 2014-11-21 22:22:22 UTC
Created attachment 149695 [details]
A possible fix
Comment 3 Xin LI freebsd_committer freebsd_triage 2014-11-21 22:26:26 UTC
I think these are false positives.

harvest_mtx is a spinning mutex that is only acquired in random_harvestq.c, and it's already done quite carefully.

The problem is raised when calling msleep_spin_sbt(), which in turn tries to acquire sleepq chain lock (sys/kern/subr_sleepqueue.c).  In witness, the "blessed" order is sleepq chain and then entropy harvest mutex, this is not right (the system does not poke with entropy harvesting when manipulating sleepq chain), so you see the second LOR warning.

When it tries to print the LOR warning, the code eventually calls printf() which in turn would go to syscons(4) where scrlock is acquired.  The defined lock order wants scrlock be acquired before harvest_mtx and therefore you would see the first LOR warning.

Could you please try the attached patch and see if it solves the problem?  What it does is to move entropy mutex slightly higher, allowing it to be held before acquiring sleepq chain lock.  This would eliminate the (false) LOR warning and make the first LOR go away at the same time.

Adding jhb@ and markm@ for review.
Comment 4 Ellis H. Wilson III 2014-11-24 17:40:35 UTC
Many thanks to Xin Li for the rapid patch and excellent explanation!  I can confirm with the attached patch applied to my 10.1 release bits, and the kernel recompiled and reinstalled, we boot without panic.  Tested via ten reboots of my VM instance of 10.1 with the patch, with both clean and unclean reboots.

I'm not sure what protocol is for declaring this fixed or not as this is my first bug report for FreeBSD.  So I apologize if I should be marking this closed or what.  Please advise.

Last, it would be ideal in my (perhaps naive) mind if we could return to a default kernel configuration without WITNESS_SKIPSPIN so these bugs are given attention naturally, rather than people having to run a non-standard configuration in order to expose them.  Thoughts?
Comment 5 Ellis H. Wilson III 2015-02-25 21:06:05 UTC
I believe we have a patch here that works, at least per my testing.  What are the correct next steps in terms of moving this along?
Comment 6 John Baldwin freebsd_committer freebsd_triage 2015-02-25 21:47:01 UTC
If this fixes the LOR report, then this looks good to me.  Xin, can you commit this?
Comment 7 Xin LI freebsd_committer freebsd_triage 2015-02-25 23:37:03 UTC
(In reply to John Baldwin from comment #6)
The code in question have received a major refactor and the issue no longer exists (in r262496: mutex is no longer held in random_kthread before sleeping).

In the new world order, the spin lock is only acquired as a leaf mutex in random_harvestq_internal() and that would avoided all LOR cases (I would probably move get_cyclecount() out from the locked scope but that's a different issue).

So I think the _right_ fix is probably just merge the refactor back to stable/10 instead.
Comment 8 John Baldwin freebsd_committer freebsd_triage 2015-02-26 14:40:14 UTC
Ok, that sound good to me then.
Comment 9 Ravi Pokala 2016-01-11 15:32:48 UTC
Since this no longer impacts -HEAD, but *does* impact stable/10, does anyone object to me applying Xin's patch directly to stable/10? If that's okay, I'll try to do it before the 10.3 slush