Bug 65448

Summary:	[mutex] _mtx_unlock_sleep() race condition if ADAPTIVE_MUTEXES is defined
Product:	Base System	Reporter:	Stephan Uphoff <ups>
Component:	kern	Assignee:	John Baldwin <jhb>
Status:	Closed FIXED
Severity:	Affects Only Me
Priority:	Normal
Version:	Unspecified
Hardware:	Any
OS:	Any

Description Stephan Uphoff 2004-04-12 02:10:17 UTC

The following waiting loop is used in _mtx_unlock_sleep()
if both ADAPTIVE_MUTEXES and SMP is defined.

while (mtx_owner(m) == owner && TD_IS_RUNNING(owner)) {
#ifdef __i386__
	ia32_pause();
#endif
}

Problem:

The current thread has no lock or reference count
on the "owner" thread.

This means that the thread pointed to by owner
can terminate and become invalid.

If the "owner" thread terminates between the tests 
"mtx_owner(m) == owner"  and  "TD_IS_RUNNING(owner)"
the second test might access an invalid memory address.

Additional Nitpick:

Because TD_IS_RUNNING(owner) is defined as 
	     #define TD_IS_RUNNING(td) ((td)->td_state == TDS_RUNNING)
and
        The td_state field is not volatile.
and
        ia32_pause() is not marked to modify memory.

the compiler can regard the test "TD_IS_RUNNING(owner)"  as being 
loop invariant and rewrite the code fragment above as:
	
if((mtx_owner(m) == owner && TD_IS_RUNNING(owner))
      do {
            #ifdef __i386__
           	ia32_pause();
            #endif
      while (mtx_owner(m) == owner);

This would cause problems.

Comment 1 Kris Kennaway freebsd_committer

2004-04-12 14:30:58 UTC

Responsible Changed
From-To: freebsd-bugs->jhb

jhb is likely to be interested in this

Comment 2 pavel 2010-01-11 10:39:26 UTC

The same race in 8.0-STABLE amd64 (~middle of december'09).
It happens very rarely (once a week). This router runs on dual Intel Xeon X5570.

It traps in _mtx_lock_sleep 'if (TD_IS_RUNNING(owner)) {'
with owner (ecx) equal NULL (0x0), and very strange, that m->mtx_lock (eax)
is also 0x0, but why it is happened ?

Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 02
fault virtual address   = 0x288
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff802ef1be
stack pointer           = 0x28:0xffffff80761d97c0
frame pointer           = 0x28:0xffffff80761d97e0
code segment            = base rx0, limit 0xfffff, type 0x1b
                         = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 4461 (bgpd)
[thread pid 4461 tid 100169 ]
Stopped at      _mtx_lock_sleep+0x4e:   movl    0x288(%rcx),%esi

db:0:kdb.enter.default>  show pcpu
cpuid        = 1
dynamic pcpu    = 0xffffff807f909c80
curthread    = 0xffffff0003439ae0: pid 4461 "bgpd"
curpcb       = 0xffffff80761d9d40
fpcurthread  = none
idlethread   = 0xffffff00014ac000: pid 10 "idle: cpu1"
curpmap         = 0
tssp            = 0xffffffff80794ee8
commontssp      = 0xffffffff80794ee8
rsp0            = 0xffffff80761d9d40
gs32p           = 0xffffffff80793d20
ldt             = 0xffffffff80793d60
tss             = 0xffffffff80793d50
db:0:kdb.enter.default>  show reg
cs                0x20  WAKEUP_efer
ds                0x3b  WAKEUP_lstar+0x3
es            0x3b003b
fs          0x288001b0013
gs           0x288001b
ss                0x28  WAKEUP_pat
rax                  0
rcx                  0
rdx         0xffffff0003439ae0
rbx         0xffffff800001c810
rsp         0xffffff80761d97c0
rbp         0xffffff80761d97e0
rsi         0xffffff0003439ae0
rdi         0xffffff800001c810
r8                   0
r9          0xffffff011b721d00
r10                0x9  WAKEUP_xpcb+0x1
r11                0x2
r12         0xffffff0003439ae0
r13         0xffffff800001c918
r14                0x3
r15         0xffffff0150480d00
rip         0xffffffff802ef1be  _mtx_lock_sleep+0x4e
rflags         0x10246
_mtx_lock_sleep+0x4e:   movl    0x288(%rcx),%esi
db:0:kdb.enter.default>  bt
Tracing pid 4461 tid 100169 td 0xffffff0003439ae0
_mtx_lock_sleep() at _mtx_lock_sleep+0x4e
netisr_queue_internal() at netisr_queue_internal+0xe1
netisr_queue_src() at netisr_queue_src+0x3c
route_output() at route_output+0x11b
sosend_generic() at sosend_generic+0x3f6
soo_write() at soo_write+0x37
dofilewrite() at dofilewrite+0x85
kern_writev() at kern_writev+0x60
writev() at writev+0x41
syscall() at syscall+0x1da
Xfast_syscall() at Xfast_syscall+0xe1
--- syscall (121, FreeBSD ELF64, writev), rip = 0x80075679c, rsp = 0x7fffffffe998, rbp = 0x8013e3040 ---



-- 
                                          best regards, Pavel Nikiforov
                                          ARTX network administrator
Mail to: pavel@co.artx.ru

Comment 3 John Baldwin freebsd_committer

2010-01-11 15:51:42 UTC

On Monday 11 January 2010 6:00:11 am Pavel Nikiforov wrote:
> The following reply was made to PR kern/65448; it has been noted by GNATS.
> 
> From: Pavel Nikiforov <pavel@artx.ru>
> To: bug-followup@FreeBSD.org, ups@tree.com
> Cc:  
> Subject: Re: kern/65448: _mtx_unlock_sleep() race condition if 
ADAPTIVE_MUTEXES
>  is defined
> Date: Mon, 11 Jan 2010 13:39:26 +0300
> 
>  The same race in 8.0-STABLE amd64 (~middle of december'09).
>  It happens very rarely (once a week). This router runs on dual Intel Xeon 
X5570.
>  
>  It traps in _mtx_lock_sleep 'if (TD_IS_RUNNING(owner)) {'
>  with owner (ecx) equal NULL (0x0), and very strange, that m->mtx_lock (eax)
>  is also 0x0, but why it is happened ?

If the mtx_lock is zero that sounds like a different bug where the memory 
containing the lock has been cleared via bzero() or the like.  Perhaps a use-
after-free bug?

-- 
John Baldwin

Comment 4 John Baldwin freebsd_committer

2015-10-20 21:46:08 UTC

I believe that https://svnweb.freebsd.org/changeset/base/289661 should address this.

Comment 5 Konstantin Belousov freebsd_committer

2015-10-20 22:14:36 UTC

I think that the r289661 fixes the first issue, about accessing the invalid memory.  The second issue, about absense of barriers in the spoin loop, seems to be already covered by the owner local variable declared volatile.