Bug 65448 - [mutex] _mtx_unlock_sleep() race condition if ADAPTIVE_MUTEXES is defined
[mutex] _mtx_unlock_sleep() race condition if ADAPTIVE_MUTEXES is defined
Status: In Progress
Product: Base System
Classification: Unclassified
Component: kern
Any Any
: Normal Affects Only Me
Assigned To: John Baldwin
Depends on:
  Show dependency treegraph
Reported: 2004-04-12 02:10 UTC by ups
Modified: 2014-05-05 04:14 UTC (History)
0 users

See Also:


Note You need to log in before you can comment on or make changes to this bug.
Description ups 2004-04-12 02:10:17 UTC
The following waiting loop is used in _mtx_unlock_sleep()
if both ADAPTIVE_MUTEXES and SMP is defined.

while (mtx_owner(m) == owner && TD_IS_RUNNING(owner)) {
#ifdef __i386__


The current thread has no lock or reference count
on the "owner" thread.

This means that the thread pointed to by owner
can terminate and become invalid.

If the "owner" thread terminates between the tests 
"mtx_owner(m) == owner"  and  "TD_IS_RUNNING(owner)"
the second test might access an invalid memory address.

Additional Nitpick:

Because TD_IS_RUNNING(owner) is defined as 
	     #define TD_IS_RUNNING(td) ((td)->td_state == TDS_RUNNING)
        The td_state field is not volatile.
        ia32_pause() is not marked to modify memory.

the compiler can regard the test "TD_IS_RUNNING(owner)"  as being 
loop invariant and rewrite the code fragment above as:
if((mtx_owner(m) == owner && TD_IS_RUNNING(owner))
      do {
            #ifdef __i386__
      while (mtx_owner(m) == owner);

This would cause problems.
Comment 1 kris freebsd_committer 2004-04-12 14:30:58 UTC
Responsible Changed
From-To: freebsd-bugs->jhb

jhb is likely to be interested in this
Comment 2 pavel 2010-01-11 10:39:26 UTC
The same race in 8.0-STABLE amd64 (~middle of december'09).
It happens very rarely (once a week). This router runs on dual Intel Xeon X5570.

It traps in _mtx_lock_sleep 'if (TD_IS_RUNNING(owner)) {'
with owner (ecx) equal NULL (0x0), and very strange, that m->mtx_lock (eax)
is also 0x0, but why it is happened ?

Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 02
fault virtual address   = 0x288
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff802ef1be
stack pointer           = 0x28:0xffffff80761d97c0
frame pointer           = 0x28:0xffffff80761d97e0
code segment            = base rx0, limit 0xfffff, type 0x1b
                         = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 4461 (bgpd)
[thread pid 4461 tid 100169 ]
Stopped at      _mtx_lock_sleep+0x4e:   movl    0x288(%rcx),%esi

db:0:kdb.enter.default>  show pcpu
cpuid        = 1
dynamic pcpu    = 0xffffff807f909c80
curthread    = 0xffffff0003439ae0: pid 4461 "bgpd"
curpcb       = 0xffffff80761d9d40
fpcurthread  = none
idlethread   = 0xffffff00014ac000: pid 10 "idle: cpu1"
curpmap         = 0
tssp            = 0xffffffff80794ee8
commontssp      = 0xffffffff80794ee8
rsp0            = 0xffffff80761d9d40
gs32p           = 0xffffffff80793d20
ldt             = 0xffffffff80793d60
tss             = 0xffffffff80793d50
db:0:kdb.enter.default>  show reg
cs                0x20  WAKEUP_efer
ds                0x3b  WAKEUP_lstar+0x3
es            0x3b003b
fs          0x288001b0013
gs           0x288001b
ss                0x28  WAKEUP_pat
rax                  0
rcx                  0
rdx         0xffffff0003439ae0
rbx         0xffffff800001c810
rsp         0xffffff80761d97c0
rbp         0xffffff80761d97e0
rsi         0xffffff0003439ae0
rdi         0xffffff800001c810
r8                   0
r9          0xffffff011b721d00
r10                0x9  WAKEUP_xpcb+0x1
r11                0x2
r12         0xffffff0003439ae0
r13         0xffffff800001c918
r14                0x3
r15         0xffffff0150480d00
rip         0xffffffff802ef1be  _mtx_lock_sleep+0x4e
rflags         0x10246
_mtx_lock_sleep+0x4e:   movl    0x288(%rcx),%esi
db:0:kdb.enter.default>  bt
Tracing pid 4461 tid 100169 td 0xffffff0003439ae0
_mtx_lock_sleep() at _mtx_lock_sleep+0x4e
netisr_queue_internal() at netisr_queue_internal+0xe1
netisr_queue_src() at netisr_queue_src+0x3c
route_output() at route_output+0x11b
sosend_generic() at sosend_generic+0x3f6
soo_write() at soo_write+0x37
dofilewrite() at dofilewrite+0x85
kern_writev() at kern_writev+0x60
writev() at writev+0x41
syscall() at syscall+0x1da
Xfast_syscall() at Xfast_syscall+0xe1
--- syscall (121, FreeBSD ELF64, writev), rip = 0x80075679c, rsp = 0x7fffffffe998, rbp = 0x8013e3040 ---

                                          best regards, Pavel Nikiforov
                                          ARTX network administrator
Mail to: pavel@co.artx.ru
Comment 3 John Baldwin freebsd_committer 2010-01-11 15:51:42 UTC
On Monday 11 January 2010 6:00:11 am Pavel Nikiforov wrote:
> The following reply was made to PR kern/65448; it has been noted by GNATS.
> From: Pavel Nikiforov <pavel@artx.ru>
> To: bug-followup@FreeBSD.org, ups@tree.com
> Cc:  
> Subject: Re: kern/65448: _mtx_unlock_sleep() race condition if 
>  is defined
> Date: Mon, 11 Jan 2010 13:39:26 +0300
>  The same race in 8.0-STABLE amd64 (~middle of december'09).
>  It happens very rarely (once a week). This router runs on dual Intel Xeon 
>  It traps in _mtx_lock_sleep 'if (TD_IS_RUNNING(owner)) {'
>  with owner (ecx) equal NULL (0x0), and very strange, that m->mtx_lock (eax)
>  is also 0x0, but why it is happened ?

If the mtx_lock is zero that sounds like a different bug where the memory 
containing the lock has been cleared via bzero() or the like.  Perhaps a use-
after-free bug?

John Baldwin