Bug 194264

Summary: race between unp_dispose (called from sofree) and unp_gc
Product: Base System Reporter: Andriy Gapon <avg>
Component: kernAssignee: Mateusz Guzik <mjg>
Status: Closed FIXED    
Severity: Affects Only Me CC: cem, cse.cem, gleb.kurtsou, johan, mjg, pjd
Priority: ---    
Version: CURRENT   
Hardware: Any   
OS: Any   
Attachments:
Description Flags
kgdb postmortem session
none
some more debug data
none
log of other thread being in sofree -> unp_dispose none

Description Andriy Gapon freebsd_committer 2014-10-09 08:29:06 UTC
Created attachment 148131 [details]
kgdb postmortem session

First, I think that this panic could be related to a crash of chromium process
that preceded it.  Perhaps the crash triggered closing of sockets and that
interacted badly with unp_gc code.

Unread portion of the kernel message buffer:
<6>pid 48502 (chrome), uid 1001: exited on signal 11 (core dumped)


Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
fault virtual address   = 0x100000021
fault code              = supervisor read data, page not present
...
(kgdb) bt
#0  doadump (textdump=1) at pcpu.h:223
#1  0xffffffff8063d9fd in kern_reboot (howto=260) at
/usr/src/sys/kern/kern_shutdown.c:445
#2  0xffffffff8063df3f in panic (fmt=<value optimized out>) at
/usr/src/sys/kern/kern_shutdown.c:621
#3  0xffffffff80861f4f in trap_fatal (frame=<value optimized out>, eva=<value
optimized out>) at /usr/src/sys/amd64/amd64/trap.c:866
#4  0xffffffff8086229c in trap_pfault (frame=0xfffffe01dd5d89e0, usermode=<value
optimized out>) at /usr/src/sys/amd64/amd64/trap.c:677
#5  0xffffffff808618be in trap (frame=0xfffffe01dd5d89e0) at
/usr/src/sys/amd64/amd64/trap.c:426
#6  0xffffffff808623f7 in trap_check (frame=<value optimized out>) at
/usr/src/sys/amd64/amd64/trap.c:620
#7  0xffffffff80845122 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:231
#8  0xffffffff806d6668 in unp_gc (arg=0x10, pending=32) at
/usr/src/sys/kern/uipc_usrreq.c:2152
#9  0xffffffff8068f465 in taskqueue_run_locked (queue=0xfffff80012294600) at
/usr/src/sys/kern/subr_taskqueue.c:371
#10 0xffffffff80690258 in taskqueue_thread_loop (arg=<value optimized out>) at
/usr/src/sys/kern/subr_taskqueue.c:642
#11 0xffffffff80605a1a in fork_exit (callout=0xffffffff80690190
<taskqueue_thread_loop>, arg=0xffffffff80ee17c0, frame=0xfffffe01dd5d8c00) at
/usr/src/sys/kern/kern_fork.c:977
#12 0xffffffff8084565e in fork_trampoline () at
/usr/src/sys/amd64/amd64/exception.S:605
Comment 1 Andriy Gapon freebsd_committer 2014-10-09 08:42:45 UTC
Created attachment 148132 [details]
some more debug data

Not sure if this is important but so_state is SS_ISDISCONNECTED | SS_NBIO | SS_NOFDREF, so_rcv.sb_state is SBS_CANTRCVMORE.
Comment 2 Andriy Gapon freebsd_committer 2014-10-09 10:12:52 UTC
Created attachment 148136 [details]
log of other thread being in sofree -> unp_dispose

So, looks like I missed an elephant in the room. There was a thread that was running soclose -> sofree -> unp_dispose on exactly the same socket that  unp_gc_process was processing. The socket is 0xfffff801e7420000 (see the previous attachment).

So, this is the race.
Comment 3 johans 2015-06-02 11:13:37 UTC
Is there anything known more on this bug? We're getting the same (or similar) panic on 10.1-RELENG (with few custom patches backported from 10-STABLE).

#0  doadump (textdump=<value optimized out>) at pcpu.h:219
#1  0xffffffff8091e882 in kern_reboot (howto=260) at /release/usr/src/sys/kern/kern_shutdown.c:452
#2  0xffffffff8091ed75 in vpanic (fmt=<value optimized out>, ap=<value optimized out>) at /release/usr/src/sys/kern/kern_shutdown.c:759
#3  0xffffffff8091edc3 in panic (fmt=0x0) at /release/usr/src/sys/kern/kern_shutdown.c:688
#4  0xffffffff80d1fc7f in trap_fatal (frame=<value optimized out>, eva=<value optimized out>) at /release/usr/src/sys/amd64/amd64/trap.c:865
#5  0xffffffff80d1f8d8 in trap (frame=<value optimized out>) at /release/usr/src/sys/amd64/amd64/trap.c:203
#6  0xffffffff80d03972 in calltrap () at /release/usr/src/sys/amd64/amd64/exception.S:232
#7  0xffffffff809a4148 in unp_gc (arg=0x8, pending=24) at /release/usr/src/sys/kern/uipc_usrreq.c:2152
#8  0xffffffff80967030 in taskqueue_run_locked (queue=0xfffff80003f99e00) at /release/usr/src/sys/kern/subr_taskqueue.c:342
#9  0xffffffff8096799b in taskqueue_thread_loop (arg=<value optimized out>) at /release/usr/src/sys/kern/subr_taskqueue.c:563
#10 0xffffffff808ee384 in fork_exit (callout=0xffffffff80967900 <taskqueue_thread_loop>, arg=0xffffffff8166c160, frame=0xfffffe03ce7e5c00)
    at /release/usr/src/sys/kern/kern_fork.c:996
#11 0xffffffff80d03eae in fork_trampoline () at /release/usr/src/sys/amd64/amd64/exception.S:606
#12 0x0000000000000000 in ?? ()


We have kernel dumps available if needed.
Comment 4 Mateusz Guzik freebsd_committer 2015-07-08 19:28:06 UTC
Try the patch at https://lists.freebsd.org/pipermail/freebsd-current/2015-July/056481.html
Comment 5 Conrad Meyer 2015-07-09 22:29:13 UTC
Slightly cleaned up patch here: https://reviews.freebsd.org/D3044
Comment 6 commit-hook freebsd_committer 2015-07-14 02:01:19 UTC
A commit references this bug:

Author: cem
Date: Tue Jul 14 02:00:52 UTC 2015
New revision: 285522
URL: https://svnweb.freebsd.org/changeset/base/285522

Log:
  Fix cleanup race between unp_dispose and unp_gc

  unp_dispose and unp_gc could race to teardown the same mbuf chains, which
  can lead to dereferencing freed filedesc pointers.

  This patch adds an IGNORE_RIGHTS flag on unpcbs marking the unpcb's RIGHTS
  as invalid/freed. The flag is protected by UNP_LIST_LOCK.

  To serialize against unp_gc, unp_dispose needs the socket object. Change the
  dom_dispose() KPI to take a socket object instead of an mbuf chain directly.

  PR:		194264
  Differential Revision:	https://reviews.freebsd.org/D3044
  Reviewed by:	mjg (earlier version)
  Approved by:	markj (mentor)
  Obtained from:	mjg
  MFC after:	1 month
  Sponsored by:	EMC / Isilon Storage Division

Changes:
  head/sys/kern/uipc_socket.c
  head/sys/kern/uipc_usrreq.c
  head/sys/sys/domain.h
  head/sys/sys/unpcb.h
Comment 7 Andriy Gapon freebsd_committer 2015-08-24 08:08:28 UTC
Status of this bug report should be updated.
Comment 8 Andriy Gapon freebsd_committer 2018-05-21 06:40:11 UTC
*** Bug 143073 has been marked as a duplicate of this bug. ***