Bug 194264 - race between unp_dispose (called from sofree) and unp_gc
Summary: race between unp_dispose (called from sofree) and unp_gc
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Only Me
Assignee: Mateusz Guzik
: 143073 (view as bug list)
Depends on:
Reported: 2014-10-09 08:29 UTC by Andriy Gapon
Modified: 2018-05-21 06:40 UTC (History)
6 users (show)

See Also:

kgdb postmortem session (6.16 KB, text/plain)
2014-10-09 08:29 UTC, Andriy Gapon
no flags Details
some more debug data (6.08 KB, text/plain)
2014-10-09 08:42 UTC, Andriy Gapon
no flags Details
log of other thread being in sofree -> unp_dispose (5.25 KB, text/plain)
2014-10-09 10:12 UTC, Andriy Gapon
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Andriy Gapon freebsd_committer 2014-10-09 08:29:06 UTC
Created attachment 148131 [details]
kgdb postmortem session

First, I think that this panic could be related to a crash of chromium process
that preceded it.  Perhaps the crash triggered closing of sockets and that
interacted badly with unp_gc code.

Unread portion of the kernel message buffer:
<6>pid 48502 (chrome), uid 1001: exited on signal 11 (core dumped)

Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
fault virtual address   = 0x100000021
fault code              = supervisor read data, page not present
(kgdb) bt
#0  doadump (textdump=1) at pcpu.h:223
#1  0xffffffff8063d9fd in kern_reboot (howto=260) at
#2  0xffffffff8063df3f in panic (fmt=<value optimized out>) at
#3  0xffffffff80861f4f in trap_fatal (frame=<value optimized out>, eva=<value
optimized out>) at /usr/src/sys/amd64/amd64/trap.c:866
#4  0xffffffff8086229c in trap_pfault (frame=0xfffffe01dd5d89e0, usermode=<value
optimized out>) at /usr/src/sys/amd64/amd64/trap.c:677
#5  0xffffffff808618be in trap (frame=0xfffffe01dd5d89e0) at
#6  0xffffffff808623f7 in trap_check (frame=<value optimized out>) at
#7  0xffffffff80845122 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:231
#8  0xffffffff806d6668 in unp_gc (arg=0x10, pending=32) at
#9  0xffffffff8068f465 in taskqueue_run_locked (queue=0xfffff80012294600) at
#10 0xffffffff80690258 in taskqueue_thread_loop (arg=<value optimized out>) at
#11 0xffffffff80605a1a in fork_exit (callout=0xffffffff80690190
<taskqueue_thread_loop>, arg=0xffffffff80ee17c0, frame=0xfffffe01dd5d8c00) at
#12 0xffffffff8084565e in fork_trampoline () at
Comment 1 Andriy Gapon freebsd_committer 2014-10-09 08:42:45 UTC
Created attachment 148132 [details]
some more debug data

Not sure if this is important but so_state is SS_ISDISCONNECTED | SS_NBIO | SS_NOFDREF, so_rcv.sb_state is SBS_CANTRCVMORE.
Comment 2 Andriy Gapon freebsd_committer 2014-10-09 10:12:52 UTC
Created attachment 148136 [details]
log of other thread being in sofree -> unp_dispose

So, looks like I missed an elephant in the room. There was a thread that was running soclose -> sofree -> unp_dispose on exactly the same socket that  unp_gc_process was processing. The socket is 0xfffff801e7420000 (see the previous attachment).

So, this is the race.
Comment 3 johans 2015-06-02 11:13:37 UTC
Is there anything known more on this bug? We're getting the same (or similar) panic on 10.1-RELENG (with few custom patches backported from 10-STABLE).

#0  doadump (textdump=<value optimized out>) at pcpu.h:219
#1  0xffffffff8091e882 in kern_reboot (howto=260) at /release/usr/src/sys/kern/kern_shutdown.c:452
#2  0xffffffff8091ed75 in vpanic (fmt=<value optimized out>, ap=<value optimized out>) at /release/usr/src/sys/kern/kern_shutdown.c:759
#3  0xffffffff8091edc3 in panic (fmt=0x0) at /release/usr/src/sys/kern/kern_shutdown.c:688
#4  0xffffffff80d1fc7f in trap_fatal (frame=<value optimized out>, eva=<value optimized out>) at /release/usr/src/sys/amd64/amd64/trap.c:865
#5  0xffffffff80d1f8d8 in trap (frame=<value optimized out>) at /release/usr/src/sys/amd64/amd64/trap.c:203
#6  0xffffffff80d03972 in calltrap () at /release/usr/src/sys/amd64/amd64/exception.S:232
#7  0xffffffff809a4148 in unp_gc (arg=0x8, pending=24) at /release/usr/src/sys/kern/uipc_usrreq.c:2152
#8  0xffffffff80967030 in taskqueue_run_locked (queue=0xfffff80003f99e00) at /release/usr/src/sys/kern/subr_taskqueue.c:342
#9  0xffffffff8096799b in taskqueue_thread_loop (arg=<value optimized out>) at /release/usr/src/sys/kern/subr_taskqueue.c:563
#10 0xffffffff808ee384 in fork_exit (callout=0xffffffff80967900 <taskqueue_thread_loop>, arg=0xffffffff8166c160, frame=0xfffffe03ce7e5c00)
    at /release/usr/src/sys/kern/kern_fork.c:996
#11 0xffffffff80d03eae in fork_trampoline () at /release/usr/src/sys/amd64/amd64/exception.S:606
#12 0x0000000000000000 in ?? ()

We have kernel dumps available if needed.
Comment 4 Mateusz Guzik freebsd_committer 2015-07-08 19:28:06 UTC
Try the patch at https://lists.freebsd.org/pipermail/freebsd-current/2015-July/056481.html
Comment 5 Conrad Meyer 2015-07-09 22:29:13 UTC
Slightly cleaned up patch here: https://reviews.freebsd.org/D3044
Comment 6 commit-hook freebsd_committer 2015-07-14 02:01:19 UTC
A commit references this bug:

Author: cem
Date: Tue Jul 14 02:00:52 UTC 2015
New revision: 285522
URL: https://svnweb.freebsd.org/changeset/base/285522

  Fix cleanup race between unp_dispose and unp_gc

  unp_dispose and unp_gc could race to teardown the same mbuf chains, which
  can lead to dereferencing freed filedesc pointers.

  This patch adds an IGNORE_RIGHTS flag on unpcbs marking the unpcb's RIGHTS
  as invalid/freed. The flag is protected by UNP_LIST_LOCK.

  To serialize against unp_gc, unp_dispose needs the socket object. Change the
  dom_dispose() KPI to take a socket object instead of an mbuf chain directly.

  PR:		194264
  Differential Revision:	https://reviews.freebsd.org/D3044
  Reviewed by:	mjg (earlier version)
  Approved by:	markj (mentor)
  Obtained from:	mjg
  MFC after:	1 month
  Sponsored by:	EMC / Isilon Storage Division

Comment 7 Andriy Gapon freebsd_committer 2015-08-24 08:08:28 UTC
Status of this bug report should be updated.
Comment 8 Andriy Gapon freebsd_committer 2018-05-21 06:40:11 UTC
*** Bug 143073 has been marked as a duplicate of this bug. ***