Bug 219701 - crash in camperiphfree()
Summary: crash in camperiphfree()
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Only Me
Assignee: Kenneth D. Merry
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-06-01 09:48 UTC by Andriy Gapon
Modified: 2017-07-03 18:24 UTC (History)
4 users (show)

See Also:


Attachments
Proposed patch to cache the periph_driver pointer. (1.37 KB, text/plain)
2017-06-16 18:39 UTC, Kenneth D. Merry
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Andriy Gapon freebsd_committer freebsd_triage 2017-06-01 09:48:22 UTC
It seems that camperiphfree() has some unsafe code that can lead to a crash when a peripheral is freed at the same time as an unrelated peripheral driver is added.

p_drv is a pointer into periph_drivers.  The pointer is used across xpt_unlock_buses + xpt_lock_buses.  While the buses lock is dropped periph_drivers could be pointed to a new memory location and the old memory location would be freed.  See periphdriver_register() for details.
Thus, p_drv can end up pointing into the freed memory.  That would lead to a crash or memory corruption while dereferencing the pointer.

I have actually experienced such a crash (happened during the kernel boot):
#3  0xffffffff80651f33 in panic (fmt=<unavailable>) at /usr/src/sys/kern/kern_shutdown.c:594
#4  0xffffffff8085a190 in trap_fatal (frame=0xfffffe050536b430, eva=0) at /usr/src/sys/amd64/amd64/trap.c:802
#5  0xffffffff80859737 in trap (frame=0xfffffe050536b430) at /usr/src/sys/amd64/amd64/trap.c:198
#6  0xffffffff8085a4ba in trap_check (frame=0xfffffe050536b430) at /usr/src/sys/amd64/amd64/trap.c:603
#7  <signal handler called>
#8  0xffffffff802a9cd8 in camperiphfree (periph=0xfffff800224f9300) at /usr/src/sys/cam/cam_periph.c:712
#9  0xffffffff802a9bb2 in cam_periph_release_locked_buses (periph=0xfffff800224f9300) at /usr/src/sys/cam/cam_periph.c:441
#10 0xffffffff802a9e7b in cam_periph_release_locked (periph=0xfffff800224f9300) at /usr/src/sys/cam/cam_periph.c:452
#11 0xffffffff802bfeac in probedone (periph=<optimized out>, done_ccb=0xfffff8000bfe1000) at /usr/src/sys/cam/scsi/scsi_xpt.c:1198
#12 0xffffffff802b13c3 in xpt_done_process (ccb_h=0xfffff8000bfe1000) at /usr/src/sys/cam/cam_xpt.c:5452
#13 0xffffffff802b2965 in xpt_done_td (arg=0xffffffff80cf3880 <cam_doneqs>) at /usr/src/sys/cam/cam_xpt.c:5479

(kgdb) fr 8
#8  0xffffffff802a9cd8 in camperiphfree (periph=0xfffff800224f9300) at /usr/src/sys/cam/cam_periph.c:712
712             TAILQ_REMOVE(&(*p_drv)->units, periph, unit_links);
(kgdb) p p_drv
$1 = (struct periph_driver **) 0xfffff80007330030
(kgdb) p *p_drv
$2 = (struct periph_driver *) 0xdeadc0dedeadc0d
(kgdb) p periph_drivers
$9 = (struct periph_driver **) 0xfffff800a81f4880

I think that we should cache the actual pointer to the peripheral driver rather than the pointer into the array of pointers.
Comment 1 Kenneth D. Merry freebsd_committer freebsd_triage 2017-06-16 18:39:03 UTC
Created attachment 183542 [details]
Proposed patch to cache the periph_driver pointer.

Proposed patch to cache the periph_driver pointer.
Comment 2 Kenneth D. Merry freebsd_committer freebsd_triage 2017-06-16 18:40:01 UTC
Can you try out the patch and see if it works for you?
Comment 3 Andriy Gapon freebsd_committer freebsd_triage 2017-06-26 08:42:38 UTC
(In reply to Kenneth D. Merry from comment #2)
I never had a way to reproduce the problem, so I can not say with 100% certainty that the problem is fixed.  The code looks like it does fix the problem.  I do not see any regressions from the patch.  And I have not seen any recurrence of the problem.
Thank you!
Comment 4 commit-hook freebsd_committer freebsd_triage 2017-06-27 19:26:48 UTC
A commit references this bug:

Author: ken
Date: Tue Jun 27 19:26:03 UTC 2017
New revision: 320421
URL: https://svnweb.freebsd.org/changeset/base/320421

Log:
  Fix a panic in camperiphfree().

  If a peripheral driver (e.g. da, sa, cd) is added or removed from the
  peripheral driver list while an unrelated peripheral driver instance (e.g.
  da0, sa5, cd2) is going away and is inside camperiphfree(), we could
  dereference an invalid pointer.

  When peripheral drivers are added or removed (see periphdriver_register()
  and periphdriver_unregister()), the peripheral driver array is resized
  and existing entries are moved.

  Although we hold the topology lock while we traverse the peripheral driver
  list, we retain a pointer to the location of the peripheral driver pointer
  and then drop the topology lock.  So we are still vulnerable to the list
  getting moved around while the lock is dropped.

  To solve the problem, cache a copy of the peripheral driver pointer.  If
  its storage location in the list changes while we have the lock dropped, it
  won't have any effect.

  This doesn't solve the issue that peripheral drivers ("da", "cd", as opposed
  to individual instances like "da0", "cd0") are not generally part of a
  reference counting scheme to guard against deregistering them while there
  are instances active.  The caller (generally the person unloading a module)
  has to be aware of active drivers and not unload something that is in use.

  sys/cam/cam_periph.c:
  	In camperiphfree(), cache a pointer to the peripheral driver
  	instance to avoid holding a pointer to an invalid memory location
  	in the event that the peripheral driver list changes while we have
  	the topology lock dropped.

  PR:		kern/219701
  Submitted by:	avg
  MFC after:	3 days
  Sponsored by:	Spectra Logic

Changes:
  head/sys/cam/cam_periph.c
Comment 5 commit-hook freebsd_committer freebsd_triage 2017-07-03 15:35:15 UTC
A commit references this bug:

Author: ken
Date: Mon Jul  3 15:34:20 UTC 2017
New revision: 320601
URL: https://svnweb.freebsd.org/changeset/base/320601

Log:
  MFC r320421:

    ------------------------------------------------------------------------
    r320421 | ken | 2017-06-27 13:26:02 -0600 (Tue, 27 Jun 2017) | 37 lines

    Fix a panic in camperiphfree().

    If a peripheral driver (e.g. da, sa, cd) is added or removed from the
    peripheral driver list while an unrelated peripheral driver instance (e.g.
    da0, sa5, cd2) is going away and is inside camperiphfree(), we could
    dereference an invalid pointer.

    When peripheral drivers are added or removed (see periphdriver_register()
    and periphdriver_unregister()), the peripheral driver array is resized
    and existing entries are moved.

    Although we hold the topology lock while we traverse the peripheral driver
    list, we retain a pointer to the location of the peripheral driver pointer
    and then drop the topology lock.  So we are still vulnerable to the list
    getting moved around while the lock is dropped.

    To solve the problem, cache a copy of the peripheral driver pointer.  If
    its storage location in the list changes while we have the lock dropped, it
    won't have any effect.

    This doesn't solve the issue that peripheral drivers ("da", "cd", as opposed
    to individual instances like "da0", "cd0") are not generally part of a
    reference counting scheme to guard against deregistering them while there
    are instances active.  The caller (generally the person unloading a module)
    has to be aware of active drivers and not unload something that is in use.

    sys/cam/cam_periph.c:
    	In camperiphfree(), cache a pointer to the peripheral driver
    	instance to avoid holding a pointer to an invalid memory location
    	in the event that the peripheral driver list changes while we have
    	the topology lock dropped.

    PR:		kern/219701
    Submitted by:	avg
    Sponsored by:	Spectra Logic

    ------------------------------------------------------------------------
  PR:		kern/219701
  Sponsored by:	Spectra Logic

Changes:
_U  stable/10/
  stable/10/sys/cam/cam_periph.c
Comment 6 commit-hook freebsd_committer freebsd_triage 2017-07-03 15:35:17 UTC
A commit references this bug:

Author: ken
Date: Mon Jul  3 15:34:20 UTC 2017
New revision: 320601
URL: https://svnweb.freebsd.org/changeset/base/320601

Log:
  MFC r320421:

    ------------------------------------------------------------------------
    r320421 | ken | 2017-06-27 13:26:02 -0600 (Tue, 27 Jun 2017) | 37 lines

    Fix a panic in camperiphfree().

    If a peripheral driver (e.g. da, sa, cd) is added or removed from the
    peripheral driver list while an unrelated peripheral driver instance (e.g.
    da0, sa5, cd2) is going away and is inside camperiphfree(), we could
    dereference an invalid pointer.

    When peripheral drivers are added or removed (see periphdriver_register()
    and periphdriver_unregister()), the peripheral driver array is resized
    and existing entries are moved.

    Although we hold the topology lock while we traverse the peripheral driver
    list, we retain a pointer to the location of the peripheral driver pointer
    and then drop the topology lock.  So we are still vulnerable to the list
    getting moved around while the lock is dropped.

    To solve the problem, cache a copy of the peripheral driver pointer.  If
    its storage location in the list changes while we have the lock dropped, it
    won't have any effect.

    This doesn't solve the issue that peripheral drivers ("da", "cd", as opposed
    to individual instances like "da0", "cd0") are not generally part of a
    reference counting scheme to guard against deregistering them while there
    are instances active.  The caller (generally the person unloading a module)
    has to be aware of active drivers and not unload something that is in use.

    sys/cam/cam_periph.c:
    	In camperiphfree(), cache a pointer to the peripheral driver
    	instance to avoid holding a pointer to an invalid memory location
    	in the event that the peripheral driver list changes while we have
    	the topology lock dropped.

    PR:		kern/219701
    Submitted by:	avg
    Sponsored by:	Spectra Logic

    ------------------------------------------------------------------------
  PR:		kern/219701
  Sponsored by:	Spectra Logic

Changes:
_U  stable/10/
  stable/10/sys/cam/cam_periph.c
Comment 7 commit-hook freebsd_committer freebsd_triage 2017-07-03 15:35:19 UTC
A commit references this bug:

Author: ken
Date: Mon Jul  3 15:34:21 UTC 2017
New revision: 320602
URL: https://svnweb.freebsd.org/changeset/base/320602

Log:
  MFC r320421:

    ------------------------------------------------------------------------
    r320421 | ken | 2017-06-27 13:26:02 -0600 (Tue, 27 Jun 2017) | 37 lines

    Fix a panic in camperiphfree().

    If a peripheral driver (e.g. da, sa, cd) is added or removed from the
    peripheral driver list while an unrelated peripheral driver instance (e.g.
    da0, sa5, cd2) is going away and is inside camperiphfree(), we could
    dereference an invalid pointer.

    When peripheral drivers are added or removed (see periphdriver_register()
    and periphdriver_unregister()), the peripheral driver array is resized
    and existing entries are moved.

    Although we hold the topology lock while we traverse the peripheral driver
    list, we retain a pointer to the location of the peripheral driver pointer
    and then drop the topology lock.  So we are still vulnerable to the list
    getting moved around while the lock is dropped.

    To solve the problem, cache a copy of the peripheral driver pointer.  If
    its storage location in the list changes while we have the lock dropped, it
    won't have any effect.

    This doesn't solve the issue that peripheral drivers ("da", "cd", as opposed
    to individual instances like "da0", "cd0") are not generally part of a
    reference counting scheme to guard against deregistering them while there
    are instances active.  The caller (generally the person unloading a module)
    has to be aware of active drivers and not unload something that is in use.

    sys/cam/cam_periph.c:
    	In camperiphfree(), cache a pointer to the peripheral driver
    	instance to avoid holding a pointer to an invalid memory location
    	in the event that the peripheral driver list changes while we have
    	the topology lock dropped.

    PR:		kern/219701
    Submitted by:	avg
    Sponsored by:	Spectra Logic

    ------------------------------------------------------------------------
  PR:		kern/219701
  Sponsored by:	Spectra Logic

Changes:
_U  stable/11/
  stable/11/sys/cam/cam_periph.c
Comment 8 commit-hook freebsd_committer freebsd_triage 2017-07-03 15:35:21 UTC
A commit references this bug:

Author: ken
Date: Mon Jul  3 15:34:21 UTC 2017
New revision: 320602
URL: https://svnweb.freebsd.org/changeset/base/320602

Log:
  MFC r320421:

    ------------------------------------------------------------------------
    r320421 | ken | 2017-06-27 13:26:02 -0600 (Tue, 27 Jun 2017) | 37 lines

    Fix a panic in camperiphfree().

    If a peripheral driver (e.g. da, sa, cd) is added or removed from the
    peripheral driver list while an unrelated peripheral driver instance (e.g.
    da0, sa5, cd2) is going away and is inside camperiphfree(), we could
    dereference an invalid pointer.

    When peripheral drivers are added or removed (see periphdriver_register()
    and periphdriver_unregister()), the peripheral driver array is resized
    and existing entries are moved.

    Although we hold the topology lock while we traverse the peripheral driver
    list, we retain a pointer to the location of the peripheral driver pointer
    and then drop the topology lock.  So we are still vulnerable to the list
    getting moved around while the lock is dropped.

    To solve the problem, cache a copy of the peripheral driver pointer.  If
    its storage location in the list changes while we have the lock dropped, it
    won't have any effect.

    This doesn't solve the issue that peripheral drivers ("da", "cd", as opposed
    to individual instances like "da0", "cd0") are not generally part of a
    reference counting scheme to guard against deregistering them while there
    are instances active.  The caller (generally the person unloading a module)
    has to be aware of active drivers and not unload something that is in use.

    sys/cam/cam_periph.c:
    	In camperiphfree(), cache a pointer to the peripheral driver
    	instance to avoid holding a pointer to an invalid memory location
    	in the event that the peripheral driver list changes while we have
    	the topology lock dropped.

    PR:		kern/219701
    Submitted by:	avg
    Sponsored by:	Spectra Logic

    ------------------------------------------------------------------------
  PR:		kern/219701
  Sponsored by:	Spectra Logic

Changes:
_U  stable/11/
  stable/11/sys/cam/cam_periph.c
Comment 9 commit-hook freebsd_committer freebsd_triage 2017-07-03 18:21:39 UTC
A commit references this bug:

Author: ken
Date: Mon Jul  3 18:20:45 UTC 2017
New revision: 320608
URL: https://svnweb.freebsd.org/changeset/base/320608

Log:
  Merge r320602 from stable/11 into releng/11.1:
    ------------------------------------------------------------------------
    r320602 | ken | 2017-07-03 09:34:21 -0600 (Mon, 03 Jul 2017) | 45 lines

    MFC r320421:

      ------------------------------------------------------------------------
      r320421 | ken | 2017-06-27 13:26:02 -0600 (Tue, 27 Jun 2017) | 37 lines

      Fix a panic in camperiphfree().

      If a peripheral driver (e.g. da, sa, cd) is added or removed from the
      peripheral driver list while an unrelated peripheral driver instance (e.g.
      da0, sa5, cd2) is going away and is inside camperiphfree(), we could
      dereference an invalid pointer.

      When peripheral drivers are added or removed (see periphdriver_register()
      and periphdriver_unregister()), the peripheral driver array is resized
      and existing entries are moved.

      Although we hold the topology lock while we traverse the peripheral driver
      list, we retain a pointer to the location of the peripheral driver pointer
      and then drop the topology lock.  So we are still vulnerable to the list
      getting moved around while the lock is dropped.

      To solve the problem, cache a copy of the peripheral driver pointer.  If
      its storage location in the list changes while we have the lock dropped, it
      won't have any effect.

      This doesn't solve the issue that peripheral drivers ("da", "cd", as opposed
      to individual instances like "da0", "cd0") are not generally part of a
      reference counting scheme to guard against deregistering them while there
      are instances active.  The caller (generally the person unloading a module)
      has to be aware of active drivers and not unload something that is in use.

      sys/cam/cam_periph.c:
      	In camperiphfree(), cache a pointer to the peripheral driver
      	instance to avoid holding a pointer to an invalid memory location
      	in the event that the peripheral driver list changes while we have
      	the topology lock dropped.

      PR:		kern/219701
      Submitted by:	avg
      Sponsored by:	Spectra Logic

      ------------------------------------------------------------------------
    ------------------------------------------------------------------------

  Approved by:	re (gjb)

Changes:
_U  releng/11.1/
  releng/11.1/sys/cam/cam_periph.c
Comment 10 Kenneth D. Merry freebsd_committer freebsd_triage 2017-07-03 18:24:11 UTC
Fixed in head, releng/11.1, stable/11, and stable/10.  Thanks for the bug report!