Bug 208275 - Kernel panic when reading from /dev/cd0 with a damaged dvd
Summary: Kernel panic when reading from /dev/cd0 with a damaged dvd
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: misc (show other bugs)
Version: CURRENT
Hardware: amd64 Any
: --- Affects Only Me
Assignee: FreeBSD bugs mailing list
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-03-25 02:26 UTC by bsd
Modified: 2016-03-29 20:00 UTC (History)
4 users (show)

See Also:


Attachments
dmesg (12.19 KB, text/plain)
2016-03-25 02:26 UTC, bsd
no flags Details
/var/crash/core.txt.0 (79.53 KB, text/plain)
2016-03-25 02:48 UTC, bsd
no flags Details
cd9660_read(): In case of read errors, don't dereference bp which may be NULL (891 bytes, patch)
2016-03-28 09:57 UTC, Fabian Keil
no flags Details | Diff
/var/log/messages kernel output on a bad dvd read (66.38 KB, text/plain)
2016-03-29 11:35 UTC, bsd
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description bsd 2016-03-25 02:26:08 UTC
Created attachment 168591 [details]
dmesg

When I mount_cd9660 /dev/cd0 /mnt and then cp -Rv /mnt/ ., I reliably get a kernel panic.  The cd is a bit scratched and old, there are some read timeout debug messages, and the it just reboots.

The machine is a Thinkpad T440p.
The version of freebsd is 11-current, FreeBSD 11.0-CURRENT #0 r297046

I initially thought it might be drm2/i915kms related, but it panics if those modules are loaded or not.

I've attached dmesg, and /var/crash/core.txt.0.
/var/crash/info.0 is:
Dump header from device: /dev/ada0p3
  Architecture: amd64
  Architecture Version: 2
  Dump Length: 856309760
  Blocksize: 512
  Dumptime: Thu Mar 24 22:11:03 2016
  Hostname: t440p.example.com
  Magic: FreeBSD Kernel Dump
  Version String: FreeBSD 11.0-CURRENT #0 r297046: Sat Mar 19 02:16:14 EDT 2016
    root@t440p.example.com:/usr/obj/usr/src/sys/GENERIC
  Panic String: page fault
  Dump Parity: 1773259869
  Bounds: 0
  Dump Status: good
Comment 1 bsd 2016-03-25 02:42:33 UTC
So I can't actually add a second file as an attachment (the core.txt.0 file).  I'm guessing it's the most useful of the bunch but clicking on the 'submit' button on the add an attachment page doesn't do anything (firefox on freebsd).  I'll mail it to the list instead.
Comment 2 bsd 2016-03-25 02:48:18 UTC
Created attachment 168592 [details]
/var/crash/core.txt.0
Comment 3 bsd 2016-03-25 02:48:59 UTC
Never mind, I couldn't add crash.txt.0 because it was read only by root.
Comment 4 Fabian Keil 2016-03-28 09:57:06 UTC
Created attachment 168724 [details]
cd9660_read(): In case of read errors, don't dereference bp which may be NULL

Thanks for the report. Can you test the attached patch?
Comment 5 bsd 2016-03-29 11:35:38 UTC
Created attachment 168755 [details]
/var/log/messages kernel output on a bad dvd read
Comment 6 bsd 2016-03-29 11:41:36 UTC
With the patch applied my machine happily worked at copying most of the evening.  Since this was a reliable way to panic the kernel, this patch is a great success. I've attached a /var/log/messages with about an hour's worth of kernel complaints regarding the media, if that's helpful.  Let me know if there's any more information needed.

Thanks!
Comment 7 Fabian Keil 2016-03-29 12:41:00 UTC
Great, thanks for testing the patch.

I believe that's all the information needed.

CC'ing freebsd-current@ to see if a committer has time for this.

FYI, when copying discs with lots of corrupt sectors, reducing
kern.cam.cd.retry_count can significantly speed things up.
Comment 8 Konstantin Belousov freebsd_committer 2016-03-29 15:06:42 UTC
(In reply to Fabian Keil from comment #7)
I think the only needed part of the change is the move of n calculation after the error check.

In head, cluster_read() and bread() relibably reset bp to NULL on error, _and_ brelse ignores NULL argument.  So it may be argued that the fix also could remobe brelse() call from the error path.
Comment 9 Fabian Keil 2016-03-29 15:58:34 UTC
I agree that moving the n calculation behind the error block
is technically sufficient to prevent the panic.

I added the NULL check in front of the brelse() because the function
contains a comment that indicates that passing NULL to it is considered
incorrect.

My assumption was that the brelse() was there for a reason and that bp
would sometimes not be NULL or at least could be in the future.

cluster_read() and bread*() indeed seem to reliably reset bp to NULL
on error, but unlike breadn_flags(), cluster_read() has no comment that
explicitly mentions this, so I wasn't sure that one can depend on this
behaviour.

If the behaviour is unlikely to change in the future, I agree that the
brelse() should be removed.
Comment 10 Konstantin Belousov freebsd_committer 2016-03-29 16:09:56 UTC
(In reply to Fabian Keil from comment #9)
See the comment near the end of cluster_read(), right after the opening '{' of if (reqbp) block.  It all comes from r294954.
Comment 11 Fabian Keil 2016-03-29 16:18:34 UTC
Thanks for the pointer. Somehow I missed that comment.
Comment 12 commit-hook freebsd_committer 2016-03-29 20:00:22 UTC
A commit references this bug:

Author: kib
Date: Tue Mar 29 19:59:45 UTC 2016
New revision: 297401
URL: https://svnweb.freebsd.org/changeset/base/297401

Log:
  Do not access buffer if bread(9) or cluster_read(9) failed.  On error,
  the functions free the buffer and set the pointer to NULL.  Also
  remove useless call to brelse(9) on the error path.

  PR:	208275
  Submitted by:	Fabian Keil <fk@fabiankeil.de>
  MFC after:	2 weeks

Changes:
  head/sys/fs/cd9660/cd9660_vnops.c