150186 – [parallels] [panic] Parallels Desktop: CDROM disconnected leads to panic, eventually

Bug 150186 - [parallels] [panic] Parallels Desktop: CDROM disconnected leads to panic, eventually

Summary: [parallels] [panic] Parallels Desktop: CDROM disconnected leads to panic, eve...

Status:	Open

Alias:	None

Product:	Base System
Classification:	Unclassified
Component:	kern (show other bugs)
Version:	Unspecified
Hardware:	Any Any

Importance:	Normal Affects Only Me
Assignee:	freebsd-virtualization (Nobody)

URL:
Keywords:	crash

Depends on:
Blocks:

Reported:	2010-09-01 14:10 UTC by Dave Evans
Modified:	2023-12-26 00:13 UTC (History)
CC List:	1 user (show)

See Also:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Dave Evans 2010-09-01 14:10:03 UTC

I've put Parallels in the subject line so that anyone
interested in Parallels Desktop will find this report
in a single-line search.  At the moment I do not think this
is a Desktop bug.

For the last few months I've had the occasional panic with
my FreeBSD 8 installation running in a VM under Parallels
Desktop for Mac versions 4.0 and 5.0. Most of the time
the panic message disappeared off the screen before I could
make a note of it. In the last few days I have finally managed
to capture the panic report.

In an effort to track down the bug I have tried various things
such a increasing the memory or setting the number of CPUs from
2 to 1. Nothing worked. I was lucky to get an uptime greater than
60 minutes. Finally, I removed the CDROM device from the Desktop's
list of virtual hardware. This seems to have fixed the problem.

Here is an annotated log of the panics.

--------
Machine: eight.pearl
Desktop-Name: FBSD-8-new-precious (eight)
Parallels-Version: 4.0
FreeBSD version: 8.0, 2010-05-28
--------
CPU: 2
Type: i386
Date: 2010-05-24 16:00
Crash:

Page not found
----------
Date: 2010-08-28
Information: I installed Parallels 5.0
----------
Date 2010:08-29 23:00
CPU:1
Crash:
Page not found after finishing cvsup ports
---------
Date: 2010-08-30
CPU: 1
Information:

Single User
Now rebuilding kernel and world with sources from 2010-08-29 cvsup
Single CPU,  Single USER.  No crashes even after 3 hours.

After installing the kernel took snapshot-1
Kernel is now dated Mon Aug 30 03:24:09 BST 2010

Installed world, then took snapshot-2

Reboot CPU:2, multiuser and test by repeated buildworlds

---------
Date: 2010-08-30
Information:
FreeBSD-version: 8.1-STABLE Aug 30 03:24:09 2010

---------
Date: 2010-08-30 12:32
Crash:
CPU: 2

Spontaneous reboot while make buildworld after about 30 minutes,
not sure whether page not found message appeared.

Could this be related to DHCP? The lease time is around 30 minutes.
No, I don't think so.

Action: set CPU to 1 and revert to snapshot-2, test with buildworld again.

Oops! forgot to set CPU to one, so:

--------
Date: 2010-08-30 13:13
Crash:
CPU: 2

while buildworld in multi user mode
got:

acd0: warning - PREVENT_ALLOW taskqueue timeout - completing request directly

Fatal tap 12: page fault while in kernel mode
cpuid - 0; apic id == 00
fault virtual address = 0x1a4
fault code = supervisor read, page not present
current process = 12 (swi6: task queue)
trap number = 12
panic: page fault
uptime: 33m0s


Action: set CPU:1 revert to snapshot-2, try buildworld again
Note: when reverting to a snapshot, you get the CPUs active when that
      snapshot was taken. Need to configure CPU:1 again.
---------
Date: 2010-08-30 15:00
Crash:
CPU: 1

Still crashes buildworld multiuser CPU=1, just takes longer (about an hour)
---------
Date: 2010-08-30  18:48
Crash:
CPU: 2

During buildworld:
panic message

acd0: WARNING - unknown CMD (0x4a) taskqueue timeout - completing request directly
acd0: WARNING - unknown CMD (0x4a) freeing taskqueue zombie request

Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id =00
fault virtual address = 0x1a4
fault code = supervisor read, page not present
instruction pointer   = 0x20:0xc0894f5f
stack pointer         = 0x28:0xe683d960
frame pointer         = 0x28:0xe683d978
code segment          = base rx0, limit 0xdffff, type =0x1b
                      = DPL 0, pres 1, def32 1, gran 1
processor eflags      = interrupt enabled, resume, IOPL=0
current process       = 1184 (initial thread)
trap number = 12
panic: page fault
cpuid = 0
Uptime: 57m10s

Action: try removing the CDROM device from the Config screen hardware list.
Previously I had it installed but not connected.

Save the configuration as snapshot-3
Seems to be working, one buildworld and cvs co ports succeeded.

-----------
Date: 2010-09-01 12:00
CPU: 2
Information:

Since removing the CDROM device I have experienced no panics.
Current uptime is 1d1h26m.

I have sucessfully done several buildworlds, a couple of cvsup's
for ports and src, and various other tasks

======================================

Conclusion:

Whenever I have captured a panic it is always preceeded by a message
from the acd0 device.  I had the CDROM installed but not connected to
a real CD or disk image. Uninstalling the CDROM makes the panics go
away. Having the CDROM connected to a real CD or diskimage also works.
Fortuitiously, I had most of my other VMs connected to a diskimage.

Fix: 

Workaround.
Reconnect or uninstall the CDROM.

I am currently building world on another VM with ZFS and a connected
CDROM and it seems to be ok so far.
How-To-Repeat: Install FreeBSD 8-STABLE on a Parallels Desktop version 4 or 5 VM.

Add a CDROM to the Desktop list of installed hardware.  Disconnect the CDROM.

Boot FreeBSD and do some processor and disk intensive task, such as make buildworld.
Within a couple of hours you will get a panic.

Comment 1 Mark Linimon freebsd_committer

2010-09-06 08:35:57 UTC

Responsible Changed
From-To: freebsd-bugs->freebsd-emulation

Over to maintainer(s).

Comment 2 Jaakko Heinonen freebsd_committer

2010-09-06 15:53:47 UTC

State Changed
From-To: open->feedback

Could you try to reproduce this with CAM(4) subsystem (ATA_CAM kernel 
option)?

Comment 3 Dave Evans 2010-09-10 11:49:17 UTC

 Summary:
--------
In a Desktop VM with the CDrom installed, but not connected, and with
the hald and dbus daemons running, and running buildworld or background
fsck or both, there is a high probability of a panic within a few minutes.
After disabling  hald and dbus in /etc/rc.conf, I successfully ran
make buildworld in a loop 7 times without any problems. This amounts to
about 11 hours of runtime.

Environment:
------------

Parallels Desktop 5 for Mac build 5.0.9376. A slightly older version was
       also used.

Mac OS X Snow Leopard 10.6.4, 4G of ram, 1G allocated to VM

CVS tag RELENG_8 src cvsup'ed at 2010-09-02 22:27 UTC

Ports cvsup'ed at 2010-08-31 15:05 UTC

Events so far:
--------------

My main development VM, known as eight.pearl, has been running for
the last five days without the CDROM installed. It has successfully
built world, ran a major portupgrade and done a few dump(8)s without any
panics. It is far too precious to risk any data corruption, so I made a clone.

The cloned VM is known imaginatively as clone8.pearl

In clone8.pearl, I installed the CDROM and disconnected it. I then started
make buildworld.  Within a few minutes there was a panic. I rebooted and
tried another buildworld.  Again, there was soon another panic.  Each panic
appeared to be preceeded by a message from /dev/acd0.  Fortunately I had
enabled dumps (see below).

In clone8.pearl I then disabled hald and dbus in /etc/rc.conf.  I then
ran make buildworld in a loop 7 times overnight. This morning I found
the VM was still running.

Additional VMs created
----------------------

I created two more VMs: cdpanic.pearl and cam.pearl. Both were the
minimum installation from the FreeBSD cdrom 1 of November 2009. I updated
the world and kernel from my local sources. No ports were installed.
cdpanic.pearl had a standard GENERIC kernel with DDB. cam.pearl also
was a standard debugging kernel with option ATA_CAM, as suggested by jh
earlier in this bug report.

I installed and disconnected the CDrom on both VMs and started a buildworld.
Both completed successfully with no panics.

hald and dbus
-------------
These two ports run as daemons checking the status of devices.
hald comes from sysutils/hal. dbus is from devel/dbus.  They
are the only two daemons I can see that access the CDrom device.
I am now convinced they are tickling a bug in the acd device
which causes a panic.  

To trigger the bug you need to run something disk-intensive.
make buildworld is good. So is background fsck.

The acd0 device needs to report NOT READY status when it is not connected.
This is probably a Desktop problem.

To Do
-----
I must create another clone of eight.pearl and install a CAM kernel
on it.

Dumps
-----
I managed to obtain two dumps. here is the output of dmesg.  I realise
they are not much use, but I need to hone my kernel debugging skills
to get more useful information. Both stopped at the same instruction pointer.

-------
ata1: WARNING - READ_TOC read data overrun 18>12
acd0: WARNING - READ_TOC taskqueue timeout - completing request directly


Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address    = 0x1a4
fault code        = supervisor read, page not present
instruction pointer    = 0x20:0xc08a119f
stack pointer            = 0x28:0xe4521b44
frame pointer            = 0x28:0xe4521b5c
code segment        = base rx0, limit 0xfffff, type 0x1b
            = DPL 0, pres 1, def32 1, gran 1
processor eflags    = interrupt enabled, resume, IOPL = 0
current process        = 12 (swi6: task queue)
panic: from debugger
cpuid = 0
Uptime: 6m0s
Physical memory: 1011 MB
Dumping 148 MB: 133 117 101 85 69 53 37 21 5
-------------------------
acd0: WARNING - READ_TOC taskqueue timeout - completing request directly


Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address    = 0x1a4
fault code        = supervisor read, page not present
instruction pointer    = 0x20:0xc08a119f
stack pointer            = 0x28:0xe4521b44
frame pointer            = 0x28:0xe4521b5c
code segment        = base rx0, limit 0xfffff, type 0x1b
            = DPL 0, pres 1, def32 1, gran 1
processor eflags    = interrupt enabled, resume, IOPL = 0
current process        = 12 (swi6: task queue)
panic: from debugger
cpuid = 0
Uptime: 21m2s
Physical memory: 1011 MB
Dumping 138 MB: 123 107 91 75 59 43 27 11

Comment 4 Dave Evans 2010-09-13 12:17:28 UTC

 I have now tried a kernel built with the ATA_CAM option.

I then cloned another VM from my eight.pearl development system.
I set the CDrom to installed and disconnected, and also made sure
that dbus and hald were running with the new kernel.

This VM now passes the "build the world seven times" test with no
panics. This means 14 hours of continuous running, which is far longer
than the 20 minutes it managed before.

This is good news.

Comment 5 Dave Evans 2010-09-13 12:32:11 UTC

 I forgot to mention in that last posting that I had accidentally omitted to
include  "devices ada" in my Kernel configuration.  It did not seem to affect
anything as ada0 etc still appeared in my /dev/ directory.

Comment 6 Jaakko Heinonen freebsd_committer

2010-09-25 18:26:05 UTC

State Changed
From-To: feedback->analyzed

Feedback received. Seems to be an issue with acd(4) or ata(4).

Comment 7 Li-Lun Wang 2010-12-16 19:13:51 UTC

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

I think I may have stumbled upon the same issue after I updated my
installed ports, including hald.  I run a FreeBSD 8.1-stable amd64 (not
the latest but a few months old) in virtual box on a windows 7 x64
host.  If I disable hald or the cdrom device in virtual box, or run the
same FreeBSD installation natively, the problem doesn't seem to occur.
When the problem does occur, I get the following messages (not
necessarily in any particular order):

ata0: WARNING - unknown CMD (0x4a) read data overrun 18>8
ata0: WARNING - READ_TOC read data overrun 18>12
ata0: WARNING - PREVENT_ALLOW read data overrun 18>0
ata0: WARNING - TEST_UNIT_READY read data overrun 18>0

These messages repeat seemingly at random for a few times.  Eventually
the box might panic.  Here is a backtrace:

Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address	= 0x290
fault code		= supervisor read data, page not present
instruction pointer	= 0x20:0xffffffff8025aec5
stack pointer		= 0x28:0xffffff800007bae0
frame pointer		= 0x28:0xffffff800007bb00
code segment		= base rx0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags	= interrupt enabled, resume, IOPL=0
current process		= 11 (swi6: task queue)
[thread pid 11 tid 100018 ]
Stopped at	_mtx_lock_sleep+0x4e:	movl	0x290(%rcx),%esi
db> bt
Tracing pid 11 tid 100018 td 0xffffff00024bcba0
_mtx_lock_sleep() at _mtx_lock_sleep+0x4e
_sema_post() at _sema_post+0x89
ata_completed() at ata_completed+0x46e
taskqueue_run() at taskqueue_run+0x94
intr_event_execute_handlers() at intr_event_execute_handlers+0xf9
ithread_loop() at ithread_loop+0x8e
fork_exit() at fork_exit+0x118
fork_trampoline() at fork_trampoline+0xe
- --- trap 0, rip = 0, rsp = 0xffffff800007bd30, rdp = 0 ---
db>

- -- llwang
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.12 (FreeBSD)

iD8DBQFNCmTrCQM7t5B2mhARAoHkAJ9zNOQ8QApAP5gDgmSgUABt39es8wCghPyr
g82JpbmVYsjLnyGkU+/JQ3k=
=VHTI
-----END PGP SIGNATURE-----

Comment 8 Eitan Adler freebsd_committer

2018-05-28 19:46:31 UTC

batch change:

For bugs that match the following
-  Status Is In progress 
AND
- Untouched since 2018-01-01.
AND
- Affects Base System OR Documentation

DO:

Reset to open status.


Note:
I did a quick pass but if you are getting this email it might be worthwhile to double check to see if this bug ought to be closed.

Comment 9 Graham Perrin freebsd_committer

2022-10-17 12:17:02 UTC

Keyword: 

    crash

– in lieu of summary line prefix: 

    [panic]

* bulk change for the keyword
* summary lines may be edited manually (not in bulk). 

Keyword descriptions and search interface: 

    <https://bugs.freebsd.org/bugzilla/describekeywords.cgi>

Comment 10 Mark Linimon freebsd_committer

2023-12-26 00:13:00 UTC

^Triage: correctly assign.

To submitter: is this ancient PR still relevant?