I've put Parallels in the subject line so that anyone interested in Parallels Desktop will find this report in a single-line search. At the moment I do not think this is a Desktop bug. For the last few months I've had the occasional panic with my FreeBSD 8 installation running in a VM under Parallels Desktop for Mac versions 4.0 and 5.0. Most of the time the panic message disappeared off the screen before I could make a note of it. In the last few days I have finally managed to capture the panic report. In an effort to track down the bug I have tried various things such a increasing the memory or setting the number of CPUs from 2 to 1. Nothing worked. I was lucky to get an uptime greater than 60 minutes. Finally, I removed the CDROM device from the Desktop's list of virtual hardware. This seems to have fixed the problem. Here is an annotated log of the panics. -------- Machine: eight.pearl Desktop-Name: FBSD-8-new-precious (eight) Parallels-Version: 4.0 FreeBSD version: 8.0, 2010-05-28 -------- CPU: 2 Type: i386 Date: 2010-05-24 16:00 Crash: Page not found ---------- Date: 2010-08-28 Information: I installed Parallels 5.0 ---------- Date 2010:08-29 23:00 CPU:1 Crash: Page not found after finishing cvsup ports --------- Date: 2010-08-30 CPU: 1 Information: Single User Now rebuilding kernel and world with sources from 2010-08-29 cvsup Single CPU, Single USER. No crashes even after 3 hours. After installing the kernel took snapshot-1 Kernel is now dated Mon Aug 30 03:24:09 BST 2010 Installed world, then took snapshot-2 Reboot CPU:2, multiuser and test by repeated buildworlds --------- Date: 2010-08-30 Information: FreeBSD-version: 8.1-STABLE Aug 30 03:24:09 2010 --------- Date: 2010-08-30 12:32 Crash: CPU: 2 Spontaneous reboot while make buildworld after about 30 minutes, not sure whether page not found message appeared. Could this be related to DHCP? The lease time is around 30 minutes. No, I don't think so. Action: set CPU to 1 and revert to snapshot-2, test with buildworld again. Oops! forgot to set CPU to one, so: -------- Date: 2010-08-30 13:13 Crash: CPU: 2 while buildworld in multi user mode got: acd0: warning - PREVENT_ALLOW taskqueue timeout - completing request directly Fatal tap 12: page fault while in kernel mode cpuid - 0; apic id == 00 fault virtual address = 0x1a4 fault code = supervisor read, page not present current process = 12 (swi6: task queue) trap number = 12 panic: page fault uptime: 33m0s Action: set CPU:1 revert to snapshot-2, try buildworld again Note: when reverting to a snapshot, you get the CPUs active when that snapshot was taken. Need to configure CPU:1 again. --------- Date: 2010-08-30 15:00 Crash: CPU: 1 Still crashes buildworld multiuser CPU=1, just takes longer (about an hour) --------- Date: 2010-08-30 18:48 Crash: CPU: 2 During buildworld: panic message acd0: WARNING - unknown CMD (0x4a) taskqueue timeout - completing request directly acd0: WARNING - unknown CMD (0x4a) freeing taskqueue zombie request Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id =00 fault virtual address = 0x1a4 fault code = supervisor read, page not present instruction pointer = 0x20:0xc0894f5f stack pointer = 0x28:0xe683d960 frame pointer = 0x28:0xe683d978 code segment = base rx0, limit 0xdffff, type =0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL=0 current process = 1184 (initial thread) trap number = 12 panic: page fault cpuid = 0 Uptime: 57m10s Action: try removing the CDROM device from the Config screen hardware list. Previously I had it installed but not connected. Save the configuration as snapshot-3 Seems to be working, one buildworld and cvs co ports succeeded. ----------- Date: 2010-09-01 12:00 CPU: 2 Information: Since removing the CDROM device I have experienced no panics. Current uptime is 1d1h26m. I have sucessfully done several buildworlds, a couple of cvsup's for ports and src, and various other tasks ====================================== Conclusion: Whenever I have captured a panic it is always preceeded by a message from the acd0 device. I had the CDROM installed but not connected to a real CD or disk image. Uninstalling the CDROM makes the panics go away. Having the CDROM connected to a real CD or diskimage also works. Fortuitiously, I had most of my other VMs connected to a diskimage. Fix: Workaround. Reconnect or uninstall the CDROM. I am currently building world on another VM with ZFS and a connected CDROM and it seems to be ok so far. How-To-Repeat: Install FreeBSD 8-STABLE on a Parallels Desktop version 4 or 5 VM. Add a CDROM to the Desktop list of installed hardware. Disconnect the CDROM. Boot FreeBSD and do some processor and disk intensive task, such as make buildworld. Within a couple of hours you will get a panic.
Responsible Changed From-To: freebsd-bugs->freebsd-emulation Over to maintainer(s).
State Changed From-To: open->feedback Could you try to reproduce this with CAM(4) subsystem (ATA_CAM kernel option)?
Summary: -------- In a Desktop VM with the CDrom installed, but not connected, and with the hald and dbus daemons running, and running buildworld or background fsck or both, there is a high probability of a panic within a few minutes. After disabling hald and dbus in /etc/rc.conf, I successfully ran make buildworld in a loop 7 times without any problems. This amounts to about 11 hours of runtime. Environment: ------------ Parallels Desktop 5 for Mac build 5.0.9376. A slightly older version was also used. Mac OS X Snow Leopard 10.6.4, 4G of ram, 1G allocated to VM CVS tag RELENG_8 src cvsup'ed at 2010-09-02 22:27 UTC Ports cvsup'ed at 2010-08-31 15:05 UTC Events so far: -------------- My main development VM, known as eight.pearl, has been running for the last five days without the CDROM installed. It has successfully built world, ran a major portupgrade and done a few dump(8)s without any panics. It is far too precious to risk any data corruption, so I made a clone. The cloned VM is known imaginatively as clone8.pearl In clone8.pearl, I installed the CDROM and disconnected it. I then started make buildworld. Within a few minutes there was a panic. I rebooted and tried another buildworld. Again, there was soon another panic. Each panic appeared to be preceeded by a message from /dev/acd0. Fortunately I had enabled dumps (see below). In clone8.pearl I then disabled hald and dbus in /etc/rc.conf. I then ran make buildworld in a loop 7 times overnight. This morning I found the VM was still running. Additional VMs created ---------------------- I created two more VMs: cdpanic.pearl and cam.pearl. Both were the minimum installation from the FreeBSD cdrom 1 of November 2009. I updated the world and kernel from my local sources. No ports were installed. cdpanic.pearl had a standard GENERIC kernel with DDB. cam.pearl also was a standard debugging kernel with option ATA_CAM, as suggested by jh earlier in this bug report. I installed and disconnected the CDrom on both VMs and started a buildworld. Both completed successfully with no panics. hald and dbus ------------- These two ports run as daemons checking the status of devices. hald comes from sysutils/hal. dbus is from devel/dbus. They are the only two daemons I can see that access the CDrom device. I am now convinced they are tickling a bug in the acd device which causes a panic. To trigger the bug you need to run something disk-intensive. make buildworld is good. So is background fsck. The acd0 device needs to report NOT READY status when it is not connected. This is probably a Desktop problem. To Do ----- I must create another clone of eight.pearl and install a CAM kernel on it. Dumps ----- I managed to obtain two dumps. here is the output of dmesg. I realise they are not much use, but I need to hone my kernel debugging skills to get more useful information. Both stopped at the same instruction pointer. ------- ata1: WARNING - READ_TOC read data overrun 18>12 acd0: WARNING - READ_TOC taskqueue timeout - completing request directly Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x1a4 fault code = supervisor read, page not present instruction pointer = 0x20:0xc08a119f stack pointer = 0x28:0xe4521b44 frame pointer = 0x28:0xe4521b5c code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 12 (swi6: task queue) panic: from debugger cpuid = 0 Uptime: 6m0s Physical memory: 1011 MB Dumping 148 MB: 133 117 101 85 69 53 37 21 5 ------------------------- acd0: WARNING - READ_TOC taskqueue timeout - completing request directly Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x1a4 fault code = supervisor read, page not present instruction pointer = 0x20:0xc08a119f stack pointer = 0x28:0xe4521b44 frame pointer = 0x28:0xe4521b5c code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 12 (swi6: task queue) panic: from debugger cpuid = 0 Uptime: 21m2s Physical memory: 1011 MB Dumping 138 MB: 123 107 91 75 59 43 27 11
I have now tried a kernel built with the ATA_CAM option. I then cloned another VM from my eight.pearl development system. I set the CDrom to installed and disconnected, and also made sure that dbus and hald were running with the new kernel. This VM now passes the "build the world seven times" test with no panics. This means 14 hours of continuous running, which is far longer than the 20 minutes it managed before. This is good news.
I forgot to mention in that last posting that I had accidentally omitted to include "devices ada" in my Kernel configuration. It did not seem to affect anything as ada0 etc still appeared in my /dev/ directory.
State Changed From-To: feedback->analyzed Feedback received. Seems to be an issue with acd(4) or ata(4).
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi, I think I may have stumbled upon the same issue after I updated my installed ports, including hald. I run a FreeBSD 8.1-stable amd64 (not the latest but a few months old) in virtual box on a windows 7 x64 host. If I disable hald or the cdrom device in virtual box, or run the same FreeBSD installation natively, the problem doesn't seem to occur. When the problem does occur, I get the following messages (not necessarily in any particular order): ata0: WARNING - unknown CMD (0x4a) read data overrun 18>8 ata0: WARNING - READ_TOC read data overrun 18>12 ata0: WARNING - PREVENT_ALLOW read data overrun 18>0 ata0: WARNING - TEST_UNIT_READY read data overrun 18>0 These messages repeat seemingly at random for a few times. Eventually the box might panic. Here is a backtrace: Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x290 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff8025aec5 stack pointer = 0x28:0xffffff800007bae0 frame pointer = 0x28:0xffffff800007bb00 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL=0 current process = 11 (swi6: task queue) [thread pid 11 tid 100018 ] Stopped at _mtx_lock_sleep+0x4e: movl 0x290(%rcx),%esi db> bt Tracing pid 11 tid 100018 td 0xffffff00024bcba0 _mtx_lock_sleep() at _mtx_lock_sleep+0x4e _sema_post() at _sema_post+0x89 ata_completed() at ata_completed+0x46e taskqueue_run() at taskqueue_run+0x94 intr_event_execute_handlers() at intr_event_execute_handlers+0xf9 ithread_loop() at ithread_loop+0x8e fork_exit() at fork_exit+0x118 fork_trampoline() at fork_trampoline+0xe - --- trap 0, rip = 0, rsp = 0xffffff800007bd30, rdp = 0 --- db> - -- llwang -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.12 (FreeBSD) iD8DBQFNCmTrCQM7t5B2mhARAoHkAJ9zNOQ8QApAP5gDgmSgUABt39es8wCghPyr g82JpbmVYsjLnyGkU+/JQ3k= =VHTI -----END PGP SIGNATURE-----
batch change: For bugs that match the following - Status Is In progress AND - Untouched since 2018-01-01. AND - Affects Base System OR Documentation DO: Reset to open status. Note: I did a quick pass but if you are getting this email it might be worthwhile to double check to see if this bug ought to be closed.
Keyword: crash – in lieu of summary line prefix: [panic] * bulk change for the keyword * summary lines may be edited manually (not in bulk). Keyword descriptions and search interface: <https://bugs.freebsd.org/bugzilla/describekeywords.cgi>
^Triage: correctly assign. To submitter: is this ancient PR still relevant?