I am using virtualbox (4.1.22 previously, but now 4.2.6) to pre-test various installations. In this case, I am testing a zfs raidz2 setup.
The host is running FreeBSD 8.2.0 release.
The client is running FreeBSD 9.1.0 release with the latst NFSE patches.
The host is exporting 7 physical disk partitions to the client:
- 1 on IDE, used for UFS / and /usr
- 6 via either a single SCSI or a single SATA controller (the problems are the same using either)
In the client, the 6 SATA-attached (or SCSI-attached) partitions are used to form a raidz2 zpool.
The problem is that even with just a little disk activity, the CAM path seems to hang after just a few operations (regardless of using SATA or SCSI to attach the 6 zpool disks). This behavior can be triggered by something as simple as a 'zfs create'.
Typically, on the console of the client the following messages appear (ultimately, for all disks):
Jan 4 14:02:31 v904 kernel: Trying to mount root from ufs:/dev/ada0a [rw]...
Jan 4 14:02:31 v904 kernel: ZFS NOTICE: Prefetch is disabled by default if less than 4GB of RAM is present;
Jan 4 14:02:31 v904 kernel: to enable, add "vfs.zfs.prefetch_disable=0" to /boot/loader.conf.
Jan 4 14:02:31 v904 kernel: ZFS filesystem version 5
Jan 4 14:02:31 v904 kernel: ZFS storage pool version 28
Jan 4 14:02:31 v904 root: /etc/rc: WARNING: failed precmd routine for vmware_guestd
Jan 4 14:02:32 v904 kernel: .
Jan 4 14:07:06 v904 kernel: (ada5:ata6:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 00 00
Jan 4 14:07:06 v904 kernel: (ada5:ata6:0:0:0): CAM status: Command timeout
Jan 4 14:07:06 v904 kernel: (ada5:ata6:0:0:0): Retrying command
Jan 4 14:07:36 v904 kernel: (ada5:ata6:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 00 00
Jan 4 14:07:36 v904 kernel: (ada5:ata6:0:0:0): CAM status: Command timeout
Jan 4 14:07:36 v904 kernel: (ada5:ata6:0:0:0): Error 5, Retries exhausted
Jan 4 14:07:44 v904 kernel: .
Jan 4 14:07:45 v904 kernel: , 750.
(I have no idea where the lines with the single dots and the ", 750" come from.)
Some activity in the client is still possible if it does not access the zpool.
Interestingly, as soon as the problem surfaces, the VirtualBox emulation process itself also becomes stuck immediately when trying to execute an action (e.g., hard reset) from its pull-down menu. The process can then be killed (just kill, i.e., -15, I assume (using zsh)). The host does not seem to be adversely affected.
Because the emulation process itself is affected, I do not believe that the client OS itself the culprit (and neither NFSE); rather, I'd guess that it is a VirtualBox problem.
One more note: Similar problems seem to occur when running the client under a Windows 7 host. However, in that case the same real partitions on the FreeBSD 8.2 server are accessed using iSCSI from VirtualBox running on the Windows 7 host, and that might introduce additional problems (for example, I see a high rate of iSCSI disconnects/reconnects in this scenario). From this, I would guess that it is the vendor source which has problems with multiple disks (because the problem occurs in a similar manner under both FreeBSD 8.2 and Windows 7 as hosts).
How-To-Repeat: See description above.
Over to maintainer (via the GNATS Auto Assign Tool)
Since the bug is the same with windows 7 host and freebsd guest I think
this could be a general vbox bug which should be reported upstream.
Remember that FreeBSD is supported as vbox guest only.
Is this PR still relevant? Back to pool.
I have just tried this with a FreeBSD 10.0 guest (FreeBSD 9.2 host with aio kldload'd), and the behavior is still the same.
I have a hunch that src/VBox/Runtime/r3/freebsd/fileaio-freebsd.cpp is broken if using more than one (emulated) disk.
This would indeed mean that the problem has to be reported upstream unless some FreeBSD person wrote that driver.
Bug report for VirtualBox itself: https://www.virtualbox.org/ticket/12648
Thanks for mentioning the upstream PR. This really seems to be unrelated to the virtualbox port per se.