174968 – emulators/virtualbox-ose: CAM lockup when using more than one disk

Bug 174968 - emulators/virtualbox-ose: CAM lockup when using more than one disk

Summary: emulators/virtualbox-ose: CAM lockup when using more than one disk

Status:	Closed Not Accepted

Alias:	None

Product:	Ports & Packages
Classification:	Unclassified
Component:	Individual Port(s) (show other bugs)
Version:	Latest
Hardware:	Any Any

Importance:	Normal Affects Only Me
Assignee:	freebsd-ports-bugs (Nobody)

URL:
Keywords:

Depends on:
Blocks:

Reported:	2013-01-04 13:40 UTC by Martin Birgmeier
Modified:	2014-09-20 21:16 UTC (History)
CC List:	2 users (show)

See Also:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Martin Birgmeier 2013-01-04 13:40:00 UTC

I am using virtualbox (4.1.22 previously, but now 4.2.6) to pre-test various installations. In this case, I am testing a zfs raidz2 setup.

The host is running FreeBSD 8.2.0 release.

The client is running FreeBSD 9.1.0 release with the latst NFSE patches.

The host is exporting 7 physical disk partitions to the client:
- 1 on IDE, used for UFS / and /usr
- 6 via either a single SCSI or a single SATA controller (the problems are the same using either)

In the client, the 6 SATA-attached (or SCSI-attached) partitions are used to form a raidz2 zpool.

The problem is that even with just a little disk activity, the CAM path seems to hang after just a few operations (regardless of using SATA or SCSI to attach the 6 zpool disks). This behavior can be triggered by something as simple as a 'zfs create'.

Typically, on the console of the client the following messages appear (ultimately, for all disks):

Jan 4 14:02:31 v904 kernel: Trying to mount root from ufs:/dev/ada0a [rw]...
Jan 4 14:02:31 v904 kernel: ZFS NOTICE: Prefetch is disabled by default if less than 4GB of RAM is present;
Jan 4 14:02:31 v904 kernel: to enable, add "vfs.zfs.prefetch_disable=0" to /boot/loader.conf.
Jan 4 14:02:31 v904 kernel: ZFS filesystem version 5
Jan 4 14:02:31 v904 kernel: ZFS storage pool version 28
Jan 4 14:02:31 v904 root: /etc/rc: WARNING: failed precmd routine for vmware_guestd
Jan 4 14:02:32 v904 kernel: .
Jan 4 14:07:06 v904 kernel: (ada5:ata6:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 00 00
Jan 4 14:07:06 v904 kernel: (ada5:ata6:0:0:0): CAM status: Command timeout
Jan 4 14:07:06 v904 kernel: (ada5:ata6:0:0:0): Retrying command
Jan 4 14:07:36 v904 kernel: (ada5:ata6:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 00 00
Jan 4 14:07:36 v904 kernel: (ada5:ata6:0:0:0): CAM status: Command timeout
Jan 4 14:07:36 v904 kernel: (ada5:ata6:0:0:0): Error 5, Retries exhausted
Jan 4 14:07:44 v904 kernel: .
Jan 4 14:07:45 v904 kernel: , 750.

(I have no idea where the lines with the single dots and the ", 750" come from.)

Some activity in the client is still possible if it does not access the zpool.

Interestingly, as soon as the problem surfaces, the VirtualBox emulation process itself also becomes stuck immediately when trying to execute an action (e.g., hard reset) from its pull-down menu. The process can then be killed (just kill, i.e., -15, I assume (using zsh)). The host does not seem to be adversely affected.

Because the emulation process itself is affected, I do not believe that the client OS itself the culprit (and neither NFSE); rather, I'd guess that it is a VirtualBox problem.

One more note: Similar problems seem to occur when running the client under a Windows 7 host. However, in that case the same real partitions on the FreeBSD 8.2 server are accessed using iSCSI from VirtualBox running on the Windows 7 host, and that might introduce additional problems (for example, I see a high rate of iSCSI disconnects/reconnects in this scenario). From this, I would guess that it is the vendor source which has problems with multiple disks (because the problem occurs in a similar manner under both FreeBSD 8.2 and Windows 7 as hosts).

How-To-Repeat: See description above.

Comment 1 Edwin Groothuis freebsd_committer

2013-01-20 01:53:31 UTC

Responsible Changed
From-To: freebsd-ports-bugs->vbox

Over to maintainer (via the GNATS Auto Assign Tool)

Comment 2 Bernhard Froehlich 2013-01-20 07:06:47 UTC

Since the bug is the same with windows 7 host and freebsd guest I think
this could be a general vbox bug which should be reported upstream.
Remember that FreeBSD is supported as vbox guest only.

Comment 3 Carlo Strub freebsd_committer

2014-09-11 19:53:17 UTC

Is this PR still relevant? Back to pool.

Comment 4 Martin Birgmeier 2014-09-12 15:49:31 UTC

I have just tried this with a FreeBSD 10.0 guest (FreeBSD 9.2 host with aio kldload'd), and the behavior is still the same.

I have a hunch that src/VBox/Runtime/r3/freebsd/fileaio-freebsd.cpp is broken if using more than one (emulated) disk.

This would indeed mean that the problem has to be reported upstream unless some FreeBSD person wrote that driver.

-- Martin

Comment 5 Martin Birgmeier 2014-09-20 14:12:12 UTC

Bug report for VirtualBox itself: https://www.virtualbox.org/ticket/12648

Comment 6 Carlo Strub freebsd_committer

2014-09-20 21:16:41 UTC

Thanks for mentioning the upstream PR. This really seems to be unrelated to the virtualbox port per se.