Bug 168298 - VirtualBox using AIO on a zvol crashes
Summary: VirtualBox using AIO on a zvol crashes
Status: Closed Works As Intended
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 11.1-RELEASE
Hardware: Any Any
: Normal Affects Only Me
Assignee: freebsd-virtualization (Nobody)
: 212128 (view as bug list)
Depends on:
Reported: 2012-05-24 11:50 UTC by Pete French
Modified: 2021-01-27 16:37 UTC (History)
19 users (show)

See Also:

update postinstall message (688 bytes, patch)
2018-05-12 16:46 UTC, rozhuk.im
no flags Details | Diff
turns off aio completely (3.22 KB, patch)
2020-09-18 05:42 UTC, Gleb Popov
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Pete French 2012-05-24 11:50:03 UTC
	With AIO loaded ViryialBox will use this to access files. Running
	VirtualBox on to of a zvol as the raw disc crashes. This may be a bug
	in zvol+aio, hence the classification above. VirtualBox prduces an
	error message in it's logs about AIO before crashing.


Do not load the AIO kernel module. VirtualBox is stable if AIO
	is not being used.
	Running VirtualBox over a zvol with AIO and then doing heavy
	disc write activity will provoke the problem in a few minutes. I made
	a posting to stable regarding this here:


	The zvol has compression enabled.
Comment 1 Martin Birgmeier 2014-12-31 14:05:34 UTC
In case this is still interesting: Do/did you have more than one disk attached? See bug #174968.

-- Martin
Comment 2 Pete French 2014-12-31 14:55:30 UTC
At the time only a single disc was attached. Subsequently I have added more, but do not have AIO enabled. I have moved to FreeBSD 10 these days, and havent tested since the original bug report, but I dont really have the disc load that I used to on the 10.1 machines.

If I get a chnace I will try it again, but its not likely to be in the next few days.
Comment 3 rozhuk.im 2015-08-14 03:16:21 UTC
To fix tune AIO.
Add to /etc/sysctl.conf

# AIO: Async IO management
vfs.aio.target_aio_procs=4		# Preferred number of ready kernel threads for async IO
vfs.aio.max_aio_procs=4			# Maximum number of kernel threads to use for handling async IO
vfs.aio.aiod_lifetime=30000		# Maximum lifetime for idle aiod
vfs.aio.aiod_timeout=10000		# Timeout value for synchronous aio operations
vfs.aio.max_aio_queue=65536		# Maximum number of aio requests to queue, globally
vfs.aio.max_aio_queue_per_proc=65536	# Maximum queued aio requests per process (stored in the process)
vfs.aio.max_aio_per_proc=8192		# Maximum active aio requests per process (stored in the process)
vfs.aio.max_buf_aio=8192		# Maximum buf aio requests per process (stored in the process)

default values:
vfs.aio.max_aio_queue: 1024
vfs.aio.max_aio_queue_per_proc: 256

to small, and some times queue in vbox > 256 and then vbox fail
Comment 4 Martin Birgmeier 2015-08-14 15:46:40 UTC
(In reply to rozhuk.im from comment #3)

Interesting... thank you.

Before I try this myself: Does this just make the issue more unlikely to happen, or is this a genuine fix?

And also, does VBox not have any method to detect when aio operations are rejected in FreeBSD due to resource limits?

Do I understand correctly that this might fix my issue https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=174968?

-- Martin
Comment 5 rkoberman 2016-08-23 01:39:41 UTC
Just for the record, this is not ZFS related. The same issue shows up with VB and UFS. I am running 11 where AIO is integrated into the kernel, so I'll try the tuning advice and see what happens.

In my case it produced several crashes in my Window 7 client while suspending the VM and many hangs of the VM. Worst case was when the virtual disk required a disk check which, in turn hung repeatedly, though eventually it did complete and the system is running again.

Can AIO be disabled? I was looking at kern.features.aio.

FreeBSD rogue 11.0-BETA4 FreeBSD 11.0-BETA4 #1 r303806: Sat Aug  6 18:50:50 PDT 2016     root@rogue:/usr/obj/usr/src/sys/GENERIC.4BSD  amd64
Comment 6 rkoberman 2016-08-24 23:28:20 UTC
The adjustments in comment 3 seem to work, although vfs.aio.aiod_timeout does not exist in 11.0 and vfs.aio.max_aio_procs defaults to 4, so is a noop.

Some of the others seem a bit extreme and I suspect tuning them back would be reasonable. The queue depths are being set to the maximum possible. I suspect 4096 and 1024 would be adequate.

Not really sure why the reduction of maximum AIO processes to 4, but does not seem unreasonable. Likewise the 10x increase in idle time for AIO processes.

The final two, max_aio_per_proc and max_buf_aio also look a bit extreme. Bumped from 32 and 16 to 8K is probably overkill. I'll play around with them and see what I find.

Finally, these may require tuning for the number of VMs.

In any case, I can now run my VM without the disk lock-ups.
Comment 7 martin 2017-03-07 14:49:30 UTC

I would like to confirm that the aio sysctl settings in comment #3 fix crashes (causing SIGILL in VBoxSDL) and broken guest filesystems on virtualbox-ose-5.1.14_2 (FreeBSD 11.0). I've tried different guest operating systems and all fail mostly with HDD problems during initial installation.

I am using simple VDI files, by the way, not any ZVOL and no compression. My host CPU is a bit slow (AMD  Athlon II X3 460), it is only capable to emulate 32-bit guests.

I've used these sysctl settings (probably still overdimensioned):


Thank you.
Comment 8 rkoberman 2017-03-07 16:49:51 UTC
I have done some experimentation and havebeen able to modify these settings to less extreme values and still get VB to run without failing.
vfs.aio.max_aio_queue: 8192
vfs.aio.max_aio_queue_per_proc: 1024
vfs.aio.max_aio_per_proc: 128
vfs.aio.max_buf_aio: 64

I will admit that I have not tried tweaking these values for some time and I suspect come are still allowing the consumption of more resources than needed, but these are safer than those I first proposed.
Comment 9 rozhuk.im 2018-05-10 23:53:13 UTC
May be add info about AIO tunings to post install message and close this bug?
Comment 10 rozhuk.im 2018-05-12 16:46:25 UTC
Created attachment 193335 [details]
update postinstall message
Comment 11 commit-hook freebsd_committer 2018-05-12 17:12:05 UTC
A commit references this bug:

Author: pi
Date: Sat May 12 17:11:34 UTC 2018
New revision: 469742
URL: https://svnweb.freebsd.org/changeset/ports/469742

  emulators/virtualbox-ose: add pkg-message about sysctl tuning with AIO

  - New values for several sysctl vfs.aio.max* parameters are suggested

  PR:		168298
  Submitted by:	rozhuk.im@gmail.com
  Reported by:	petefrench@ingresso.co.uk

Comment 12 Kurt Jaeger freebsd_committer 2018-05-12 17:17:39 UTC
Sorry, I forgot to mention the reviews in the commit message. Now we should probably keep this PR open until some finds a root cause and some safe lower bounds for those values ?
Comment 13 rozhuk.im 2018-05-12 17:25:27 UTC
Lower bounds depends on host system load, host system speed, guest activity.
IMHO lower bounds does nothink, this is max values, it does not consume resources at idle.
Comment 14 Alan Somers freebsd_committer 2019-04-18 20:06:57 UTC
*** Bug 212128 has been marked as a duplicate of this bug. ***
Comment 15 rozhuk.im 2020-04-16 08:33:40 UTC
Time to close this?
Comment 16 Pete French 2020-04-16 09:05:54 UTC
(In reply to rozhuk.im from comment #15)

Yes, I would says so, this has been stable for me now for many years with the AIO tunings above. I run several machines on virtuabox on zvols fine.
Comment 17 VVD 2020-04-16 10:10:38 UTC
(In reply to pete from comment #16)
> Yes, I would says so, this has been stable for me now for many years with the AIO tunings above. I run several machines on virtuabox on zvols fine.
Same here too. But I'm using files on zfs, not zvols.
Comment 18 Bernhard Froehlich freebsd_committer 2020-04-16 10:18:24 UTC
Seems to be working fine nowadays so let's close it.
Comment 19 Kurt Jaeger freebsd_committer 2020-04-16 10:24:33 UTC
It's a bit strange to close a bug report with "works as intended" with a sysctl tuning fix, and not trying to find the root cause.
Comment 20 rozhuk.im 2020-04-16 10:44:39 UTC
(In reply to Kurt Jaeger from comment #19)

1. It works.
2. No one wants dig in to it in past 2-3 years.

I suspect that there is some internal IO queue, probably with limit, and default limit in vbox > fbsd aio defaults.
Comment 21 Anton Saietskii 2020-04-16 10:47:36 UTC
(In reply to rozhuk.im from comment #20)

"Overcome By Events" as closure code fits much better, although not perfect also.
Comment 22 Kurt Jaeger freebsd_committer 2020-04-16 10:59:14 UTC
It's not even overcome by events. The problem is probably still there, the sysctl fix only masks it.
Comment 23 Steve Wills freebsd_committer 2020-04-16 12:24:25 UTC


did adjust the default value of vfs.aio.max_aio_queue, but I'm not sure if that's sufficient. Seems to me what's needed is either:

* someone to evaluate if that's sufficient and propose adjusting it if appropriate

* a patch to VBox to make it cope with the lower default values

* a patch to disable AIO in VBox by default
Comment 24 Sarah 2020-07-29 09:02:03 UTC
Comment 25 Gleb Popov freebsd_committer 2020-09-18 05:42:11 UTC
Created attachment 218039 [details]
turns off aio completely

I made a patch to turn off aio_* completely and deployed it on several production machines. It seems to work great so far - no more hangs spotted.

I believe this is more correct fix rather than tweaking sysctls.

How about committing it and removing the pkg-message suggestion?
Comment 26 rkoberman 2020-09-18 05:55:50 UTC
Comment on attachment 218039 [details]
turns off aio completely

This looks completely appropriate to me. I have never been comfortable with the parameters and their possible impact on systems. I was just hoping to get CB working again and that worked, but was probably excessive and the whole idea of adjusting these parameters system wide when only VB was impacted was pretty questionable.

While the patch looks good to me, I'm hardly competent to speak to its correctness.
Comment 27 Pete French 2020-09-18 09:38:11 UTC
Yes, this looks good to me too. My original suggested fix was not to load AIO if you can avoid it, but actually thats not always practical. So having AIO disabled for virtual box instead of tweaking loads of parameters would appeal to me as a more correct fix.

Having said that, its been stable for me with the tweaks parameters for years now, but this looks like a more rock-solid solution.
Comment 28 commit-hook freebsd_committer 2020-10-12 15:31:50 UTC
A commit references this bug:

Author: arrowd
Date: Mon Oct 12 15:31:45 UTC 2020
New revision: 552134
URL: https://svnweb.freebsd.org/changeset/ports/552134

  emulators/virtualbox-ose: Turn off aio usage and make VirtualBox use generic Unix implementation.

  This fixes instabilities on some loads involving disk IO.

  PR:		168298, 221294
  Approved by:	vbox (timeout)

Comment 29 datafl4sh 2020-11-01 16:40:38 UTC
I stumbled on this bug today for the first time, and it looks like that the proposed patch does not have the intended effect.

I am on 11.4-RELEASE-p3, ports tree updated today. I compiled emulators/virtualbox-ose-nox11 from source and fired up a virtual machine. As soon as it tries to write something to disk, with 100% repeatability, the VM freezes.

Here the relevant output in dmesg:

pid 3091 (VBoxHeadless) is attempting to use unsafe AIO requests - not logging anymore

The VM has a single, zvol-backed disk.

Comment 30 Gleb Popov freebsd_committer 2020-11-02 09:06:26 UTC
(In reply to datafl4sh from comment #29)
What is exact version of virtualbox-ose-nox11 are you running?
Comment 31 datafl4sh 2020-11-02 09:23:07 UTC
(In reply to Gleb Popov from comment #30)

According to the distinfo file, version is 5.2.44.
Comment 32 Gleb Popov freebsd_committer 2020-11-02 09:24:10 UTC
(In reply to datafl4sh from comment #31)
No, run `pkg info virtualbox-ose-nox11` and check the version string there.
Comment 33 datafl4sh 2020-11-02 09:28:44 UTC
(In reply to Gleb Popov from comment #32)

Sorry, 5.2.44_4
Comment 34 Gleb Popov freebsd_committer 2020-11-02 09:33:18 UTC
(In reply to datafl4sh from comment #33)
This is strange. This version shouldn't use AIO at all. Did you built VirtualBox from source or installed from the official binary package?
Also, do you have any special tweaks in /etc/sysctl.conf and /boot/loader.conf ?
Comment 35 datafl4sh 2020-11-02 10:10:52 UTC
(In reply to Gleb Popov from comment #34)

As I said, I built it from source (the tree was updated yesterday before compiling).

There are no special tweaks in the files you indicated, not even the ones of comment #3.
Comment 36 Gleb Popov freebsd_committer 2020-11-02 10:14:35 UTC
Well, I'm out of ideas what's wrong here, sorry.
Comment 37 ykla 2020-12-22 05:51:43 UTC
It should write 
in configurations files automatic. Not need people to write It.
Comment 38 ml 2021-01-10 11:28:16 UTC
(In reply to commit-hook from comment #28)

This patch broke VirtualBox for me (12.2amd/64, with VDI files accessed via NFSv4).
Starting any machine, I get:
VDI: error reading pre-header in '....vdi' (VERR_DEV_IO_ERROR).
VD: error VERR_VD_VDI_INVALID_HEADER opening image file '....vdi' (VERR_VD_VDI_INVALID_HEADER).
Failed to open image '....vdi' in read-write mode (VERR_VD_VDI_INVALID_HEADER).
AHCI: Failed to attach drive to Port0 (VERR_VD_VDI_INVALID_HEADER).

Result Code: NS_ERROR_FAILURE (0x80004005)
Component: ConsoleWrap
Interface: IConsole {872da645-4a9b-1727-bee2-5585105b9eed} 

I can confirm reverting this brings VBox in a working state again.
Comment 39 hjf 2021-01-27 03:58:59 UTC
(In reply to ml from comment #38)
Are you able to build this port on 12.2?  I recently upgraded and can't roll back to the 5.2.34 version I was using. I tried portdowngrade to revert to r549922 but I get all sorts of error compiling.
I think this is the problem I'm having. the VM works but it reports so many disk errors it's simply unable to boot. Back in 12.1-RELEASE I installed 5.2.34_4 pkg's cache, and it worked. But I can't do the same with 12.2-RELEASE
Comment 40 ml 2021-01-27 07:30:21 UTC
(In reply to hjf from comment #39)

Yes, I'm able to build it (using Poudriere), with or without disabling AIO.
However, as I said in comment #38, the latest official version does not work for me.
Reenabling AIO is much better (I occasionally get AHCI timeouts in a FreeBSD guest, but that's fine as I don't use vbox for anything too serious, at least now; also, it probably did before, I'm not sure).

If you are having the problem I had (not sure, you don't tell), instead of reverting to the previous version, you might just try the latest without disabling AIO: use the following:

# svn diff
Index: files/patch-src-VBox-Runtime-Makefile.kmk
--- files/patch-src-VBox-Runtime-Makefile.kmk	(revision 563032)
+++ files/patch-src-VBox-Runtime-Makefile.kmk	(working copy)
@@ -12,12 +12,3 @@
  # Unicode Specification reader used to regenerate unidata-*.cpp.
  uniread_TEMPLATE = VBoxBldProg
-@@ -1632,7 +1637,7 @@ VBoxRT_SOURCES.solaris        += \
- VBoxRT_SOURCES.darwin         += \
- 	r3/posix/fileaio-posix.cpp
- VBoxRT_SOURCES.freebsd        += \
--	r3/freebsd/fileaio-freebsd.cpp
-+	r3/posix/fileaio-posix.cpp
- VBoxRT_INCS                   := $(RuntimeR3_INCS)
Comment 41 hjf 2021-01-27 13:54:26 UTC
(In reply to ml from comment #40)
Thank you! This patch worked for me. I can use my VMs again

For more detail:

The Linux VMs were backed on a ZFS ZVOL. Since the upgrade to .44, I got read errors. The sort of error Linux gives you on a bad disk. ATA timeouts, etc. The errors were similar to: failed command: READ DMA EXT, status DRDY ERR , error IDNF.

After applying the patch you provided, everything is working again.

The aio kernel tuning options discussed in this thread did nothing for my case.

Can we reopen this bug? This port is obviously broken for people using zvols.
Comment 42 Pete French 2021-01-27 14:00:56 UTC
!!! I actually saw this effect when I upgraded. My Windows 10 VM booted fine after the 12.2 upgrade, but my OpenSUSE install just crashed. I ended up deleting it and installing Ubuntu instead, using the setting which worked for the Windows VM, as I assumed it was my inexperience with Linux, not a reoccurrence of a bug in the port. Damn, wish I had said something at the time now.
Comment 43 hjf 2021-01-27 15:51:22 UTC
(In reply to Pete French from comment #42)
Are you also using a zvol as the backing storage for your VM?
Comment 44 Pete French 2021-01-27 16:02:13 UTC
(In reply to hjf from comment #43)

Yes - indeed, I am the original reporter of this bug ;-)
Comment 45 ml 2021-01-27 16:37:17 UTC
(In reply to hjf from comment #41)

Notice I'm not using zvols, so it's not "obviously broken" only "for people using zvols".