With AIO loaded ViryialBox will use this to access files. Running
VirtualBox on to of a zvol as the raw disc crashes. This may be a bug
in zvol+aio, hence the classification above. VirtualBox prduces an
error message in it's logs about AIO before crashing.
Do not load the AIO kernel module. VirtualBox is stable if AIO
is not being used.
Running VirtualBox over a zvol with AIO and then doing heavy
disc write activity will provoke the problem in a few minutes. I made
a posting to stable regarding this here:
The zvol has compression enabled.
In case this is still interesting: Do/did you have more than one disk attached? See bug #174968.
At the time only a single disc was attached. Subsequently I have added more, but do not have AIO enabled. I have moved to FreeBSD 10 these days, and havent tested since the original bug report, but I dont really have the disc load that I used to on the 10.1 machines.
If I get a chnace I will try it again, but its not likely to be in the next few days.
To fix tune AIO.
Add to /etc/sysctl.conf
# AIO: Async IO management
vfs.aio.target_aio_procs=4 # Preferred number of ready kernel threads for async IO
vfs.aio.max_aio_procs=4 # Maximum number of kernel threads to use for handling async IO
vfs.aio.aiod_lifetime=30000 # Maximum lifetime for idle aiod
vfs.aio.aiod_timeout=10000 # Timeout value for synchronous aio operations
vfs.aio.max_aio_queue=65536 # Maximum number of aio requests to queue, globally
vfs.aio.max_aio_queue_per_proc=65536 # Maximum queued aio requests per process (stored in the process)
vfs.aio.max_aio_per_proc=8192 # Maximum active aio requests per process (stored in the process)
vfs.aio.max_buf_aio=8192 # Maximum buf aio requests per process (stored in the process)
to small, and some times queue in vbox > 256 and then vbox fail
(In reply to rozhuk.im from comment #3)
Interesting... thank you.
Before I try this myself: Does this just make the issue more unlikely to happen, or is this a genuine fix?
And also, does VBox not have any method to detect when aio operations are rejected in FreeBSD due to resource limits?
Do I understand correctly that this might fix my issue https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=174968?
Just for the record, this is not ZFS related. The same issue shows up with VB and UFS. I am running 11 where AIO is integrated into the kernel, so I'll try the tuning advice and see what happens.
In my case it produced several crashes in my Window 7 client while suspending the VM and many hangs of the VM. Worst case was when the virtual disk required a disk check which, in turn hung repeatedly, though eventually it did complete and the system is running again.
Can AIO be disabled? I was looking at kern.features.aio.
FreeBSD rogue 11.0-BETA4 FreeBSD 11.0-BETA4 #1 r303806: Sat Aug 6 18:50:50 PDT 2016 root@rogue:/usr/obj/usr/src/sys/GENERIC.4BSD amd64
The adjustments in comment 3 seem to work, although vfs.aio.aiod_timeout does not exist in 11.0 and vfs.aio.max_aio_procs defaults to 4, so is a noop.
Some of the others seem a bit extreme and I suspect tuning them back would be reasonable. The queue depths are being set to the maximum possible. I suspect 4096 and 1024 would be adequate.
Not really sure why the reduction of maximum AIO processes to 4, but does not seem unreasonable. Likewise the 10x increase in idle time for AIO processes.
The final two, max_aio_per_proc and max_buf_aio also look a bit extreme. Bumped from 32 and 16 to 8K is probably overkill. I'll play around with them and see what I find.
Finally, these may require tuning for the number of VMs.
In any case, I can now run my VM without the disk lock-ups.
I would like to confirm that the aio sysctl settings in comment #3 fix crashes (causing SIGILL in VBoxSDL) and broken guest filesystems on virtualbox-ose-5.1.14_2 (FreeBSD 11.0). I've tried different guest operating systems and all fail mostly with HDD problems during initial installation.
I am using simple VDI files, by the way, not any ZVOL and no compression. My host CPU is a bit slow (AMD Athlon II X3 460), it is only capable to emulate 32-bit guests.
I've used these sysctl settings (probably still overdimensioned):
I have done some experimentation and havebeen able to modify these settings to less extreme values and still get VB to run without failing.
I will admit that I have not tried tweaking these values for some time and I suspect come are still allowing the consumption of more resources than needed, but these are safer than those I first proposed.
May be add info about AIO tunings to post install message and close this bug?
Created attachment 193335 [details]
update postinstall message
A commit references this bug:
Date: Sat May 12 17:11:34 UTC 2018
New revision: 469742
emulators/virtualbox-ose: add pkg-message about sysctl tuning with AIO
- New values for several sysctl vfs.aio.max* parameters are suggested
Submitted by: firstname.lastname@example.org
Reported by: email@example.com
Sorry, I forgot to mention the reviews in the commit message. Now we should probably keep this PR open until some finds a root cause and some safe lower bounds for those values ?
Lower bounds depends on host system load, host system speed, guest activity.
IMHO lower bounds does nothink, this is max values, it does not consume resources at idle.
*** Bug 212128 has been marked as a duplicate of this bug. ***
Time to close this?
(In reply to rozhuk.im from comment #15)
Yes, I would says so, this has been stable for me now for many years with the AIO tunings above. I run several machines on virtuabox on zvols fine.
(In reply to pete from comment #16)
> Yes, I would says so, this has been stable for me now for many years with the AIO tunings above. I run several machines on virtuabox on zvols fine.
Same here too. But I'm using files on zfs, not zvols.
Seems to be working fine nowadays so let's close it.
It's a bit strange to close a bug report with "works as intended" with a sysctl tuning fix, and not trying to find the root cause.
(In reply to Kurt Jaeger from comment #19)
1. It works.
2. No one wants dig in to it in past 2-3 years.
I suspect that there is some internal IO queue, probably with limit, and default limit in vbox > fbsd aio defaults.
(In reply to rozhuk.im from comment #20)
"Overcome By Events" as closure code fits much better, although not perfect also.
It's not even overcome by events. The problem is probably still there, the sysctl fix only masks it.
did adjust the default value of vfs.aio.max_aio_queue, but I'm not sure if that's sufficient. Seems to me what's needed is either:
* someone to evaluate if that's sufficient and propose adjusting it if appropriate
* a patch to VBox to make it cope with the lower default values
* a patch to disable AIO in VBox by default
MARKED AS SPAM
Created attachment 218039 [details]
turns off aio completely
I made a patch to turn off aio_* completely and deployed it on several production machines. It seems to work great so far - no more hangs spotted.
I believe this is more correct fix rather than tweaking sysctls.
How about committing it and removing the pkg-message suggestion?
Comment on attachment 218039 [details]
turns off aio completely
This looks completely appropriate to me. I have never been comfortable with the parameters and their possible impact on systems. I was just hoping to get CB working again and that worked, but was probably excessive and the whole idea of adjusting these parameters system wide when only VB was impacted was pretty questionable.
While the patch looks good to me, I'm hardly competent to speak to its correctness.
Yes, this looks good to me too. My original suggested fix was not to load AIO if you can avoid it, but actually thats not always practical. So having AIO disabled for virtual box instead of tweaking loads of parameters would appeal to me as a more correct fix.
Having said that, its been stable for me with the tweaks parameters for years now, but this looks like a more rock-solid solution.
A commit references this bug:
Date: Mon Oct 12 15:31:45 UTC 2020
New revision: 552134
emulators/virtualbox-ose: Turn off aio usage and make VirtualBox use generic Unix implementation.
This fixes instabilities on some loads involving disk IO.
PR: 168298, 221294
Approved by: vbox (timeout)