Summary: | VirtualBox using AIO on a zvol crashes | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Ports & Packages | Reporter: | Pete French <petefrench> | ||||||||
Component: | Individual Port(s) | Assignee: | freebsd-virtualization (Nobody) <virtualization> | ||||||||
Status: | Closed FIXED | ||||||||||
Severity: | Affects Some People | CC: | KevinTaylor15.44, arrowd, bdrewery, d8zNeCFG, datafl4sh, decke, grembo, hjf, kbowling, madpilot, martin, ml, pete, pi, rkoberman, rozhuk.im, shoesoft, swills, tbr, trombik1973, trueos, vsasjason, vvd, yklaxds | ||||||||
Priority: | Normal | Flags: | madpilot:
merge-quarterly?
|
||||||||
Version: | Latest | ||||||||||
Hardware: | Any | ||||||||||
OS: | Any | ||||||||||
See Also: | https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221294 | ||||||||||
Attachments: |
|
Description
Pete French
2012-05-24 11:50:03 UTC
In case this is still interesting: Do/did you have more than one disk attached? See bug #174968. -- Martin At the time only a single disc was attached. Subsequently I have added more, but do not have AIO enabled. I have moved to FreeBSD 10 these days, and havent tested since the original bug report, but I dont really have the disc load that I used to on the 10.1 machines. If I get a chnace I will try it again, but its not likely to be in the next few days. To fix tune AIO. Add to /etc/sysctl.conf # AIO: Async IO management vfs.aio.target_aio_procs=4 # Preferred number of ready kernel threads for async IO vfs.aio.max_aio_procs=4 # Maximum number of kernel threads to use for handling async IO vfs.aio.aiod_lifetime=30000 # Maximum lifetime for idle aiod vfs.aio.aiod_timeout=10000 # Timeout value for synchronous aio operations vfs.aio.max_aio_queue=65536 # Maximum number of aio requests to queue, globally vfs.aio.max_aio_queue_per_proc=65536 # Maximum queued aio requests per process (stored in the process) vfs.aio.max_aio_per_proc=8192 # Maximum active aio requests per process (stored in the process) vfs.aio.max_buf_aio=8192 # Maximum buf aio requests per process (stored in the process) default values: vfs.aio.max_aio_queue: 1024 vfs.aio.max_aio_queue_per_proc: 256 to small, and some times queue in vbox > 256 and then vbox fail (In reply to rozhuk.im from comment #3) Interesting... thank you. Before I try this myself: Does this just make the issue more unlikely to happen, or is this a genuine fix? And also, does VBox not have any method to detect when aio operations are rejected in FreeBSD due to resource limits? Do I understand correctly that this might fix my issue https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=174968? -- Martin Just for the record, this is not ZFS related. The same issue shows up with VB and UFS. I am running 11 where AIO is integrated into the kernel, so I'll try the tuning advice and see what happens. In my case it produced several crashes in my Window 7 client while suspending the VM and many hangs of the VM. Worst case was when the virtual disk required a disk check which, in turn hung repeatedly, though eventually it did complete and the system is running again. Can AIO be disabled? I was looking at kern.features.aio. virtualbox-ose-5.0.26_1 FreeBSD rogue 11.0-BETA4 FreeBSD 11.0-BETA4 #1 r303806: Sat Aug 6 18:50:50 PDT 2016 root@rogue:/usr/obj/usr/src/sys/GENERIC.4BSD amd64 The adjustments in comment 3 seem to work, although vfs.aio.aiod_timeout does not exist in 11.0 and vfs.aio.max_aio_procs defaults to 4, so is a noop. Some of the others seem a bit extreme and I suspect tuning them back would be reasonable. The queue depths are being set to the maximum possible. I suspect 4096 and 1024 would be adequate. Not really sure why the reduction of maximum AIO processes to 4, but does not seem unreasonable. Likewise the 10x increase in idle time for AIO processes. The final two, max_aio_per_proc and max_buf_aio also look a bit extreme. Bumped from 32 and 16 to 8K is probably overkill. I'll play around with them and see what I find. Finally, these may require tuning for the number of VMs. In any case, I can now run my VM without the disk lock-ups. Hi, I would like to confirm that the aio sysctl settings in comment #3 fix crashes (causing SIGILL in VBoxSDL) and broken guest filesystems on virtualbox-ose-5.1.14_2 (FreeBSD 11.0). I've tried different guest operating systems and all fail mostly with HDD problems during initial installation. I am using simple VDI files, by the way, not any ZVOL and no compression. My host CPU is a bit slow (AMD Athlon II X3 460), it is only capable to emulate 32-bit guests. I've used these sysctl settings (probably still overdimensioned): vfs.aio.max_aio_queue=8192 vfs.aio.max_aio_queue_per_proc=8192 vfs.aio.max_aio_per_proc=4096 vfs.aio.max_buf_aio=4096 Thank you. I have done some experimentation and havebeen able to modify these settings to less extreme values and still get VB to run without failing. vfs.aio.max_aio_queue: 8192 vfs.aio.max_aio_queue_per_proc: 1024 vfs.aio.max_aio_per_proc: 128 vfs.aio.max_buf_aio: 64 I will admit that I have not tried tweaking these values for some time and I suspect come are still allowing the consumption of more resources than needed, but these are safer than those I first proposed. May be add info about AIO tunings to post install message and close this bug? Created attachment 193335 [details]
update postinstall message
A commit references this bug: Author: pi Date: Sat May 12 17:11:34 UTC 2018 New revision: 469742 URL: https://svnweb.freebsd.org/changeset/ports/469742 Log: emulators/virtualbox-ose: add pkg-message about sysctl tuning with AIO - New values for several sysctl vfs.aio.max* parameters are suggested PR: 168298 Submitted by: rozhuk.im@gmail.com Reported by: petefrench@ingresso.co.uk Changes: head/emulators/virtualbox-ose/pkg-message Sorry, I forgot to mention the reviews in the commit message. Now we should probably keep this PR open until some finds a root cause and some safe lower bounds for those values ? Lower bounds depends on host system load, host system speed, guest activity. IMHO lower bounds does nothink, this is max values, it does not consume resources at idle. *** Bug 212128 has been marked as a duplicate of this bug. *** Time to close this? (In reply to rozhuk.im from comment #15) Yes, I would says so, this has been stable for me now for many years with the AIO tunings above. I run several machines on virtuabox on zvols fine. (In reply to pete from comment #16) > Yes, I would says so, this has been stable for me now for many years with the AIO tunings above. I run several machines on virtuabox on zvols fine. Same here too. But I'm using files on zfs, not zvols. Seems to be working fine nowadays so let's close it. It's a bit strange to close a bug report with "works as intended" with a sysctl tuning fix, and not trying to find the root cause. (In reply to Kurt Jaeger from comment #19) 1. It works. 2. No one wants dig in to it in past 2-3 years. I suspect that there is some internal IO queue, probably with limit, and default limit in vbox > fbsd aio defaults. (In reply to rozhuk.im from comment #20) "Overcome By Events" as closure code fits much better, although not perfect also. It's not even overcome by events. The problem is probably still there, the sysctl fix only masks it. r324941 https://svnweb.freebsd.org/base?view=revision&revision=324941 did adjust the default value of vfs.aio.max_aio_queue, but I'm not sure if that's sufficient. Seems to me what's needed is either: * someone to evaluate if that's sufficient and propose adjusting it if appropriate * a patch to VBox to make it cope with the lower default values * a patch to disable AIO in VBox by default MARKED AS SPAM Created attachment 218039 [details]
turns off aio completely
I made a patch to turn off aio_* completely and deployed it on several production machines. It seems to work great so far - no more hangs spotted.
I believe this is more correct fix rather than tweaking sysctls.
How about committing it and removing the pkg-message suggestion?
Comment on attachment 218039 [details]
turns off aio completely
This looks completely appropriate to me. I have never been comfortable with the parameters and their possible impact on systems. I was just hoping to get CB working again and that worked, but was probably excessive and the whole idea of adjusting these parameters system wide when only VB was impacted was pretty questionable.
While the patch looks good to me, I'm hardly competent to speak to its correctness.
Yes, this looks good to me too. My original suggested fix was not to load AIO if you can avoid it, but actually thats not always practical. So having AIO disabled for virtual box instead of tweaking loads of parameters would appeal to me as a more correct fix. Having said that, its been stable for me with the tweaks parameters for years now, but this looks like a more rock-solid solution. A commit references this bug: Author: arrowd Date: Mon Oct 12 15:31:45 UTC 2020 New revision: 552134 URL: https://svnweb.freebsd.org/changeset/ports/552134 Log: emulators/virtualbox-ose: Turn off aio usage and make VirtualBox use generic Unix implementation. This fixes instabilities on some loads involving disk IO. PR: 168298, 221294 Approved by: vbox (timeout) Changes: head/emulators/virtualbox-ose/Makefile head/emulators/virtualbox-ose/files/patch-src-VBox-Runtime-Makefile.kmk I stumbled on this bug today for the first time, and it looks like that the proposed patch does not have the intended effect. I am on 11.4-RELEASE-p3, ports tree updated today. I compiled emulators/virtualbox-ose-nox11 from source and fired up a virtual machine. As soon as it tries to write something to disk, with 100% repeatability, the VM freezes. Here the relevant output in dmesg: pid 3091 (VBoxHeadless) is attempting to use unsafe AIO requests - not logging anymore The VM has a single, zvol-backed disk. -mc. (In reply to datafl4sh from comment #29) What is exact version of virtualbox-ose-nox11 are you running? (In reply to Gleb Popov from comment #30) According to the distinfo file, version is 5.2.44. (In reply to datafl4sh from comment #31) No, run `pkg info virtualbox-ose-nox11` and check the version string there. (In reply to Gleb Popov from comment #32) Sorry, 5.2.44_4 (In reply to datafl4sh from comment #33) This is strange. This version shouldn't use AIO at all. Did you built VirtualBox from source or installed from the official binary package? Also, do you have any special tweaks in /etc/sysctl.conf and /boot/loader.conf ? (In reply to Gleb Popov from comment #34) As I said, I built it from source (the tree was updated yesterday before compiling). There are no special tweaks in the files you indicated, not even the ones of comment #3. Well, I'm out of ideas what's wrong here, sorry. It should write vfs.aio.max_buf_aio=8192 vfs.aio.max_aio_queue_per_proc=65536 vfs.aio.max_aio_per_proc=8192 vfs.aio.max_aio_queue=65536 in configurations files automatic. Not need people to write It. (In reply to commit-hook from comment #28) This patch broke VirtualBox for me (12.2amd/64, with VDI files accessed via NFSv4). Starting any machine, I get: VDI: error reading pre-header in '....vdi' (VERR_DEV_IO_ERROR). VD: error VERR_VD_VDI_INVALID_HEADER opening image file '....vdi' (VERR_VD_VDI_INVALID_HEADER). Failed to open image '....vdi' in read-write mode (VERR_VD_VDI_INVALID_HEADER). AHCI: Failed to attach drive to Port0 (VERR_VD_VDI_INVALID_HEADER). Result Code: NS_ERROR_FAILURE (0x80004005) Component: ConsoleWrap Interface: IConsole {872da645-4a9b-1727-bee2-5585105b9eed} I can confirm reverting this brings VBox in a working state again. (In reply to ml from comment #38) Are you able to build this port on 12.2? I recently upgraded and can't roll back to the 5.2.34 version I was using. I tried portdowngrade to revert to r549922 but I get all sorts of error compiling. I think this is the problem I'm having. the VM works but it reports so many disk errors it's simply unable to boot. Back in 12.1-RELEASE I installed 5.2.34_4 pkg's cache, and it worked. But I can't do the same with 12.2-RELEASE (In reply to hjf from comment #39) Yes, I'm able to build it (using Poudriere), with or without disabling AIO. However, as I said in comment #38, the latest official version does not work for me. Reenabling AIO is much better (I occasionally get AHCI timeouts in a FreeBSD guest, but that's fine as I don't use vbox for anything too serious, at least now; also, it probably did before, I'm not sure). If you are having the problem I had (not sure, you don't tell), instead of reverting to the previous version, you might just try the latest without disabling AIO: use the following: # svn diff Index: files/patch-src-VBox-Runtime-Makefile.kmk =================================================================== --- files/patch-src-VBox-Runtime-Makefile.kmk (revision 563032) +++ files/patch-src-VBox-Runtime-Makefile.kmk (working copy) @@ -12,12 +12,3 @@ # Unicode Specification reader used to regenerate unidata-*.cpp. # uniread_TEMPLATE = VBoxBldProg -@@ -1632,7 +1637,7 @@ VBoxRT_SOURCES.solaris += \ - VBoxRT_SOURCES.darwin += \ - r3/posix/fileaio-posix.cpp - VBoxRT_SOURCES.freebsd += \ -- r3/freebsd/fileaio-freebsd.cpp -+ r3/posix/fileaio-posix.cpp - VBoxRT_INCS := $(RuntimeR3_INCS) - VBoxRT_INCS.$(KBUILD_TARGET) := $(RuntimeR3_INCS.$(KBUILD_TARGET)) - VBoxRT_INCS.$(KBUILD_TARGET).$(KBUILD_TARGET_ARCH) := $(RuntimeR3_INCS.$(KBUILD_TARGET).$(KBUILD_TARGET_ARCH)) (In reply to ml from comment #40) Thank you! This patch worked for me. I can use my VMs again For more detail: The Linux VMs were backed on a ZFS ZVOL. Since the upgrade to .44, I got read errors. The sort of error Linux gives you on a bad disk. ATA timeouts, etc. The errors were similar to: failed command: READ DMA EXT, status DRDY ERR , error IDNF. After applying the patch you provided, everything is working again. The aio kernel tuning options discussed in this thread did nothing for my case. Can we reopen this bug? This port is obviously broken for people using zvols. !!! I actually saw this effect when I upgraded. My Windows 10 VM booted fine after the 12.2 upgrade, but my OpenSUSE install just crashed. I ended up deleting it and installing Ubuntu instead, using the setting which worked for the Windows VM, as I assumed it was my inexperience with Linux, not a reoccurrence of a bug in the port. Damn, wish I had said something at the time now. (In reply to Pete French from comment #42) Are you also using a zvol as the backing storage for your VM? (In reply to hjf from comment #43) Yes - indeed, I am the original reporter of this bug ;-) (In reply to hjf from comment #41) Notice I'm not using zvols, so it's not "obviously broken" only "for people using zvols". I may have found a clue towards the root cause of bug 168298 (and the duplicate 212128) where virtualbox-ose locks up or crashes doing heavy AIO. I found the location in the source code where virtualbox picks up the sysctl value: vfs.aio.max_aio_per_proc but could not find any place in the code that actually used it to make any sort of decision. The virtualbox AIO code would attempt to grow the local number of allowed AIO requests whenever it got EAGAIN, without any limit. When the attempt to grow went past the sysctl value so that the attempt failed, bad things happened :). I have attached a tiny but very ugly patch that stops the attempt to grow the local resources when the number of such resources meets the AIO sysctl value. The patch makes my virtualbox version 6.1.20 running on FreeBSD 13.0 stop crashng even when the AIO sysctl values are left at the default installed values. It also (when adjusted for line number differences) makes my 5.2.34 virtualbox running on FreeBSD 12.1 with AIO stable. Both versions of virtualbox had a fairly repeatable crash or lockup with a Linux Mint 20 VM when I tried to upgrade the kernel and headers. After applying the patch I was able to upgrade the VMs with no problems and no changes to the AIO sysctl values. I do not suggest this patch as any sort of final solution :). Someone far more knowledgable about the virtualbox code than I (since the first time I looked at it was 2 weeks ago) should use it as a starting point to make the same decision about growing the number of local resources for AIO requests in a virtualbox consistent way (and style :) ). The patch would do as a workaround while we all wait for the virtualbox code gurus to improve it or come up with a better alternative. Created attachment 225299 [details]
a small ugly patch to make vbox stop trying for more AIO resources than it can have
(In reply to Tom Rushworth from comment #47) Ah, this is excellent! Is this in FreeBSd specific code, or should it be brought to the attention of the VirtualBox people upstream ? I advent looked at that site in a while I have to admit... But, nice job :) (In reply to Pete French from comment #48) Yes, this patch should be sent upstream as well. (In reply to Pete French from comment #48) Thanks for the "nice job" :). Yes this should go upstream, it is not FreeBSD specific. As far as I could see by grepping through the entire source tree, no platform looks at the AIO limit after obtaining it. That's OK when the platform doesn't have a limit :). I've never contacted the VBox folks, does someone who has contacted them want to volunteer to pass the information on? I'd also like to repeat that the patch is pretty ugly, look at the number of data structures that have to be gone through to get the AIO limit when making the decision to grow. The upstream folks should have a much better way to actually fix things, such as copying the AIO limit to the AIO manager when the manager is created so it is easier to get at when the decision needs to be made. I just made the patch as small and localized as possible to reduce the risk of breaking things I didn't understand :). (In reply to Tom Rushworth from comment #50) I sent a small patch to their mailing list (seems to be the only available channel). My intention was (and maybe I'll still try) to package some FreeBSD port patches and push them upstream. you can read the whole exchange here: https://www.virtualbox.org/pipermail/vbox-dev/2021-March/015627.html (follow the thread) I'll cite here this small exchange: ---- > BTW is there some review tool you are using? Sending patches as > attachments via mailing list does not look very efficient for review. And yet, the Linux kernel managed 15,477 commits between 5.10 and 5.11, for an average around 170 per day, the vast majority of which weren't attachments but up in the main body of the email. ---- So it looks like, since the linux kernel does things this way(do they really? Never coded for linux kernel), it's got to be good for everyone. I then sent my small patch inline and got no further feedback. Personally I'm not looking forward to further interfacing with them, and will do it only for patches I created. (In reply to Guido Falsi from comment #51) I forgot to mention that their policy to accept patches [1] is very copyright conscious. It means that it's actually impossible for anyone except the original author of the patch to contribute it (apart from lying about who is the author, but avoiding that is the whole point of copyright law). [1] https://www.virtualbox.org/wiki/Contributor_information (In reply to Guido Falsi from comment #51) I followed the thread, and I must admit that I actually laughed out loud when I got to the end and saw the patch was a one-liner! Ah, yes, definitely having that as an attachment must have been awful for them to deal with :-) I once submitted some code to the GNU project - which actually involved signing paper forms and sending them off to a layers office in Boston (the USA one). Which was astonishing - had to list every file I modified! So, I sympathise with you... (In reply to Guido Falsi from comment #52) Thanks for pointing out the VBox contributor information, and all the related background stuff :). It looks, from the licensing info there, that I have to be the one sending it upstream if anyone is going to. I guess, given who owns the project, they do have to work in a "lawyer rich environment" :). I don't have a strong stance on any particular software license, except for the belief that contributions should be licensed in the same way as the original authors did or as close to their intent as can be determined. The VBox contributor information page seems to offer only a legal agreement with Oracle or the MIT license, while the original project seems to be GPL. Certainly the file I patched is GPLv2. I don't think I care for the idea that future contributions to VirtualBox have to be either specifically licensed to Oracle (i.e. free of the GPL), or MIT licensed (i.e. also free of the GPL). It seems to me that would indicate a desire to gradually change the original authors intent. I'll have to think about it for a while before I do try to upstream it :(. In the meantime, I guess I'll also have look at the copyright issues for contributions to the FreeBSD ports area :(. My intent was simply to provide help to other FreeBSD users in whatever the usual FreeBSD way was, while spending the least amount of my own time on it. It looks like I should dot a few more 'i's and cross a few more 't's before the FreeBSD port maintainers can accept it. Anyway, until I do, if any FreeBSD user runs into this problem, please feel free to stick the patch into your /usr/ports/emulators/virtualbox-ose/files directory as a possible workaroud, with whatever modifcations you might want. It's working just fine for me in FreeBSD 13.0 for VBox 6.1.20 at the moment, and an earlier version of it seems to be working in FreeBSD 12.1 for Vbox 5.2.34. I can't promise any quick responses if things change, but FreeBSD is my working desktop, which I try to keep fairly up to date with released FreeBSD, and I need a working emulator :). on my 14-CURRENT, the patch works. i do not have sysctl tuning for aio in /etc/sysctl.conf. I tried disabling AIO option in the port, but to no avail. with the patch, guests OSes do not hang anymore while installing many packages. (In reply to Tomoyuki Sakurai from comment #55) > Anyway, until I do, if any FreeBSD user runs into this problem, please feel > free to stick the patch into your /usr/ports/emulators/virtualbox-ose/files > directory as a possible workaroud, with whatever modifcations you might want. > It's working just fine for me in FreeBSD 13.0 for VBox 6.1.20 at the moment, > and an earlier version of it seems to be working in FreeBSD 12.1 for Vbox > 5.2.34. I can't promise any quick responses if things change, but FreeBSD is > my working desktop, which I try to keep fairly up to date with released > FreeBSD, and I need a working emulator :). It would be kind of nice to actually do this, until there is a solution upstream (vbox@FreeBSD.org)? I bumped into this again at $WORK on fresh FreeBSD 13 system and latest package set. Any reason for not committing Tom's patch into our ports? (In reply to Gleb Popov from comment #57) Have you been able to test the patch and can confirm it works for you? I'm unable to test it properly here, but if I get some confirmations it works I could take responsibility to commit it myself. (In reply to Guido Falsi from comment #58) All right, I will test it tomorrow on the affected machine. The patch definitely improves things for me. I was having problems even with /boot/loader.conf AIO adjustments and now the VM works fine even without additional tweaking. (In reply to Gleb Popov from comment #60) Thanks for testing this. Please give me a little more time to test this a little myself so I can commit it. A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/ports/commit/?id=95ac4999a82d493d3d0bc0b2160e07bcc6d80ddf commit 95ac4999a82d493d3d0bc0b2160e07bcc6d80ddf Author: Tom Rushworth <tbr@acm.org> AuthorDate: 2021-09-04 16:19:11 +0000 Commit: Guido Falsi <madpilot@FreeBSD.org> CommitDate: 2021-09-04 16:22:00 +0000 emulators/virtualbox-ose(-legacy): Make VirtualBox limit AIO requests Import patch to teach VirtualBox to check availability of AIO resources before trying to allocate more. This prevents crashes when using AIO in VirtualBox. PR: 168298 emulators/virtualbox-ose-legacy/Makefile | 2 +- ...MM_VMMR3_PDMAsyncCompletionFileNormal.cpp (new) | 59 ++++++++++++++++++++++ emulators/virtualbox-ose/Makefile | 2 +- ...MM_VMMR3_PDMAsyncCompletionFileNormal.cpp (new) | 59 ++++++++++++++++++++++ 4 files changed, 120 insertions(+), 2 deletions(-) A commit in branch 2021Q3 references this bug: URL: https://cgit.FreeBSD.org/ports/commit/?id=146e091222ee0ad179309319ef36f02b9050a059 commit 146e091222ee0ad179309319ef36f02b9050a059 Author: Tom Rushworth <tbr@acm.org> AuthorDate: 2021-09-04 16:19:11 +0000 Commit: Guido Falsi <madpilot@FreeBSD.org> CommitDate: 2021-09-04 16:25:08 +0000 emulators/virtualbox-ose(-legacy): Make VirtualBox limit AIO requests Import patch to teach VirtualBox to check availability of AIO resources before trying to allocate more. This prevents crashes when using AIO in VirtualBox. PR: 168298 (cherry picked from commit 95ac4999a82d493d3d0bc0b2160e07bcc6d80ddf) emulators/virtualbox-ose-legacy/Makefile | 2 +- ...MM_VMMR3_PDMAsyncCompletionFileNormal.cpp (new) | 59 ++++++++++++++++++++++ emulators/virtualbox-ose/Makefile | 2 +- ...MM_VMMR3_PDMAsyncCompletionFileNormal.cpp (new) | 59 ++++++++++++++++++++++ 4 files changed, 120 insertions(+), 2 deletions(-) Just committed Tom patch, which has been confirmed to solve the issue, and merged it to quarterly. Thanks For providing the patch and testing! With the patch that is committed are the aio sysctls in the pkg-message still appropriate? with the patch, I haven't seen crashes on my CURRENT since then. no sysctl hack. it happened during the initial installation when disk io is heavy, but many packer builds never caused a crash. |