Summary: | [bhyve] utilizing passthru breaks raw device usage with virtio-blk | ahci-hd | ||
---|---|---|---|
Product: | Base System | Reporter: | Harald Schmalzbauer <bugzilla.freebsd> |
Component: | misc | Assignee: | freebsd-virtualization (Nobody) <virtualization> |
Status: | New --- | ||
Severity: | Affects Some People | CC: | grehan |
Priority: | --- | ||
Version: | 11.0-STABLE | ||
Hardware: | amd64 | ||
OS: | Any | ||
See Also: | https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=260178 | ||
Attachments: |
Description
Harald Schmalzbauer
2017-01-03 16:47:31 UTC
Would you be able to post a verbose dmesg (boot -v) ? Created attachment 178537 [details]
Verbose boot log part 1, listing ACPI+CPU messages
Created attachment 178538 [details]
Verbose boot log part 2, listing device probe messages
Created attachment 178539 [details]
Verbose boot log part 3, listing rest (msi assignment + consumer attaching messages)
(In reply to Peter Grehan from comment #1) Thanks for your attention! Please find them attached, I hope my 3-part separation doesn't confuse anybody... -harry Created attachment 182869 [details]
Vebose boot of ppt corruvting /dev/ada via bhyve-ahci
I tried to investigate further.
I can confirm that the same procedure also breaks UEFI booting:
X64 Exception Type - 000000000000000D CPU Apic ID - 00000000 !!!!
RIP - 000000007FB00FF5, CS - 0000000000000028, RFLAGS - 0000000000010002
ExceptionData - 0000000000000000
RAX - 0000000000000000, RCX - 0000000000000008, RDX - 0000000000000408
RBX - 0000000000000001, RSP - 000000007FBEF468, RBP - 000000007FBEF7C8
RSI - 000000007E549B2E, RDI - 000000007FBEF468
R8 - 000000007FBEF97C, R9 - 000000007FC16A9F, R10 - 00000000000003F8
R11 - 0000000000000040, R12 - 0000000000000000, R13 - 0000000000000000
R14 - 0000000000000000, R15 - 0000000000000000
DS - 0000000000000008, ES - 0000000000000008, FS - 0000000000000008
GS - 0000000000000008, SS - 0000000000000008
CR0 - 0000000080000033, CR2 - 0000000000000000, CR3 - 000000007FB8E000
CR4 - 0000000000000668, CR8 - 0000000000000000
DR0 - 0000000000000000, DR1 - 0000000000000000, DR2 - 0000000000000000
DR3 - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 - 0000000000000400
GDTR - 000000007FB78E98 000000000000003F, LDTR - 0000000000000000
IDTR - 000000007F711018 0000000000000FFF, TR - 0000000000000000
FXSAVE_STATE - 000000007FBEF0C0
This happens as soon as I add a passthru device.
Attached is a verbose boot of an install-iso, with bhyve-ahci (responsive, dd to /dev/null leads to _real_ disk activity, unfortunately NULLs only, not the disk's data).
One thin I noticed is that I always get the message "pcib0: no PRT entry for 0.5.INTA" for any passthru device, regardless which slot I use.
Any help highly appreciated! How do others use passthru?
-harry
Is there anybody who has checked whether the steps to reproduce show the reported results? Meaning, is there anybody who can confirm correct behaviour in that case? I observed many more, at first sight completely unrelated strange errors, but all show up as soon as one condition is true: shutting down a bhyve-guest which had ppt in use. Latest example: panic: Memory modified after free 0xfffff8002486a030(48) val=0 @ 0xfffff8002486a030 cpuid = 5 KDB: stack backtrace: #0 0xffffffff805bf327 at kdb_backtrace+0x67 #1 0xffffffff8057f266 at vpanic+0x186 #2 0xffffffff8057f2e3 at panic+0x43 #3 0xffffffff8082eaeb at trash_ctor+0x4b #4 0xffffffff8082aaec at uma_zalloc_arg+0x52c #5 0xffffffff813b54a6 at zio_add_child+0x26 #6 0xffffffff813b5a05 at zio_create+0x385 #7 0xffffffff813b6de2 at zio_vdev_child_io+0x232 #8 0xffffffff81396be0 at vdev_mirror_io_start+0x370 #9 0xffffffff813bc629 at zio_vdev_io_start+0x4a9 #10 0xffffffff813b76bc at zio_execute+0x36c #11 0xffffffff813b6868 at zio_nowait+0xb8 #12 0xffffffff81396bec at vdev_mirror_io_start+0x37c #13 0xffffffff813bc383 at zio_vdev_io_start+0x203 #14 0xffffffff813b76bc at zio_execute+0x36c #15 0xffffffff805d10dd at taskqueue_run_locked+0x13d #16 0xffffffff805d1e78 at taskqueue_thread_loop+0x88 #17 0xffffffff80543844 at fork_exit+0x84 #0 doadump (textdump=<value optimized out>) at pcpu.h:222 #1 0xffffffff8057ece0 in kern_reboot (howto=260) at /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_shutdown.c:366 #2 0xffffffff8057f2a0 in vpanic (fmt=<value optimized out>, ap=<value optimized out>) at /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_shutdown.c:759 #3 0xffffffff8057f2e3 in panic (fmt=<value optimized out>) at /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_shutdown.c:690 #4 0xffffffff8082eaeb in trash_ctor (mem=<value optimized out>, size=<value optimized out>, arg=<value optimized out>, flags=<value optimized out>) at /usr/local/share/deploy-tools/RELENG_11/src/sys/vm/uma_dbg.c:80 #5 0xffffffff8082aaec in uma_zalloc_arg (zone=0xfffff8001febc680, udata=0xfffff8001ad5f340, flags=<value optimized out>) at /usr/local/share/deploy-tools/RELENG_11/src/sys/vm/uma_core.c:2152 #6 0xffffffff813b54a6 in zio_add_child (pio=0xfffff8026f350b88, cio=0xfffff8002478b7b0) at /usr/local/share/deploy-tools/RELENG_11/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:460 #7 0xffffffff813b5a05 in zio_create (pio=0xfffff8026f350b88, spa=<value optimized out>, txg=433989, bp=<value optimized out>, data=0xfffffe0058afa000, size=1024, type=<value optimized out>, priority=ZIO_PRIORITY_ASYNC_WRITE, flags=<value optimized out>, vd=<value optimized out>, offset=<value optimized out>, zb=<value optimized out>, pipeline=<value optimized out>) at /usr/local/share/deploy-tools/RELENG_11/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:690 #8 0xffffffff813b6de2 in zio_vdev_child_io (pio=0xfffff8026f350b88, bp=<value optimized out>, vd=<value optimized out>, offset=325398016, data=<value optimized out>, size=1024, type=<value optimized out>, flags=1048704, done=<value optimized out>) at /usr/local/share/deploy-tools/RELENG_11/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1141 #9 0xffffffff81396be0 in vdev_mirror_io_start (zio=0xfffff8026f350b88) at /usr/local/share/deploy-tools/RELENG_11/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_mirror.c:488 #10 0xffffffff813bc629 in zio_vdev_io_start (zio=0xfffff8026f350b88) at /usr/local/share/deploy-tools/RELENG_11/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3143 #11 0xffffffff813b76bc in zio_execute (zio=<value optimized out>) at /usr/local/share/deploy-tools/RELENG_11/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1681 #12 0xffffffff813b6868 in zio_nowait (zio=0xfffff8026f350b88) at /usr/local/share/deploy-tools/RELENG_11/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1739 #13 0xffffffff81396bec in vdev_mirror_io_start (zio=0xfffff8026f7a7b88) at /usr/local/share/deploy-tools/RELENG_11/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_mirror.c:488 #14 0xffffffff813bc383 in zio_vdev_io_start (zio=0xfffff8026f7a7b88) at /usr/local/share/deploy-tools/RELENG_11/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3021 #15 0xffffffff813b76bc in zio_execute (zio=<value optimized out>) at /usr/local/share/deploy-tools/RELENG_11/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1681 #16 0xffffffff805d10dd in taskqueue_run_locked (queue=0xfffff8001ab5a700) at /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/subr_taskqueue.c:454 #17 0xffffffff805d1e78 in taskqueue_thread_loop (arg=<value optimized out>) at /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/subr_taskqueue.c:741 #18 0xffffffff80543844 in fork_exit (callout=0xffffffff805d1df0 <taskqueue_thread_loop>, arg=0xfffff8001aa90720, frame=0xfffffe043f609ac0) at /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_fork.c:1042 #19 0xffffffff808598ae in fork_trampoline () at /usr/local/share/deploy-tools/RELENG_11/src/sys/amd64/amd64/exception.S:611 #20 0x0000000000000000 in ?? () I consider this as a severe problem, which shouldn't exist in 11.1-RELEASE. If nobody can prove my findings wrong, using passthru should be disabled in RELENG_11_1 until it can be ruled out as source of these strange problems (some form of memory corruption). Thanks, -harry Seems to be fixed in https://cgit.freebsd.org/src/commit/?id=246c398145674e4a9337fd933a6e6da7f160118e Will close as soon as I had the opportunity to do a real-world check - anybody else checking and closing welcome. A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=dd113f67dfb5bdaf5d8b3a87bb19924ad447494c commit dd113f67dfb5bdaf5d8b3a87bb19924ad447494c Author: Bjoern A. Zeeb <bz@FreeBSD.org> AuthorDate: 2022-03-18 20:39:06 +0000 Commit: Bjoern A. Zeeb <bz@FreeBSD.org> CommitDate: 2022-03-27 17:57:28 +0000 bhyve: Do not remove guest physical addresses from IOMMU host domain This permits I/O devices on the host to directly access wired memory dedicated to guests using passthru devices. Note that wired memory belonging to guests that do not use passthru devices has always been accessible by I/O devices on the host. bhyve maps guest physical addresses into the user address space of the bhyve process by mmap'ing /dev/vmm/<vmname>. Device models pass pointers derived from this mapping directly to system calls such as preadv() to minimize copies when emulating DMA. If the backing store for a device model is a raw host device (e.g. when exporting a raw disk device such as /dev/ada<n> as a drive in the guest), the host device driver (e.g. ahci for /dev/ada<n>) can itself use DMA on the host directly to the guest's memory. However, if the guest's memory is not present in the host IOMMU domain, these DMA requests by the host device will fail without raising an error visible to the host device driver or to the guest resulting in non-working I/O in the guest. It is unclear why guest addresses were removed from the IOMMU host domain initially, especially only for VM's with a passthru device as the host IOMMU domain does not affect the permissions of passthru devices, only devices on the host. A considered alternative was using bounce buffers instead (D34535 is a proof of concept), but that adds additional overhead for unclear benefit. This solves a long-standing problem when using passthru devices and physical disks in the same VM. Thanks to: grehan (patience and help) Thanks to: jhb (for improving the commit message) PR: 260178, 215740 Reviewed by: grehan, jhb Differential Revision: https://reviews.freebsd.org/D34607 (cherry picked from commit 246c398145674e4a9337fd933a6e6da7f160118e) sys/amd64/vmm/vmm.c | 2 -- 1 file changed, 2 deletions(-) A commit in branch releng/13.1 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=1c6abf864ecd3bbf07ace2018f9aab45b6406ce2 commit 1c6abf864ecd3bbf07ace2018f9aab45b6406ce2 Author: Bjoern A. Zeeb <bz@FreeBSD.org> AuthorDate: 2022-03-18 20:39:06 +0000 Commit: Bjoern A. Zeeb <bz@FreeBSD.org> CommitDate: 2022-03-30 15:33:47 +0000 bhyve: Do not remove guest physical addresses from IOMMU host domain This permits I/O devices on the host to directly access wired memory dedicated to guests using passthru devices. Note that wired memory belonging to guests that do not use passthru devices has always been accessible by I/O devices on the host. bhyve maps guest physical addresses into the user address space of the bhyve process by mmap'ing /dev/vmm/<vmname>. Device models pass pointers derived from this mapping directly to system calls such as preadv() to minimize copies when emulating DMA. If the backing store for a device model is a raw host device (e.g. when exporting a raw disk device such as /dev/ada<n> as a drive in the guest), the host device driver (e.g. ahci for /dev/ada<n>) can itself use DMA on the host directly to the guest's memory. However, if the guest's memory is not present in the host IOMMU domain, these DMA requests by the host device will fail without raising an error visible to the host device driver or to the guest resulting in non-working I/O in the guest. It is unclear why guest addresses were removed from the IOMMU host domain initially, especially only for VM's with a passthru device as the host IOMMU domain does not affect the permissions of passthru devices, only devices on the host. A considered alternative was using bounce buffers instead (D34535 is a proof of concept), but that adds additional overhead for unclear benefit. This solves a long-standing problem when using passthru devices and physical disks in the same VM. Approved by: re (gjb) Thanks to: grehan (patience and help) Thanks to: jhb (for improving the commit message) PR: 260178, 215740 Reviewed by: grehan, jhb Differential Revision: https://reviews.freebsd.org/D34607 (cherry picked from commit 246c398145674e4a9337fd933a6e6da7f160118e) (cherry picked from commit dd113f67dfb5bdaf5d8b3a87bb19924ad447494c) sys/amd64/vmm/vmm.c | 2 -- 1 file changed, 2 deletions(-) |