Created attachment 219550 [details] Patch to make bhyve pci passthru work inside a jail on FreeBSD 12.2-RELEASE is possible to run bhyve inside a jail, but testing passthru revealed that this not work inside a jail. The problem is that the jail needs r/w access to the following devices: * /dev/mem * /dev/io * /dev/pci /dev/pci is not a problem with a securelevel < 0. But the rest needs a patch. I have attached a rough patch to make bhyve pci passthrough work inside a jail,I also have attached jail.conf and devfs.rules used for testing. This was tested in jail using vnet.
Created attachment 219551 [details] devfs.rules use for testing
Created attachment 219552 [details] jail.conf used for testing
PRIV_IO access is not required only by /dev/io, it is also required for sysarch(I386_SET_IOPERM), which is otherwise available to jailed processes. So the patch definitely should not be committed. A better solution would be to extend pci(4) so that bhyve can use it to do everything required for PCI passthrough. Even then I'm not sure why it's useful to jail the bhyve process - what does it buy you?
(In reply to Mark Johnston from comment #3) > Even then I'm not sure why it's useful to jail the bhyve process - what does it buy you? It allows folks to have a production version of bhyve in the host, and to develop and test the userland components of bhyve in a jail.
Perhaps I'm missing the point, but rather than punching a giant hole in the jail security model (ie: by giving unconstrained ring 0 / kernel privileges to jailed processes), would it not be better to run these development/testing bhyve userland components in a simple chroot environment?
(In reply to Peter Wemm from comment #5) Currently illumos is able to do pci-passthrough with bhyve running inside a zone, that gives you an extra layer of security,if there is escape from the hypervisor then the attacker will land on a jail and not the host system. Here are relevant links on how is used on illumos : https://movementarian.org/blog/posts/2018-10-26-pci-pass-through-support-with-bhyve-and-smartos/ https://www.cyber-tec.org/2019/05/29/using-bhyve-pci-passthrough-on-omnios/ I think it would be nice to have this feature on FreeBSD jails, as Mark stated "better solution would be to extend pci(4) so that bhyve can use it to do everything required for PCI passthrough." I would like to explore this option any pointer on how to start would be really good.
Hi All, > Even then I'm not sure why it's useful to jail the bhyve process - what does it buy you? The idea to run bhyve inside jail is to provide an additional layer of security for potential vm-escape vulnerabilities in bhyve. This is the way VMs are executed on Linux (protected by AppArmor and SEL) and Illumos. Currently it's possible to run bhyve in jail, but not with PCI passthrough. > A better solution would be to extend pci(4) so that bhyve can use it to do everything required for PCI passthrough. Mark, could you please give us a hint on what should be done to extend pci(4) so jail changes are not needed? We are willing to implement this, but need some guidance. One more security improvement that bhyve needs is to run it without root, but that's another story for another report.
Sorry for the delayed follow up. I wrote some patches to remove the need for /dev/io: https://reviews.freebsd.org/D31307 https://reviews.freebsd.org/D31308 Testing would be appreciated. This does not remove the dependency on /dev/mem yet. I am very skeptical that jailing bhyve with PCI passthrough enabled provides any meaningful security. /dev/pci allows a jailed root to access all PCI(e) devices in the system. Jails can be a useful deployment mechanism though, so I think we should better support their integration with bhyve.
The /dev/mem ones could probably be PCIOCBARMMAP if that could be/is locked down enough? But I assume all the checks needed are in place (now) somewhere for the IO ioctl?
(In reply to Bjoern A. Zeeb from comment #9) > The /dev/mem ones could probably be PCIOCBARMMAP if that could be/is locked down enough? Yes, it seems possible. > But I assume all the checks needed are in place (now) somewhere for the IO ioctl? I'm not sure if I understand the question. The new ioctl limits accesses to the specified BAR and verifies that the accesses is within bounds. The /dev/io interface permits access to any system I/O port.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=42375556e5b2e68746d999b43d124040b6affb91 commit 42375556e5b2e68746d999b43d124040b6affb91 Author: Mark Johnston <markj@FreeBSD.org> AuthorDate: 2021-08-14 14:42:34 +0000 Commit: Mark Johnston <markj@FreeBSD.org> CommitDate: 2021-08-14 14:59:04 +0000 bhyve: Use pci(4) to access I/O port BARs This removes the dependency on /dev/io. PR: 251046 Reviewed by: jhb MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31308 usr.sbin/bhyve/pci_passthru.c | 65 ++++++++++++++++++------------------------- 1 file changed, 27 insertions(+), 38 deletions(-)
Review for removal of /dev/mem: https://reviews.freebsd.org/D31534
(In reply to Mark Johnston from comment #12) Mark thanks for your commit. If I understand it correctly, bhyve should also be modified to not require PRIV_KMEM_WRITE when doing PPT, right?
(In reply to Anatoli from comment #13) Right, that's handled by the revision linked in comment 12.
Mark, All, > --- Comment #3 from Mark Johnston <markj@FreeBSD.org> --- > PRIV_IO access is not required only by /dev/io, it is also required for > sysarch(I386_SET_IOPERM), which is otherwise available to jailed processes. So > the patch definitely should not be committed. A better solution would be to > extend pci(4) so that bhyve can use it to do everything required for PCI > passthrough. Even then I'm not sure why it's useful to jail the bhyve process > - what does it buy you? In light of the recently patched VM-escape vulnerability in bhyve (FreeBSD-SA-21:13.bhyve fixing the CVE-2021-29631), I'd like to highlight the benefits of running bhyve under a non-root user and inside a jail by default. If it were the case, this vulnerability, instead of a complete host takeover would just have a DoS impact on the malicious VM, which is perfectly fine IMO. That's why it's extremely important to make bhyve work correctly under all situations (including PPT) inside jail so we could make it run inside jail by default. > --- Comment #8 from Mark Johnston <markj@FreeBSD.org> --- > I am very skeptical that jailing bhyve with PCI passthrough enabled provides > any meaningful security. /dev/pci allows a jailed root to access all PCI(e) > devices in the system. Jails can be a useful deployment mechanism though, so I > think we should better support their integration with bhyve. With respect to this, isn't it possible to restrict the bhyve process (maybe self-restricting via Capsicum) to just the masked PCI addresses or to the PCI addresses specified via the args so to limit the impact of a bhyve compromise to just the intended device(s)? Or, as you already proposed, to extend pci(4) so that bhyve can use it to do everything required for PPT? Regards, Anatoli
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=c53f23984220bfd34c198966874c369006ea3246 commit c53f23984220bfd34c198966874c369006ea3246 Author: Mark Johnston <markj@FreeBSD.org> AuthorDate: 2021-08-14 14:42:34 +0000 Commit: Mark Johnston <markj@FreeBSD.org> CommitDate: 2021-08-29 16:39:53 +0000 bhyve: Use pci(4) to access I/O port BARs This removes the dependency on /dev/io. PR: 251046 Reviewed by: jhb Sponsored by: The FreeBSD Foundation (cherry picked from commit 42375556e5b2e68746d999b43d124040b6affb91) usr.sbin/bhyve/pci_passthru.c | 65 ++++++++++++++++++------------------------- 1 file changed, 27 insertions(+), 38 deletions(-)
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=e002d882ac2094047a8d5a8bef9252e5006b5828 commit e002d882ac2094047a8d5a8bef9252e5006b5828 Author: Mark Johnston <markj@FreeBSD.org> AuthorDate: 2021-10-09 15:36:19 +0000 Commit: Mark Johnston <markj@FreeBSD.org> CommitDate: 2021-11-15 18:11:52 +0000 bhyve: Map the MSI-X table unconditionally for passthrough It is possible for the PBA to reside in the same page as the MSI-X table. And, while devices are not supposed to do this, at least some Intel wifi devices place registers in a page shared with the MSI-X table. To handle the first case we currently map the PBA page using /dev/mem, and the second case is not handled. Kill two birds with one stone: map the MSI-X table BAR using the PCIOCBARMMAP ioctl instead of /dev/mem, and map the entire table so that accesses beyond the bounds of the table can be emulated. Regions of the BAR not containing the table are left unmapped. PR: 251046 Reviewed by: bz, grehan, jhb Sponsored by: The FreeBSD Foundation (cherry picked from commit 7fa2335347362378322a4d27cb40f6e6cd5dd0fb) usr.sbin/bhyve/pci_emul.h | 4 +- usr.sbin/bhyve/pci_passthru.c | 186 +++++++++++++++++------------------------- 2 files changed, 76 insertions(+), 114 deletions(-)
I've removed the dependency on /dev/mem and /dev/io in bhyve on main and stable/13. See https://cgit.freebsd.org/src/commit/?id=7fa2335347362378322a4d27cb40f6e6cd5dd0fb , I failed to tag this PR in the commit.