Summary: | bhyve NVMe emulation panic after LLVM 14 import to CURRENT | ||
---|---|---|---|
Product: | Base System | Reporter: | Michael Dexter <editor> |
Component: | bhyve | Assignee: | Mark Johnston <markj> |
Status: | Closed FIXED | ||
Severity: | Affects Some People | CC: | chuck, eduardo, markj |
Priority: | --- | ||
Version: | CURRENT | ||
Hardware: | amd64 | ||
OS: | Any |
Description
Michael Dexter
2022-08-10 04:18:41 UTC
On my system, bhyve segfaults after printing nvme_opc_write_read command would exceed LBA range(slba=0x2ffff0 nblocks=0x1) If I disassemble nvme_opc_write_read(), the end of the function (inlined into pci_nvme_write()) is: 0x000000000106bfb3 <+7763>: jmp 0x106bfbc <pci_nvme_write+7772> 0x000000000106bfb5 <+7765>: lea -0x3b292(%rip),%rsi # 0x1030d2a 0x000000000106bfbc <+7772>: lea -0x4008a(%rip),%rdx # 0x102bf39 0x000000000106bfc3 <+7779>: mov %r9,%rcx 0x000000000106bfc6 <+7782>: xor %eax,%eax 0x000000000106bfc8 <+7784>: call 0x1086010 <fprintf@plt> End of assembler dump. and that fprintf() call is the warning. If I disassemble past that, I get (gdb) x/16i 0x000000000106bfc8 0x106bfc8 <pci_nvme_write+7784>: call 0x1086010 <fprintf@plt> => 0x106bfcd: nopl (%rax) 0x106bfd0 <pci_nvme_read>: push %rbp 0x106bfd1 <pci_nvme_read+1>: mov %rsp,%rbp 0x106bfd4 <pci_nvme_read+4>: push %r15 0x106bfd6 <pci_nvme_read+6>: push %r14 0x106bfd8 <pci_nvme_read+8>: push %r13 so we're just running off the end of the function and going into pci_nvme_read(). That's pretty weird! I thought the compiler would insert breakpoints between functions. Maybe there is some UB happening here, but compiling bhyve with UBSAN makes the problem go away. We compile bhyve with many warnings disabled; enabling them for pci_nvme.c uncovers some actual bugs, but fixing them doesn't fix the problem. And it's really bizarre that the compiler is apparently assuming that fprintf() won't return. I lost a word on the title. Here's hoping a rename will not break anything. Looks like an instance of "compiler does something stupid when it sees a use of an uninitialized variable." The compiler is failing us here, but fixing the pci_nvme code resolves the problem: https://reviews.freebsd.org/D36119 A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=b6ecef28bfd7c1c267442fae1c8f2fe0f699f617 commit b6ecef28bfd7c1c267442fae1c8f2fe0f699f617 Author: Mark Johnston <markj@FreeBSD.org> AuthorDate: 2022-08-14 15:57:24 +0000 Commit: Mark Johnston <markj@FreeBSD.org> CommitDate: 2022-08-14 15:59:01 +0000 bhyve: Address uses of uninitialized variables in pci_nvme.c The debug print in nvme_opc_get_log_page() would print an uninitialized local variable. In nvme_opc_write_read(), a failed LBA bounds check would cause pci_nvme_stats_write_read_update() to be called with an uninitialized variable as a parameter. Although the parameter is unused when the check fails (and so status != 0), LLVM 14 emits some bogus machine code in this path, which happens to result in a segfault when it gets executed. PR: 265749 Reviewed by: chuck, emaste MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D36119 usr.sbin/bhyve/pci_nvme.c | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) Tested this fixes the original case. Thank you! Closing with Chuck's permission as the original test passes. Thank you everyone who helped track down this highly-undefined behavior! A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=b8e33d1abeae18c0441583f912ff9dc85c628180 commit b8e33d1abeae18c0441583f912ff9dc85c628180 Author: Mark Johnston <markj@FreeBSD.org> AuthorDate: 2022-08-14 15:57:24 +0000 Commit: Mark Johnston <markj@FreeBSD.org> CommitDate: 2022-08-29 15:01:01 +0000 bhyve: Address uses of uninitialized variables in pci_nvme.c The debug print in nvme_opc_get_log_page() would print an uninitialized local variable. In nvme_opc_write_read(), a failed LBA bounds check would cause pci_nvme_stats_write_read_update() to be called with an uninitialized variable as a parameter. Although the parameter is unused when the check fails (and so status != 0), LLVM 14 emits some bogus machine code in this path, which happens to result in a segfault when it gets executed. PR: 265749 Reviewed by: chuck, emaste Sponsored by: The FreeBSD Foundation (cherry picked from commit b6ecef28bfd7c1c267442fae1c8f2fe0f699f617) usr.sbin/bhyve/pci_nvme.c | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) |