Boot log from an arm64 instance on GCE: Autoloading module: if_gve Autoloading module: virtio_random gve0: <gVNIC> mem 0x10203000-0x10203fff,0x10202000-0x1020203f,0x10100000-0x101fffff at device 0.0 on pci0 gve0: Unrecognized device option 0x5 not enabled. gve0: Unrecognized device option 0x6 not enabled. gve0: Failed to acquire any msix vectors gve0: No irq table, nothing to free gve0: No irq table, nothing to free gve0: No irq table, nothing to free Fatal data abort: x0: 0xffffa000ea784ba0 x1: 0xffffa000ea784ba0 x2: 0x000000000000000a x3: 0x000000000000000a x4: 0xffff00000088eff4 x5: 0x0000000000000041 x6: 0xffff00000052cfdc x7: 0xffff0000aca301d0 x8: 0x0000000000000000 x9: 0x0000000000000000 x10: 0x0000000000000001 x11: 0xfefefefefefefeff x12: 0xffff00000a656572 x13: 0x0000feff01000001 x14: 0x0000000000000000 x15: 0x0000000000000002 x16: 0xffff0000ade2cdc0 x17: 0xffff00000051bb70 x18: 0xffff0000aca30330 x19: 0xffffa000ea510018 x20: 0x0000000000000000 x21: 0x0000000000000006 x22: 0x0000000080040003 x23: 0xffff000000a22944 x24: 0xffff000000a899c0 x25: 0xffff000000a08e29 x26: 0xffffa000e7a49470 x27: 0x000000006097de09 x28: 0x0000000000000000 x29: 0xffff0000aca30330 sp: 0xffff0000aca30330 lr: 0xffff0000ade176fc elr: 0x0000000000000000 spsr: 0x0000000060400045 far: 0x0000000000000000 esr: 0x0000000086000004 panic: vm_fault failed: 0x0 error 1 cpuid = 0 time = 1716533241 KDB: stack backtrace: #0 0xffff000000525e30 at kdb_backtrace+0x58 #1 0xffff0000004d0d4c at vpanic+0x198 #2 0xffff0000004d0bb0 at panic+0x44 #3 0xffff0000008b795c at data_abort+0x2cc #4 0xffff000000893814 at handle_el1h_sync+0x14 Uptime: 23s
The panic is a red herring, that's just a bad error path that we should fix. The real problem is back here: gve0: Failed to acquire any msix vectors MSI-X interrupts in virtio_pci seem to be borked completely for whatever reason, and gve(4) can't cope with that like, e.g., nvme(4) can.
The panic looks like it is because the driver is calling a NULL function pointer. To track down that it would be useful to know what function the lr register is in. Can you get the full FreeBSD boot log?
Hi Andrew, sure! The full boot log: UEFI firmware (version built at 15:54:05 on Apr 2 2024) EMU Variable FVB Started EMU Variable invalid PCD sizes Found PL031 RTC @ 0x9010000 InitializeRealTimeClock: using default timezone/daylight settings [2J[01;01H[=3h[2J[01;01H[2J[01;01H[=3h[2J[01;01HBdsDxe: loading Boot0001 "UEFI Misc Device" from PciRoot(0x0)/Pci(0x2,0x0)/NVMe(0x1,00-00-00-00-00-00-00-00) BdsDxe: starting Boot0001 "UEFI Misc Device" from PciRoot(0x0)/Pci(0x2,0x0)/NVMe(0x1,00-00-00-00-00-00-00-00) UEFI: Attempting to start image. Description: UEFI Misc Device FilePath: PciRoot(0x0)/Pci(0x2,0x0)/NVMe(0x1,00-00-00-00-00-00-00-00) OptionNumber: 1. [2J[01;01H[2J[01;01H[01;01HConsoles: EFI console |/ Reading loader env vars from /efi/freebsd/loader.env Setting currdev to disk0p1: FreeBSD/arm64 EFI loader, Revision 1.1 Command line arguments: loader.efi Image base: 0x13c4c0000 EFI version: 2.70 EFI Firmware: EDK II (rev 1.00) Console: efi (0x1000) Load Path: \EFI\BOOT\BOOTAA64.EFI Load Device: PciRoot(0x0)/Pci(0x2,0x0)/NVMe(0x1,00-00-00-00-00-00-00-00)/HD(1,GPT,FE6C0EE6-1911-11EF-8459-0CC47ADA5F32,0x22,0x10418) BootCurrent: 0001 BootOrder: 0001[*] 0000 BootInfo Path: PciRoot(0x0)/Pci(0x2,0x0)/NVMe(0x1,00-00-00-00-00-00-00-00) Ignoring Boot0001: Only one DP found Trying ESP: PciRoot(0x0)/Pci(0x2,0x0)/NVMe(0x1,00-00-00-00-00-00-00-00)/HD(1,GPT,FE6C0EE6-1911-11EF-8459-0CC47ADA5F32,0x22,0x10418) Setting currdev to disk0p1: -Trying: PciRoot(0x0)/Pci(0x2,0x0)/NVMe(0x1,00-00-00-00-00-00-00-00)/HD(2,GPT,FE6C0EED-1911-11EF-8459-0CC47ADA5F32,0x1043A,0x200000) Setting currdev to disk0p2: \Trying: PciRoot(0x0)/Pci(0x2,0x0)/NVMe(0x1,00-00-00-00-00-00-00-00)/HD(3,GPT,FE6C0EF0-1911-11EF-8459-0CC47ADA5F32,0x21043A,0x800000) |Setting currdev to zfs:zroot/ROOT/default: /-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\|/[27;01H-\|/-\Loading /boot/defaults/loader.conf |Loading /boot/defaults/loader.conf Loading /boot/device.hints /-Loading /boot/loader.conf console vidconsole is unavailable console vidconsole is unavailable \Loading /boot/loader.conf.local |/-\[2J[01;01H[01;01H?c|/Loading kernel... -\|/boot/kernel/kernel text=0x2a8 text=0x9db150 /-\text=0x261054 data=0x150cb8 |data=0x0+0x2bc000 0x8+0x1516b0+0x8+0x17a5c2 Loading configured modules... /-\can't find '/boot/entropy' |/-\|/-\|/-\|/-\|/-\|/-\|/-\|/-\can't find 'aesni' |//boot/kernel/zfs.ko text=0xacd30 text=0x207b90 -data=0x2ce30+0xaabe4 0x8+0x34db8+0x8+0x2e521 \|can't find '/etc/hostid' / Booting [/boot/kernel/kernel]... No valid device tree blob found! WARNING! Trying to fire up the kernel, but no device tree blob found! ---<<BOOT>>--- Copyright (c) 1992-2023 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 14.1-BETA1 releng/14.1-n267604-25c2d762af7a GENERIC arm64 FreeBSD clang version 18.1.4 (https://github.com/llvm/llvm-project.git llvmorg-18.1.4-0-ge6c3289804a6) SRAT: Ignoring memory at addr 0x0 VT: init without driver. module scmi already present! real memory = 4294967296 (4096 MB) avail memory = 4154679296 (3962 MB) FreeBSD/SMP: Multiprocessor System Detected: 1 CPUs arc4random: WARNING: initial seeding bypassed the cryptographic random device because it was not yet seeded and the knob 'bypass_before_seeding' was enabled. random: entropy device external interface kbd0 at kbdmux0 acpi0: <Google GOOGFACP> acpi0: Power Button (fixed) acpi0: Sleep Button (fixed) acpi0: Could not update all GPEs: AE_NOT_CONFIGURED psci0: <ARM Power State Co-ordination Interface Driver> on acpi0 gic0: <ARM Generic Interrupt Controller v3.0> iomem 0x8000000-0x800ffff,0x80a0000-0x8ffffff on acpi0 its0: <ARM GIC Interrupt Translation Service> mem 0x8080000-0x809ffff on gic0 generic_timer0: <ARM Generic Timer> irq 5,6,7 on acpi0 Timecounter "ARM MPCore Timecounter" frequency 25000000 Hz quality 1000 Event timer "ARM MPCore Eventtimer" frequency 25000000 Hz quality 1000 efirtc0: <EFI Realtime Clock> efirtc0: registered as a time-of-day clock, resolution 1.000000s pmu0: <Performance Monitoring Unit> on acpi0 cpu0: <ACPI CPU> on acpi0 uart0: <PrimeCell UART (PL011)> iomem 0x9000000-0x9000fff irq 0 on acpi0 uart0: console (9600,n,8,1) uart1: <PrimeCell UART (PL011)> iomem 0x9001000-0x9001fff irq 1 on acpi0 uart2: <PrimeCell UART (PL011)> iomem 0x9002000-0x9002fff irq 2 on acpi0 uart3: <PrimeCell UART (PL011)> iomem 0x9003000-0x9003fff irq 3 on acpi0 acpi_ged0: <Generic Event Device> irq 4 on acpi0 acpi_ged0: Raw IRQ 50 acpi_button0: <Power Button> on acpi0 acpi_button1: <Sleep Button> on acpi0 pcib0: <Generic PCI host controller> on acpi0 pci0: <PCI bus> on pcib0 pci0: <network, ethernet> at device 0.0 (no driver attached) virtio_pci0: <VirtIO PCI (legacy) Entropy adapter> mem 0x10201000-0x1020103f at device 1.0 on pci0 nvme0: <Generic NVMe Device> mem 0x10000000-0x10003fff,0x10200000-0x1020003f at device 2.0 on pci0 nvme0: unable to allocate MSI-X armv8crypto0: <AES-CBC,AES-XTS,AES-GCM> Timecounters tick every 1.000 msec ZFS filesystem version: 5 ZFS storage pool version: features support (5000) usb_needs_explore_all: no devclass nvme0: temperature threshold not supported CPU 0: ARM Neoverse-N1 r3p1 affinity: 0 Cache Type = <64 byte D-cacheline,64 byte I-cacheline,PIPT ICache,64 byte ERG,64 byte CWG,IDC> Instruction Set Attributes 0 = <DP,RDM,Atomic,CRC32,SHA2,SHA1,AES+PMULL> Instruction Set Attributes 1 = <RCPC-8.3,DCPoP> Instruction Set Attributes 2 = <> Processor Features 0 = <CSV3,CSV2,GIC,AdvSIMD+HP,FP+HP,EL3,EL2,EL1,EL0> Processor Features 1 = <PSTATE.SSBS MSR> Memory Model Features 0 = <TGran4,SNSMem,16bit ASID,256TB PA> Memory Model Features 1 = <PAN+ATS1E1,LO,HPD,16bit VMID,HAF+DS> Memory Model Features 2 = <32bit CCIDX,48bit VA,UAO,CnP> Debug Features 0 = <MTPMU res0,2 CTX BKPTs,4 Watchpoints,6 Breakpoints,Debugv8> Debug Features 1 = <> Auxiliary Features 0 = <> Auxiliary Features 1 = <> AArch32 Instruction Set Attributes 5 = <> AArch32 Media and VFP Features 0 = <> AArch32 Media and VFP Features 1 = <> TCP_ratelimit: Is now initialized Trying to mount root from zfs:zroot/ROOT/default []... nda0 at nvme0 bus 0 scbus0 target 0 lun 1 nda0: <nvme_card-pd 2 nvme_card-pd> nda0: Serial Number nvme_card-pd nda0: nvme version 1.0 nda0: 10240MB (20971520 512 byte sectors) GEOM: nda0: the secondary GPT header is not in the last LBA. Setting hostuuid: 8c7b2107-d74c-a0ba-4e31-5747c0e41d46. Setting hostid: 0x51ecc71c. This system supports ZFS pool feature flags. Enabled the following features on 'zroot': async_destroy empty_bpobj lz4_compress multi_vdev_crash_dump spacemap_histogram enabled_txg hole_birth extensible_dataset embedded_data bookmarks filesystem_limits large_blocks large_dnode sha512 skein edonr userobj_accounting encryption project_quota device_removal obsolete_counts zpool_checkpoint spacemap_v2 allocation_classes resilver_defer bookmark_v2 redaction_bookmarks redacted_datasets bookmark_written log_spacemap livelist device_rebuild zstd_compress draid zilsaxattr head_errlog blake3 block_cloning vdev_zaps_v2 Pool 'zroot' has the bootfs property set, you might need to update the boot code. See gptzfsboot(8) and loader.efi(8) for details. Starting file system checks: /dev/gpt/efiesp: FILESYSTEM CLEAN; SKIPPING CHECKS Growing root partition to fill device nda0 recovered random: randomdev_wait_until_seeded unblock wait random: randomdev_wait_until_seeded unblock wait random: unblocking device. nda0p3 resized Mounting local filesystems:. Autoloading module: if_gve Autoloading module: virtio_random gve0: <gVNIC> mem 0x10203000-0x10203fff,0x10202000-0x1020203f,0x10100000-0x101fffff at device 0.0 on pci0 gve0: Unrecognized device option 0x5 not enabled. gve0: Unrecognized device option 0x6 not enabled. gve0: Failed to acquire any msix vectors gve0: No irq table, nothing to free gve0: No irq table, nothing to free gve0: No irq table, nothing to free Fatal data abort: x0: 0xffffa000ea784ba0 x1: 0xffffa000ea784ba0 x2: 0x000000000000000a x3: 0x000000000000000a x4: 0xffff00000088eff4 x5: 0x0000000000000041 x6: 0xffff00000052cfdc x7: 0xffff0000aca301d0 x8: 0x0000000000000000 x9: 0x0000000000000000 x10: 0x0000000000000001 x11: 0xfefefefefefefeff x12: 0xffff00000a656572 x13: 0x0000feff01000001 x14: 0x0000000000000000 x15: 0x0000000000000002 x16: 0xffff0000ade2cdc0 x17: 0xffff00000051bb70 x18: 0xffff0000aca30330 x19: 0xffffa000ea510018 x20: 0x0000000000000000 x21: 0x0000000000000006 x22: 0x0000000080040003 x23: 0xffff000000a22944 x24: 0xffff000000a899c0 x25: 0xffff000000a08e29 x26: 0xffffa000e7a49470 x27: 0x000000006097de09 x28: 0x0000000000000000 x29: 0xffff0000aca30330 sp: 0xffff0000aca30330 lr: 0xffff0000ade176fc elr: 0x0000000000000000 spsr: 0x0000000060400045 far: 0x0000000000000000 esr: 0x0000000086000004 panic: vm_fault failed: 0x0 error 1 cpuid = 0 time = 1716533241 KDB: stack backtrace: #0 0xffff000000525e30 at kdb_backtrace+0x58 #1 0xffff0000004d0d4c at vpanic+0x198 #2 0xffff0000004d0bb0 at panic+0x44 #3 0xffff0000008b795c at data_abort+0x2cc #4 0xffff000000893814 at handle_el1h_sync+0x14 Uptime: 23s Dumping 247 out of 4064 MB:..2%..12%..22%..31%..41%..51%..62%..72%..81%..91% Dump complete Automatic reboot in 15 seconds - press a key on the console to abort ================================================ I keep the original boot log and reproduction steps also here: https://docs.google.com/document/d/1iAVx83Hhb7jS9Q1goxZg8TflwLm2ZE2yagMdEalUkqw/edit
https://reviews.freebsd.org/D45489 fixes the error path panic. As for the inability to allocate msix vectors, I wonder if `hw.pci.honor_msi_blacklist` is relevant.
(In reply to shailend from comment #4) Thanks for the patch! I've built a new image based on 14.1-R https://people.freebsd.org/~lwhsu/tmp/gce/gce.raw.zst Ilya can you help testing it?
(In reply to Li-Wen Hsu from comment #5) Thanks! Could you please also create an image with the patch plus: hw.pci.honor_msi_blacklist="0" in /boot/loader.conf?
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=b81cbb12410b000074483899e61e9e767ba3ec1d commit b81cbb12410b000074483899e61e9e767ba3ec1d Author: Shailend Chand <shailend@google.com> AuthorDate: 2024-06-05 05:31:46 +0000 Commit: Xin LI <delphij@FreeBSD.org> CommitDate: 2024-06-18 06:08:31 +0000 gve: Make gve_free_qpls idempotent This fixes a panic caused by double free. PR: kern/279410 MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D45489 sys/dev/gve/gve_qpl.c | 1 + 1 file changed, 1 insertion(+)
Can someone get a verbose boot log? There are a few places pci_alloc_msix might fail & that would narrow down where.
A commit in branch stable/14 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=224e20ceb1212579887397b67c43b42d41108c62 commit 224e20ceb1212579887397b67c43b42d41108c62 Author: Shailend Chand <shailend@google.com> AuthorDate: 2024-06-05 05:31:46 +0000 Commit: Xin LI <delphij@FreeBSD.org> CommitDate: 2024-06-21 05:44:34 +0000 gve: Make gve_free_qpls idempotent This fixes a panic caused by double free. PR: kern/279410 Differential Revision: https://reviews.freebsd.org/D45489 (cherry picked from commit b81cbb12410b000074483899e61e9e767ba3ec1d) sys/dev/gve/gve_qpl.c | 1 + 1 file changed, 1 insertion(+)
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=14454f417201a6c1075768c1a571b22c6d4c57d2 commit 14454f417201a6c1075768c1a571b22c6d4c57d2 Author: Shailend Chand <shailend@google.com> AuthorDate: 2024-06-05 05:31:46 +0000 Commit: Xin LI <delphij@FreeBSD.org> CommitDate: 2024-06-21 05:45:58 +0000 gve: Make gve_free_qpls idempotent This fixes a panic caused by double free. PR: kern/279410 Differential Revision: https://reviews.freebsd.org/D45489 (cherry picked from commit b81cbb12410b000074483899e61e9e767ba3ec1d) sys/dev/gve/gve_qpl.c | 1 + 1 file changed, 1 insertion(+)
^Triage: committed and MFCed.