In the recent FreeBSD arm64 community preview image in Azure, during reboot, I am seeing this panic sometime: . ELF ldconfig path: /lib /usr/lib /usr/lib/compat /usr/local/lib /usr/local/lib/compat/pkg /usr/local/lib/compat/pkg lo0: link state changed to UP Kernel page fault with the following non-sleepable locks held: exclusive rm rib head lock (rib head lock) r = 0 (0xffffa000012278e0) locked @ /usr/src/sys/net/route/route_ctl.c:797 stack backtrace: #0 0xffff0000004d2af4 at witness_debugger+0x5c #1 0xffff0000004d3cf8 at witness_warn+0x400 #2 0xffff0000007f7310 at data_abort+0xa0 #3 0xffff0000007d3014 at handle_el1h_sync+0x14 x0: 0x0000000000000001 x1: 0x0000000000000100 x2: 0xffffa00001ae7000 x3: 0xffff00004031af40 ($d.2 + 0x3efee96f) x4: 0x0000000000000100 x5: 0x0000000000000000 x6: 0x000000000000003f x7: 0x0000000000000000 x8: 0xffff000132c76c40 x9: 0x0000000000000000 x10: 0x0000000000000008 x11: 0x0000000000000000 x12: 0x000000000000003e x13: 0xffffa00001ae70fc x14: 0x0000000000000000 x15: 0x0000000000000001 x16: 0x0000000000010000 x17: 0x0000000000000005 x18: 0xffff00012d2f7e60 x19: 0xffff00012d2f8080 x20: 0xffffa00001227800 x21: 0x0000000000000000 x22: 0xdeadc0dedeadc0de x23: 0xffffa000012278e0 x24: 0xffffa00001227800 x25: 0xffffa0000c93ba00 x26: 0xffff000000960582 (digits + 0x12fbf) x27: 0xffffa0000c9338f0 x28: 0x0000000000000000 x29: 0xffff00012d2f7e60 sp: 0xffff00012d2f7e60 lr: 0xffff0000005bf63c (rib_notify + 0x50) elr: 0xffffa0000c93bb00 spsr: 0x0000000060400045 far: 0xffffa0000c93bb00 esr: 0x000000008600000e panic: data abort in critical section or under mutex cpuid = 3 time = 1690047394 KDB: stack backtrace: db_trace_self() at db_trace_self db_trace_self_wrapper() at db_trace_self_wrapper+0x30 vpanic() at vpanic+0x13c panic() at panic+0x44 data_abort() at data_abort+0x30c handle_el1h_sync() at handle_el1h_sync+0x14 --- exception, esr 0x8600000e (null)() at 0xffffa0000c93bb00 add_route() at add_route+0xc4 add_route_flags() at add_route_flags+0x1b0 rib_add_route() at rib_add_route+0x324 ifa_maintain_loopback_route() at ifa_maintain_loopback_route+0xf4 in6_update_ifa() at in6_update_ifa+0x994 in6_ifattach() at in6_ifattach+0x1bc in6_if_up() at in6_if_up+0x90 if_up() at if_up+0xd8 ifhwioctl() at ifhwioctl+0xb7c ifioctl() at ifioctl+0x860 kern_ioctl() at kern_ioctl+0x2dc sys_ioctl() at sys_ioctl+0x118 do_el0_sync() at do_el0_sync+0x520 handle_el0_sync() at handle_el0_sync+0x44 --- exception, esr 0x56000000 KDB: enter: panic [ thread pid 203 tid 100109 ] Stopped at kdb_enter+0x44: str xzr, [x19, #3328] db> bt Tracing pid 203 tid 100109 td 0xffff000132c76c40 db_trace_self() at db_trace_self db_stack_trace() at db_stack_trace+0x11c db_command() at db_command+0x2d8 db_command_loop() at db_command_loop+0x54 db_trap() at db_trap+0xf8 kdb_trap() at kdb_trap+0x20c handle_el1h_sync() at handle_el1h_sync+0x14 --- exception, esr 0xf2000000 kdb_enter() at kdb_enter+0x44 vpanic() at vpanic+0x178 panic() at panic+0x44 data_abort() at data_abort+0x30c handle_el1h_sync() at handle_el1h_sync+0x14 --- exception, esr 0x8600000e (null)() at 0xffffa0000c93bb00 add_route() at add_route+0xc4 add_route_flags() at add_route_flags+0x1b0 rib_add_route() at rib_add_route+0x324 ifa_maintain_loopback_route() at ifa_maintain_loopback_route+0xf4 in6_update_ifa() at in6_update_ifa+0x994 in6_ifattach() at in6_ifattach+0x1bc in6_if_up() at in6_if_up+0x90 if_up() at if_up+0xd8 ifhwioctl() at ifhwioctl+0xb7c ifioctl() at ifioctl+0x860 kern_ioctl() at kern_ioctl+0x2dc sys_ioctl() at sys_ioctl+0x118 do_el0_sync() at do_el0_sync+0x520 handle_el0_sync() at handle_el0_sync+0x44 --- exception, esr 0x56000000 db> The uname details: 14.0-CURRENT FreeBSD 14.0-CURRENT #1 main-n263931-5aee3e14d491-dirty: Mon Jul 3 14:15:14 UTC 2023 root@poudriere:/usr/obj/usr/src/arm64.aarch64/sys/GENERIC arm64 And ifconfig details : schakrabarti@schakrabarti-freebsd-arm:~ $ ifconfig -a lo0: flags=1008049<UP,LOOPBACK,RUNNING,MULTICAST,LOWER_UP> metric 0 mtu 16384 options=680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6> inet 127.0.0.1 netmask 0xff000000 inet6 ::1 prefixlen 128 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1 groups: lo nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> hn0: flags=1008843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu 1500 options=0 ether 00:0d:3a:1b:a5:92 inet 10.0.0.4 netmask 0xffffff00 broadcast 10.0.0.255 media: Ethernet 100GBase-CR4 <full-duplex,rxpause,txpause> status: active nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> mce0: flags=1008a43<UP,BROADCAST,RUNNING,ALLMULTI,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu 1500 options=18a00a8<VLAN_MTU,JUMBO_MTU,VLAN_HWCSUM,NV,LINKSTATE,HWSTATS,TXRTLMT> ether 00:0d:3a:1b:a5:92 media: Ethernet 100GBase-CR4 <full-duplex,rxpause,txpause> status: active nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
root@:/ # reboot 2023-07-22T17:45:11.347219+00:00 - init 1 - - single user shell terminated. Waiting (max 60 seconds) for system process `vnlru' to stop... done Waiting (max 60 seconds) for system process `syncer' to stop... Syncing disks, vnodes remaining... 0 0 0 0 0 0 0 0 0 done All buffers synced. Uptime: 46s Starting CPU 1 (1) Starting CPU 2 (2) Starting CPU 3 (3) FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs random: unblocking device. random: entropy device external interface kbd0 at kbdmux0 acpi0: <VRTUAL MICROSFT> acpi0: Could not update all GPEs: AE_NOT_CONFIGURED psci0: <ARM Power State Co-ordination Interface Driver> on acpi0 gic0: <ARM Generic Interrupt Controller v3.0> iomem 0xffff0000-0xffffffff,0xeffee000-0xf000dfff,0xf000e000-0xf002dfff,0xf002e000-0xf004dfff,0xf004e000-0xf006dfff on acpi0 generic_timer0: <ARM Generic Timer> irq 4,5,6 on acpi0 Timecounter "ARM MPCore Timecounter" frequency 25000000 Hz quality 1000 Event timer "ARM MPCore Eventtimer" frequency 25000000 Hz quality 1000 efirtc0: <EFI Realtime Clock> efirtc0: registered as a time-of-day clock, resolution 1.000000s pmu0: <Performance Monitoring Unit> on acpi0 cpu0: <ACPI CPU> on acpi0 acpi_syscontainer0: <System Container> on acpi0 vmbus0: <Hyper-V Vmbus> on acpi_syscontainer0 vmgenc0: <VM Generation Counter> on acpi0 acpi_ged0: <Generic Event Device> irq 3 on acpi0 acpi_ged0: Raw IRQ 35 uart0: <PrimeCell UART (PL011)> iomem 0xeffec000-0xeffecfff irq 0 on acpi0 uart0: console (115200,n,8,1) uart1: <PrimeCell UART (PL011)> iomem 0xeffeb000-0xeffebfff irq 1 on acpi0 vmbus_res0: <Hyper-V Vmbus Resource> irq 2 on acpi0 armv8crypto0: <AES-CBC,AES-XTS,AES-GCM> Timecounters tick every 10.000 msec usb_needs_explore_all: no devclass CPU 0: ARM Neoverse-N1 r3p1 affinity: 0 Cache Type = <64 byte D-cacheline,64 byte I-cacheline,PIPT ICache,64 byte ERG,64 byte CWG,IDC,DIC> Instruction Set Attributes 0 = <DP,RDM,Atomic,CRC32,SHA2,SHA1,AES+PMULL> Instruction Set Attributes 1 = <RCPC-8.3,DCPoP> Instruction Set Attributes 2 = <> Trying to mount root from ufs:/dev/gpt/rootfs [rw]... Processor Features 0 = <CSV2,GIC,AdvSIMD+HP,FP+HP,EL3,EL2,EL1,EL0 32> Processor Features 1 = <> Memory Model Features 0 = <TGran4,TGran64,TGran16,16bit ASID,256TB PA> Memory Model Features 1 = <PAN+ATS1E1,8bit VMID,HAF+DS> Memory Model Features 2 = <32bit CCIDX,48bit VA,UAO> Debug Features 0 = <DoubleLock,2 CTX BKPTs,4 Watchpoints,6 Breakpoints,PMUv3 v8.1,Debugv8> Debug Features 1 = <> Auxiliary Features 0 = <> Auxiliary Features 1 = <> AArch32 Instruction Set Attributes 5 = <RDM,CRC32,SHA2,SHA1,AES+VMULL,SEVL> AArch32 Media and VFP Features 0 = <FPRound,FPSqrt,FPDivide,DP VFPv3+v4,SP VFPv3+v4,AdvSIMD> AArch32 Media and VFP Features 1 = <SIMDFMAC,FPHP Arith,SIMDHP Arith,SIMDSP,SIMDInt,SIMDLS,FPDNaN,FPFtZ> CPU 1: ARM Neoverse-N1 r3p1 affinity: 1 CPU 2: ARM Neoverse-N1 r3p1 affinity: 2 CPU 3: ARM Neoverse-N1 r3p1 affinity: 3 Release APs...done mountroot: waiting for device /dev/gpt/rootfs... vmbus0: irq 0x2, vector 0 end 0x2 vmbus0: the irq 18 vmbus0: version 4.0 hvshutdown0: <Hyper-V Shutdown> on vmbus0 hvtimesync0: <Hyper-V Timesync> on vmbus0 storvsc0: <Hyper-V SCSI> on vmbus0 storvsc1: <Hyper-V SCSI>da0 at storvsc0 bus 0 scbus0 target 0 lun 0 da0: <Msft Virtual Disk 1.0> Fixed Direct Access SPC-3 SCSI device da0: 300.000MB/s transfers da0: Command Queueing enabled da0: 30753MB (62982144 512 byte sectors) on vmbus0 da1 at storvsc0 bus 0 scbus0 target 0 lun 1 da1: <Msft Virtual Disk 1.0> Fixed Direct Access SPC-3 SCSI device da1: 300.000MB/s transfers da1: Command Queueing enabled da1: 153600MB (314572800 512 byte sectors) cd0 at storvsc0 bus 0 scbus0 target 0 lun 2 cd0: <Msft Virtual DVD-ROM 1.0> Removable CD-ROM SPC-3 SCSI device cd0: 300.000MB/s transfers cd0: Attempt to query device size failed: NOT READY, Medium not present - tray closed hvheartbeat0: <Hyper-V Heartbeat> on vmbus0 hvkbd0: <Hyper-V KBD> on vmbus0 kbd1 at hvkbd0 hn0: <Hyper-V Network Interface> on vmbus0 Dual Console: Serial Primary, Video Secondary No suitable dump device was found. Setting hostuuid: eda9654c-22a6-4730-8401-25f7a3ea1e4a. Setting hostid: 0xa1ce1a7a. Starting file system checks: /dev/gpt/rootfs: FILE SYSTEM CLEAN; SKIPPING CHECKS /dev/gpt/rootfs: clean, 6765546 free (130 frags, 845677 blocks, 0.0% fragmentation) /dev/gpt/efiesp: 4 files, 31 MiB free (63844 clusters) FIXED /dev/gpt/efiesp: MARKING FILE SYSTEM CLEAN Mounting local filesystems:. Autoloading module: hv_hid hvhid0: <Hyper-V HID device> on vmbus0 hidbus0: <HID bus> on hvhid0 hms0: <Hyper-V Tablet> on hidbus0 hms0: 5 buttons and [XYW] coordinates ID=0 Setting hostname: schakrabarti-freebsd-arm. Setting up harvesting: PURE_VMGENID,[CALLOUT],[UMA],[FS_ATIME],SWI,INTERRUPT,NET_NG,[NET_ETHER],NET_TUN,MOUSE,KEYBOARD,ATTACH,CACHED Feeding entropy: hn0: got notify, nvs type 128 . ELF ldconfig path: /lib /usr/lib /usr/lib/compat /usr/local/lib /usr/local/lib/compat/pkg /usr/local/lib/compat/pkg lo0: link state changed to UP Kernel page fault with the following non-sleepable locks held: exclusive rm rib head lock (rib head lock) r = 0 (0xffffa000012278e0) locked @ /usr/src/sys/net/route/route_ctl.c:797 stack backtrace:
The instruction here: add_route() at add_route+0xc4 is a call to rib_notify.
This is only seen in Azure FreeBSD arm64 preview image not in x86 preview.
Also it only happens if the reboot is called from the serial console, but if the reboot is done from the Azure portal, then there is no issue.
Is there any update on this issue? It is very easy to reproduce. Just provision the FreeBSD arm64 testing image from the private preview and after provision, use serial console and type reboot there. In one or two attempts this panic will be happening.
One more thing, this is happening for ipv6 address of loopback interface.
I have added two debug printfs in add route and post that we are not hitting the panic during reboot. I guess some race condition is happening, which is why the delay from printfs is mitigating the panic. I have tried 10 reboots with this, and no issue. diff --git a/sys/net/route/route_ctl.c b/sys/net/route/route_ctl.c index 9c9b148eba19..aaf404d565a1 100644 --- a/sys/net/route/route_ctl.c +++ b/sys/net/route/route_ctl.c @@ -1237,6 +1237,7 @@ add_route(struct rib_head *rnh, struct rtentry *rt, rc->rc_nh_new = rnd->rnd_nhop; rc->rc_nh_weight = rnd->rnd_weight; + printf("add_route called before rib_notify\n"); rib_notify(rnh, RIB_NOTIFY_IMMEDIATE, rc); return (0); } diff --git a/sys/net/route/route_subscription.c b/sys/net/route/route_subscription.c index 510b5117df1b..4a9cc0c5f800 100644 --- a/sys/net/route/route_subscription.c +++ b/sys/net/route/route_subscription.c @@ -58,7 +58,7 @@ rib_notify(struct rib_head *rnh, enum rib_subscription_type type, struct rib_cmd_info *rc) { struct rib_subscription *rs; - + printf("rnh is %s",rnh? "not null": "null"); CK_STAILQ_FOREACH(rs, &rnh->rnh_subscribers, next) { if (rs->type == type) rs->func(rnh, rc, rs->arg);
Looks like in 11th reboot it happened again.
What does rs->func point to in rib_notify when it panics?
Looks like the bus dma alignment patch fixes this issue as well https://reviews.freebsd.org/D41728
So is this bug 273694 perhaps also related?
(In reply to Mina Galić from comment #11) No, as it is specific to the VMBus of Hyper-V running on ARM64 cpu.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=e7a9817b8d328dda04069b65944ce2ed6f54c6f0 commit e7a9817b8d328dda04069b65944ce2ed6f54c6f0 Author: Souradeep Chakrabarti <schakrabarti@microsoft.com> AuthorDate: 2023-09-14 07:11:25 +0000 Commit: Wei Hu <whu@FreeBSD.org> CommitDate: 2023-09-14 07:11:25 +0000 Hyper-V: vmbus: implementat bus_get_dma_tag in vmbus In ARM64 Hyper-V UFS filesystem is getting corruption and those corruptions are consistently happening just after hitting a page boundary. It is unable to correctly read disk blocks into buffers that are not aligned to 512-byte boundaries. It happens because storvsc needs physically contiguous memory which may not be the case when bus_dma needs to create a bounce buffer. This can happen when the destination is not cache-line aligned. Hyper-V VMs have VMbus synthetic devices and PCI pass-thru devices that are added dynamically via the VMbus protocol and are not represented in the ACPI DSDT. Only the top level VMbus node exists in the DSDT. As such, on ARM64 these devices don't pick up coherence information and default to not hardware coherent. PR: 267654, 272666 Reviewed by: andrew, whu Tested by: lwhsu MFC after: 3 days Sponsored by: Microsoft Differential Revision: https://reviews.freebsd.org/D41728 sys/dev/hyperv/vmbus/vmbus.c | 33 +++++++++++++++++++++++++++++++++ sys/dev/hyperv/vmbus/vmbus_var.h | 1 + 2 files changed, 34 insertions(+)
A commit in branch stable/14 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=85bc81352e4b0d0a9da251bacec35eec130eee49 commit 85bc81352e4b0d0a9da251bacec35eec130eee49 Author: Souradeep Chakrabarti <schakrabarti@microsoft.com> AuthorDate: 2023-09-14 07:11:25 +0000 Commit: Wei Hu <whu@FreeBSD.org> CommitDate: 2023-09-18 10:26:59 +0000 Hyper-V: vmbus: implementat bus_get_dma_tag in vmbus In ARM64 Hyper-V UFS filesystem is getting corruption and those corruptions are consistently happening just after hitting a page boundary. It is unable to correctly read disk blocks into buffers that are not aligned to 512-byte boundaries. It happens because storvsc needs physically contiguous memory which may not be the case when bus_dma needs to create a bounce buffer. This can happen when the destination is not cache-line aligned. Hyper-V VMs have VMbus synthetic devices and PCI pass-thru devices that are added dynamically via the VMbus protocol and are not represented in the ACPI DSDT. Only the top level VMbus node exists in the DSDT. As such, on ARM64 these devices don't pick up coherence information and default to not hardware coherent. PR: 267654, 272666 Reviewed by: andrew, whu Tested by: lwhsu MFC after: 3 days Sponsored by: Microsoft Differential Revision: https://reviews.freebsd.org/D41728 (cherry picked from commit e7a9817b8d328dda04069b65944ce2ed6f54c6f0) sys/dev/hyperv/vmbus/vmbus.c | 33 +++++++++++++++++++++++++++++++++ sys/dev/hyperv/vmbus/vmbus_var.h | 1 + 2 files changed, 34 insertions(+)
A commit in branch releng/14.0 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=48a799af88c0c77963ffb23eba797f85f86751dd commit 48a799af88c0c77963ffb23eba797f85f86751dd Author: Souradeep Chakrabarti <schakrabarti@microsoft.com> AuthorDate: 2023-09-14 07:11:25 +0000 Commit: Wei Hu <whu@FreeBSD.org> CommitDate: 2023-09-18 14:57:57 +0000 Hyper-V: vmbus: implementat bus_get_dma_tag in vmbus In ARM64 Hyper-V UFS filesystem is getting corruption and those corruptions are consistently happening just after hitting a page boundary. It is unable to correctly read disk blocks into buffers that are not aligned to 512-byte boundaries. It happens because storvsc needs physically contiguous memory which may not be the case when bus_dma needs to create a bounce buffer. This can happen when the destination is not cache-line aligned. Hyper-V VMs have VMbus synthetic devices and PCI pass-thru devices that are added dynamically via the VMbus protocol and are not represented in the ACPI DSDT. Only the top level VMbus node exists in the DSDT. As such, on ARM64 these devices don't pick up coherence information and default to not hardware coherent. Approved by: re (gjb) PR: 267654, 272666 Reviewed by: andrew, whu Tested by: lwhsu Sponsored by: Microsoft Differential Revision: https://reviews.freebsd.org/D41728 (cherry picked from commit e7a9817b8d328dda04069b65944ce2ed6f54c6f0) (cherry picked from commit 85bc81352e4b0d0a9da251bacec35eec130eee49) sys/dev/hyperv/vmbus/vmbus.c | 33 +++++++++++++++++++++++++++++++++ sys/dev/hyperv/vmbus/vmbus_var.h | 1 + 2 files changed, 34 insertions(+)