People are seeing some different call-traces with FreeBSD 11.2 and/or the latest CURRENT code recently and some call-traces are mentioned here : https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=220453 #1: panic in namei(): https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=220453#c5 #2: panic in vmbus_msghc_exec(): https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=220453#c7 It's reported FreeBSD-12.0-CURRENT-amd64-20180618-r335317-disc1.iso also reproduces this call-panic. #3: No detailed call-trace. It looks the panic happens just after root is mounted.: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=220453#c14 But it looks I can't reproduce the issue any more, after I fresh-build a kernel with the same version. Not sure if this is some kind of weird memory corruption issue.
Thanks pete for reporting the issue! No need for you to test UFS, as I think we can reproduce the issue with UFS, and our internal testing team is trying to narrow down the issue by bisecting. Especially, symptom #2 happens to the Hyper-V vmbus driver, which hasn't been updated for 8 months, except for the PTI patch. I suspect something unusual, like unexpected memory corruption, is happening here.
This is definitely a better place to discus this, sorry for hijacking the other thread! Just so you know, I rebuilt with the latest STABLE, and have just rebooted using that, loading the module at boot time, and it seems to be panicking again. Unfortunately my serial console has also stopped working on the test machine, so I can't verify that its the same panic. Will fiddle about and see if I can get that back somehow and let you know.
(In reply to pete from comment #2) Hi pete, can you please help to test the FreeBSD 11.2-RC3 iso (https://download.freebsd.org/ftp/releases/amd64/amd64/ISO-IMAGES/11.2/FreeBSD-11.2-RC3-amd64-dvd1.iso ? Our test team didn't find the issue with this .iso. The latest HEAD is indeed broken, and the first bad commit is https://github.com/freebsd/freebsd/commit/8222f5cb7cdf1ad98c5f281437e35ec2d5da0b96 (or SVN r335068) according to our test team, but I haven't verified it myself. It looks Olivier Houchard fixed it with r335182: https://www.mail-archive.com/svn-src-all@freebsd.org/msg164885.html
Am not sure how I would set about booting from an ISO in Azure, as have always done source upgrades, but I can build the latest 11.2 tree, form which that ISO was made, with no options and try that. Will have a go tomorrow - the test machine I upgraded to -STABLE is locking up in Azure a lot, and current inaccessible unfortunately.
(In reply to pete from comment #4) Building the latest 11.2 tree should be equivalent to the 11.2 iso. 11.2 will be released very soon. If there is a major issue with it, we must try to fix it ASAP, and request an errata kernel update. We're short of hands to test FreeBSD and fix bugs recently... According to our test team, r335182 can't fix the panic issue. :-(
I am building from 11.2 now. Looking at the commits you referenced, the big change was for kernel malloc to return unelectable memory unless requested, yes ? If that was causing the panic though, then would it not also affect the model when loaded after booting. Which seems to work fine. By the way, when I am using this, I should only have config in rc.conf for the mlxen0 device, and not simultaneously for hn0 if I understand this right ? Will let you know the test results in an hour or so hopefully.
______ ____ _____ _____ | ____| | _ \ / ____| __ \ | |___ _ __ ___ ___ | |_) | (___ | | | | | ___| '__/ _ \/ _ \| _ < \___ \| | | | | | | | | __/ __/| |_) |____) | |__| | | | | | | | || | | | |_| |_| \___|\___||____/|_____/|_____/ ``` ` s` `.....---.......--.``` -/ +============Welcome to FreeBSD===========+ +o .--` /y:` +. | | yo`:. :o `+- | 1. Boot Multi User [Enter] | y/ -/` -o/ | 2. Boot [S]ingle User | .- ::/sy+:. | 3. [Esc]ape to loader prompt | / `-- / | 4. Reboot | `: :` | | `: :` | Options: | / / | 5. [K]ernel: kernel (1 of 2) | .- -. | 6. Configure Boot [O]ptions... | -- -. | 7. Select Boot [E]nvironment... | `:` `:` | | .-- `--. | | .---.....----. +=========================================+ /boot/kernel/kernel text=0x153d048 data=0x143f70+0x4bc418 syms=[0x8+0x16ad00 +0x8+0x183cac] /boot/entropy size=0x1000 /boot/kernel/mlx4.ko size 0x6a6e0 at 0x222d000 loading required module 'linuxkpi' /boot/kernel/linuxkpi.ko size 0x26550 at 0x2298000 /boot/kernel/aesni.ko size 0xaec8 at 0x22bf000 /boot/kernel/cryptodev.ko size 0x7740 at 0x22ca000 /boot/kernel/zfs.ko size 0x37ff00 at 0x22d2000 loading required module 'opensolaris' /boot/kernel/opensolaris.ko size 0xa380 at 0x2652000 Booting... Copyright (c) 1992-2018 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 11.2-RELEASE #0 r335553: Fri Jun 22 11:10:58 UTC 2018 webadmin@test-machine:/usr/obj/usr/src/sys/GENERIC amd64 FreeBSD clang version 6.0.0 (tags/RELEASE_600/final 326565) (based on LLVM 6.0.0) SRAT: Ignoring memory at addr 0x4e0200000 SRAT: Ignoring memory at addr 0x1000000000 SRAT: Ignoring memory at addr 0x10000200000 SRAT: Ignoring memory at addr 0x20000200000 VT(vga): text 80x25 Hyper-V Version: 10.0.14393 [SP0] Features=0x2e7f<VPRUNTIME,TMREFCNT,SYNIC,SYNTM,APIC,HYPERCALL,VPINDEX,REFTSC,IDLE,TMFREQ> PM Features=0x0 [C2] Features3=0xed7b2<DEBUG,XMMHC,IDLE,NUMA,TMFREQ,SYNCMC,CRASH,NPIEP> Timecounter "Hyper-V" frequency 10000000 Hz quality 2000 CPU: Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz (2394.46-MHz K8-class CPU) Origin="GenuineIntel" Id=0x306f2 Family=0x6 Model=0x3f Stepping=2 Features=0x1f83fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE,SSE2,SS,HTT> Features2=0xfed83223<SSE3,PCLMULQDQ,VMX,SSSE3,FMA,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND,HV> AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM> AMD Features2=0x21<LAHF,ABM> Structured Extended Features=0x23a9<FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,NFPUSG> XSAVE Features=0x1<XSAVEOPT> VT-x: PAT,HLT,PAUSE,EPT,VPID Hypervisor: Origin = "Microsoft Hv" real memory = 17179869184 (16384 MB) avail memory = 16513617920 (15748 MB) Event timer "LAPIC" quality 100 ACPI APIC Table: <VRTUAL MICROSFT> FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs FreeBSD/SMP: 1 package(s) x 2 core(s) x 2 hardware threads ioapic0: Changing APIC ID to 0 ioapic0 <Version 1.1> irqs 0-23 on motherboard SMP: AP CPU #3 Launched! SMP: AP CPU #1 Launched! SMP: AP CPU #2 Launched! random: entropy device external interface Timecounter "Hyper-V-TSC" frequency 10000000 Hz quality 3000 kbd1 at kbdmux0 netmap: loaded module module_register_init: MOD_LOAD (vesa, 0xffffffff80feab30, 0) error 19 random: registering fast source Intel Secure Key RNG random: fast provider: "Intel Secure Key RNG" nexus0 vtvga0: <VT VGA driver> on motherboard cryptosoft0: <software crypto> on motherboard aesni0: <AES-CBC,AES-XTS,AES-GCM,AES-ICM> on motherboard acpi0: <VRTUAL MICROSFT> on motherboard acpi0: Power Button (fixed) cpu0: <ACPI CPU> on acpi0 cpu1: <ACPI CPU> on acpi0 cpu2: <ACPI CPU> on acpi0 cpu3: <ACPI CPU> on acpi0 attimer0: <AT timer> port 0x40-0x43 irq 0 on acpi0 Timecounter "i8254" frequency 1193182 Hz quality 0 Event timer "i8254" frequency 1193182 Hz quality 100 atrtc0: <AT realtime clock> port 0x70-0x71 irq 8 on acpi0 atrtc0: registered as a time-of-day clock, resolution 1.000000s Event timer "RTC" frequency 32768 Hz quality 0 Timecounter "ACPI-fast" frequency 3579545 Hz quality 900 acpi_timer0: <32-bit timer at 3.579545MHz> port 0x408-0x40b on acpi0 pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0 pcib0: Length mismatch for 3 range: 20000000 vs 1fffffff vmbus0: <Hyper-V Vmbus> on pcib0 pci0: <ACPI PCI bus> on pcib0 isab0: <PCI-ISA bridge> at device 7.0 on pci0 isa0: <ISA bus> on isab0 atapci0: <Intel PIIX4 UDMA33 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xffa0-0xffaf at device 7.1 on pci0 ata0: <ATA channel> at channel 0 on atapci0 ata1: <ATA channel> at channel 1 on atapci0 pci0: <bridge> at device 7.3 (no driver attached) vgapci0: <VGA-compatible display> mem 0xf8000000-0xfbffffff irq 11 at device 8.0 on pci0 vgapci0: Boot video device atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0 atkbd0: <AT Keyboard> irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] psm0: <PS/2 Mouse> irq 12 on atkbdc0 psm0: [GIANT-LOCKED] psm0: model IntelliMouse Explorer, device ID 4 uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 uart0: console (115200,n,8,1) uart1: <16550 or compatible> port 0x2f8-0x2ff irq 3 on acpi0 fdc0: <floppy drive controller (FDE)> port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 onacpi0 fd0: <1440-KB 3.5" drive> on fdc0 drive 0 vmbus_res0: <Hyper-V Vmbus Resource> irq 5,7 on acpi0 orm0: <ISA Option ROM> at iomem 0xc0000-0xcbfff on isa0 vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 ppc0: cannot reserve I/O port range ZFS filesystem version: 5 ZFS storage pool version: features support (5000) Timecounters tick every 10.000 msec usb_needs_explore_all: no devclass vmbus0: version 3.0 hvet0: <Hyper-V event timer> on vmbus0 Event timer "Hyper-V" frequency 10000000 Hz quality 1000 storvsc0: <Hyper-V IDE> on vmbus0 storvsc1: <Hyper-V IDE> on vmbus0 hvkbd0: <Hyper-V KBD> on vmbus0 hvheartbeat0: <Hyper-V Heartbeat> on vmbus0 hvkvp0: <Hyper-V KVP> on vmbus0 hvshutdown0: <Hyper-V Shutdown> on vmbus0 hvtimesync0: <Hyper-V Timesync> on vmbus0 hvvss0: <Hyper-V VSS> on vmbus0 storvsc2: <Hyper-V SCSI> on vmbus0 da0 at blkvsc0 bus 0 scbus2 target 0 lun 0 da0: <Msft Virtual Disk 1.0> Fixed Direct Access SPC-3 SCSI device da1 at blkvsc1 bus 0 scbus3 target 1 lun 0 da0: 300.000MB/s transfers da0: Command Queueing enabled da1: <Msft Virtual Disk 1.0> Fixed Direct Access SPC-3 SCSI device da0: 30720MB (62914560 512 byte sectors) storvsc3: da1: 300.000MB/s transfers da1: Command Queueing enabled da1: 32768MB (67108864 512 byte sectors) <Hyper-V SCSI> on vmbus0 hn0: <Hyper-V Network Interface> on vmbus0 hn0: got notify, nvs type 128 hn0: Ethernet address: 00:0d:3a:20:9c:a0 hn0: link state changed to UP pcib1: <Hyper-V PCI Express Pass Through> on vmbus0 Trying to mount root from zfs:test-zroot/ROOT/default []... pci1: <PCI bus> on pcib1 mlx4_core0: <mlx4_core> at device 2.0 on pci1 <6>mlx4_core: Mellanox ConnectX core driver v3.4.1 (October 2017) mlx4_core: Initializing mlx4_core mlx4_core0: Detected virtual function - running in slave mode mlx4_core0: Sending reset mlx4_core0: Sending vhcr0 mlx4_core0: HCA minimum page size:512 mlx4_core0: Timestamping is not supported in slave mode Fatal trap 12: page fault while in kernel mode cpuid = 2; apic id = 02 fault virtual address = 0x1d4 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80bb3474 stack pointer = 0x28:0xfffffe03e27be9d0 frame pointer = 0x28:0xfffffe03e27be9d0 code segment = base rx0, limit 0xfffff, type 0x1b
One thing I did notice - if I boot the machine and load the module by hand, it doesn't crash, but it also no longer detects the Mellanox card either, unlike what I wa seeing in STABLE. Am sorry this didn't work - I hope the above trace is useful, as you can see I did it on a clean build of 11.2 RELEASE. I won't have time to look at anything more until after the weekend I suspect (family & kids). What I can try next week is to create a machine form one of your images, and then upgrade it to 11.2 from sources and see if that works, to eliminate any possibilities of this being affected by my ZFS environment and the fact I load those modules at boot too. Thanks for your attention to this so far - I really would like to get it working, but I also need ZFS.
Addendum - I created a complete new machine using the UFS image "MicrosoftOSTC:FreeBSD:11.1:latest" and did a source upgrade to 11.2-RELEASE using the version I compiled earlier. This also panics if you load the module on boot and accelerated networking is enabled. I do have the tuning parameter "CPUTYPE?=core2" in src.conf - is there the remotest possibility that this is a compiler bug which only manifests when compiling for that architecture, and if I did vanilla build it would work OK ? I don't really have the time to actually do another build now unfortunately, and I suspect I am just catching at straws here.
(In reply to pete from comment #6) > By the way, when I am using this, I should only have config in rc.conf > for the mlxen0 device, and not simultaneously for hn0 if I understand > this right ? I think you're talking about the network configuration in /etc/rc.conf. What we should use is the opposite. :-) In rc.conf, we should only use hn0 rather than mlxen0: ifconfig_hn0="DHCP" As I mentioned, thanks to sephe's work, mlxen0 can automatically work in "bond mode" with hn0, and we don't need use lagg. Please let us know if this doesn't work. At least it's supposed to work this way, but obviously we didn't have enough resources to fully test it.
That first bad commit was reported by our test engineer. It turns out the bisection in unreliable, but I do suspect something outside of the Hyper-V drivers cause the panic, probably in the mlx driver, or the generic memory management code. @pete, thanks very much for your help! It turns out both 11.2 and the HEAD are broken, but in my environment somehow I can't always reproduce it. I'll try to look into this next week.
(In reply to Dexuan Cui from comment #10) Heh, yes, I suddenly realised how it was designed to work last night after building the UFS machine and seeing how that was configured. Obvious when I thought about it - nice piece of design by the way. Is there anything else I can do to help debug this ? Obviously it's not going to be fixed for 11.2, but its a bit worrying if its something in the generic memory management code! I could give you access to one of my images which shows the problem...
(In reply to pete from comment #12) Our test team can reproduce the issue. I'm going to dig into it.
A commit references this bug: Author: dexuan Date: Sat Jul 7 00:41:05 UTC 2018 New revision: 336054 URL: https://svnweb.freebsd.org/changeset/base/336054 Log: hyperv: Fix boot-up after malloc() returns memory of NX by default now FreeBSD VM can't boot up on Hyper-V after the recent malloc change in r335068: Make UMA and malloc(9) return non-executable memory in most cases. The hypercall page here must be executable. Fix the boot-up issue by adding M_EXEC. PR: 229167 Sponsored by: Microsoft Changes: head/sys/dev/hyperv/vmbus/hyperv.c
(In reply to pete from comment #12) I installed a 11.2 VM on my local Hyper-V host with http://ftp.freebsd.org/pub/FreeBSD/releases/ISO-IMAGES/11.2/FreeBSD-11.2-RELEASE-amd64-dvd1.iso, and can't reproduce the issue with MLX VF. I also installed a 12-CURRENT VM on my local Hyper-V host with http://ftp.freebsd.org/pub/FreeBSD/snapshots/ISO-IMAGES/12.0/FreeBSD-12.0-CURRENT-amd64-20180709-r336134-disc1.iso, and can't repro any panic issue either ("#2: panic in vmbus_msghc_exec()" has been fixed by r336054) I used mlx4en_load="YES" in "/boot/loader.conf". Need to figure out how to reproduce the issue...
(In reply to Dexuan Cui from comment #15) I also created a 11.1 VM from the MicrosoftOSTC:FreeBSD:11.1:latest image, and updated it to 11.2 (11.2-RELEASE FreeBSD 11.2-RELEASE #0 r335510: Fri Jun 22 04:32:14 UTC 2018 root@releng2.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC amd64) and can't reproduce the issue: I have mlx4en_load="YES" in /boot/loader.conf and the mlxen0 interface appears in "ifconfig" as expected. According to the call-trace, the panic we previously saw happens in namei(), which is in VFS code. So my feeling is: the file system may be somehow corrupt or there is a race condition bug in the VFS code?
(In reply to Dexuan Cui from comment #16) Thats interesting, as I did the same thing I thought (built from 11.1, upgraded to 11.2) and go the panic. How did you do your upgrade ? I built mine from source, did you use the binary upgrades ? I don't have time to try this again today, but might get a chance tomorrow. Will try a binary upgrade to 11.2 if thats what you did, and if that works will then try a recompile from the source.
(In reply to pete from comment #17) I did both source code update and binary update, and can't repro the issue either way. For binary update, I referred https://www.freebsd.org/doc/handbook/updating-upgrading-freebsdupdate.html: freebsd-update -r 11.2-RELEASE upgradd freebsd-update install reboot freebsd-update install
OK, I just did this and (to my surprise) I have a running machine, with accelerated networking, on 11.2. Will try building from source on this version, and then the version I got the panic on, to see if we can narrow it down a bit. Odd that when I upgraded from source it crashed, but hopefully we can narrow this down a bit now!
(In reply to Dexuan Cui from comment #18) It's ZFS. I build a UFS machine the way you did, then binary upgrade to 11.2, then source upgrade to STABLE. All fine. Then I copied the data to another disc, formatted as a pool, so I have two identical drives, differing only in filesystem. The UFS derive boots fine, the ZFS one panics when it loads the mlx4en driver as we have seen before. The filesystem is not corrupt, so theres some odd interaction somewhere. Kind of worrying! By the way, I do have real hardware, with real Mellanox cards, which boost fine from ZFS - but they are only ConnectX-2 cards.
(In reply to pete from comment #20) I never used ZFS -- I even don't know how to format a ZFS partition :-), but previously I did reproduce the panic when mlx VF driver loads: i.e. the panic in namei(), and now I can't reproduce it any more... BTW, can you reproduce the same panic with the latest 12-CURRENT + ZFS? I'm wondering if the issue also exists in 12-CURRENT. I'll try to reproduce the issue with ZFS, as you did.
(In reply to Dexuan Cui from comment #21) Its interesting that you got the panic on UFS previous - so did I, but I can't reproduce it anymore either. However it is there on ZFS. ZFS is superb - you have missed out if you haven't used it - but this is not the place for ZFS advocacy :-) A quick guide would be to simply attach a drive and partition as you would for UFS, except using freebsd-zfs for the type of the main partition. Use /boot/gptzfsboot instead of /boot/gptboot in the boot partition though. Then make a pool, and I usually create a separate filesystem for booting, like this: zpool create tank da2p2 zfs create tank/ROOT zfs create tank/ROOT/default Copy the existing system to it using cpdup: cpdup -v -i0 / /tank/ROOT/default Finally set that filesystem as the one to be posted on the pool: zpool set bootfs=tank/ROOT/default And you are good to go... (typed from memory BTW! hopefully all correct...) Am just doing a rebuild with current to see if that has the same issue. Will build and boot on UFS first, then copy to a ZFS filesystem using the above method for testing. I'll let you know how it goes.
(In reply to Dexuan Cui from comment #21) So, I rebuilt with CURRENT, and it booted fine on UFS. Then I rebooted it a couple of times, and the third time it did this: mlxen0: Ethernet address: 00:0d:3a:2f:f9:61 <4>mlx4_en: mlx4_core0: Port 1: Using 4 TX rings mlxen0: link state changed to DOWN <4>mlx4_en: mlx4_core0: Port 1: Using 4 RX rings <4>mlx4_en: mlxen0: Using 4 TX rings hn0: link state changed to DOWN <4>mlx4_en: mlxen0: Using 4 RX rings <4>mlx4_en: mlxen0: Initializing port Fatal trap 12: page fault while in kernel mode cpuid = 2; apic id = 02 fault virtual address = 0x1d8 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80c6a3ea stack pointer = 0x28:0xfffffe00004bb950 frame pointer = 0x28:0xfffffe00004bb960 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 0 (vmbusdev) [ thread pid 0 tid 100105 ] Stopped at vrefact+0xa: cmpl $0x4,ll+0x1b7(%r14) db> Hmmmm... thats the same (UFS based) machine being rebooted, no changes, so it look unpredictable :-( Am going to try it again and also try with ZFS if I can, but it does look like it shows up in CURRENT, just not every time.
Also worth noting that the 3rd reboot was after a size change. Even on UFS on 11.2 I did find that when it booted sometimes the networking didn't work and I had to redeploy a couple of times to get it working. Maybe there is some dependency on the physical host somehow ? Which would be very nasty :-(
OK, this is interesting. With CURRENT booting from UFS: Works fine if machine size is D16s_v3 Panics, as above, if machine size is D4s_v3
More tests - I built a ZFS machine as D16s_v3 - that boots fine. Then I resized both of them back to D4s_v3. The UFS machine now boots (which it didn't at that size before) but the ZFS machine doesn't. So its looking like either a race condition (so the number of CPU's makes a difference) or some dependency on the underlying hardware (as the resize may move to a different host presumably?) Either way its depressingly inconsistent :-( Once it happens its reproducible though. Do you need me to do any more testing ?
Here is a 'bt' from the last ZFS boot attempt, in case it helps. ZFS filesystem version: 5 ZFS storage pool version: features support (5000) Timecounters tick every 10.000 msec usb_needs_explore_all: no devclass vmbus0: version 3.0 hvet0: <Hyper-V event timer> on vmbus0 Event timer "Hyper-V" frequency 10000000 Hz quality 1000 storvsc0: <Hyper-V IDE> on vmbus0 storvsc1: <Hyper-V IDE> on vmbus0 hvkbd0: <Hyper-V KBD> on vmbus0 hvheartbeat0: <Hyper-V Heartbeat> on vmbus0 hvkvp0: <Hyper-V KVP> on vmbus0 da0 at blkvsc0 bus 0 scbus2 target 0 lun 0 da0: <Msft Virtual Disk 1.0> Fixed Direct Access SPC-3 SCSI device da0: 300.000MB/s transfers da0: Command Queueing enabled da0: 30720MB (62914560 512 byte sectors) hvshutdown0: da1 at blkvsc1 bus 0 scbus3 target 1 lun 0 da1: <Msft Virtual Disk 1.0> Fixed Direct Access SPC-3 SCSI device da1: 300.000MB/s transfers da1: Command Queueing enabled da1: 32768MB (67108864 512 byte sectors) <Hyper-V Shutdown> on vmbus0 hvtimesync0: <Hyper-V Timesync> on vmbus0 hvtimesync0: RTT hvvss0: <Hyper-V VSS> on vmbus0 hn0: <Hyper-V Network Interface> on vmbus0 hn0: got notify, nvs type 128 hn0: Ethernet address: 00:0d:3a:2e:a3:c7 hn0: link state changed to UP storvsc2: <Hyper-V SCSI> on vmbus0 storvsc3: <Hyper-V SCSI> on vmbus0 pcib1: <Hyper-V PCI Express Pass Through> on vmbus0 WARNING: WITNESS option enabled, expect reduced performance. Trying to mount root from zfs:zroot/ROOT/default []... pci1: <PCI bus> on pcib1 mlx4_core0: <mlx4_core> at device 2.0 on pci1 <6>mlx4_core: Mellanox ConnectX core driver v3.4.1 (October 2017) mlx4_core: Initializing mlx4_core mlx4_core0: Detected virtual function - running in slave mode mlx4_core0: Sending reset mlx4_core0: Sending vhcr0 mlx4_core0: HCA minimum page size:512 mlx4_core0: Timestamping is not supported in slave mode mlx4_en mlx4_core0: Activating port:1 mlxen0: Ethernet address: 00:0d:3a:2e:a3:c7 <4>mlx4_en: mlx4_core0: Port 1: Using 4 TX rings mlxen0: link state changed to DOWN <4>mlx4_en: mlx4_core0: Port 1: Using 4 RX rings hn0: link state changed to DOWN <4>mlx4_en: mlxen0: Using 4 TX rings <4>mlx4_en: mlxen0: Using 4 RX rings <4>mlx4_en: mlxen0: Initializing port Fatal trap 12: page fault while in kernel mode cpuid = 3; apic id = 03 fault virtual address = 0x1d8 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80c6a3ea stack pointer = 0x28:0xfffffe00004bb950 frame pointer = 0x28:0xfffffe00004bb960 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 0 (vmbusdev) [ thread pid 0 tid 100105 ] Stopped at vrefact+0xa: cmpl $0x4,ll+0x1b7(%r14) db> bt Tracing pid 0 tid 100105 td 0xfffff80004a02580 vrefact() at vrefact+0xa/frame 0xfffffe00004bb960 namei() at namei+0x20d/frame 0xfffffe00004bba20 vn_open_cred() at vn_open_cred+0x213/frame 0xfffffe00004bbb60 linker_load_module() at linker_load_module+0x4cf/frame 0xfffffe00004bbe70 kern_kldload() at kern_kldload+0xf1/frame 0xfffffe00004bbec0 mlx4_request_modules() at mlx4_request_modules+0x93/frame 0xfffffe00004bbf80 mlx4_load_one() at mlx4_load_one+0x3196/frame 0xfffffe00004bc630 mlx4_init_one() at mlx4_init_one+0x410/frame 0xfffffe00004bc690 linux_pci_attach() at linux_pci_attach+0x411/frame 0xfffffe00004bc6f0 device_attach() at device_attach+0x3f3/frame 0xfffffe00004bc730 device_probe_and_attach() at device_probe_and_attach+0x71/frame 0xfffffe00004bc760 bus_generic_attach() at bus_generic_attach+0x18/frame 0xfffffe00004bc780 pci_attach() at pci_attach+0xd5/frame 0xfffffe00004bc7c0 device_attach() at device_attach+0x3f3/frame 0xfffffe00004bc800 device_probe_and_attach() at device_probe_and_attach+0x71/frame 0xfffffe00004bc830 bus_generic_attach() at bus_generic_attach+0x18/frame 0xfffffe00004bc850 vmbus_pcib_attach() at vmbus_pcib_attach+0x95e/frame 0xfffffe00004bc940 device_attach() at device_attach+0x3f3/frame 0xfffffe00004bc980 device_probe_and_attach() at device_probe_and_attach+0x71/frame 0xfffffe00004bc9b0 vmbus_add_child() at vmbus_add_child+0x6a/frame 0xfffffe00004bc9e0 taskqueue_run_locked() at taskqueue_run_locked+0x14c/frame 0xfffffe00004bca40 taskqueue_thread_loop() at taskqueue_thread_loop+0x88/frame 0xfffffe00004bca70 fork_exit() at fork_exit+0x84/frame 0xfffffe00004bcab0 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00004bcab0 --- trap 0, rip = 0, rsp = 0, rbp = 0 --- db>
I do not know anything about this code and this problem, but I couldn't help but notice that the driver attempts to load a module. Does it ensure that the root filesystem is mounted?
(In reply to pete from comment #27) @pete: You have done good tests! We know the issue exists in 12-CURRENT as well. Thanks! I'm doing more tests too, trying to find out which exact line of code causes the panic in namei(), or namei() -> vrefact().
(In reply to Andriy Gapon from comment #28) Even if the root filesystem is not mounted when the mlx4_core driver tries to load mlx4en, I believe we should not get a panic like this. :-(
(In reply to Dexuan Cui from comment #30) Perhaps. It depends on the documented contract of kern_kldload, if it exists. In any case, kern_kldload is used very rarely and, except for mlx4 -- or rather linuxkpi, always in user-initiated actions. So, I wouldn't be surprised if it was designed to work only after the root is mounted.
(In reply to Andriy Gapon from comment #31) Unluckily I still can't reproduce the issue recently. :-( I created the Marketplace 11.1 VM on Azure, and can't reproduce the issue: FreeBSD decui-bsd111 11.1-RELEASE-p6 FreeBSD 11.1-RELEASE-p6 #0 r313908+f9756700e2c(releng/11.1)-dirty: Thu Jan 11 23:23:54 CST 2018 root@bsd:/usr/obj/usr/xhx/freebsd/sys/GENERIC amd64 Next, I upgraded it to the latest 11.1, and still no luck: FreeBSD decui-bsd111 11.1-RELEASE-p13 FreeBSD 11.1-RELEASE-p13 #0: Tue Aug 14 19:34:21 UTC 2018 root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64 Finally, I upgraded it to the latest 11.2, and still no luck: FreeBSD decui-bsd111 11.2-RELEASE-p2 FreeBSD 11.2-RELEASE-p2 #0: Tue Aug 14 21:45:40 UTC 2018 root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64 Please let us know if you can still reproduce the issue, especially with the same latest 11.1 and 11.2 releases.
(In reply to Dexuan Cui from comment #32) You are trying to boot from ZFS, yes ? I get the panic when I have my root filesystem as ZFS. It seems to work fine on UFS. I will try and reproduce when I get a moment though....
(In reply to pete from comment #33) Last Friday, I happened to reproduce the issue again with the latest stable/11. UFS was used here. I'm debugging it now.
(In reply to Dexuan Cui from comment #34) So, I just saw a new version of the Mellanox driver was imported into 12.0 yesterday - has anyone tried this yet ? I am rebuilding on one of my Azure machines and can potentailly try it later today.
(In reply to pete from comment #35) I guess nobody tried this yet and I doubt the issue can disappear automatically with the new MLX driver, as the panic happened in namei() rather than in a MLX driver function. Cc'd Wei Hu, who can help to further investigate the issue.
I hit this bug on FreeBSD 12.1 image in Azure with Mellanox CX3 VF. Looks the root file system is not available at the time mlx4 driver is trying to load mlx4en kernel module. Adding one second sleep in mlx4_request_modules makes the problem go away. But it doesn't look like a fix to me. At least namei() or vrefact() should check if the vnode is NULL to avoid the panic. Here is the detailed troubleshooting in debugger I did when the crash happened. The panic on console: ---------------------- pci1: <PCI bus> on pcib1 mlx4_core0: <mlx4_core> at device 2.0 on pci1 <6>mlx4_core: Mellanox ConnectX core driver v3.5.1 (April 2019) mlx4_core: Initializing mlx4_core mlx4_core0: Detected virtual function - running in slave mode mlx4_core0: Sending reset mlx4_core0: Sending vhcr0 mlx4_core0: HCA minimum page size:512 mlx4_core0: Timestamping is not supported in slave mode mlx4_en mlx4_core0: Activating port:1 mlxen0: Ethernet address: 00:0d:3a:e8:16:18 <4>mlx4_en: mlx4_core0: Port 1: Using 4 TX rings mlxen0: link state changed to DOWN <4>mlx4_en: mlx4_core0: Port 1: Using 4 RX rings <4>mlx4_en: mlxen0: Using 4 TX rings hn0: link state changed to DOWN <4>mlx4_en: mlxen0: Using 4 RX rings <4>mlx4_en: mlxen0: Initializing port mlx4_core0: About to load mlx4_en Fatal trap 12: page fault while in kernel mode cpuid = 2; apic id = 02 fault virtual address = 0x1d8 <- 0x1d8 is the offset of (struct vnode *)->v_type fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80cb5c34 stack pointer = 0x28:0xfffffe00004f4960 frame pointer = 0x28:0xfffffe00004f4960 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 0 (vmbusdev) trap number = 12 panic: page fault cpuid = 2 time = 1599838711 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00004f4610 vpanic() at vpanic+0x19d/frame 0xfffffe00004f4660 panic() at panic+0x43/frame 0xfffffe00004f46c0 trap_fatal() at trap_fatal+0x39c/frame 0xfffffe00004f4720 trap_pfault() at trap_pfault+0x49/frame 0xfffffe00004f4780 trap() at trap+0x29f/frame 0xfffffe00004f4890 calltrap() at calltrap+0x8/frame 0xfffffe00004f4890 --- trap 0xc, rip = 0xffffffff80cb5c34, rsp = 0xfffffe00004f4960, rbp = 0xfffffe00004f4960 --- vrefact() at vrefact+0x4/frame 0xfffffe00004f4960 namei() at namei+0x172/frame 0xfffffe00004f4a20 vn_open_cred() at vn_open_cred+0x221/frame 0xfffffe00004f4b70 linker_load_module() at linker_load_module+0x480/frame 0xfffffe00004f4e90 kern_kldload() at kern_kldload+0xc3/frame 0xfffffe00004f4ee0 mlx4_request_modules() at mlx4_request_modules+0xc2/frame 0xfffffe00004f4fa0 mlx4_load_one() at mlx4_load_one+0x349c/frame 0xfffffe00004f5660 mlx4_init_one() at mlx4_init_one+0x3f0/frame 0xfffffe00004f56b0 linux_pci_attach() at linux_pci_attach+0x432/frame 0xfffffe00004f5710 device_attach() at device_attach+0x3e1/frame 0xfffffe00004f5760 bus_generic_attach() at bus_generic_attach+0x5c/frame 0xfffffe00004f5790 pci_attach() at pci_attach+0xd5/frame 0xfffffe00004f57d0 device_attach() at device_attach+0x3e1/frame 0xfffffe00004f5820 bus_generic_attach() at bus_generic_attach+0x5c/frame 0xfffffe00004f5850 vmbus_pcib_attach() at vmbus_pcib_attach+0x75e/frame 0xfffffe00004f5930 device_attach() at device_attach+0x3e1/frame 0xfffffe00004f5980 device_probe_and_attach() at device_probe_and_attach+0x42/frame 0xfffffe00004f59b0 vmbus_add_child() at vmbus_add_child+0x7b/frame 0xfffffe00004f59e0 taskqueue_run_locked() at taskqueue_run_locked+0x154/frame 0xfffffe00004f5a40 taskqueue_thread_loop() at taskqueue_thread_loop+0x98/frame 0xfffffe00004f5a70 fork_exit() at fork_exit+0x83/frame 0xfffffe00004f5ab0 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00004f5ab0 --- trap 0, rip = 0, rsp = 0, rbp = 0 --- KDB: enter: panic [ thread pid 0 tid 100080 ] Stopped at kdb_enter+0x3b: movq $0,kdb_why db> x/i vrefact,10 vrefact: pushq %rbp vrefact+0x1: movq %rsp,%rbp vrefact+0x4: cmpl $0x4,ll+0x1b7(%rdi) <-- if (__predict_false(vp->v_type == VCHR)) in vrefact() vrefact+0xb: jz vrefact+0x1f vrefact+0xd: lock addl $0x1,ll+0x19b(%rdi) vrefact+0x15: lock addl $0x1,ll+0x19f(%rdi) vrefact+0x1d: popq %rbp vrefact+0x1e: ret db> x/i namei+0x160,10 namei+0x160: call _sx_slock_int namei+0x165: movq 0x10(%r13),%rdi namei+0x169: movq %rdi,ll+0x7(%rbx) namei+0x16d: call vrefact <--- Place in namei() calling vrefact() namei+0x172: movq 0x18(%r13),%rax namei+0x176: movq %rax,ll+0xf(%rbx) namei+0x17a: movq ll+0x77(%rbx),%rax This is the code in namei(): /* * Get starting point for the translation. */ FILEDESC_SLOCK(fdp); ndp->ni_rootdir = fdp->fd_rdir; vrefact(ndp->ni_rootdir); <--- here ndp->ni_topdir = fdp->fd_jdir; And fdp is (struct filedesc *) and got assigned earlier to the current proc's p_fd: p = td->td_proc; ... fdp = p->p_fd; db> show thread Thread 100080 at 0xfffff80004a95000: proc (pid 0): 0xffffffff81ff2060 <--- pointer to proc name: vmbusdev stack: 0xfffffe00004f2000-0xfffffe00004f5fff flags: 0x4 pflags: 0x200000 state: RUNNING (CPU 2) priority: 8 container lock: sched lock 2 (0xffffffff81eb3540) last voluntary switch: 50 ms ago db> x/gx 0xffffffff81ff2060,20 (struct proc) proc0: 0 fffff80003609a60 ffffffff81ff25a0 proc0+0x18: fffff80009866010 ffffffff81332c23 proc0+0x28: 30000 0 proc0+0x38: 0 fffff8000308ad00 proc0+0x48: fffff800035168a0 (p_fd) 0 proc0+0x58: fffff80003084e00 fffff8000308ac00 db> x/gx 0xfffff800035168a0, 10 (struct filedesc) 0xfffff800035168a0: fffff80003516920 0 0xfffff800035168b0: 0 (fd_rdir) 0 0xfffff800035168c0: fffff80003516ce8 ffffffff 0xfffff800035168d0: 100000012 1 0xfffff800035168e0: ffffffff812c4e35 2330000 0xfffff800035168f0: 0 21 0xfffff80003516900: 0 0 0xfffff80003516910: 0 0 So at the moment the fd_rdir (root directory) is still NULL.
I am hitting this panic often while creating VMs on Azure with Mellanox VFs on larger size VMs. Even with adding one second sleep as Wei Hu mentioned, though the frequency has reduced still hit the panic. The stack trace is exactly same as the one given in the earlier comment.
The cause of this problem is mlx4 driver was loaded before the root filesystem. On larger VM the odds of hitting this increase as more things are in parallel. Instead of adding 1 second delay in the driver, I think you can move the driver loading from /boot/loader.conf to /etc/rc.conf. Remove the line such as mlx4en_load="YES" in /boot/loader.conf and add following line in /etc/rc.conf: kld_list="mlx4en mlx4ib" This way the loading of mlx4 driver would be after the local disks are mounted, making this problem go away.
Thanks Wei Hu, We have made mlxen4 module as part of the kernel and commented out the call to mlx4_request_modules() temporarily. However we are looking for any change in namei() or vrefact() to handle the NULL vnode case gracefully to avoid the panic.
(In reply to mammootty from comment #40) This sounds like a bug in the kernel function kern_kldload(): if it can't find a mounted root file system, it should return some error rather than panic.