When booting an i386 7.0 FreeBSD via PXE with a NFS / on AMD CPUs, it seems that the kernel launches again the boot loader which fails and loses its roots... -8<-8<-8<-8<-8-8<-8<-8<-8<-8<-8<-8<-8<-8<-8<-8<-8<-8<-8<-8<-8<- PXE version 2.1, real nide ebtry point @94f8:00d6 BIOS 536kB.3667904kB available memory FreeBSD/i386 bootstrap loader, Revision 1.1 (root@logan.cse.buffalo.edu, Fri Nov 16 18:54:21 UTC 2007) pxe_open: server addr: <host IP> pxe_open: server path: /vol/FreeBSD/RELENG_7/i386/jumpstart pxe_open: gateway ip: <router IP> /boot/kernel/kernel text=0x63aaf8 data=0xa5d80+0x57520 syms=[0x4+0x69ce0+0x4+0x857cb] Consoles: internal video/keyboard BIOS drive C: is disk0 BIOS 536kB.3667904kB available memory FreeBSD/i386 bootstrap loader, Revision 1.1 (root@logan.cse.buffalo.edu, Fri Nov 16 18:54:21 UTC 2007) Can't work out which disk we are booting from. Guessed BIOS device 0xffffffff not found by probes, defaulting to disk0: can't load 'kernel' Type '?' for a list of commands, 'help' for more detailed help. OK -8<-8<-8<-8<-8-8<-8<-8<-8<-8<-8<-8<-8<-8<-8<-8<-8<-8<-8<-8<-8<- [copied by hands: this is shown out on a remote console emulator] uname -a (from exactly the same OS, but booted out of the disk): FreeBSD amnesiac 7.0-BETA2 FreeBSD 7.0-BETA2 #0: Fri Nov 9 06:58:53 CET 2007 root@athos:/usr/obj/usr/src/sys/LAME i386 kernel conf: GENERIC without INET6 and SCTP * With an amd64 kernel (and loader) everything works fine on the same machine. * With 6.2 and 6.1 releases things behave correctly. * The problem occurs at least since a few weeks ago (beginning of October). We track the source tree every night (cvsup) and no update cured anything, neither using a pxeboot or a kernel from BETA* iso (as in the transcript). * using another (similar) CPU leads to the same problem. * using the amd64 pxeboot leads to the same result. Hardware: + Fujitsu/Siemens bx630 blade server + 2 dual core AMD opteron 870 + 4 GB. + (2) 5704 Broadcom Ethernet dmesg (out of the disk, the system may be a bit older than in the transcript): -8<-8<-8<-8<-8-8<-8<-8<-8<-8<-8<-8<-8<-8<-8<-8<-8<-8<-8<-8<-8<-Copyright (c) 1992-2007 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 7.0-BETA2 #0: Tue Nov 6 14:39:38 CET 2007 root@athos:/usr/obj/usr/src/sys/LAME Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: Dual Core AMD Opteron(tm) Processor 870 (1997.40-MHz 686-class CPU) Origin = "AuthenticAMD" Id = 0x20f12 Stepping = 2 Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT> Features2=0x1<SSE3> AMD Features=0xe2500800<SYSCALL,NX,MMX+,FFXSR,LM,3DNow!+,3DNow!> AMD Features2=0x3<LAHF,CMP> Cores per package: 2 real memory = 3756982272 (3582 MB) avail memory = 3673055232 (3502 MB) ACPI APIC Table: <PTLTD APIC > FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 cpu2 (AP): APIC ID: 2 cpu3 (AP): APIC ID: 3 MADT: Forcing active-low polarity and level trigger for SCI ioapic0 <Version 1.1> irqs 0-23 on motherboard ioapic1 <Version 1.1> irqs 24-27 on motherboard ioapic2 <Version 1.1> irqs 28-31 on motherboard kbd1 at kbdmux0 ath_hal: 0.9.20.3 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413) acpi0: <PTLTD XSDT> on motherboard acpi0: [ITHREAD] acpi0: Power Button (fixed) unknown: I/O range not supported unknown: I/O range not supported unknown: I/O range not supported acpi0: reservation of 400, 100 (3) failed Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 acpi_timer0: <24-bit timer at 3.579545MHz> port 0xc008-0xc00b on acpi0 cpu0: <ACPI CPU> on acpi0 powernow0: <Cool`n'Quiet K8> on cpu0 cpu1: <ACPI CPU> on acpi0 powernow1: <Cool`n'Quiet K8> on cpu1 cpu2: <ACPI CPU> on acpi0 powernow2: <Cool`n'Quiet K8> on cpu2 cpu3: <ACPI CPU> on acpi0 powernow3: <Cool`n'Quiet K8> on cpu3 acpi_button0: <Power Button> on acpi0 pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff,0xc000-0xc07f,0xc080-0xc0ff iomem 0xd8000-0xdbfff on acpi0 pci0: <ACPI PCI bus> on pcib0 pcib1: <ACPI PCI-PCI bridge> at device 6.0 on pci0 pci1: <ACPI PCI bus> on pcib1 ohci0: <OHCI (generic) USB controller> mem 0xe8110000-0xe8110fff irq 19 at device 0.0 on pci1 ohci0: [GIANT-LOCKED] ohci0: [ITHREAD] usb0: OHCI version 1.0, legacy support usb0: SMM does not respond, resetting usb0: <OHCI (generic) USB controller> on ohci0 usb0: USB revision 1.0 uhub0: <AMD OHCI root hub, class 9/0, rev 1.00/1.00, addr 1> on usb0 uhub0: 3 ports with 3 removable, self powered ohci1: <OHCI (generic) USB controller> mem 0xe8111000-0xe8111fff irq 19 at device 0.1 on pci1 ohci1: [GIANT-LOCKED] ohci1: [ITHREAD] usb1: OHCI version 1.0, legacy support usb1: SMM does not respond, resetting usb1: <OHCI (generic) USB controller> on ohci1 usb1: USB revision 1.0 uhub1: <AMD OHCI root hub, class 9/0, rev 1.00/1.00, addr 1> on usb1 uhub1: 3 ports with 3 removable, self powered vgapci0: <VGA-compatible display> port 0x2000-0x20ff mem 0xf0000000-0xf7ffffff,0xe8100000-0xe810ffff irq 17 at device 5.0 on pci1 isab0: <PCI-ISA bridge> at device 7.0 on pci0 isa0: <ISA bus> on isab0 pci0: <bridge> at device 7.3 (no driver attached) pcib2: <ACPI PCI-PCI bridge> at device 10.0 on pci0 pci2: <ACPI PCI bus> on pcib2 mpt0: <LSILogic SAS/SATA Adapter> port 0x3000-0x30ff mem 0xe8210000-0xe8213fff,0xe8200000-0xe820ffff irq 24 at device 4.0 on pci2 mpt0: [ITHREAD] mpt0: MPI Version=1.5.13.0 mpt0: mpt_cam_event: 0x16 mpt0: Unhandled Event Notify Frame. Event 0x16 (ACK not required). mpt0: mpt_cam_event: 0x12 mpt0: Unhandled Event Notify Frame. Event 0x12 (ACK not required). mpt0: mpt_cam_event: 0x12 mpt0: Unhandled Event Notify Frame. Event 0x12 (ACK not required). mpt0: mpt_cam_event: 0x16 mpt0: Unhandled Event Notify Frame. Event 0x16 (ACK not required). pcib3: <ACPI PCI-PCI bridge> at device 11.0 on pci0 pci3: <ACPI PCI bus> on pcib3 pci0:3:4:0: bad VPD cksum, remain 14 bge0: <Broadcom NetXtreme Gigabit Ethernet Controller, ASIC rev. 0x2100> mem 0xe8300000-0xe830ffff irq 28 at device 4.0 on pci3 bge0: Ethernet address: 00:30:05:71:d5:da bge0: [ITHREAD] pci0:3:4:1: bad VPD cksum, remain 14 bge1: <Broadcom NetXtreme Gigabit Ethernet Controller, ASIC rev. 0x2100> mem 0xe8310000-0xe831ffff irq 29 at device 4.1 on pci3 bge1: Ethernet address: 00:30:05:71:d5:db bge1: [ITHREAD] atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0 atkbd0: <AT Keyboard> irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] atkbd0: [ITHREAD] psm0: <PS/2 Mouse> irq 12 on atkbdc0 psm0: [GIANT-LOCKED] psm0: [ITHREAD] psm0: model IntelliMouse Explorer, device ID 4 sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 sio0: type 16550A sio0: [FILTER] pmtimer0 on isa0 orm0: <ISA Option ROM> at iomem 0xc0000-0xcafff pnpid ORM0000 on isa0 fdc0: No FDOUT register! ppc0: parallel port not found. sc0: <System console> at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> sio1: configured irq 3 not in bitmap of probed irqs 0 sio1: port may not be enabled vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 Timecounters tick every 1.000 msec da0 at mpt0 bus 0 target 0 lun 0 da0: <LSILOGIC Logical Volume 3000> Fixed Direct Access SCSI-2 device da0: 300.000MB/s transfers da0: Command Queueing Enabled da0: 34332MB (70311936 512 byte sectors: 255H 63S/T 4376C) SMP: AP CPU #1 Launched! SMP: AP CPU #3 Launched! SMP: AP CPU #2 Launched! Trying to mount root from ufs:/dev/da0a -8<-8<-8<-8<-8-8<-8<-8<-8<-8<-8<-8<-8<-8<-8<-8<-8<-8<-8<-8<-8<- Now, more weird things: * Things behave correctly on intel i386. * On an other AMD mother board (dual core amd athlon 4000 with 1 GB and 5755 Broadcom), the system loads and boot but displays garbage on the NFS path name. -8<-8<-8<-8<-8-8<-8<-8<-8<-8<-8<-8<-8<-8<-8<-8<-8<-8<-8<-8<-8<- Copyright (c) 1992-2007 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 7.0-BETA2 #0: Fri Nov 9 06:58:53 CET 2007 root@athos:/usr/obj/usr/src/sys/LAME Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: AMD Athlon(tm) 64 X2 Dual Core Processor 4000+ (2109.62-MHz 686-class CPU) Origin = "AuthenticAMD" Id = 0x60fb1 Stepping = 1 Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT> Features2=0x2001<SSE3,CX16> AMD Features=0xea500800<SYSCALL,NX,MMX+,FFXSR,RDTSCP,LM,3DNow!+,3DNow!> AMD Features2=0x11f<LAHF,CMP,SVM,ExtAPIC,CR8,Prefetch> Cores per package: 2 real memory = 1005649920 (959 MB) avail memory = 970354688 (925 MB) ACPI APIC Table: <PTLTD APIC > FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 MADT: Forcing active-low polarity and level trigger for SCI ioapic0 <Version 1.1> irqs 0-23 on motherboard kbd1 at kbdmux0 ath_hal: 0.9.20.3 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413) acpi0: <FSC PC> on motherboard acpi0: [ITHREAD] acpi0: Power Button (fixed) Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x1008-0x100b on acpi0 acpi_hpet0: <High Precision Event Timer> iomem 0xfed00000-0xfed003ff on acpi0 Timecounter "HPET" frequency 25000000 Hz quality 900 cpu0: <ACPI CPU> on acpi0 powernow0: <PowerNow! K8> on cpu0 cpu1: <ACPI CPU> on acpi0 powernow1: <PowerNow! K8> on cpu1 acpi_button0: <Power Button> on acpi0 pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0 pci0: <ACPI PCI bus> on pcib0 pci0: <memory, RAM> at device 0.0 (no driver attached) pci0: <memory, RAM> at device 0.1 (no driver attached) pci0: <memory, RAM> at device 0.2 (no driver attached) pci0: <memory, RAM> at device 0.3 (no driver attached) pci0: <memory, RAM> at device 0.4 (no driver attached) pci0: <memory, RAM> at device 0.5 (no driver attached) pci0: <memory, RAM> at device 0.6 (no driver attached) pci0: <memory, RAM> at device 0.7 (no driver attached) pcib1: <ACPI PCI-PCI bridge> at device 2.0 on pci0 pci1: <ACPI PCI bus> on pcib1 pcib2: <ACPI PCI-PCI bridge> at device 3.0 on pci0 pci2: <ACPI PCI bus> on pcib2 vgapci0: <VGA-compatible display> mem 0xf1000000-0xf1ffffff,0xe0000000-0xefffffff,0xf0000000-0xf0ffffff irq 16 at device 5.0 on pci0 pci0: <memory, RAM> at device 9.0 (no driver attached) isab0: <PCI-ISA bridge> port 0x8800-0x887f at device 10.0 on pci0 isa0: <ISA bus> on isab0 pci0: <serial bus, SMBus> at device 10.1 (no driver attached) ohci0: <OHCI (generic) USB controller> mem 0xf2204000-0xf2204fff irq 18 at device 11.0 on pci0 ohci0: [GIANT-LOCKED] ohci0: [ITHREAD] usb0: OHCI version 1.0, legacy support usb0: SMM does not respond, resetting usb0: <OHCI (generic) USB controller> on ohci0 usb0: USB revision 1.0 uhub0: <nVidia OHCI root hub, class 9/0, rev 1.00/1.00, addr 1> on usb0 uhub0: 8 ports with 8 removable, self powered ehci0: <EHCI (generic) USB 2.0 controller> mem 0xf2208000-0xf22080ff irq 19 at device 11.1 on pci0 ehci0: [GIANT-LOCKED] ehci0: [ITHREAD] usb1: EHCI version 1.0 usb1: companion controller, 8 ports each: usb0 usb1: <EHCI (generic) USB 2.0 controller> on ehci0 usb1: USB revision 2.0 uhub1: <nVidia EHCI root hub, class 9/0, rev 2.00/1.00, addr 1> on usb1 uhub1: 8 ports with 8 removable, self powered atapci0: <nVidia nForce MCP51 UDMA133 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0x8c00-0x8c0f at device 13.0 on pci0 ata0: <ATA channel 0> on atapci0 ata0: [ITHREAD] ata1: <ATA channel 1> on atapci0 ata1: [ITHREAD] atapci1: <nVidia nForce MCP51 SATA300 controller> port 0x8c40-0x8c47,0x8c34-0x8c37,0x8c38-0x8c3f,0x8c30-0x8c33,0x8c10-0x8c1f mem 0xf 2205000-0xf2205fff irq 20 at device 14.0 on pci0 atapci1: [ITHREAD] ata2: <ATA channel 0> on atapci1 ata2: [ITHREAD] ata3: <ATA channel 1> on atapci1 ata3: [ITHREAD] atapci2: <nVidia nForce MCP51 SATA300 controller> port 0x8c58-0x8c5f,0x8c4c-0x8c4f,0x8c50-0x8c57,0x8c48-0x8c4b,0x8c20-0x8c2f mem 0xf 2206000-0xf2206fff irq 21 at device 15.0 on pci0 atapci2: [ITHREAD] ata4: <ATA channel 0> on atapci2 ata4: [ITHREAD] ata5: <ATA channel 1> on atapci2 ata5: [ITHREAD] pcib3: <ACPI PCI-PCI bridge> at device 16.0 on pci0 pci4: <ACPI PCI bus> on pcib3 pci0: <multimedia> at device 16.1 (no driver attached) nfe0: <NVIDIA nForce 430 MCP12 Networking Adapter> port 0x8c60-0x8c67 mem 0xf2207000-0xf2207fff irq 23 at device 20.0 on pci0 miibus0: <MII bus> on nfe0 rgephy0: <RTL8169S/8110S/8211B media interface> PHY 1 on miibus0 rgephy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto nfe0: Ethernet address: 00:19:99:15:a7:4f nfe0: [FILTER] atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0 atkbd0: <AT Keyboard> irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] atkbd0: [ITHREAD] psm0: <PS/2 Mouse> irq 12 on atkbdc0 psm0: [GIANT-LOCKED] psm0: [ITHREAD] psm0: model IntelliMouse, device ID 3 fdc0: <floppy drive controller> port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on acpi0 fdc0: [FILTER] sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 sio0: type 16550A sio0: [FILTER] sio1: <16550A-compatible COM port> port 0x2f8-0x2ff irq 3 on acpi0 sio1: type 16550A sio1: [FILTER] pmtimer0 on isa0 orm0: <ISA Option ROM> at iomem 0xcf000-0xcffff pnpid ORM0000 on isa0 ppc0: parallel port not found. sc0: <System console> at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 Timecounters tick every 1.000 msec ad4: 152627MB <WDC WD1600AAJS-07PSA0 05.06H05> at ata2-master SATA300 SMP: AP CPU #1 Launched! Trying to mount root from nfs:fas:/vol/FreeBSD/RELENG_7/i386/jumpstart NFS ROOT: <server IP>:/vol/FreeBSD/RELENG_7/i386/jumpstart nfe0: tx v2 error 0x6804<FORCEDINT> nfe0: tx v2 error 0x6804<FORCEDINT> nfe0: tx v2 error 0x6804<FORCEDINT> nfe0: tx v2 error 0x6804<FORCEDINT> nfe0: tx v2 error 0x6804<FORCEDINT> -8<-8<-8<-8<-8-8<-8<-8<-8<-8<-8<-8<-8<-8<-8<-8<-8<-8<-8<-8<-8<- The path in the line NFS ROOT: <server IP>:/vol/FreeBSD/RELENG_7/i386/jumpstart is garbled on screen but correct on dmesg (as shown above). * the blade server was originally chipped with 8 GB. Reducing to 4 or 2 GB doesn't cure the problem. How-To-Repeat: Booting via PXE this system served by NFS, on these AMD boards.
Responsible Changed From-To: freebsd-i386->freebsd-bugs This does not seem i386 specific
Having the same problem here. Doing an lsdev likewise does not list any pxe device. This is on a dell d430 Core 2 Duo.
I'm seeing the very same problem trying to netboot a Thinkpad X60s. As noted in http://jcmdev0.blogspot.com/2008/06/freebsd-70-on-thinkpad-x60s.html using an old pxeboot binary works. In my case, I took the pxeboot binary from a 6.2 install disk. Stefan
We have identified the problem as related to a heap overflow: instrumenting the code at /usr/src/sys/boot/common/interp.c::include() you will see that at some point the call to malloc sp = malloc(sizeof(struct includeline) + strlen(cp) + 1); at some point will not return and cause the pxeboot to be restarted. Tracking the values returned by malloc() we found that the last successful return is something around 0x77384, with the stack being dangerously close to it (somewhere around 0x77900 on the first call to include but the function is recursive and has a 256 byte local variable). When the heap overflows, my system is processing line 1500 of file /boot/support.4th, which is 1700 lines long and is the last (third or fourth) of a set of nested includes. Why this occurs only on AMD64 is not completely clear, but probably it is related to less memory made available by the bios on those boards compared to the i386 machines. In any case the following patch is enough to save enough memory so that pxeboot run to completion with our set of includes, and it does this by not saving empty lines (about 200 of them in the offending file, which saves some 6k of memory) and making a buffer static (saving another 1-2k of memory due to the recursive calls). Clearly, this is not the way to go on a system with 2GB of memory, and we need to make the entire system more robust :) cheers luigi Index: common/interp.c =================================================================== RCS file: /home/ncvs/src/sys/boot/common/interp.c,v retrieving revision 1.29 diff -u -r1.29 interp.c --- common/interp.c 25 Aug 2003 23:30:41 -0000 1.29 +++ common/interp.c 18 Nov 2008 16:00:57 -0000 @@ -192,7 +192,7 @@ include(const char *filename) { struct includeline *script, *se, *sp; - char input[256]; /* big enough? */ + static char input[256]; /* big enough? */ #ifdef BOOT_FORTH int res; char *cp; @@ -236,6 +239,8 @@ } #endif /* Allocate script line structure and copy line, flags */ + if (*cp == '\0') + continue; sp = malloc(sizeof(struct includeline) + strlen(cp) + 1); sp->text = (char *)sp + sizeof(struct includeline); strcpy(sp->text, cp);
For bugs matching the following criteria: Status: In Progress Changed: (is less than) 2014-06-01 Reset to default assignee and clear in-progress tags. Mail being skipped
The relevant part of this patch has been applied.