FreeBSD 10.1 does not boot on Intel S1200RPv3 board correctly. We did some tests with mfsBSD based on FreeBSD 10.1 (http://mfsbsd.vx.sk/). After booting it the first time with following reboot it wont boot and stops with the following message: ACPI APIC Table: <INTEL DENLOW> To fix it, we need to boot a linux system. After booting the linux correctly the FreeBSD will boot the next time correctly. But with the next reboot it wont start again. Maybe there is an initializing problem with the ACPI table, which the linux does correctly, the FreeBSD not.
Are you using boot -v?
Created attachment 151448 [details] Intel S1200 FreeBSD 10.1 "ACPI Denlow" I have started the system with "Verbose" = "on". But there is no further information found.
Created attachment 151451 [details] Complete Dmesg with verbose on I've added a complete dmesg of installed FreeBSD 10.1 booting with verbose on. The system hangs some time on the Denlow message and after that it sometimes boots through to login. But in some random reboots, it hangs there forever (strange).
So one thing we are doing between those printfs is starting up the other CPUs. Can you build a custom kernel with printfs sprinkled in sys/amd64/amd64/mp_machdep.c? cpu_mp_start() and native_start_all_aps() would be good places to start.
Created attachment 151604 [details] Bootup with printf enabled I've added some printf in sys/amd64/amd64/mp_machdep.c to the functions cpu_mp_start() and start_all_aps() (I didn't find native_start_all_aps()...) and booted up the system again. These printf are the comments of blocks inside the function.
Comment on attachment 151604 [details] Bootup with printf enabled So it looks like it hung trying to start up the second AP? Normally though we have a timeout here that fails and prints a message (and offers to panic :-/) if the AP doesn't respond in time. In the past when I've seen similar issues they have either been due to SMI# interference (we disable legacy USB mode on certain Macs early in the boot process to workaround similar issues) or they were resolved via a BIOS update. Is this a new Intel platform?
(In reply to John Baldwin from comment #6) Well, its not a new Intel Platform. It is a S1200V3RPL server mainboard running a Xeon E3-1270v3 with 32GB of memory. The current BIOS version is installed, maybe it needs some more fixes. If I disable legacy USB it wont boot via PXE, so that isn't an option...
I think I saw in a thread on stable@ that you said it works fine if you disable HTT? Is that true?
(In reply to John Baldwin from comment #8) Someone else said that it worked. But in my case this doesn't help. Also isn't disabling HTT an option, because we need HTT.
Are there any news towards this bug?
I assume you've let it sit for more than 5 seconds? Normally if trying to start an AP times out, it panics after waiting about 5 seconds. I'm puzzled that it is hanging forever without panic'ing. The reason to test with legacy USB disabled is to try to narrow down a possible cause (similar to HTT) btw. If you are up for one more test, can you also instrument the "start_ap" function to verify that it at least gets into the spin loop and is spinning when it dies? (Maybe print out something periodically in the wait loop, could start with once a second, but if that doesn't output you could make it print more often to see if it is looping at all, or if it is hanging in the DELAY() call)
Created attachment 152254 [details] Printf in start_ap function (In reply to John Baldwin from comment #11) I've waited longer than 5 seconds. The systems stays on a couple of hours. So i've added some printf to the start_ap function. As you can see, it starts a couple of APs and hangs after on one. The message "wait ms: 0" is in the delay while-loop, so it does not seem to hang on the DELAY().
Towards disabling Legacy-USB: I've tested that and I noticed, that the system hangs on the same point as with Legacy-USB enabled, but then it boots correctly after half a minute or so. Maybe it is related and it does not trigger the error. I hope that the provided information helps.
(In reply to Jonas Keidel from comment #12) To be clear, does the machine hang at the image you provided? If so, it appears to be hung in ipi_startup? Can you instrument ipi_startup? The calls in there to lapic_ipi_wait(-1) can potentially hang forever. One thing you can do is to change the '-1's to 1000000 and seeing if the machine boots. It seems that Linux does this (it just gives up waiting if the ICR bit doesn't clear).
Created attachment 152309 [details] ipi_startup printf As you can see on my screenshot, it hangs there (sometimes) forever. So I've changed the delay time to 1000000 and it boots correctly. Should that be 10 seconds or can it be shorter? Now it hangs every booting quiet long because of this delay...
And is it good to correct the symptoms and not the effects? Why gets the ICR not ready fast enough on this system? I think this is more interesting than set a static delay which is on most systems not necessary...
To be clear, it is hanging in the DELAY() and not in the call to lapic_ipi_raw() to send the first startup IPI? (That is, your printfs are before the line in question, not after?) Making the delay longer would seem to contradict that as if it was going to hang in DELAY() it would seem to hang for a long delay the same as a short one (DELAY just spins on the TSC). The 10 millisecond wait there is what is specified in the original Intel SMP spec as the appropriate delay between INIT and STARTUP. Also, if you are hanging in the DELAY, then ICR has cleared just fine. Perhaps post a diff of your changes to mp_machdep.c just so I can be clear on where the logging has been added? Thanks.
Created attachment 152311 [details] mp_machdep.c patch with printf (In reply to John Baldwin from comment #17) This is a diff of my changes in mp_machdep.c. I always put the printf before the appropriate line.
Comment on attachment 152311 [details] mp_machdep.c patch with printf Ahhh, so you didn't alter the arg passed to DELAY(), but you made the lapic_ipi_wait()'s timeout. Ok. That is actually what Linux does too, though I think it might specify a timeout in a time unit (e.g. microseconds) rather than a simple spin count. I will think about this some more to come up with a real patch.
(In reply to John Baldwin from comment #19) That sounds nice that you might have a solution for this! I'm looking forward to the patch fixing this problem!
So it doesn't help that the original Intel MP spec and Intel's later SDM docs conflict with each other. Please try the changes in https://reviews.freebsd.org/D1719
(In reply to John Baldwin from comment #21) Thanks for the patch. I've tested it and works very good. Sometimes it appears to take a little longer to get the IPI initialized, but it does not hang forever. Maybe there is some more room for improvement? Because with the linux kernel there is no delay while initializing the IPI, also on other boards it does not appear. But with this state of the patch it works very well, thanks a lot.
I'm not sure what else to change really. Linux waits for up to 100 milliseconds if the ICR is stuck, but aside from that it uses the same set of operations as in this patch. Linux does increment an interrupt counter called "icr_read_retry_count" when it thinks the ICR is stuck. I'm not sure how it would export it, but perhaps you can see if it is advertised somewhere?
(In reply to John Baldwin from comment #23) Maybe there is a problem while shutting down the system. Because if I boot a linux system, followed by booting a freebsd 10.1 rescue system (based on mfsbsd), there is no problem. Nothing hangs. If I reboot and start the mfsbsd again it hangs. So there might be a problem while shutting down the freebsd. Maybe it sets some registers or whatever, which causes the hanging during the second boot. Might this be a point to figure the problem out?
A commit references this bug: Author: jhb Date: Fri Feb 6 18:20:01 UTC 2015 New revision: 278325 URL: https://svnweb.freebsd.org/changeset/base/278325 Log: Revert the IPI startup sequence to match what is described in the Intel Multiprocessor Specification v1.4. The Intel SDM claims that the INIT IPIs here are invalid, but other systems follow the MP spec instead. While here, fix the IPI wait routine to accept a timeout in microseconds instead of a raw spin count, and don't spin forever during AP startup. Instead, panic if a STARTUP IPI is not delivered after 20 us. PR: 196542 Differential Revision: https://reviews.freebsd.org/D1719 MFC after: 2 weeks Changes: head/sys/amd64/amd64/mp_machdep.c head/sys/i386/i386/mp_machdep.c head/sys/x86/x86/local_apic.c
Does FreeBSD boot fine from a cold boot as well? We don't do anything super special on the APs during shutdown. One thing you can try perhaps is changing the enable_intr() in sys/amd64/amd64/vm_machdep.c cpu_reset() in the #ifdef SMP code to a disable_intr() instead.
Does Revision 278325 fix this issue?
(In reply to chris from comment #27) Yes, this fixed the panic on boot. The remaining issue is that he still sees it take a while after rebooting from FreeBSD (that and another bug report I now have after the commit in question where another machine now panics because the startup IPI takes too long)
Any idea when it will go to 10-STABLE?
(In reply to chris from comment #29) Since this broke other systems on HEAD I want to get that regression fixed before I merge the change.
A commit references this bug: Author: jhb Date: Wed Apr 15 16:52:35 UTC 2015 New revision: 281560 URL: https://svnweb.freebsd.org/changeset/base/281560 Log: MFC 278325,280866: Revert the IPI startup sequence to match what is described in the Intel Multiprocessor Specification v1.4. The Intel SDM claims that 278325: Revert the IPI startup sequence to match what is described in the Intel Multiprocessor Specification v1.4. The Intel SDM claims that the INIT IPIs here are invalid, but other systems follow the MP spec instead. While here, fix the IPI wait routine to accept a timeout in microseconds instead of a raw spin count, and don't spin forever during AP startup. Instead, panic if a STARTUP IPI is not delivered after 20 us. 280866: Wait 100 microseconds for a local APIC to dispatch each startup-related IPI rather than 20. The MP 1.4 specification states in Appendix B.2: "A period of 20 microseconds should be sufficient for IPI dispatch to complete under normal operating conditions". (Note that this appears to be separate from the 10 millisecond (INIT) and 200 microsecond (STARTUP) waits after the IPIs are dispatched.) The Intel SDM is silent on this issue as far as I can tell. At least some hardware requires 60 microseconds as noted in the PR, so bump this to 100 to be on the safe side. PR: 196542, 197756 Changes: _U stable/10/ stable/10/sys/amd64/amd64/mp_machdep.c stable/10/sys/i386/i386/mp_machdep.c stable/10/sys/x86/x86/local_apic.c
I upgrade to 10.2 and the issue still exist.
Issue still exist with 10.3-RELEASE-p7
The patches in question were present in both 10.2 and 10.3. You can try testing the last RC of 11.0 to see if 11 is also affected. If so, you will need to follow the same procedure used earlier in this PR to instrument the relevant functions to narrow down where the hang occurs.
I have a system running 11.0-STABLE (r307819) with similar symptoms. Every now and then the system fails to (re)boot and hangs at line: ACPI APIC Table: <INTEL DENLOW> It seems that the code was modified quite a bit. So what patch can I test? This I a remote server that I can only access via ssh or rescue system.
Is this a server from Hetzner?
Yes, it is.
Ask them to update the BIOS to latest version and see if this fix the issue.
Sorry for reporting into a closed bugreport, but it seems this bug has been re-introduced (or hasn't been fully fixed). I'm stuck with the same problem on an INTEL NUC7i5BN. BIOS has been updated to latest version, boot hangs for legacy & UEFI with any [11.3|12.0|12.1|13.0]-[RELEASE|STABLE|CURRENT]-mini-memstick images currently available for download... Last few lines of boot -v output: --- [...] ACPI APIC Table: <INTEL NUC7i5BN> Package ID shift: 4 L3 cache ID shift: 4 L2 cache ID shift: 1 L1 cache ID shift: 1 Core ID shift: 1 _ ---