Bug 121558 - Supermicro X7SB4 Fatal trap 12 when ACPI disabled
Summary: Supermicro X7SB4 Fatal trap 12 when ACPI disabled
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 7.0-STABLE
Hardware: Any Any
: Normal Affects Only Me
Assignee: freebsd-acpi (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-03-10 12:20 UTC by Leon Kos
Modified: 2008-03-13 18:38 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Leon Kos 2008-03-10 12:20:03 UTC
I am getting Fatal trap 12: page fault while in kernel mode
when booting GENERIC debug kernel with ACPI disabled on Supermicro X7SB4 motherboard with dual core Xeon CPU.

instruction pointer     =  0x20:0xc0a49aea
[root@cad ~]# nm -n /boot/kernel/kernel|grep c0a49ae
c0a49ae0 T ioapic_get_vector

Custom kernel shows up the same routine.

Here is dmesg after boot -v: http://pastebin.ca/936377

[root@cad ~]# sysctl hw.acpi
hw.acpi.supported_sleep_state: S1 S4 S5
hw.acpi.power_button_state: S5
hw.acpi.sleep_button_state: S1
hw.acpi.lid_switch_state: NONE
hw.acpi.standby_state: S1
hw.acpi.suspend_state: S3
hw.acpi.sleep_delay: 1
hw.acpi.s4bios: 0
hw.acpi.verbose: 1
hw.acpi.disable_on_reboot: 0
hw.acpi.handle_reboot: 0
hw.acpi.reset_video: 0
hw.acpi.cpu.cx_lowest: C1

acpidump -dt
http://www.lecad.uni-lj.si/~leon/tmp/supermicro-x7sb4.asl

Fix: 

Booting with ACPI does not show this problem although rebooting does not work even if setting  hw.acpi.disable_on_reboot and hw.acpi.handle_reboot
How-To-Repeat: Supermicro X7SB4 motherboard with latest BIOS
Boot GENERIC with ACPI disabled
Comment 1 Mark Linimon freebsd_committer freebsd_triage 2008-03-10 13:22:31 UTC
Responsible Changed
From-To: freebsd-bugs->freebsd-acpi

Over to maintainer(s).
Comment 2 Mark Linimon freebsd_committer freebsd_triage 2008-03-10 22:48:23 UTC
State Changed
From-To: open->feedback

Note that submitter has been asked for feedback.
Comment 3 Volker 2008-03-11 01:03:14 UTC
Leon,

we're really sorry to tell, but I think everybody in the team is bad on
guessing.

Can you please post actual and _complete_ panic message and a backtrace
as a followup to this ticket? Without that, nobody is able to analyze this.

Please not, the pastebin link does not work.

Thanks!
Comment 4 begunje 2008-03-11 09:16:25 UTC
------=_Part_230_362368.1205226985575
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

I am sorry but my response was discarded and considered spam as my SMTP site
is multi-homed.
I am still resolving this DNS issue and now using my alternate e-mail for
followup.

Pastebin link is OK, it's just wrongly converted.  This boot log is now also
at
http://www.lecad.uni-lj.si/~leon/other/x7sb4/936377.html

And acpidump -dt can also be reached at
http://www.lecad.uni-lj.si/~leon/other/x7sb4/supermicro-x7sb4.asl

---------- Forwarded message ----------
Date: Mon, 10 Mar 2008 17:32:16 +0100 (CET)
From: Leon Kos <leon.kos@lecad.uni-lj.si>
To: Dan Lukes <dan@obluda.cz>
Cc: bug-followup@FreeBSD.org
Subject: Re: kern/121558: Supermicro X7SB4 Fatal trap 12 when ACPI disabled

I have added options KDB and DDB to GENERIC but don't know how to produce
core-dump at boot, so I've took a picture of the screen.
I've also opened http://www.freebsd.org/cgi/query-pr.cgi?pr=121558

Photo http://www.lecad.uni-lj.si/~leon/other/x7sb4/img_1636.jpg shows
nm -n /boot/kernel/kernel |grep c0a5ae6
c0a5ae60 T ioapic_get_vector

http://www.lecad.uni-lj.si/~leon/other/x7sb4/img_1630.jpg shows first screen
of the stack trace.
http://www.lecad.uni-lj.si/~leon/other/x7sb4/img_1631.jpg is a continuation
of the trace
http://www.lecad.uni-lj.si/~leon/other/x7sb4/img_1632.jpg is the end of
trace


http://www.lecad.uni-lj.si/~leon/other/x7sb4/img_1634.jpg is first screen
after where command
http://www.lecad.uni-lj.si/~leon/other/x7sb4/img_1635.jpg is last screen of
the where command

http://www.lecad.uni-lj.si/~leon/other/x7sb4/img_1637.jpg is from mine
opinion unrelated
problem that shows what happens after reboot command that gets overprinted
and never reboots.
I am attaching it anyway to get some suggestion on how to handle it.
CTRL-ALT-ESC does not
work.


Could you instruct me on how to get kernel core dumped manualy? I've set
rc.conf dupmdev=/dev/da0s1b
But this is not valid for kernels that does not get into multiuser, I think.

Kind regards!

Leon Kos, CAD lab, Mech.Eng., University of Ljubljana, Slovenia
(http://www.lecad.uni-lj.si/~leon)

------=_Part_230_362368.1205226985575
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

I am sorry but my response was discarded and considered spam as my SMTP site is multi-homed. <br>I am still resolving this DNS issue and now using my alternate e-mail for followup.<br><br>Pastebin link is OK, it&#39;s just wrongly converted.&nbsp; This boot log is now also at<br>
<a href="http://www.lecad.uni-lj.si/~leon/other/x7sb4/936377.html">http://www.lecad.uni-lj.si/~leon/other/x7sb4/936377.html</a><br><br>And acpidump -dt can also be reached at<br><a href="http://www.lecad.uni-lj.si/~leon/other/x7sb4/supermicro-x7sb4.asl"
Comment 5 Dan Lukes 2008-03-11 10:53:23 UTC
> ioapic_get_vector(0,13) at ioapic_get_vector+0x0a
> mptable_pci_route_interrupt_handler(c009de3d,c1020890) at mptable_pci_route_interrupt_handler+0x35

The function implementation:

  ------------------
> ioapic_get_vector(void *cookie, u_int pin)
> {
>         struct ioapic *io;
> 
>         io = (struct ioapic *)cookie;
>         if (pin >= io->io_numintr)
...
  ------------------


E.g. the ioapic_get_vector seems to be called with NULL cookie which is 
used as a valid pointer a dereferenced - exactly as I fabulate in 
previous email.

The function implementation:

  ------------------
> mptable_pci_route_interrupt_handler(u_char *entry, void *arg)
> {
> ...
>         /* Make sure the APIC maps to a known APIC. */
>         KASSERT(ioapics[intr->dst_apic_id] != NULL,
>             ("No I/O APIC %d to route interrupt to", intr->dst_apic_id));
> ...
>         vector = ioapic_get_vector(ioapics[intr->dst_apic_id],
>             intr->dst_apic_int);
  ------------------

	As your kernel is compiled without INVARIANTS the KASSERT test become 
void and ioapic_get_vector may be called with NULL causing the abend later.

	It's because the intr->dst_apic_id point to APIC that doesn't exist 
(you can run kernel with INVARIANTS to display the dst_apic_id in question).

	Please note the MPTABLE generated by BIOS MAY change also when ACPI is 
(de)activated in BIOS. You may try to boot with ACPI enabled in BIOS but 
disabled in OS. It may (or may not) help to you.

	It may be problem with MPTABLE itself (eg. not in FreeBSD) or with it's 
parsing (e.g. in FreeBSD). MPTABLE is generated by BIOS. Look into BIOS 
if MPTABLE version can be set then use 1.4. Especially dont use "auto" 
as version even such option present.

	The output of mptable command may help to us.

							Dan
Comment 6 Leon Kos 2008-03-11 13:45:40 UTC
mptable output of the system is located at:
http://www.lecad.uni-lj.si/~leon/other/x7sb4/mptable.txt

I could not change version in BIOS. Processor option supported by BIOS are 
shown in http://www.lecad.uni-lj.si/~leon/other/x7sb4/img_1638.jpg

I have added
options     INVARIANTS
options     INVARIANT_SUPPORT
options     DIAGNOSTIC

and now GENERIC shows the following trace when ACPI disabled:
http://www.lecad.uni-lj.si/~leon/other/x7sb4/img_1639.jpg
http://www.lecad.uni-lj.si/~leon/other/x7sb4/img_1640.jpg
http://www.lecad.uni-lj.si/~leon/other/x7sb4/img_1641.jpg


Forcing single core in BIOS helps when running ACPI disabled.
Toggling other BIOS options like C1 Enhanced Mode or Speed Step
does not.


Kind regards!

Leon Kos, CAD lab, Mech.Eng., University of Ljubljana, Slovenia
(http://www.lecad.uni-lj.si/~leon)


On Tue, 11 Mar 2008, Dan Lukes wrote:

> The following reply was made to PR kern/121558; it has been noted by GNATS.
>
> From: Dan Lukes <dan@obluda.cz>
> To: Leon Kos <begunje@gmail.com>
> Cc: jhb@freebsd.org, volker@vwsoft.com, bug-followup@freebsd.org
> Subject: Re: kern/121558: Supermicro X7SB4 Fatal trap 12 when ACPI disabled
> Date: Tue, 11 Mar 2008 11:53:23 +0100
>
> > ioapic_get_vector(0,13) at ioapic_get_vector+0x0a
> > mptable_pci_route_interrupt_handler(c009de3d,c1020890) at mptable_pci_route_interrupt_handler+0x35
>
> The function implementation:
>
>   ------------------
> > ioapic_get_vector(void *cookie, u_int pin)
> > {
> >         struct ioapic *io;
> >
> >         io = (struct ioapic *)cookie;
> >         if (pin >= io->io_numintr)
> ...
>   ------------------
>
>
> E.g. the ioapic_get_vector seems to be called with NULL cookie which is
> used as a valid pointer a dereferenced - exactly as I fabulate in
> previous email.
>
> The function implementation:
>
>   ------------------
> > mptable_pci_route_interrupt_handler(u_char *entry, void *arg)
> > {
> > ...
> >         /* Make sure the APIC maps to a known APIC. */
> >         KASSERT(ioapics[intr->dst_apic_id] != NULL,
> >             ("No I/O APIC %d to route interrupt to", intr->dst_apic_id));
> > ...
> >         vector = ioapic_get_vector(ioapics[intr->dst_apic_id],
> >             intr->dst_apic_int);
>   ------------------
>
> 	As your kernel is compiled without INVARIANTS the KASSERT test become
> void and ioapic_get_vector may be called with NULL causing the abend later.
>
> 	It's because the intr->dst_apic_id point to APIC that doesn't exist
> (you can run kernel with INVARIANTS to display the dst_apic_id in question).
>
> 	Please note the MPTABLE generated by BIOS MAY change also when ACPI is
> (de)activated in BIOS. You may try to boot with ACPI enabled in BIOS but
> disabled in OS. It may (or may not) help to you.
>
> 	It may be problem with MPTABLE itself (eg. not in FreeBSD) or with it's
> parsing (e.g. in FreeBSD). MPTABLE is generated by BIOS. Look into BIOS
> if MPTABLE version can be set then use 1.4. Especially dont use "auto"
> as version even such option present.
>
> 	The output of mptable command may help to us.
>
> 							Dan
> _______________________________________________
> freebsd-acpi@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-acpi
> To unsubscribe, send any mail to "freebsd-acpi-unsubscribe@freebsd.org"
>
Comment 7 Volker Werth freebsd_committer freebsd_triage 2008-03-11 14:35:19 UTC
Please don't post HTML mail to a PR ticket! (ticket manually cleared
from a bunch of HTML waste)
Comment 8 begunje 2008-03-11 14:48:52 UTC
mptable output of the system is located at:
http://www.lecad.uni-lj.si/~leon/other/x7sb4/mptable.txt

I could not change version in BIOS. Processor option supported by BIOS are
shown in http://www.lecad.uni-lj.si/~leon/other/x7sb4/img_1638.jpg

I have added
options     INVARIANTS
options     INVARIANT_SUPPORT
options     DIAGNOSTIC

and now GENERIC shows the following trace when ACPI disabled:
http://www.lecad.uni-lj.si/~leon/other/x7sb4/img_1639.jpg
http://www.lecad.uni-lj.si/~leon/other/x7sb4/img_1640.jpg
http://www.lecad.uni-lj.si/~leon/other/x7sb4/img_1641.jpg


Forcing single core in BIOS helps when running ACPI disabled.
Toggling other BIOS options like C1 Enhanced Mode or Speed Step
does not.


Kind regards!

Leon Kos, CAD lab, Mech.Eng., University of Ljubljana, Slovenia
(http://www.lecad.uni-lj.si/~leon)
Comment 9 Leon Kos 2008-03-11 15:11:45 UTC
I do have latest BIOS 1.0a.

How can I manually specify the routing?


Kind regards!

Leon Kos, CAD lab, Mech.Eng., University of Ljubljana, Slovenia
(http://www.lecad.uni-lj.si/~leon)


On Tue, 11 Mar 2008, Dan Lukes wrote:

> John Baldwin napsal/wrote, On 03/11/08 15:03:
>> Your MPTable is broken.  It has 3 entries which use an I/O APIC ID of 0, 
>> but you don't have an I/O APIC with an ID of 0:
>
>> You can work around this by manually specifying the routing
>
> 	You shall update BIOS also unless you have the latest version (1.0a)
>
> 						Dan
>
>
Comment 10 Leon Kos 2008-03-12 08:14:28 UTC
I have added

  hw.pci13.0.INTA.irq="16"
  hw.pci15.0.INTA.irq="17"
  hw.pci5.0.INTA.irq="19"

to /boot/loader.conf and to /boot/device.hints without and face no effect
of this options when looking mptable. Then I've created CAD.hints
  hw.pci13.0.INTA.irq=16
  hw.pci15.0.INTA.irq=17
  hw.pci5.0.INTA.irq=19

and included this in my kernel config with 
hints          "CAD.hints"

Now this kernel does not boot. So there is some progress in this. I've also 
tried to prepend hint. to options without notable difference. So I suspect, 
that adding hints to the kernel works, just the config is wrong. I've 
created dmesg of boot -v at

http://www.lecad.uni-lj.si/~leon/other/x7sb4/boot-v.txt

if it is of any value for a more precise instructions on the above settings. 
Kernel that does not boot with the above settings outputs just a single
| after a screen blink.

Thank you for all suggestions so far!

Leon Kos, CAD lab, Mech.Eng., University of Ljubljana, Slovenia
(http://www.lecad.uni-lj.si/~leon)


On Tue, 11 Mar 2008, John Baldwin wrote:

> On Tuesday 11 March 2008 09:50:03 am Leon Kos wrote:
>> The following reply was made to PR kern/121558; it has been noted by GNATS.
>>
>> From: Leon Kos <leon.kos@lecad.uni-lj.si>
>> To: Dan Lukes <dan@obluda.cz>
>> Cc: freebsd-acpi@freebsd.org, bug-followup@freebsd.org
>> Subject: Re: kern/121558: Supermicro X7SB4 Fatal trap 12 when ACPI disabled
>> Date: Tue, 11 Mar 2008 14:45:40 +0100 (CET)
>>
>>  mptable output of the system is located at:
>>  http://www.lecad.uni-lj.si/~leon/other/x7sb4/mptable.txt
>
> Your MPTable is broken.  It has 3 entries which use an I/O APIC ID of 0, but
> you don't have an I/O APIC with an ID of 0:
>
> I/O APICs:	APIC ID	Version	State		Address
> 		 2	 0x20	 usable		 0xfec00000
> 		 3	 0x20	 usable		 0xfecc0000
> 		 4	 0x20	 usable		 0xfecc0400
>
> --
> I/O Ints:	Type	Polarity    Trigger	Bus ID	 IRQ	APIC ID	PIN#
> ...
> 		INT	active-lo       level	    13	 0:A	      0	  16
> 		INT	active-lo       level	    15	 0:A	      0	  17
> 		INT	active-lo       level	     5	 0:A	      0	  19
>
> You can work around this by manually specifying the routing for these devices
> with hints.  E.g. to use I/O APIC 2, you would do:
>
> hw.pci13.0.INTA.irq=16
> hw.pci15.0.INTA.irq=17
> hw.pci5.0.INTA.irq=19
>
> To use one of the other I/O APICs you will need to examine the dmesg to find
> the first IRQ for the I/O APIC (boot verbose might help) and add that to 16,
> 17, 19, etc. to come up with the appropriate IRQ number.
>
> In this case after looking at your dmesg, the BIOS uses the same GSI layout
> for the I/O APICs that FreeBSD's MP Table code uses, so you can just use the
> IRQs from the ACPI kernel.  From your dmesg ACPI is using the settings above
> (i.e. all 3 devices are using I/O APIC 2).
>
> --
> John Baldwin
>
Comment 11 Leon Kos 2008-03-12 10:02:09 UTC
I must amend my previous findings that CAD.hints to the kernel like

hw.pci13.0.INTA.irq=40
hw.pci15.0.INTA.irq=41
hw.pci5.0.INTA.irq=43

or

hw.pci13.0.INTA.irq=64
hw.pci15.0.INTA.irq=65
hw.pci5.0.INTA.irq=67

does not give me more than one character [|/\] boot progress  after issuing 
boot -v. IRQ offset was extracted from the lines

MADT: Found IO APIC ID 3, Interrupt 24 at 0xfecc0000
MADT: Found IO APIC ID 4, Interrupt 48 at 0xfecc0400

Kind regards!

Leon Kos, CAD lab, Mech.Eng., University of Ljubljana, Slovenia
(http://www.lecad.uni-lj.si/~leon)
Comment 12 Volker 2008-03-12 13:51:56 UTC
On 12/23/-58 20:59, Leon Kos wrote:
> I have added
> 
>  hw.pci13.0.INTA.irq="16"
>  hw.pci15.0.INTA.irq="17"
>  hw.pci5.0.INTA.irq="19"
> 
> to /boot/loader.conf and to /boot/device.hints without and face no effect
> of this options when looking mptable. Then I've created CAD.hints
>  hw.pci13.0.INTA.irq=16
>  hw.pci15.0.INTA.irq=17
>  hw.pci5.0.INTA.irq=19
> 
> and included this in my kernel config with hints          "CAD.hints"

This is from my memories: I think I've investigated the hints include
thing a while ago and found, when using this, the default hints file
will not be processed anymore.

If you're going to use 'hints "CAD.hints"' make sure it includes all the
settings from default hints file (I do have the light feeling you've
only included your additions to that file).

Again, this is from memory and I welcome corrections if this statement
is caused by remembering it wrong.

Volker
Comment 13 Leon Kos 2008-03-12 15:15:07 UTC
It really looks like that if one uses hints to be statically compiled, then 
/boot/device.hints are ignored. After adding GENERIC.hints to CAD.hints my
kernel booted. But not with ACPI disabled. Same issue with Fatal trap 12.

I've added to /boot/device.hints
hw.pci13.0.INTA.irq="16"
hw.pci15.0.INTA.irq="17"
hw.pci5.0.INTA.irq="17"

With GENERIC and this hints does not help when booting ACPI disabled. Even 
if they are statically compiled in.  Is the syntax correct? How can I 
verify this? mptable is not useful as John Baldwin said?

I am also sad to say, that Supermicro response on the issue was:
Hello Sir,

We didn't validate FreeBSD with our X7SB4, can you please install one of the 
validated OS'es and see if error is still the same?

http://www.supermicro.nl/support/resources/OS/X7S.cfm

Met vriendelijke groet / Best regards / Mit freundlichem gruß,

Peter Maas
Senior Application Engineer

Kind regards,

Leon Kos, CAD lab, Mech.Eng., University of Ljubljana, Slovenia
(http://www.lecad.uni-lj.si/~leon)
Comment 14 John Baldwin freebsd_committer freebsd_triage 2008-03-12 15:45:40 UTC
On Wednesday 12 March 2008 11:15:07 am Leon Kos wrote:
> It really looks like that if one uses hints to be statically compiled, th=
en=20
> /boot/device.hints are ignored. After adding GENERIC.hints to CAD.hints my
> kernel booted. But not with ACPI disabled. Same issue with Fatal trap 12.
>=20
> I've added to /boot/device.hints
> hw.pci13.0.INTA.irq=3D"16"
> hw.pci15.0.INTA.irq=3D"17"
> hw.pci5.0.INTA.irq=3D"17"

How about just removing hints from your kernel config completely and just=20
putting hints in /boot/device.hints.  That is, remove CAD.hints and just=20
leave relevant hints in /boot/device.hints.

> With GENERIC and this hints does not help when booting ACPI disabled. Eve=
n=20
> if they are statically compiled in.  Is the syntax correct? How can I=20
> verify this? mptable is not useful as John Baldwin said?
>=20
> I am also sad to say, that Supermicro response on the issue was:
> Hello Sir,
>=20
> We didn't validate FreeBSD with our X7SB4, can you please install one of =
the=20
> validated OS'es and see if error is still the same?
>=20
> http://www.supermicro.nl/support/resources/OS/X7S.cfm
>=20
> Met vriendelijke groet / Best regards / Mit freundlichem gru=DF,
>=20
> Peter Maas
> Senior Application Engineer
>=20
> Kind regards,
>=20
> Leon Kos, CAD lab, Mech.Eng., University of Ljubljana, Slovenia
> (http://www.lecad.uni-lj.si/~leon)

If they support Linux, boot Linux with ACPI disabled.

=2D-=20
John Baldwin
Comment 15 Leon Kos 2008-03-13 10:07:41 UTC
Previously I've added hints to /boot/loader.conf and booted GENERIC with 
ACPI disabled. Moving hints to /boot/device.hints does not help!
That's why I've asked if the syntax:
hw.pci13.0.INTA.irq="16"
hw.pci15.0.INTA.irq="17"
hw.pci5.0.INTA.irq="19"
is correct?

I am still getting "No I/O APIC 0 to route interrupt to" as shown in
http://www.lecad.uni-lj.si/~leon/other/x7sb4/img_1650.jpg

I've also tried to boot OpenSUSE 10.3 that has kernel 2.6.21.5-31 and it 
boots with or without ACPI.
http://www.lecad.uni-lj.si/~leon/other/x7sb4/img_1651.jpg shows dmesg and 
/proc/interrupts with acpi=off
http://www.lecad.uni-lj.si/~leon/other/x7sb4/img_1652.jpg is the same with 
enabled ACPI (default)

Linux appears to work well with this board. Even handles reboot well while 
FreeBSD 7.0 after upgrade does not as I staded before and shown in photo
http://www.lecad.uni-lj.si/~leon/other/x7sb4/img_1637.jpg

Kind regards!

Leon Kos, CAD lab, Mech.Eng., University of Ljubljana, Slovenia
(http://www.lecad.uni-lj.si/~leon)
Comment 16 Leon Kos 2008-03-13 12:44:49 UTC
From sys/dev/pci/pci.c I see that syntax in pci_assign_interrupt() for 
7.0-STABLE is different that one provided:

         /* Let the user override the IRQ with a tunable. */
         irq = PCI_INVALID_IRQ;
         snprintf(tunable_name, sizeof(tunable_name),
             "hw.pci%d.%d.%d.INT%c.irq",
             cfg->domain, cfg->bus, cfg->slot, cfg->intpin + 'A' - 1);
         if (TUNABLE_INT_FETCH(tunable_name, &irq) && (irq >= 255 || irq <= 0))
                 irq = PCI_INVALID_IRQ;


This is the reason for hints not getting fetched. How should I change hints 
to?

For linux logs and not seeing ethernet cards, it is shown in
http://www.lecad.uni-lj.si/~leon/other/x7sb4/img_1652.jpg
that ethernet devices are not configured and that dmesg outputs just 3 
lines. Maybe this is the reason for a bootable linux in any case.

Suggestion that EHCI is a case for rebooting problems was correct.
I have disabled USB on the motherboard and now it reboots! It is in my 
nature to disable things that I do not need, but after BIOS upgrade and 
consequent BIOS reset to defaults I've overlooked this. So, I am taking back 
my statement that reboot worked in 6.3-STABLE and not working in 7.0-STABLE.
It was just that I've had disabled USB previously and forgot to re-disable 
it for 7.0.

But we can say that Supermicro X7SB4 board has broken EHCI controller.
Not  mentioning troubles with onboard AIC7901 Ultra320 SCSI adapter 
that spills out bunch of "Invalid Sequencer interrupt occurred" when trying 
to run full speed.

  Leon Kos, CAD lab, Mech.Eng., University of Ljubljana, Slovenia
(http://www.lecad.uni-lj.si/~leon)


On Thu, 13 Mar 2008, John Baldwin wrote:

> On Thursday 13 March 2008 06:10:02 am Leon Kos wrote:
>> The following reply was made to PR kern/121558; it has been noted by GNATS.
>>
>> From: Leon Kos <leon.kos@lecad.uni-lj.si>
>> To: John Baldwin <jhb@freebsd.org>
>> Cc: Volker <volker@vwsoft.com>, freebsd-acpi@freebsd.org,
>>         bug-followup@freebsd.org
>> Subject: Re: kern/121558: Supermicro X7SB4 Fatal trap 12 when ACPI disabled
>> Date: Thu, 13 Mar 2008 11:07:41 +0100 (CET)
>>
>>  Previously I've added hints to /boot/loader.conf and booted GENERIC with
>>  ACPI disabled. Moving hints to /boot/device.hints does not help!
>>  That's why I've asked if the syntax:
>>  hw.pci13.0.INTA.irq="16"
>>  hw.pci15.0.INTA.irq="17"
>>  hw.pci5.0.INTA.irq="19"
>>  is correct?
>
> Yes.
>
> The code looks like this:
>
>        /* Let the user override the IRQ with a tunable. */
>        irq = PCI_INVALID_IRQ;
>        snprintf(tunable_name, sizeof(tunable_name), "hw.pci%d.%d.INT%c.irq",
>            cfg->bus, cfg->slot, cfg->intpin + 'A' - 1);
>        if (TUNABLE_INT_FETCH(tunable_name, &irq) && (irq >= 255 || irq <= 0))
>                irq = PCI_INVALID_IRQ;
>
>        /*
>         * If we didn't get an IRQ via the tunable, then we either use the
>         * IRQ value in the intline register or we ask the bus to route an
>         * interrupt for us.  If force_route is true, then we only use the
>         * value in the intline register if the bus was unable to assign an
>         * IRQ.
>         */
>        if (!PCI_INTERRUPT_VALID(irq)) {
>                if (!PCI_INTERRUPT_VALID(cfg->intline) || force_route)
>                        irq = PCI_ASSIGN_INTERRUPT(bus, dev);
>                if (!PCI_INTERRUPT_VALID(irq))
>                        irq = cfg->intline;
>        }
>
> The PCI_ASSIGN_INTERRUPT routine is the one that ends up invoking the
> mptable_pci_route_interrupt() function.
>
>>  I am still getting "No I/O APIC 0 to route interrupt to" as shown in
>>  http://www.lecad.uni-lj.si/~leon/other/x7sb4/img_1650.jpg
>
> I would add printfs to the code above to make sure the tunable is being
> triggered.
>
>>  I've also tried to boot OpenSUSE 10.3 that has kernel 2.6.21.5-31 and it
>>  boots with or without ACPI.
>>  http://www.lecad.uni-lj.si/~leon/other/x7sb4/img_1651.jpg shows dmesg and
>>  /proc/interrupts with acpi=off
>>  http://www.lecad.uni-lj.si/~leon/other/x7sb4/img_1652.jpg is the same with
>>  enabled ACPI (default)
>
> Neither of the /proc/interrupts show the eth devices for any of the IRQs.
> Perhaps it is just not setting up interrupts at all for the eth devices in
> this case?  You would need the dmesg lines for the actual eth devices to see
> what IRQs they are using.
>
>>  Linux appears to work well with this board. Even handles reboot well while
>>  FreeBSD 7.0 after upgrade does not as I staded before and shown in photo
>>  http://www.lecad.uni-lj.si/~leon/other/x7sb4/img_1637.jpg
>
> You can debug why it hangs, but you will need to do some work to figure it
> out.  I would start by adding printfs around the 'device_shutdown()' of
> root_bus in sys/kern/subr_bus.c as well as printfs for in
> bus_generic_shutdown() of each device name before invoking its shutdown
> routine to see if it hangs on a device driver's shutdown routine.  I
> committed a hang on reboot fix yesterday to HEAD involving some busted BIOSes
> handling of ehci(4) controllers.
>
> --
> John Baldwin
>
Comment 17 John Baldwin freebsd_committer freebsd_triage 2008-03-13 13:10:06 UTC
On Thursday 13 March 2008 08:44:49 am Leon Kos wrote:
> From sys/dev/pci/pci.c I see that syntax in pci_assign_interrupt() for
> 7.0-STABLE is different that one provided:
>
>          /* Let the user override the IRQ with a tunable. */
>          irq = PCI_INVALID_IRQ;
>          snprintf(tunable_name, sizeof(tunable_name),
>              "hw.pci%d.%d.%d.INT%c.irq",
>              cfg->domain, cfg->bus, cfg->slot, cfg->intpin + 'A' - 1);
>          if (TUNABLE_INT_FETCH(tunable_name, &irq) && (irq >= 255 || irq <=
> 0)) irq = PCI_INVALID_IRQ;
>
>
> This is the reason for hints not getting fetched. How should I change hints
> to?

Argh, yes.  It probably should support the old format for domain == 0 devices.  
Change them to each be 'hw.pci0.13.0.INTA.irq' vs 'hw.pci13.0.INTA.irq'.

> For linux logs and not seeing ethernet cards, it is shown in
> http://www.lecad.uni-lj.si/~leon/other/x7sb4/img_1652.jpg
> that ethernet devices are not configured and that dmesg outputs just 3
> lines. Maybe this is the reason for a bootable linux in any case.
>
> Suggestion that EHCI is a case for rebooting problems was correct.
> I have disabled USB on the motherboard and now it reboots! It is in my
> nature to disable things that I do not need, but after BIOS upgrade and
> consequent BIOS reset to defaults I've overlooked this. So, I am taking
> back my statement that reboot worked in 6.3-STABLE and not working in
> 7.0-STABLE. It was just that I've had disabled USB previously and forgot to
> re-disable it for 7.0.

I would try grabbing my last commit to ehci_pci.c and seeing if it fixes your 
reboot hang.

-- 
John Baldwin
Comment 18 Leon Kos 2008-03-13 15:13:36 UTC
I've added

hw.pci0.13.0.INTA.irq="16"
hw.pci0.15.0.INTA.irq="17"
hw.pci0.5.0.INTA.irq="19"

and now it boots when ACPI disabled. I have also tried to move hints into 
/boot/loader.conf and this also works.

Now I have

hw.pci0.13.0.INTA.irq="40"
hw.pci0.15.0.INTA.irq="41"
hw.pci0.5.0.INTA.irq="43"

in /boot/loader.conf and I see the following lines in dmesg log:
em3: <Intel(R) PRO/1000 Network Connection Version - 6.7.3> port 0x5000-0x501f mem 0xd8400000-0xd841ffff irq 40 at device 0.0 on pci13
em4: <Intel(R) PRO/1000 Network Connection Version - 6.7.3> port 0x6000-0x601f mem 0xd8500000-0xd851ffff irq 41 at device 0.0 on pci15
em2: <Intel(R) PRO/1000 Network Connection Version - 6.7.3> port 0x4000-0x401f mem 0xd8320000-0xd833ffff,0xd8300000-0xd831ffff irq 43 at device 0.0 on pci5

that I plan to stick with.

I've replaced  src/sys/dev/usb/ehci_pci.c with revision 1.3 from trunk and 
now also reboot is handled well.

Thank you for all support and now I suggest to close the ticket.
We'll see if BIOS will be upgraded by Supermicro. For now, above workaround 
is the only cure for this and similar boards.

Kind regards,

Leon Kos, CAD lab, Mech.Eng., University of Ljubljana, Slovenia
(http://www.lecad.uni-lj.si/~leon)
Comment 19 John Baldwin freebsd_committer freebsd_triage 2008-03-13 18:38:08 UTC
State Changed
From-To: feedback->closed

Submitter reports that problem is fixed/worked around.