Bug 152250 - [acpi] [patch] Kernel panic when hw.ciss.expose_hidden_physical is set
Summary: [acpi] [patch] Kernel panic when hw.ciss.expose_hidden_physical is set
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 7.2-RELEASE
Hardware: Any Any
: Normal Affects Only Me
Assignee: Sean Bruno
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-11-14 20:20 UTC by loic-freebsd
Modified: 2013-04-04 16:13 UTC (History)
0 users

See Also:


Attachments
file.diff (333 bytes, patch)
2010-11-14 20:20 UTC, loic-freebsd
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description loic-freebsd 2010-11-14 20:20:08 UTC
HP ProLiant DL360 G6 server with an HP StorageWorks MSL4048 Tape Library

# grep ciss /boot/loader.conf 
hw.ciss.expose_hidden_physical=1


When the tunable hw.ciss.expose_hidden_physical is set at boot time, I have a kernel panic:

Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address	= 0x8
fault code		= supervisor read data, page not present
instruction pointer	= 0x8:0xffffffff80201686
stack pointer	        = 0x10:0xffffff807c6ab930
frame pointer	        = 0x10:0x400
code segment		= base rx0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 77 (sysctl)
trap number		= 12
panic: page fault
cpuid = 0
Uptime: 6s
Physical memory: 4073 MB
Dumping 1230 MB:

Backtrace from the core dump:

(kgdb) bt
#0  doadump () at pcpu.h:195
#1  0x0000000000000004 in ?? ()
#2  0xffffffff8054cff9 in boot (howto=260)
    at /usr/src/sys/kern/kern_shutdown.c:418
#3  0xffffffff8054d402 in panic (fmt=0x104 <Address 0x104 out of bounds>)
    at /usr/src/sys/kern/kern_shutdown.c:574
#4  0xffffffff80812563 in trap_fatal (frame=0xffffff0003eb4390, eva=Variable "eva" is not available.
)
    at /usr/src/sys/amd64/amd64/trap.c:756
#5  0xffffffff80812935 in trap_pfault (frame=0xffffff807c6ab880, usermode=0)
    at /usr/src/sys/amd64/amd64/trap.c:672
#6  0xffffffff80813274 in trap (frame=0xffffff807c6ab880)
    at /usr/src/sys/amd64/amd64/trap.c:443
#7  0xffffffff807fd2ce in calltrap ()
    at /usr/src/sys/amd64/amd64/exception.S:218
#8  0xffffffff80201686 in acpi_child_pnpinfo_str_method (cbdev=Variable "cbdev" is not available.
)
    at /usr/src/sys/dev/acpica/acpi.c:850
#9  0xffffffff805753c9 in device_sysctl_handler (oidp=Variable "oidp" is not available.
)
    at /usr/src/sys/kern/subr_bus.c:260
#10 0xffffffff8055654f in sysctl_root (oidp=Variable "oidp" is not available.
)
    at /usr/src/sys/kern/kern_sysctl.c:1419
#11 0xffffffff805578c5 in userland_sysctl (td=0x0, name=0xffffff807c6abac0, 
    namelen=4, old=0x0, oldlenp=Variable "oldlenp" is not available.
) at /usr/src/sys/kern/kern_sysctl.c:1522
#12 0xffffffff80557ad2 in __sysctl (td=0xffffff0003eb4390, 
    uap=0xffffff807c6abbf0) at /usr/src/sys/kern/kern_sysctl.c:1449
#13 0xffffffff80812bb7 in syscall (frame=0xffffff807c6abc80)
    at /usr/src/sys/amd64/amd64/trap.c:899
#14 0xffffffff807fd4db in Xfast_syscall ()
    at /usr/src/sys/amd64/amd64/exception.S:339
#15 0x0000000800719cac in ?? ()
Previous frame inner to this frame (corrupt stack?)

Faulty instruction:
(kgdb) x/i 0xffffffff80201686
0xffffffff80201686 <acpi_child_pnpinfo_str_method+70>:  mov    0x8(%rbx),%edx

Fix: Last called function is acpi_child_pnpinfo_str_method in sys/dev/acpica/acpi.c

static int
acpi_child_pnpinfo_str_method(device_t cbdev, device_t child, char *buf,
    size_t buflen)
{
    ACPI_BUFFER adbuf = {ACPI_ALLOCATE_BUFFER, NULL};
    ACPI_DEVICE_INFO *adinfo;
    struct acpi_device *dinfo = device_get_ivars(child);
    char *end;
    int error;

    error = AcpiGetObjectInfo(dinfo->ad_handle, &adbuf);
    adinfo = (ACPI_DEVICE_INFO *) adbuf.Pointer;
    if (error)
        snprintf(buf, buflen, "unknown");
    else
        snprintf(buf, buflen, "_HID=%s _UID=%lu",
                 (adinfo->Valid & ACPI_VALID_HID) ?
                 adinfo->HardwareId.Value : "none",
                 (adinfo->Valid & ACPI_VALID_UID) ?
                 strtoul(adinfo->UniqueId.Value, &end, 10) : 0);
    if (adinfo)
        AcpiOsFree(adinfo);

    return (0);
}

buf is modified accordingly to "error" value. 

I have found adbuf.Pointer to be set to 0x0 while "error" was set to a zero value. Therefore, references to adinfo struct in snprintf have 0x0 as base.

"error" value is not set correctly. Let's see why in AcpiGetObjectInfo, in sys/contrib/dev/acpica/nsxfname.c

Node = AcpiNsMapHandleToNode (Handle);
if (!Node)
{
    (void) AcpiUtReleaseMutex (ACPI_MTX_NAMESPACE);
    goto Cleanup;
}
(...)
Cleanup:
    ACPI_FREE (Info);
    if (CidList)
    {
        ACPI_FREE (CidList);
    }
    return (Status);

If AcpiNsMapHandleToNode fails, we release a mutex and go to Cleanup:, which does not update Status value before return. 
Status value hence is the one from AcpiUtAcquireMutex called earlier, which is wrong.

Setting Status to AE_BAD_PARAMETER before going to Cleanup fix the issue (I found that AE_BAD_PARAMETER is used elsewhere in the kernel in similar flows when AcpiNsMapHandleToNode is called).

7.0 to 7.3 are affected, patch is attached.

Hope I'm right :)

Patch attached with submission follows:
How-To-Repeat: With the same hardware, put hw.ciss.expose_hidden_physical=1 in loader.conf and reboot.
Comment 1 Mark Linimon freebsd_committer 2010-11-15 13:15:23 UTC
Responsible Changed
From-To: freebsd-bugs->freebsd-scsi

Over to maintainer(s).
Comment 2 loic-freebsd 2011-03-10 20:04:24 UTC
Hello,

Are some informations missing or not clear enough ?
If my report is not usable please tell me how to improve it :)

Regards,
Loïc
Comment 3 Sean Bruno freebsd_committer 2013-01-12 01:24:52 UTC
Responsible Changed
From-To: freebsd-scsi->sbruno

Taking ticket as this is in my universe ish
Comment 4 seanbru 2013-01-12 01:27:59 UTC
This looks correct to me, but the ticket is misfiled as "ciss" and not
"acpi" because of gnats.

Sean
Comment 5 loic-freebsd 2013-04-02 22:32:31 UTC
Hello Sean,

Category has been changed to acpi.

Cheers,
Loic
Comment 6 seanwbruno 2013-04-03 22:48:24 UTC
This is only applicable to stable/7 so I'll go ahead and commit this as
it is appropriate.

stable/8 and newer use different code paths and this problem does not
exist.

Sean
Comment 7 dfilter freebsd_committer 2013-04-04 00:11:33 UTC
Author: sbruno
Date: Wed Apr  3 23:11:15 2013
New Revision: 249073
URL: http://svnweb.freebsd.org/changeset/base/249073

Log:
  Resolve kernel panic that occurs on callback from sysctl when setting
  hw.ciss.expose_hidden_physical=1 on a HP ProLiant DL360 G6 (and possibly
  others) due to mishandling of error value in acpica on stable/7
  
  Note that this is a direct commit as this code has been fixed in stable/8
  (8.4 included) and higher release for quite some time.
  
  PR:	kern/152250
  Submitted by:	Loic Pefferkorn <loic-freebsd@loicp.eu>
  Reviewed by:	avg@

Modified:
  stable/7/sys/contrib/dev/acpica/nsxfname.c

Modified: stable/7/sys/contrib/dev/acpica/nsxfname.c
==============================================================================
--- stable/7/sys/contrib/dev/acpica/nsxfname.c	Wed Apr  3 22:37:40 2013	(r249072)
+++ stable/7/sys/contrib/dev/acpica/nsxfname.c	Wed Apr  3 23:11:15 2013	(r249073)
@@ -361,6 +361,7 @@ AcpiGetObjectInfo (
     if (!Node)
     {
         (void) AcpiUtReleaseMutex (ACPI_MTX_NAMESPACE);
+        Status = AE_BAD_PARAMETER;
         goto Cleanup;
     }
 
_______________________________________________
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
Comment 8 Sean Bruno freebsd_committer 2013-04-04 16:12:13 UTC
State Changed
From-To: open->closed

Ticket is resolved on stable/7 as of svn r249073 and does not apply to any 
currently supported branch.