Bug 28736

Summary: sysctl -a: kernel trap 12.
Product: Base System Reporter: Thomas Quinot <thomas>
Component: kernAssignee: dd <dd>
Status: Closed FIXED    
Severity: Affects Only Me    
Priority: Normal    
Version: 4.3-STABLE   
Hardware: Any   
OS: Any   

Description Thomas Quinot 2001-07-05 21:40:01 UTC
	Running 'sysctl -a' reproduceably causes a null pointer dereference
	in kernel:

Script started on Thu Jul  5 22:31:02 2001
(root@melusine) /var/crash # gdb -k /usr/obj/usr/src/sys/MELUSINE/kernel.debug vmcore.3
GNU gdb 4.18
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-unknown-freebsd"...

IdlePTD 4087808
initial pcb at 343a60
panicstr: page fault
panic messages:
---
Fatal trap 12: page fault while in kernel mode
fault virtual address	= 0x14
fault code		= supervisor read, page not present
instruction pointer	= 0x8:0xc0176bf9
stack pointer	        = 0x10:0xc8f28e00
frame pointer	        = 0x10:0xc8f28e10
code segment		= base rx0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, def32 1, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 1633 (sysctl)
interrupt mask		= none
trap number		= 12
panic: page fault

syncing disks... 13 2 1 1 
done
Uptime: 24m20s

dumping to dev #ad/0x20009, offset 270360
dump ata0: resetting devices .. done
127 126 125 124 123 122 121 120 119 118 117 116 115 114 113 112 111 110 109 108 107 106 105 104 103 102 101 100 99 98 97 96 95 94 93 92 91 90 89 88 87 86 85 84 83 82 81 80 79 78 77 76 75 74 73 72 71 70 69 68 67 66 65 64 63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 
---
#0  dumpsys () at /usr/src/sys/kern/kern_shutdown.c:472
472		if (dumping++) {
(kgdb) bt
#0  dumpsys () at /usr/src/sys/kern/kern_shutdown.c:472
#1  0xc016d765 in boot (howto=256) at /usr/src/sys/kern/kern_shutdown.c:312
#2  0xc016dafd in panic (fmt=0xc02ec7cf "page fault")
    at /usr/src/sys/kern/kern_shutdown.c:559
#3  0xc02a4506 in trap_fatal (frame=0xc8f28dc0, eva=20)
    at /usr/src/sys/i386/i386/trap.c:951
#4  0xc02a41c5 in trap_pfault (frame=0xc8f28dc0, usermode=0, eva=20)
    at /usr/src/sys/i386/i386/trap.c:844
#5  0xc02a3d6b in trap (frame={tf_fs = -923664368, tf_es = -1071382512, 
      tf_ds = -1070333936, tf_edi = 20, tf_esi = -923627932, 
      tf_ebp = -923628016, tf_isp = -923628052, tf_ebx = -1060872140, 
      tf_edx = -1070599904, tf_ecx = -1, tf_eax = -1060872192, tf_trapno = 12, 
      tf_err = 0, tf_eip = -1072206855, tf_cs = 8, tf_eflags = 66054, 
      tf_esp = 2, tf_ss = -923627932}) at /usr/src/sys/i386/i386/trap.c:443
#6  0xc0176bf9 in sysctl_disks (oidp=0xc02ff120, arg1=0x0, arg2=0, 
    req=0xc8f28e64) at /usr/src/sys/kern/subr_disk.c:149
#7  0xc017223d in sysctl_root (oidp=0x0, arg1=0xc8f28ef4, arg2=2, 
    req=0xc8f28e64) at /usr/src/sys/kern/kern_sysctl.c:1033
#8  0xc0172401 in userland_sysctl (p=0xc857f440, name=0xc8f28ef4, namelen=2, 
    old=0x0, oldlenp=0xbfbfeedc, inkernel=0, new=0x0, newlen=0, 
    retval=0xc8f28ef0) at /usr/src/sys/kern/kern_sysctl.c:1126
#9  0xc01722a8 in __sysctl (p=0xc857f440, uap=0xc8f28f80)
    at /usr/src/sys/kern/kern_sysctl.c:1062
#10 0xc02a47a6 in syscall2 (frame={tf_fs = 47, tf_es = 47, tf_ds = 47, 
      tf_edi = 0, tf_esi = 2, tf_ebp = -1077940600, tf_isp = -923627564, 
      tf_ebx = -1077938344, tf_edx = -1077940516, tf_ecx = 0, tf_eax = 202, 
      tf_trapno = 12, tf_err = 2, tf_eip = 134559840, tf_cs = 31, 
      tf_eflags = 643, tf_esp = -1077940660, tf_ss = 47})
    at /usr/src/sys/i386/i386/trap.c:1150
#11 0xc0296015 in Xint0x80_syscall ()
#12 0x8048ab5 in ?? ()
#13 0x8048fe7 in ?? ()
#14 0x8048307 in ?? ()
#15 0x8048137 in ?? ()
(kgdb) fr 6
#6  0xc0176bf9 in sysctl_disks (oidp=0xc02ff120, arg1=0x0, arg2=0, 
    req=0xc8f28e64) at /usr/src/sys/kern/subr_disk.c:149
149			error = SYSCTL_OUT(req, disk->d_dev->si_name, strlen(disk->d_dev->si_name));
(kgdb) print disk
$1 = (struct disk *) 0x0
(kgdb) quit

The following dmesg snippet might be relevant (some of the disk
structures corresponding to the SCSI CD bruner get removed?)

ad0: 6149MB <QUANTUM FIREBALL EX6.4A> [13328/15/63] at ata0-master UDMA33
ad1: 14324MB <QUANTUM FIREBALLlct15 15> [29104/16/63] at ata0-slave UDMA33
acd0: DVD-ROM <Pioneer DVD-ROM ATAPIModel DVD-105S 0122> at ata1-master using PIO4
Waiting 3 seconds for SCSI devices to settle
sa0 at sym0 bus 0 target 5 lun 0
sa0: <HP C1533A HP00> Removable Sequential Access SCSI-2 device 
sa0: 10.000MB/s transfers (10.000MHz, offset 8)
Mounting root from ufs:/dev/ad1s1a
WARNING: / was not properly dismounted
da0 at sym0 bus 0 target 6 lun 0
da0: <IBM DDRS-39130W S71D> Fixed Direct Access SCSI-2 device 
da0: 40.000MB/s transfers (20.000MHz, offset 15, 16bit), Tagged Queueing Enabled
da0: 8715MB (17850000 512 byte sectors: 255H 63S/T 1111C)
(cd0:sym0:0:2:0): got CAM status 0x4c
(cd0:sym0:0:2:0): fatal error, failed to attach to device
(cd0:sym0:0:2:0): lost device
(cd0:sym0:0:2:0): removing device entry

See also PR kern/24596 for a similar problem noted in sysinstall.
At this site, the above CAM messages have probably appeared about
at the same time as the panics.

Fix: 

None known so far.
How-To-Repeat: 	sysctl -a
Comment 1 dima 2001-07-06 02:10:41 UTC
Thomas Quinot <thomas@cuivre.fr.eu.org> writes:
> >Description:
> 	Running 'sysctl -a' reproduceably causes a null pointer dereference
> 	in kernel:

This was only broken for a period of about a day and a half.  Upgrade
to a more recent -stable.
Comment 2 Thomas Quinot 2001-07-06 08:29:35 UTC
Le 2001-07-06, Dima Dorfman écrivait :

> > 	Running 'sysctl -a' reproduceably causes a null pointer dereference
> > 	in kernel:
> This was only broken for a period of about a day and a half.  Upgrade
> to a more recent -stable.

I cvsupped to today's -stalbe, remade kernel and reproduced the crash.
Is a 'make world' required as well?

(also note that this is not the same problem as PR misc/27706,
if that's the one you were referring to: here we're having
a kernel trap 12, not a freeze.)

Thomas.

-- 
    Thomas.Quinot@Cuivre.FR.EU.ORG
Comment 3 dima 2001-07-06 09:41:29 UTC
[ Ernst, I'm cc'ing you regarding PR 24596; I'd like you to try the
patch attached below and see if it helps that problem. ]

Thomas Quinot <thomas@cuivre.fr.eu.org> writes:
> Le 2001-07-06, Dima Dorfman écrivait :
> 
> > > 	Running 'sysctl -a' reproduceably causes a null pointer dereference
> > > 	in kernel:
> > This was only broken for a period of about a day and a half.  Upgrade
> > to a more recent -stable.
> 
> I cvsupped to today's -stalbe, remade kernel and reproduced the crash.
> Is a 'make world' required as well?
> 
> (also note that this is not the same problem as PR misc/27706,
> if that's the one you were referring to: here we're having
> a kernel trap 12, not a freeze.)

Okay, I read your PR too fast; it is indeed a different problem.  You
are right, this is related to cd0 not being present.  The CAM code
doesn't properly clean up after itself in this case (or so I think).
If you can reproduce the problem, please try the patch below and see
if it helps (and you should be able to reproduce it).

Thanks,

					Dima Dorfman
					dima@unixfreak.org


Index: scsi_cd.c
===================================================================
RCS file: /stl/src/FreeBSD/src/sys/cam/scsi/scsi_cd.c,v
retrieving revision 1.51
diff -u -r1.51 scsi_cd.c
--- scsi_cd.c	2001/05/08 08:30:47	1.51
+++ scsi_cd.c	2001/07/06 08:40:55
@@ -487,6 +487,9 @@
 	}
 	devstat_remove_entry(&softc->device_stats);
 	cam_extend_release(cdperiphs, periph->unit_number);
+	if (softc->dev) {
+		disk_destroy(softc->dev);
+	}
 	free(softc, M_DEVBUF);
 	splx(s);
 }
Comment 4 Thomas Quinot 2001-07-06 16:12:06 UTC
Le 2001-07-06, Dima Dorfman écrivait :

> If you can reproduce the problem, please try the patch below and see
> if it helps (and you should be able to reproduce it).

The patch seems to fix the problem, thanks!

> +	if (softc->dev) {
> +		disk_destroy(softc->dev);
> +	}

I rewrote this as
	if (softc->disk.d_dev) {
		disk_destroy(softc->disk.d_dev);
	}
as I am using -STABLE. Apparently my guess was correct: the machine
rebooted correctly, and the sysctl -a went fine.

Next issue will be: why is the attach failing? I'll try to clarify
this and submit a separate PR.

-- 
    Thomas.Quinot@Cuivre.FR.EU.ORG
Comment 5 dd freebsd_committer freebsd_triage 2001-07-10 08:15:43 UTC
Responsible Changed
From-To: freebsd-bugs->dd

I have a patch to fix this.
Comment 6 dd freebsd_committer freebsd_triage 2001-07-11 06:16:49 UTC
State Changed
From-To: open->analyzed

Patch as discussed applied to -current, will mfc in a week or so.  Thanks!
Comment 7 dd freebsd_committer freebsd_triage 2001-07-23 11:09:50 UTC
State Changed
From-To: analyzed->closed

MFC'd