| Summary: | Reproducable PANIC in -stable and .current | ||
|---|---|---|---|
| Product: | Base System | Reporter: | Stefan Eßer <se> |
| Component: | kern | Assignee: | Dag-Erling Smørgrav <des> |
| Status: | Closed FIXED | ||
| Severity: | Affects Only Me | CC: | se |
| Priority: | Normal | ||
| Version: | 4.4-STABLE | ||
| Hardware: | Any | ||
| OS: | Any | ||
Further debugging reveals, that the panic is caused by PROCFS (on my -c=
urrent
box running either an non-SMP or SMP kernel).
In order to check for the presence of some process %PID%, nessusd perfo=
rms an
opendir("/proc/%PID%"), which can lead to a panic in fstafs(), if the
corresponding process just finishes just before the call to fstatfs ...=
/*
* First method : attempt to open /proc/<pid>
* (we first check that we can open /proc/ourpid because
* we may run on some weird system)
*/
sprintf(procname, "/proc/%d", getpid());
dir =3D opendir(procname);
if(dir) /* The /proc trick worked for us */
{
closedir(dir);
sprintf(procname, "/proc/%d", pid);
dir =3D opendir(procname);
if(dir){
closedir(dir);
return 1;
}
else return 0;
}
Truss output caused by opendir("/proc/%PID%"). Seems that either the ar=
gument
printing does not work for the specific system calls (or there is some =
other
problem), but the opendir call had been issued with a valid directory n=
ame (see
debugging output below):
stat("D=F2=BF=BFqyp(=A8=ACr(=FFt",0xbfbff004) =3D 0 (0x0=
)
open("qyp(=A8=ACr(=FFt",4,027757771104) =3D 13 (0xd)
fstat(13,0xbfbff004) =3D 0 (0x0)
fcntl(0xd,0x2,0x1) =3D 0 (0x0)
fstatfs(0xd,0xbfbfef04) =3D 0 (0x0)
Debugging output:
905 sprintf(procname, "/proc/%d", getpid());
(gdb)
getpid() =3D 724 (0x2d4)
906 dir =3D opendir(procname);
(gdb)
907 if(dir) /* The /proc trick worked for us */
(gdb) p procname
$7 =3D "/proc/724\000"
(gdb) s
909 closedir(dir);
(gdb)
close(13) =3D 0 (0x0)
910 sprintf(procname, "/proc/%d", pid);
=
Responsible Changed From-To: freebsd-bugs->des Fortunately, we have a procfs maintainer. :-) The following patch works for me on -current:
Index: vfs_syscalls.c
===================================================================
RCS file: /usr/cvs/src/sys/kern/vfs_syscalls.c,v
retrieving revision 1.215
diff -u -3 -r1.215 vfs_syscalls.c
--- vfs_syscalls.c 5 Nov 2001 10:33:45 -0000 1.215
+++ vfs_syscalls.c 11 Dec 2001 15:07:18 -0000
@@ -795,6 +795,8 @@
if ((error = getvnode(td->td_proc->p_fd, SCARG(uap, fd), &fp)) != 0)
return (error);
mp = ((struct vnode *)fp->f_data)->v_mount;
+ if (mp == NULL)
+ return (ENOENT);
sp = &mp->mnt_stat;
error = VFS_STATFS(mp, sp, td);
if (error)
Stefan.Esser@o-tel-o.de writes: > In order to check for the presence of some process %PID%, nessusd performs an > opendir("/proc/%PID%"), which can lead to a panic in fstafs(), if the > corresponding process just finishes just before the call to fstatfs ... Ah, OK, I see. First, if nessus needs to check for the existence of a process with a given PID, it should kill(pid, 0) instead of relying on procfs. Second, the problem seems to be insufficient locking - possibly in pseudofs, but equally possibly in the VFS system. When a process terminates, pseudofs automatically reclaims all vnodes associated with it, which leads to an obvious race condition which can only be avoided through proper locking. Looking at the pseudofs code, it doesn't seem to be at fault. When the process exits, pfs_exit() removes all vnodes associated with it from pseudofs' vnode cache, and vgone()s them. I'm starting to think that the race might be in namei(), actually - since the namei() call from fstatfs() succeeds, it means namei() found a vnode, but that vnode's v_mount is NULL, which means it's been reclaimed, which can't happen if it's referenced (can it?), and namei() is supposed to reference the vnode before it returns it. My bet is that there is a small race between namei() finding a vnode and referencing it, but I'm getting out of my depth. Does anybody else have any ideas? DES -- Dag-Erling Smorgrav - des@ofug.org State Changed From-To: open->closed Fix provided in PR has been commited (with EBADF instead of ENOENT). |
I've been suffering from panics during a security scan of a large company LAN. With nessus and 20 simultanous addresses being scanned, I get panics within minutes, which forced me to lower the parallelism to 5 addresses at a time (the whole network will be finished in about 4 years, at the current pace ;-) Since the operational system doing these scans is two hours of travel away, I decided to check, whether the same problem exists in -current on identical hardware at my location: it's there! Since I don't have access to the console of the remote -stable box, I did all my testing on a similar box running -current. The symptoms are identical: Trap 12 within minutes of running the latest nessus-devel (set to test 10 addresses in parallel). I've tried to identify the location of the fstatfs() call in nessusd, but did not find a single occurence. The source of problem appears to be a value of NULL resulting from: mp = ((struct vnode *)fp->f_data)->v_mount; which is dereferenced in the next progam line ... System log of the -current system (no SCSI drives, additional i4B, else identical to the -stable system): Copyright (c) 1992-2001 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 5.0-CURRENT #1: Mon Dec 10 18:00:17 CET 2001 se@server:/usr/src/sys/i386/compile/TYAN Preloaded elf kernel "/boot/kernel/kernel" at 0xc049d000. Timecounter "i8254" frequency 1193182 Hz CPU: Pentium II/Pentium II Xeon/Celeron (400.91-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0x652 Stepping = 2 Features=0x183fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR> real memory = 134217728 (131072K bytes) avail memory = 125427712 (122488K bytes) Programming 24 pins in IOAPIC #0 IOAPIC #0 intpin 2 -> irq 0 IOAPIC #0 intpin 16 -> irq 10 IOAPIC #0 intpin 18 -> irq 9 IOAPIC #0 intpin 19 -> irq 11 FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs cpu0 (BSP): apic id: 0, version: 0x00040011, at 0xfee00000 cpu1 (AP): apic id: 1, version: 0x00040011, at 0xfee00000 io0 (APIC): apic id: 2, version: 0x00170011, at 0xfec00000 Pentium Pro MTRR support enabled npx0: <math processor> on motherboard npx0: INT 16 interface pcib0: <Intel 82443GX host to PCI bridge> at pcibus 0 on motherboard pci0: <PCI bus> on pcib0 agp0: <Intel 82443GX host to PCI bridge> mem 0xf4000000-0xf7ffffff at device 0.0 on pci0 pcib1: <PCI-PCI bridge> at device 1.0 on pci0 pci1: <PCI bus> on pcib1 pci1: <display, VGA> at device 0.0 (no driver attached) isab0: <PCI-ISA bridge> at device 7.0 on pci0 isa0: <ISA bus> on isab0 atapci0: <Intel PIIX4 ATA33 controller> port 0xffa0-0xffaf at device 7.1 on pci0 ata0: at 0x1f0 irq 14 on atapci0 ata1: at 0x170 irq 15 on atapci0 pci0: <serial bus, USB> at device 7.2 (no driver attached) intpm0: <Intel 82371AB Power management controller> port 0x440-0x44f irq 9 at device 7.3 on pci0 intpm0: I/O mapped 440 intpm0: intr IRQ 9 enabled revision 0 smbus0: <System Management Bus> on intsmb0 smb0: <SMBus general purpose I/O> on smbus0 intpm0: PM I/O mapped 400 pcib2: <PCI-PCI bridge> at device 16.0 on pci0 pci2: <PCI bus> on pcib2 fxp0: <Intel Pro 10/100B/100+ Ethernet> port 0xdf80-0xdf9f mem 0xfe600000-0xfe6fffff,0xf37ff000-0xf37fffff irq 9 at device 6.0 on pci2 fxp0: Ethernet address 00:90:27:14:c2:0d inphy0: <i82555 10/100 media interface> on miibus0 inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto isic0: <ELSA MicroLink ISDN/PCI> port 0xdff4-0xdff7,0xdc00-0xdc7f mem 0xfe4fff80-0xfe4 fffff irq 11 at device 7.0 on pci2 isic0: passive stack unit 0 fxp1: <Intel Pro 10/100B/100+ Ethernet> port 0xef40-0xef5f mem 0xfea00000-0xfeafffff,0xfb8ff000-0xfb8fffff irq 11 at device 17.0 on pci0 fxp1: Ethernet address 00:e0:81:10:27:2a inphy1: <i82555 10/100 media interface> on miibus1 inphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto ahc0: <Adaptec aic7895 Ultra SCSI adapter> port 0xe400-0xe4ff mem 0xfebfe000-0xfebfefff irq 10 at device 18.0 on pci0 aic7895C: Ultra Wide Channel A, SCSI Id=7, 32/255 SCBs ahc1: <Adaptec aic7895 Ultra SCSI adapter> port 0xe800-0xe8ff mem 0xfebff000-0xfebfffff irq 10 at device 18.1 on pci0 aic7895C: Ultra Wide Channel B, SCSI Id=7, 32/255 SCBs ata-: ata0 already exists, skipping it ata-: ata1 already exists, skipping it sc-: sc0 already exists, skipping it vga-: vga0 already exists, skipping it orm0: <Option ROMs> at iomem 0xc0000-0xc7fff,0xc8000-0xc87ff on isa0 atkbdc0: <Keyboard controller (i8042)> at port 0x60,0x64 on isa0 atkbd0: <AT Keyboard> flags 0x1 irq 1 on atkbdc0 psm0: <PS/2 Mouse> irq 12 on atkbdc0 psm0: model Generic PS/2 mouse, device ID 0 fdc0: <NEC 72065B or clone> at port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on isa0 fdc0: FIFO enabled, 8 bytes threshold fd0: <1440-KB 3.5" drive> on fdc0 drive 0 ppc0: <Parallel port> at port 0x378-0x37f irq 7 on isa0 ppc0: SMC-like chipset (ECP/EPP/PS2/NIBBLE) in COMPATIBLE mode ppc0: FIFO with 16/16/8 bytes threshold lpt0: <Printer> on ppbus0 lpt0: Interrupt-driven port sc0: <System console> at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0 sio0: type 16550A sio1 at port 0x2f8-0x2ff irq 3 on isa0 sio1: type 16550A vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 sbc0: <Creative ViBRA16X> at port 0x220-0x22f,0x330-0x331,0x388-0x38b irq 5 drq 1,3 on isa0 pcm0: <SB16 DSP 4.16 (ViBRA16X)> on sbc0 unknown: <PNP0303> can't assign resources unknown: <PNP0c02> can't assign resources unknown: <PNP0501> can't assign resources unknown: <PNP0501> can't assign resources unknown: <PNP0401> can't assign resources unknown: <PNP0700> can't assign resources unknown: <PNP0f13> can't assign resources APIC_IO: Testing 8254 interrupt delivery APIC_IO: routing 8254 via IOAPIC #0 intpin 2 i4brbch: 4 raw B channel access device(s) attached i4bipr: 4 IP over raw HDLC ISDN device(s) attached (VJ header compression) i4bctl: ISDN system control port attached IPv6 packet filtering initialized, logging limited to 100 packets/entry i4btrc: 4 ISDN trace device(s) attached DUMMYNET initialized (011031) i4b: ISDN call control device attached IP packet filtering initialized, divert disabled, rule-based forwarding enabled, default to deny, logging limited to 100 packets/entry by default i4btel: 2 ISDN telephony interface device(s) attached IPsec: Initialized Security Association Processing. ad0: 12970MB <Maxtor 91360D8> [26353/16/63] at ata0-master UDMA33 Waiting 2 seconds for SCSI devices to settle Mounting root from ufs:/dev/ad0s1a SMP: AP CPU #1 Launched! IdlePTD 4964352 initial pcb at 3c7920 panicstr: from debugger panic messages: --- Fatal trap 12: page fault while in kernel mode cpuid = 1; lapic.id = 01000000 fault virtual address = 0x8 fault code = supervisor read, page not present instruction pointer = 0x8:0xc0226c3f stack pointer = 0x10:0xd25b9be4 frame pointer = 0x10:0xd25b9cf4 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 369 (nessusd) kernel trap 12 with interrupts disabled Fatal trap 12: page fault while in kernel mode cpuid = 1; lapic.id = 01000000 fault virtual address = 0xc0226c3 fault code = supervisor read, page not present instruction pointer = 0x8:0xc02e92f0 stack pointer = 0x10:0xd25b99bc frame pointer = 0x10:0xd25b99c0 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = resume, IOPL = 0 current process = 369 (nessusd) Fatal trap 12: page fault while in kernel mode cpuid = 1; lapic.id = 01000000 fault virtual address = 0x5708438b fault code = supervisor read, page not present instruction pointer = 0x8:0xc02e92f0 stack pointer = 0x10:0xd25b99bc frame pointer = 0x10:0xd25b99c0 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 369 (nessusd) (kgdb) where #0 dumpsys () at ../../../kern/kern_shutdown.c:492 #1 0xc01e8b7b in boot (howto=260) at ../../../kern/kern_shutdown.c:335 #2 0xc01e901d in panic (fmt=0xc031dfaa "from debugger") at ../../../kern/kern_shutdown.c:634 #3 0xc014679d in db_panic (addr=-1071485889, have_addr=0, count=-1, modif=0xd25b9a58 "") at ../../../ddb/db_command.c:452 #4 0xc014673b in db_command (last_cmdp=0xc036f798, cmd_table=0xc036f5b8, aux_cmd_tablep=0xc0365e6c, aux_cmd_tablep_end=0xc0365e70) at ../../../ddb/db_command.c:348 #5 0xc0146807 in db_command_loop () at ../../../ddb/db_command.c:474 #6 0xc0148ba3 in db_trap (type=12, code=0) at ../../../ddb/db_trap.c:72 #7 0xc02e91b2 in kdb_trap (type=12, code=0, regs=0xd25b9ba4) at ../../../i386/i386/db_interface.c:167 #8 0xc02fb6ec in trap_fatal (frame=0xd25b9ba4, eva=8) at ../../../i386/i386/trap.c:945 #9 0xc02fb435 in trap_pfault (frame=0xd25b9ba4, usermode=0, eva=8) at ../../../i386/i386/trap.c:864 #10 0xc02fae43 in trap (frame={tf_fs = 24, tf_es = 16, tf_ds = 16, tf_edi = -767445244, tf_esi = 88, tf_ebp = -765747980, tf_isp = -765748272, tf_ebx = 0, tf_edx = -1005341568, tf_ecx = 4, tf_eax = -764857728, tf_trapno = 12, tf_err = 0, tf_eip = -1071485889, tf_cs = 8, tf_eflags = 66118, tf_esp = -767445504, tf_ss = 2}) at ../../../i386/i386/trap.c:416 #11 0xc0226c3f in fstatfs4 (td=0xd241b704, uap=0xd25b9d20) at ../../../kern/vfs_syscalls.c:798 #12 0xc02fbb61 in syscall (frame={tf_fs = 134610991, tf_es = 47, tf_ds = -1078001617, tf_edi = 134656288, tf_esi = 136399488, tf_ebp = -1077938528, tf_isp = -765747852, tf_ebx = 672379112, tf_edx = 13, tf_ecx = 672455364, tf_eax = 158, tf_trapno = 0, tf_err = 2, tf_eip = 671985175, tf_cs = 31, tf_eflags = 663, tf_esp = -1077938956, tf_ss = 47}) at ../../../i386/i386/trap.c:1140 /sys/kern/vfs_syscalls: if ((error = getvnode(td->td_proc->p_fd, SCARG(uap, fd), &fp)) != 0) return (error); mp = ((struct vnode *)fp->f_data)->v_mount; 798: sp = &mp->mnt_stat; 0xc0226c24 <fstatfs+32>: call 0xc022a864 <getvnode> 0xc0226c29 <fstatfs+37>: add $0xc,%esp 0xc0226c2c <fstatfs+40>: test %eax,%eax 0xc0226c2e <fstatfs+42>: jne 0xc0226cae <fstatfs+170> 0xc0226c30 <fstatfs+44>: mov 0xfffffefc(%ebp),%eax 0xc0226c36 <fstatfs+50>: mov 0x30(%eax),%eax 0xc0226c39 <fstatfs+53>: mov 0x14(%eax),%ebx 0xc0226c3c <fstatfs+56>: lea 0x58(%ebx),%esi 0xc0226c3f <fstatfs+59>: mov 0x8(%ebx),%eax <== TRAP 12 0xc0226c42 <fstatfs+62>: push %edi 0xc0226c43 <fstatfs+63>: push %esi 0xc0226c44 <fstatfs+64>: push %ebx 0xc0226c45 <fstatfs+65>: mov 0x14(%eax),%eax 0xc0226c48 <fstatfs+68>: call *%eax (kgdb) info reg eax 0x0 0 ecx 0x0 0 edx 0x0 0 ebx 0x0 0 esp 0xd25b99d8 0xd25b99d8 ebp 0xd25b9cf4 0xd25b9cf4 esi 0x58 88 edi 0xd241b704 -767445244 eip 0xc02ea278 0xc02ea278 eflags 0x0 0 cs 0x0 0 ss 0x0 0 ds 0x0 0 es 0x0 0 fs 0x0 0 gs 0x2f 47 Fix: No idea. Locking issue (vnode information becomes stale while being accessed) ??? How-To-Repeat: Run nessus configured to scan 20 hosts at a time. Allow a few minutes for delivery of a trap 12 ;-)