| Summary: | Crash with FIFOs (named pipes) and truncate() | ||
|---|---|---|---|
| Product: | Base System | Reporter: | Kirk Russell <kirk> |
| Component: | kern | Assignee: | Maxim Konovalov <maxim> |
| Status: | Closed FIXED | ||
| Severity: | Affects Only Me | ||
| Priority: | Normal | ||
| Version: | 6.1-RELEASE | ||
| Hardware: | Any | ||
| OS: | Any | ||
Copyright (c) 1992-2006 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD 6.1-RELEASE #0: Sun May 7 18:29:20 UTC 2006
root@ds10.freebie.xs4all.nl:/usr/obj/usr/src/sys/GENERIC
DEC AlphaServer 2100
AlphaServer 2100 5/300, 291MHz
8192 byte page size, 2 processors.
CPU: EV5 (21164) major=5 minor=5
OSF PAL rev: 0x4000100020116
real memory = 937353216 (893 MB)
avail memory = 909377536 (867 MB)
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
cpu0 (BSP): PAL ID: 0
cpu1 (AP): PAL ID: 1
t20: using interrupt type 0 on pci bus 0
t20: <T2 Core Logic chipset>
pcib0: <T2 PCI host bus adapter> on t20
pci0: <PCI bus> on pcib0
de0: <Digital 21040 Ethernet> port 0x10200-0x1027f mem 0x81009100-0x8100917f irq 2 at device 0.0 on pci0
de0: DEC 21040 [10Mb/s] pass 2.3
de0: Ethernet address: 08:00:2b:e2:8f:02
sym0: <810> port 0x10000-0x100ff mem 0x81009000-0x810090ff irq 1 at device 1.0 on pci0
sym0: No NVRAM, ID 7, Fast-10, SE, parity checking
sym0: [GIANT-LOCKED]
eisab0: <PCI-EISA bridge> at device 2.0 on pci0
eisa0: <EISA bus> on eisab0
mainboard0: <DEC2a01 (System Board)> on eisa0 slot 0
eisa0: unknown card CPQ3011 (0x0e113011) at slot 1
isa0: <ISA bus> on eisab0
ahc0: <Adaptec 2940 SCSI adapter> port 0x10100-0x101ff mem 0x81008000-0x81008fff irq 0 at device 6.0 on pci0
ahc0: [GIANT-LOCKED]
aic7870: Single Channel A, SCSI Id=7, 16/253 SCBs
de1: <Digital 21040 Ethernet> port 0x10280-0x102ff mem 0x81009180-0x810091ff irq 4 at device 7.0 on pci0
de1: DEC 21040 [10Mb/s] pass 2.3
de1: Ethernet address: 08:00:2b:e4:d2:7c
orm0: <ISA Option ROM> at iomem 0xc0000-0xc67ff on isa0
ata0 at port 0x1f0-0x1f7,0x3f6 irq 14 on isa0
ata1 at port 0x170-0x177,0x376 irq 15 on isa0
atkbdc0: <Keyboard controller (i8042)> at port 0x60,0x64 on isa0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
atkbd0: [GIANT-LOCKED]
fdc0: <Enhanced floppy controller> at port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on isa0
fdc0: [FAST]
fd0: <1440-KB 3.5" drive> on fdc0 drive 0
mcclock0: <MC146818A real time clock> at port 0x70-0x71 on isa0
ppc0: <Parallel port> at port 0x3bc-0x3c3 irq 7 on isa0
ppc0: Generic chipset (NIBBLE-only) in COMPATIBLE mode
ppbus0: <Parallel port bus> on ppc0
lpt0: <Printer> on ppbus0
lpt0: Polled port
ppi0: <Parallel I/O> on ppbus0
sc0: <System console> on isa0
sc0: VGA <16 virtual consoles, flags=0x0>
sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0
sio0: type 16550A, console
sio1: configured irq 3 not in bitmap of probed irqs 0
sio1: port may not be enabled
sio1 at port 0x2f8-0x2ff irq 3 flags 0x80 on isa0
sio1: type 16550A
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
Timecounter "i8254" frequency 1193182 Hz quality 0
Timecounters tick every 0.976 msec
Waiting 5 seconds for SCSI devices to settle
SMP: AP CPU #1 Launched!
cd0 at sym0 bus 0 target 5 lun 0
cd0: <DEC RRD43 (C) DEC 1084> Removable CD-ROM SCSI-2 device
cd0: 4.166MB/s transfers (4.166MHz, offset 8)
cd0: cd present [49632 x 512 byte records]
da0 at sym0 bus 0 target 0 lun 0
da0: <QUANTUM XP39100S LYK8> Fixed Direct Access SCSI-2 device
da0: 10.000MB/s transfers (10.000MHz, offset 8), Tagged Queueing Enabled
da0: 8682MB (17781520 512 byte sectors: 255H 63S/T 1106C)
Trying to mount root from ufs:/dev/da0a
On Sun, 28 May 2006, Kirk Russell wrote:
[... lots, but I inadvertently deleted the mail that I meant to reply to]
I have used used the following fixes in this area for many years. They
make truncate() on a fifo and some other file types always succeed
instead of wandering off into UFS_TRUNCATE() (which is always (?)
ffs_truncate()) and tending to cause panics there.
% Index: ufs_vnops.c
% ===================================================================
% RCS file: /home/ncvs/src/sys/ufs/ufs/ufs_vnops.c,v
% retrieving revision 1.239
% diff -u -2 -r1.239 ufs_vnops.c
% --- ufs_vnops.c 7 Apr 2004 03:47:20 -0000 1.239
% +++ ufs_vnops.c 15 Feb 2005 09:56:50 -0000
% @@ -542,7 +573,6 @@
% if (vap->va_size != VNOVAL) {
% /*
% - * Disallow write attempts on read-only filesystems;
% - * unless the file is a socket, fifo, or a block or
% - * character device resident on the filesystem.
% + * XXX most of this checking should be in callers instead
% + * of in N filesystems. The VDIR check mostly already is.
% */
% switch (vp->v_type) {
The comment didn't even echo the code, since writes are now also disallowed
for snapshots and block devices aren't supported. I moved it and expanded
it (except for not trying to list the file types in the "unless" case),
though I think comments that echo code are negatively useful.
% @@ -551,4 +581,15 @@
% case VLNK:
% case VREG:
% + /*
% + * Truncation should have an effect in these cases.
% + * Disallow it if the filesystem is read-only or
% + * the file is being snapshotted.
% + *
% + * XXX unfortunately the snapshot check can't be
% + * more global since we want to check other things
% + * first so as to return better error codes. But
% + * we miss several cases (file flags and ownership
% + * changes at least) by not doing a central check.
After writing this I learned that allowing changing the attributes of
snapshots for the "missed" cases is intentional. I haven't seen any
documentation or rationale of what can be changed. The strangest
restrictions are that certain chmod()s are not allowed and all utimes()
are not allowed. All chown()s are allowed, so the ctimes of snapshots
can be changed to the current time using a null chown().
% + */
% if (vp->v_mount->mnt_flag & MNT_RDONLY)
% return (EROFS);
% @@ -557,5 +598,17 @@
% break;
% default:
% - break;
% + /*
% + * According to POSIX, the result is unspecified
% + * for file types other than regular files,
% + * directories and shared memory objects. We
% + * don't support shared memory objects in the file
% + * system, and have dubious support for truncating
% + * symlinks (I think it implements panic(3) if
% + * DIAGNOSTIC is enabled.) Just ignore the request
% + * in other cases.
The support for the VLNK case is bogus. ffs_truncate() has a fair amount
of code to support this case, but if DIAGNOSTIC is configured the
ffs_truncate() just panics for symlinks. This patch doesn't change that.
However, I think the VLNK case is unreachable even here. truncate(2)
follows links so it is impossible to truncate(2) a symlink, and open(2)
follows symlinks so it is impossible to ftruncate(2) a symlink. So only
the kernel could call here for a symlink, and I think it doesn't.
% + *
% + * XXX update the man page.
% + */
% + return (0);
This return makes truncate() on a fifo do nothing.
% }
% if ((error = UFS_TRUNCATE(vp, vap->va_size, IO_NORMAL,
All this should be handled using vnops in individual file systems and no
comments here.
Bruce
Hi Bruce,
> I have used used the following fixes in this area for many years. They
> make truncate() on a fifo and some other file types always succeed
> instead of wandering off into UFS_TRUNCATE() (which is always (?)
> ffs_truncate()) and tending to cause panics there.
[...]
Why doesn't RELENG_4 suffer from this? The code of ufs_setattr() is
very similar there.
--
Maxim Konovalov
On Mon, 29 May 2006, Maxim Konovalov wrote: >> I have used used the following fixes in this area for many years. They >> make truncate() on a fifo and some other file types always succeed >> instead of wandering off into UFS_TRUNCATE() (which is always (?) >> ffs_truncate()) and tending to cause panics there. > [...] > > Why doesn't RELENG_4 suffer from this? The code of ufs_setattr() is > very similar there. Hmm, my fixes are for ~5.2 and they may be unnecessary there too. I don't remember noticing this particular problem. Perhaps some changes in -current resulted in ffs_update() doing more and happening to do something bad. Unfortunately, the PR doesn't contain much debugging info so it isn't clear that the problem is in ffs_update(). I just found another problem. ffs_truncate() panics for read-only file systems. The unpatched ufs_setattr() only returns early for read-only file systems in the VREG, VLNK and VDIR cases. So ffs_truncate() on a fifo on a read-only file system can apparently very easily panic. The panic in ffs_truncate() is strangely placed after doing some things, both null things and things that attempt to write to the file system. These things are mainly: - handling extended attributes. I don't understand the details. - the !IO_NORMAL case (whatever that is). Returns success without checking for read-only. - new size too large. Returns failure early. - a subcase of the VLNK case. Attempts to write even to read-only file systems. Such writes are handled bogusly at lower levels and hopefully don't reach the disks (I use printfs to detect these and others. Most involve setting file times for read-only file systems or directives to set file times that aren't handled properly on changing from rw to ro. Here there is a file size change so not writing to disk might cause more serious problems). Returns early. In my previous reply I said that the VLNK case is unreachable and it panics if DIAGNOSTIC is configured. Actually, only a subcase of the subcase panics. - null truncates (ones which don't change the size). These are not quite null. They attempt to set the file times even for read-only file systems. Such writes are handled bogusly as above. Returns early. If ffs_update() gets through these things without returning, it always panics in the read-only case. The test program truncates to a nonzero length, presumably to get past the null truncate case, and its panic is not for the read-only case so it must be getting past the panic for that too. Bruce On Tue, 30 May 2006, 00:04+1000, Bruce Evans wrote: > On Mon, 29 May 2006, Maxim Konovalov wrote: > > > > I have used used the following fixes in this area for many years. They > > > make truncate() on a fifo and some other file types always succeed > > > instead of wandering off into UFS_TRUNCATE() (which is always (?) > > > ffs_truncate()) and tending to cause panics there. > > [...] > > > > Why doesn't RELENG_4 suffer from this? The code of ufs_setattr() is > > very similar there. > > Hmm, my fixes are for ~5.2 and they may be unnecessary there too. I don't > remember noticing this particular problem. Perhaps some changes in -current > resulted in ffs_update() doing more and happening to do something bad. > Unfortunately, the PR doesn't contain much debugging info so it isn't > clear that the problem is in ffs_update(). Here is a backtrace: Unread portion of the kernel message buffer: No strategy for buffer at 0xcc6a8b70 vnode 0xc2c6c514: tag ufs, type VFIFO usecount 1, writecount 0, refcount 2 mountedhere 0 flags () lock type ufs: EXCL (count 1) by thread 0xc28ca6c0 (pid 1039) ino 141854, on dev ad0s1e Fatal trap 12: page fault while in kernel mode fault virtual address = 0xc fault code = supervisor read, page not present instruction pointer = 0x20:0xc047cf95 stack pointer = 0x28:0xd565f798 frame pointer = 0x28:0xd565f798 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 1039 (q) panic: from debugger Uptime: 31s Physical memory: 494 MB Dumping 48 MB: 33 17 1 #0 doadump () at pcpu.h:166 166 __asm __volatile("movl %%fs:0,%0" : "=r" (td)); gdb% bt #0 doadump () at pcpu.h:166 #1 0xc04b4de4 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:409 #2 0xc04b508f in panic (fmt=0xc05e237c "from debugger") at /usr/src/sys/kern/kern_shutdown.c:565 #3 0xc0444f25 in db_panic (addr=-1069035627, have_addr=0, count=-1, modif=0xd565f578 "") at /usr/src/sys/ddb/db_command.c:426 #4 0xc0444ebc in db_command (last_cmdp=0xc0640704, cmd_table=0x0) at /usr/src/sys/ddb/db_command.c:395 #5 0xc0444f7a in db_command_loop () at /usr/src/sys/ddb/db_command.c:446 #6 0xc0446b91 in db_trap (type=12, code=0) at /usr/src/sys/ddb/db_main.c:221 #7 0xc04cfb35 in kdb_trap (type=12, code=0, tf=0x0) at /usr/src/sys/kern/subr_kdb.c:481 #8 0xc05bfa14 in trap_fatal (frame=0xd565f758, eva=12) at /usr/src/sys/i386/i386/trap.c:861 #9 0xc05bf77f in trap_pfault (frame=0xd565f758, usermode=0, eva=12) at /usr/src/sys/i386/i386/trap.c:778 #10 0xc05bf3bd in trap (frame= {tf_fs = -1027211256, tf_es = 65576, tf_ds = -714801112, tf_edi = 67584, tf_esi = -1027160812, tf_ebp = -714737768, tf_isp = -714737788, tf_ebx = -1027336392, tf_edx = 0, tf_ecx = -1056878592, tf_eax = 0, tf_trapno = 12, tf_err = 0, tf_eip = -1069035627, tf_cs = 32, tf_eflags = 590406, tf_esp = -714737748, tf_ss = -1068030820}) at /usr/src/sys/i386/i386/trap.c:463 #11 0xc05b035a in calltrap () at /usr/src/sys/i386/i386/exception.s:138 ---Type <return> to continue, or q <return> to quit--- #12 0xc047cf95 in fifo_printinfo (vp=0x0) at /usr/src/sys/fs/fifofs/fifo_vnops.c:448 #13 0xc057249c in ufs_print (ap=0x0) at /usr/src/sys/ufs/ufs/ufs_vnops.c:1965 #14 0xc05d1c70 in VOP_PRINT_APV (vop=0x0, a=0xd565f7cc) at vnode_if.c:1899 #15 0xc050cafa in vn_printf (vp=0xc2c6c514, fmt=0xc05e2816 "%s\n") at vnode_if.h:971 #16 0xc0502a4e in vop_nostrategy (ap=0xd565f8a8) at /usr/src/sys/kern/vfs_default.c:195 #17 0xc05d1b05 in VOP_STRATEGY_APV (vop=0xc062a280, a=0xd565f8a8) at vnode_if.c:1797 #18 0xc0568920 in ffsext_strategy (ap=0xd565f8a8) at /usr/src/sys/ufs/ffs/ffs_vnops.c:1291 #19 0xc05d1b05 in VOP_STRATEGY_APV (vop=0xc0631520, a=0xd565f8a8) at vnode_if.c:1797 #20 0xc04ff6fd in bufstrategy (bo=0x0, bp=0xcc6a8b70) at vnode_if.h:928 #21 0xc04fa81e in bufwrite (bp=0xcc6a8b70) at buf.h:419 #22 0xc04fae51 in bawrite (bp=0x0) at buf.h:405 #23 0xc0552581 in ffs_truncate (vp=0xc2c6c514, length=16000, flags=67584, cred=0xc2be3e80, td=0xc28ca6c0) at /usr/src/sys/ufs/ffs/ffs_inode.c:304 #24 0xc056ffe9 in ufs_setattr (ap=0x0) at /usr/src/sys/ufs/ufs/ufs_vnops.c:532 #25 0xc05d0ae6 in VOP_SETATTR_APV (vop=0x0, a=0xd565fb3c) at vnode_if.c:586 #26 0xc0513195 in kern_truncate (td=0xc28ca6c0, path=0x0, pathseg=UIO_USERSPACE, length=16000) at vnode_if.h:314 ---Type <return> to continue, or q <return> to quit--- #27 0xc0512fac in truncate (td=0xc28ca6c0, uap=0x0) at /usr/src/sys/kern/vfs_syscalls.c:3018 #28 0xc05bfd2a in syscall (frame= {tf_fs = 59, tf_es = 59, tf_ds = 59, tf_edi = -1077941208, tf_esi = -1077941216, tf_ebp = -1077941336, tf_isp = -714736284, tf_ebx = 672492728, tf_edx = 16, tf_ecx = 2, tf_eax = 198, tf_trapno = 12, tf_err = 2, tf_eip = 672423415, tf_cs = 51, tf_eflags = 534, tf_esp = -1077941380, tf_ss = 59}) at /usr/src/sys/i386/i386/trap.c:1016 #29 0xc05b03af in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:191 #30 0x00000033 in ?? () Previous frame inner to this frame (corrupt stack?) gdb% -- Maxim Konovalov On Mon, 29 May 2006, Maxim Konovalov wrote: > On Tue, 30 May 2006, 00:04+1000, Bruce Evans wrote: > > > On Mon, 29 May 2006, Maxim Konovalov wrote: > > > Why doesn't RELENG_4 suffer from this? The code of ufs_setattr() is > > > very similar there. > > > > Hmm, my fixes are for ~5.2 and they may be unnecessary there too. I don't > > remember noticing this particular problem. Perhaps some changes in -current > > resulted in ffs_update() doing more and happening to do something bad. > > Unfortunately, the PR doesn't contain much debugging info so it isn't > > clear that the problem is in ffs_update(). > > Here is a backtrace: > > Unread portion of the kernel message buffer: > No strategy for buffer at 0xcc6a8b70 > vnode > 0xc2c6c514: tag ufs, type VFIFO > usecount 1, writecount 0, refcount 2 mountedhere 0 > flags () > lock type ufs: EXCL (count 1) by thread 0xc28ca6c0 (pid 1039) > ino 141854, on dev ad0s1e > ... > #10 0xc05bf3bd in trap (frame= > {tf_fs = -1027211256, tf_es = 65576, tf_ds = -714801112, tf_edi = 67584, tf_esi = -1027160812, tf_ebp = -714737768, tf_isp = -714737788, tf_ebx = -1027336392, tf_edx = 0, tf_ecx = -1056878592, tf_eax = 0, tf_trapno = 12, tf_err = 0, tf_eip = -1069035627, tf_cs = 32, tf_eflags = 590406, tf_esp = -714737748, tf_ss = -1068030820}) at /usr/src/sys/i386/i386/trap.c:463 > #11 0xc05b035a in calltrap () at /usr/src/sys/i386/i386/exception.s:138 > ---Type <return> to continue, or q <return> to quit--- > #12 0xc047cf95 in fifo_printinfo (vp=0x0) > at /usr/src/sys/fs/fifofs/fifo_vnops.c:448 > #13 0xc057249c in ufs_print (ap=0x0) at /usr/src/sys/ufs/ufs/ufs_vnops.c:1965 > #14 0xc05d1c70 in VOP_PRINT_APV (vop=0x0, a=0xd565f7cc) at vnode_if.c:1899 > #15 0xc050cafa in vn_printf (vp=0xc2c6c514, fmt=0xc05e2816 "%s\n") > at vnode_if.h:971 > #16 0xc0502a4e in vop_nostrategy (ap=0xd565f8a8) > at /usr/src/sys/kern/vfs_default.c:195 It has paniced in the debugging code in vop_nostrategy(). That code is apparently broken. vop_nostrategy() starts with a printf() that works. It then calls vprint() which panics. ap=0x0 in ufs_print and vp=0x0 in fifo_printinfo() seem to have been corrupted between printing some info about the fifo and panicing. Instead of panicing, vprint() is supposed to return. Then vop_nostrategy() is supposed to do nothing except fail the i/o. > #17 0xc05d1b05 in VOP_STRATEGY_APV (vop=0xc062a280, a=0xd565f8a8) > at vnode_if.c:1797 > #18 0xc0568920 in ffsext_strategy (ap=0xd565f8a8) > at /usr/src/sys/ufs/ffs/ffs_vnops.c:1291 > #19 0xc05d1b05 in VOP_STRATEGY_APV (vop=0xc0631520, a=0xd565f8a8) > at vnode_if.c:1797 > #20 0xc04ff6fd in bufstrategy (bo=0x0, bp=0xcc6a8b70) at vnode_if.h:928 > #21 0xc04fa81e in bufwrite (bp=0xcc6a8b70) at buf.h:419 ffs_truncate() has done bad things to reach here. It has reached the "Lengthen the size of the file" code and succeeded in allocating some blocks, despite the fifo file type not being capable of having any blocks. Perhaps the allocation works because the blocks are virtual and no indirect blocks are required. The test program only allocates 16K IIRC, and that many virtual blocks are represented as 0's in the inode alone. A larger extension might cause more problems by allocating indirect blocks. Now ffs_truncate() is trying to write the buffer for which it has just allocated blocks. The write is done via vop_nostrategy() which should just fail. I suppose the write would be null for the case of all blocks virtual if it were done by a non-null strategy routine. ffs_truncate() continues to do bad things by not checking for errors in the write even for the IO_SYNC case where the write function returns an error. ffs_truncate() has committed to the write succeeding, and has updated the size in the in-core inode. > #22 0xc04fae51 in bawrite (bp=0x0) at buf.h:405 > #23 0xc0552581 in ffs_truncate (vp=0xc2c6c514, length=16000, flags=67584, > cred=0xc2be3e80, td=0xc28ca6c0) at /usr/src/sys/ufs/ffs/ffs_inode.c:304 > #24 0xc056ffe9 in ufs_setattr (ap=0x0) at /usr/src/sys/ufs/ufs/ufs_vnops.c:532 > #25 0xc05d0ae6 in VOP_SETATTR_APV (vop=0x0, a=0xd565fb3c) at vnode_if.c:586 > #26 0xc0513195 in kern_truncate (td=0xc28ca6c0, path=0x0, > pathseg=UIO_USERSPACE, length=16000) at vnode_if.h:314 > ---Type <return> to continue, or q <return> to quit--- > #27 0xc0512fac in truncate (td=0xc28ca6c0, uap=0x0) > at /usr/src/sys/kern/vfs_syscalls.c:3018 This may explain why RELENG_4 doesn't panic. The ufs and ffs layers might do the same bad things without anyone noticing if the debugging code doesn't panic or if the wrong strategy routine is used. You can tell if the wrong strategy routine is used by checking for debugging output. vop_strategy() hasn't changed much, but the plumbing for strategy routines has. Larger extensions might cause more obvious problems in RELENG_4 too. If I'm right, then the bugs would also show up as fifos with a nonzero size (st_size in struct stat?) and maybe as allocated unwriteable physical blocks for sizes that require indirect blocks. Anyway, the fix is to return (0) long before reaching "Lengthen the size of the file" in ffs_truncate(). Bruce [...] > > #12 0xc047cf95 in fifo_printinfo (vp=0x0) > > at /usr/src/sys/fs/fifofs/fifo_vnops.c:448 > > #13 0xc057249c in ufs_print (ap=0x0) at > > #/usr/src/sys/ufs/ufs/ufs_vnops.c:1965 > > #14 0xc05d1c70 in VOP_PRINT_APV (vop=0x0, a=0xd565f7cc) at vnode_if.c:1899 > > #15 0xc050cafa in vn_printf (vp=0xc2c6c514, fmt=0xc05e2816 "%s\n") > > at vnode_if.h:971 > > #16 0xc0502a4e in vop_nostrategy (ap=0xd565f8a8) > > at /usr/src/sys/kern/vfs_default.c:195 > > It has paniced in the debugging code in vop_nostrategy(). That code > is apparently broken. vop_nostrategy() starts with a printf() that > works. It then calls vprint() which panics. ap=0x0 in ufs_print and > vp=0x0 in fifo_printinfo() seem to have been corrupted between printing > some info about the fifo and panicing. Instead of panicing, vprint() > is supposed to return. Then vop_nostrategy() is supposed to do nothing > except fail the i/o. It appears this stack trace is not correct. Actially it panics at fifo_printinfo() when tries to dereference vp->v_fifoinfo. I added a simple check (unneeded in normal situation I believe) and as expected panic moved to VOP_STRATEGY assert. --- fifo_vnops.c 15 Mar 2006 10:15:35 -0000 1.134 +++ fifo_vnops.c 30 May 2006 20:09:26 -0000 @@ -449,4 +449,5 @@ fifo_printinfo(vp) - printf(", fifo with %ld readers and %ld writers", - fip->fi_readers, fip->fi_writers); + if (fip != NULL) + printf(", fifo with %ld readers and %ld writers", + fip->fi_readers, fip->fi_writers); return (0); My traces show namei(&nd) in kern_truncate() returns nd with ni_vp->v_fifoinfo = 0x0, ni_vp->v_type = VFIFO. That is why kernel panics in fifo_printinfo() later. But how can it happen? For the record: for non-INVARIANT kernel with the patch above a test case gave: $ ls -l afifo prw-r--r-- 1 maxim maxim 16000 31 ÍÁÊ 01:08 afifo as you predicted :-) -- Maxim Konovalov On Wed, 31 May 2006, Maxim Konovalov wrote: > [...] >> It has paniced in the debugging code in vop_nostrategy(). That code >> is apparently broken. vop_nostrategy() starts with a printf() that >> works. It then calls vprint() which panics. ap=0x0 in ufs_print and >> vp=0x0 in fifo_printinfo() seem to have been corrupted between printing >> some info about the fifo and panicing. Instead of panicing, vprint() >> is supposed to return. Then vop_nostrategy() is supposed to do nothing >> except fail the i/o. > > It appears this stack trace is not correct. Actially it panics at > fifo_printinfo() when tries to dereference vp->v_fifoinfo. I added a > simple check (unneeded in normal situation I believe) and as expected > panic moved to VOP_STRATEGY assert. > > --- fifo_vnops.c 15 Mar 2006 10:15:35 -0000 1.134 > +++ fifo_vnops.c 30 May 2006 20:09:26 -0000 > @@ -449,4 +449,5 @@ fifo_printinfo(vp) > > - printf(", fifo with %ld readers and %ld writers", > - fip->fi_readers, fip->fi_writers); > + if (fip != NULL) > + printf(", fifo with %ld readers and %ld writers", > + fip->fi_readers, fip->fi_writers); > return (0); > > My traces show namei(&nd) in kern_truncate() returns nd with > ni_vp->v_fifoinfo = 0x0, ni_vp->v_type = VFIFO. That is why kernel > panics in fifo_printinfo() later. But how can it happen? fip is always null for non-open fifos, and truncate(2) doesn't open the file. Change the printing to print just ", fifo" or ", closed fifo" in the fip == NULL case. I would have expected RELENG_4 to panic here too. Maybe the vnop plumbing is broken in RELENG_4, resulting in fifo_print() not being called. Bruce On Wed, 31 May 2006, 23:06+1000, Bruce Evans wrote: > On Wed, 31 May 2006, Maxim Konovalov wrote: > > > [...] > > > It has paniced in the debugging code in vop_nostrategy(). That code > > > is apparently broken. vop_nostrategy() starts with a printf() that > > > works. It then calls vprint() which panics. ap=0x0 in ufs_print and > > > vp=0x0 in fifo_printinfo() seem to have been corrupted between printing > > > some info about the fifo and panicing. Instead of panicing, vprint() > > > is supposed to return. Then vop_nostrategy() is supposed to do nothing > > > except fail the i/o. > > > > It appears this stack trace is not correct. Actially it panics at > > fifo_printinfo() when tries to dereference vp->v_fifoinfo. I added a > > simple check (unneeded in normal situation I believe) and as expected > > panic moved to VOP_STRATEGY assert. > > > > --- fifo_vnops.c 15 Mar 2006 10:15:35 -0000 1.134 > > +++ fifo_vnops.c 30 May 2006 20:09:26 -0000 > > @@ -449,4 +449,5 @@ fifo_printinfo(vp) > > > > - printf(", fifo with %ld readers and %ld writers", > > - fip->fi_readers, fip->fi_writers); > > + if (fip != NULL) > > + printf(", fifo with %ld readers and %ld writers", > > + fip->fi_readers, fip->fi_writers); > > return (0); > > > > My traces show namei(&nd) in kern_truncate() returns nd with > > ni_vp->v_fifoinfo = 0x0, ni_vp->v_type = VFIFO. That is why kernel > > panics in fifo_printinfo() later. But how can it happen? > > fip is always null for non-open fifos, and truncate(2) doesn't open > the file. > > Change the printing to print just ", fifo" or ", closed fifo" in the > fip == NULL case. Oh my. Now I can't imagine why I miss so simple explanation. > I would have expected RELENG_4 to panic here too. Maybe the vnop plumbing > is broken in RELENG_4, resulting in fifo_print() not being called. Thanks, Bruce! Your help is much appreciated. -- Maxim Konovalov State Changed From-To: open->patched Fixed in HEAD. Thanks for the test case! Responsible Changed From-To: freebsd-bugs->maxim Feedback trap. State Changed From-To: patched->closed Merged to RELENG_6. |
I am guessing you can reproduce this issue on i386, so it should be a generic kernel issue and not specific to the alpha. When you call truncate() on a fifo, the kernel will crash. How-To-Repeat: $ cat bstg0001.c #include <unistd.h> #include <err.h> #include <errno.h> #include <fcntl.h> #include <string.h> #include <sys/stat.h> int main() { unlink("afifo"); if (mkfifo("afifo", 0666) == -1) { errx(1, "%s: %s", "mkfifo", strerror(errno)); } truncate("afifo", 16000); return 0; } $ cc -Wall bstg0001.c $ ./a.out No strategy for buffer at 0xfffffe000dbe8e28 vnode 0xfffffc0027a001f0: tag ufs, type VFIFO usecount 1, writecount 0, refcount 2 mountedhere 0 flags () lock type ufs: EXCL (count 1) by thread 0xfffffc002aaaafc0 (pid 516) ino 306188, on dev da0f fatal kernel trap: trap entry = 0x2 (memory management fault) cpuid = 1 faulting va = 0x10 type = access violation cause = load instructon pc = 0xfffffc00004b6af8 ra = 0xfffffc00007241cc sp = 0xfffffe00188e95d0 usp = 0x11ffec20 curthread = 0xfffffc002aaaafc0 pid = 516, comm = a.out panic: trap cpuid = 1 Uptime: 4m47s Cannot dump. No dump device defined. Automatic reboot in 15 seconds - press a key on the console to abort