| Summary: | reboot after panic: backgroundwritedone: lost buffer | ||
|---|---|---|---|
| Product: | Base System | Reporter: | Andre Albsmeier <Andre.Albsmeier> |
| Component: | kern | Assignee: | freebsd-bugs (Nobody) <bugs> |
| Status: | Closed FIXED | ||
| Severity: | Affects Only Me | ||
| Priority: | Normal | ||
| Version: | 4.2-STABLE | ||
| Hardware: | Any | ||
| OS: | Any | ||
In message <200012231336.eBNDaMp03117@curry.mchp.siemens.de>, Andre Albsmeier w rites: >>Synopsis: reboot after panic: backgroundwritedone: lost buffer This is interesting, since it relates to the discussions which preceded revision 1.267 of vfs_bio.c. There was a piece of code in brelse() that was supposed to ensure that buffers with associated background writes would remain until the background write completed. However the logic of this code had always been wrong so it never functioned, and Matt removed it in vfs_bio.c revision 1.267 (and in 4.x revision 1.242.2.3). This panic may indicate that this check really does need to be added (correctly this time). What is the version number of /usr/src/sys/kern/vfs_bio.c that you are using? I think to re-introduce the test for background writes we need something like the patch below, though I haven't looked too carefully. Ian Index: vfs_bio.c =================================================================== RCS file: /home/iedowse/CVS/src/sys/kern/vfs_bio.c,v retrieving revision 1.268 diff -u -r1.268 vfs_bio.c --- vfs_bio.c 2000/12/15 20:08:19 1.268 +++ vfs_bio.c 2000/12/23 14:54:44 @@ -1009,6 +1009,7 @@ * background write. */ if ((bp->b_flags & B_VMIO) + && !(bp->b_xflags & BX_BKGRDINPROG) && !(bp->b_vp->v_tag == VT_NFS && !vn_isdisk(bp->b_vp, NULL) && (bp->b_flags & B_DELWRI)) :This is interesting, since it relates to the discussions which
:preceded revision 1.267 of vfs_bio.c. There was a piece of code in
:brelse() that was supposed to ensure that buffers with associated
:background writes would remain until the background write completed.
:
:However the logic of this code had always been wrong so it never
:functioned, and Matt removed it in vfs_bio.c revision 1.267 (and
:in 4.x revision 1.242.2.3). This panic may indicate that this check
:really does need to be added (correctly this time).
First and foremost... Andre, do you have that crash dump still? Do
you have a debug kernel to go with it? (or, if not that, the kernel
binary the savecore program saved to /var/crash along with the dump)?
If I could download that crash dump it would make things a whole lot
easier to track down.
-
Ian's synopsis is essentially correct. The particular panic reported
is one Kirk already had in there, and is an indication of probable
corruption due to improperly freeing a buffer. I'll spend some time
on sunday to go over the problem and your proposed patch in more detail.
-Matt
On Sun, 24-Dec-2000 at 02:33:47 +0000, Ian Dowse wrote: > In message <200012231336.eBNDaMp03117@curry.mchp.siemens.de>, Andre Albsmeier w > rites: > >>Synopsis: reboot after panic: backgroundwritedone: lost buffer > > This is interesting, since it relates to the discussions which > preceded revision 1.267 of vfs_bio.c. There was a piece of code in > brelse() that was supposed to ensure that buffers with associated > background writes would remain until the background write completed. > > However the logic of this code had always been wrong so it never > functioned, and Matt removed it in vfs_bio.c revision 1.267 (and > in 4.x revision 1.242.2.3). This panic may indicate that this check > really does need to be added (correctly this time). > > What is the version number of /usr/src/sys/kern/vfs_bio.c that you > are using? The version I am using is 1.242.2.4 which is the newest on -STABLE> > I think to re-introduce the test for background writes we need > something like the patch below, though I haven't looked too carefully. I can test it if you want. However, we will have to wait until I am back at work on 27th. Thanks, -Andre > > Ian > > > Index: vfs_bio.c > =================================================================== > RCS file: /home/iedowse/CVS/src/sys/kern/vfs_bio.c,v > retrieving revision 1.268 > diff -u -r1.268 vfs_bio.c > --- vfs_bio.c 2000/12/15 20:08:19 1.268 > +++ vfs_bio.c 2000/12/23 14:54:44 > @@ -1009,6 +1009,7 @@ > * background write. > */ > if ((bp->b_flags & B_VMIO) > + && !(bp->b_xflags & BX_BKGRDINPROG) > && !(bp->b_vp->v_tag == VT_NFS && > !vn_isdisk(bp->b_vp, NULL) && > (bp->b_flags & B_DELWRI)) On Sun, 24-Dec-2000 at 02:26:23 -0800, Matt Dillon wrote: > > :This is interesting, since it relates to the discussions which > :preceded revision 1.267 of vfs_bio.c. There was a piece of code in > :brelse() that was supposed to ensure that buffers with associated > :background writes would remain until the background write completed. > : > :However the logic of this code had always been wrong so it never > :functioned, and Matt removed it in vfs_bio.c revision 1.267 (and > :in 4.x revision 1.242.2.3). This panic may indicate that this check > :really does need to be added (correctly this time). > > First and foremost... Andre, do you have that crash dump still? Do > you have a debug kernel to go with it? (or, if not that, the kernel > binary the savecore program saved to /var/crash along with the dump)? > > If I could download that crash dump it would make things a whole lot > easier to track down. Matt, I have just sent you an email wehre you can find the files! Thanks, -Andre > > - > > Ian's synopsis is essentially correct. The particular panic reported > is one Kirk already had in there, and is an indication of probable > corruption due to improperly freeing a buffer. I'll spend some time > on sunday to go over the problem and your proposed patch in more detail. > > -Matt Please close this one, it is fixed... -Andre State Changed From-To: open->closed Closed at submitters request. |
A single 400MB file was written to the 2GB jaz drive with tar. The jaz media contains a single UFS filesystem using softupdates. As soon as the tar command returned, the unmount command was issued in order to be able to eject the media. The flushing of the buffers to the drive could be heard for about 1 second. Then the machine started to dump and rebooted. The crashdump didn't show a lot: root@bali:/server/FreeBSD/crash/1>gdb -k kernel.debug vmcore.4 GNU gdb 4.18 Copyright 1998 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-unknown-freebsd"... IdlePTD 3489792 initial pcb at 2c9460 panicstr: backgroundwritedone: lost buffer panic messages: --- dmesg: kvm_read: invalid address (f0403000) --- #0 dumpsys () at /src/src-4/sys/kern/kern_shutdown.c:469 469 if (dumping++) { (kgdb) where #0 dumpsys () at /src/src-4/sys/kern/kern_shutdown.c:469 #1 0x0 in ?? () Fix: unknown How-To-Repeat: The machine runs rock solid otherwise. I have seen the problem happening a few times now but only when running the procedure as written above. It does not happen always. I don't know if it is softupdates related. I will disable softupdates to see if it helps.