Bug 23794

Summary: reboot after panic: backgroundwritedone: lost buffer
Product: Base System Reporter: Andre Albsmeier <Andre.Albsmeier>
Component: kernAssignee: freebsd-bugs (Nobody) <bugs>
Status: Closed FIXED    
Severity: Affects Only Me    
Priority: Normal    
Version: 4.2-STABLE   
Hardware: Any   
OS: Any   

Description Andre Albsmeier 2000-12-23 13:40:00 UTC
A single 400MB file was written to the 2GB jaz drive with tar. The jaz media
contains a single UFS filesystem using softupdates. As soon as the tar command
returned, the unmount command was issued in order to be able to eject the media.
The flushing of the buffers to the drive could be heard for about 1 second.
Then the machine started to dump and rebooted.

The crashdump didn't show a lot:

root@bali:/server/FreeBSD/crash/1>gdb -k kernel.debug vmcore.4 
GNU gdb 4.18
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-unknown-freebsd"...
IdlePTD 3489792
initial pcb at 2c9460
panicstr: backgroundwritedone: lost buffer
panic messages:
---
dmesg: kvm_read: invalid address (f0403000)
---
#0  dumpsys () at /src/src-4/sys/kern/kern_shutdown.c:469
469             if (dumping++) {
(kgdb) where
#0  dumpsys () at /src/src-4/sys/kern/kern_shutdown.c:469
#1  0x0 in ?? ()

Fix: 

unknown
How-To-Repeat: 
The machine runs rock solid otherwise. I have seen the problem happening
a few times now but only when running the procedure as written above. It
does not happen always.

I don't know if it is softupdates related. I will disable softupdates
to see if it helps.
Comment 1 iedowse 2000-12-24 02:33:47 UTC
In message <200012231336.eBNDaMp03117@curry.mchp.siemens.de>, Andre Albsmeier w
rites:
>>Synopsis:       reboot after panic: backgroundwritedone: lost buffer

This is interesting, since it relates to the discussions which
preceded revision 1.267 of vfs_bio.c. There was a piece of code in
brelse() that was supposed to ensure that buffers with associated
background writes would remain until the background write completed.

However the logic of this code had always been wrong so it never
functioned, and Matt removed it in vfs_bio.c revision 1.267 (and
in 4.x revision 1.242.2.3). This panic may indicate that this check
really does need to be added (correctly this time).

What is the version number of /usr/src/sys/kern/vfs_bio.c that you
are using?

I think to re-introduce the test for background writes we need
something like the patch below, though I haven't looked too carefully.

Ian


Index: vfs_bio.c
===================================================================
RCS file: /home/iedowse/CVS/src/sys/kern/vfs_bio.c,v
retrieving revision 1.268
diff -u -r1.268 vfs_bio.c
--- vfs_bio.c	2000/12/15 20:08:19	1.268
+++ vfs_bio.c	2000/12/23 14:54:44
@@ -1009,6 +1009,7 @@
 	 * background write.
 	 */
 	if ((bp->b_flags & B_VMIO)
+	    && !(bp->b_xflags & BX_BKGRDINPROG)
 	    && !(bp->b_vp->v_tag == VT_NFS &&
 		 !vn_isdisk(bp->b_vp, NULL) &&
 		 (bp->b_flags & B_DELWRI))
Comment 2 dillon 2000-12-24 10:26:23 UTC
:This is interesting, since it relates to the discussions which
:preceded revision 1.267 of vfs_bio.c. There was a piece of code in
:brelse() that was supposed to ensure that buffers with associated
:background writes would remain until the background write completed.
:
:However the logic of this code had always been wrong so it never
:functioned, and Matt removed it in vfs_bio.c revision 1.267 (and
:in 4.x revision 1.242.2.3). This panic may indicate that this check
:really does need to be added (correctly this time).

    First and foremost... Andre, do you have that crash dump still?  Do
    you have a debug kernel to go with it?  (or, if not that, the kernel
    binary the savecore program saved to /var/crash along with the dump)?

    If I could download that crash dump it would make things a whole lot
    easier to track down.

    -

    Ian's synopsis is essentially correct.   The particular panic reported
    is one Kirk already had in there, and is an indication of probable
    corruption due to improperly freeing a buffer.  I'll spend some time
    on sunday to go over the problem and your proposed patch in more detail.

					-Matt
Comment 3 Andre Albsmeier 2000-12-24 11:36:00 UTC
On Sun, 24-Dec-2000 at 02:33:47 +0000, Ian Dowse wrote:
> In message <200012231336.eBNDaMp03117@curry.mchp.siemens.de>, Andre Albsmeier w
> rites:
> >>Synopsis:       reboot after panic: backgroundwritedone: lost buffer
> 
> This is interesting, since it relates to the discussions which
> preceded revision 1.267 of vfs_bio.c. There was a piece of code in
> brelse() that was supposed to ensure that buffers with associated
> background writes would remain until the background write completed.
> 
> However the logic of this code had always been wrong so it never
> functioned, and Matt removed it in vfs_bio.c revision 1.267 (and
> in 4.x revision 1.242.2.3). This panic may indicate that this check
> really does need to be added (correctly this time).
> 
> What is the version number of /usr/src/sys/kern/vfs_bio.c that you
> are using?

The version I am using is 1.242.2.4 which is the newest on -STABLE>

> I think to re-introduce the test for background writes we need
> something like the patch below, though I haven't looked too carefully.

I can test it if you want. However, we will have to wait until
I am back at work on 27th.

Thanks,

	-Andre

> 
> Ian
> 
> 
> Index: vfs_bio.c
> ===================================================================
> RCS file: /home/iedowse/CVS/src/sys/kern/vfs_bio.c,v
> retrieving revision 1.268
> diff -u -r1.268 vfs_bio.c
> --- vfs_bio.c	2000/12/15 20:08:19	1.268
> +++ vfs_bio.c	2000/12/23 14:54:44
> @@ -1009,6 +1009,7 @@
>  	 * background write.
>  	 */
>  	if ((bp->b_flags & B_VMIO)
> +	    && !(bp->b_xflags & BX_BKGRDINPROG)
>  	    && !(bp->b_vp->v_tag == VT_NFS &&
>  		 !vn_isdisk(bp->b_vp, NULL) &&
>  		 (bp->b_flags & B_DELWRI))
Comment 4 Andre Albsmeier 2000-12-24 11:52:38 UTC
On Sun, 24-Dec-2000 at 02:26:23 -0800, Matt Dillon wrote:
> 
> :This is interesting, since it relates to the discussions which
> :preceded revision 1.267 of vfs_bio.c. There was a piece of code in
> :brelse() that was supposed to ensure that buffers with associated
> :background writes would remain until the background write completed.
> :
> :However the logic of this code had always been wrong so it never
> :functioned, and Matt removed it in vfs_bio.c revision 1.267 (and
> :in 4.x revision 1.242.2.3). This panic may indicate that this check
> :really does need to be added (correctly this time).
> 
>     First and foremost... Andre, do you have that crash dump still?  Do
>     you have a debug kernel to go with it?  (or, if not that, the kernel
>     binary the savecore program saved to /var/crash along with the dump)?
> 
>     If I could download that crash dump it would make things a whole lot
>     easier to track down.

Matt,

I have just sent you an email wehre you can find the files!

Thanks,

	-Andre

> 
>     -
> 
>     Ian's synopsis is essentially correct.   The particular panic reported
>     is one Kirk already had in there, and is an indication of probable
>     corruption due to improperly freeing a buffer.  I'll spend some time
>     on sunday to go over the problem and your proposed patch in more detail.
> 
> 					-Matt
Comment 5 Andre Albsmeier 2001-02-12 14:54:47 UTC
Please close this one, it is fixed...

	-Andre
Comment 6 dwmalone freebsd_committer freebsd_triage 2001-02-12 15:10:57 UTC
State Changed
From-To: open->closed

Closed at submitters request.