Bug 33941

Summary: /usr/sbin/dev_mkdb dumps core
Product: Base System Reporter: Ryan Dooley <dooleyr>
Component: binAssignee: Yar Tikhiy <yar>
Status: Closed FIXED    
Severity: Affects Only Me    
Priority: Normal    
Version: Unspecified   
Hardware: Any   
OS: Any   

Description Ryan Dooley 2002-01-16 15:00:01 UTC
	Dell Optiplex GX1
	FreeBSD 4.5-RC (cvsup'd 15-Jan-2002)
	
	
	dev_mkdb dumps core when run

How-To-Repeat: 	Can't.  I've got two other -STABLE machines which don't exhibit
        the same behavior. 

From GDB: dev_mkdb was compiled with -g here.

Program received signal SIGSEGV, Segmentation fault.
0x280cc8df in __free_ovflpage () from /usr/lib/libc.so.4
(gdb) where
#0  0x280cc8df in __free_ovflpage () from /usr/lib/libc.so.4
#1  0x280ccf9f in __big_delete () from /usr/lib/libc.so.4
#2  0x280cb8e5 in __delpair () from /usr/lib/libc.so.4
#3  0x280ce5c8 in __hash_open () from /usr/lib/libc.so.4
#4  0x280ce284 in __hash_open () from /usr/lib/libc.so.4
#5  0x8048a10 in main (argc=1, argv=0xbfbffc28)
    at /usr/src/usr.sbin/dev_mkdb/dev_mkdb.c:153
#6  0x8048739 in _start ()
Comment 1 ru freebsd_committer freebsd_triage 2002-01-16 16:04:04 UTC
On Wed, Jan 16, 2002 at 08:52:41AM -0600, Ryan Dooley wrote:
> 
> >From GDB: dev_mkdb was compiled with -g here.
> 
> Program received signal SIGSEGV, Segmentation fault.
> 0x280cc8df in __free_ovflpage () from /usr/lib/libc.so.4
> (gdb) where
> #0  0x280cc8df in __free_ovflpage () from /usr/lib/libc.so.4
> #1  0x280ccf9f in __big_delete () from /usr/lib/libc.so.4
> #2  0x280cb8e5 in __delpair () from /usr/lib/libc.so.4
> #3  0x280ce5c8 in __hash_open () from /usr/lib/libc.so.4
> #4  0x280ce284 in __hash_open () from /usr/lib/libc.so.4
> #5  0x8048a10 in main (argc=1, argv=0xbfbffc28)
>     at /usr/src/usr.sbin/dev_mkdb/dev_mkdb.c:153
> #6  0x8048739 in _start ()
> 
Try compiling a debugging version of libc and linking dev_mkdb
statically with it.  You can run a stripped down version of it,
and the dump could still be used with the unstripped version
for post mortem analysis.


Cheers,
-- 
Ruslan Ermilov		Oracle Developer/DBA,
ru@sunbay.com		Sunbay Software AG,
ru@FreeBSD.org		FreeBSD committer,
+380.652.512.251	Simferopol, Ukraine

http://www.FreeBSD.org	The Power To Serve
http://www.oracle.com	Enabling The Information Age
Comment 2 Ryan Dooley 2002-01-17 19:56:29 UTC
> Try compiling a debugging version of libc and linking dev_mkdb
> statically with it.  You can run a stripped down version of it,
> and the dump could still be used with the unstripped version
> for post mortem analysis.

Unfortunatly, I don't have that option off hand right now.  The curious
thing I just installed on a similar workstation and the problem doesn't
exist there.

The only difference(s) are: (new vs. old where problem exists)

1) Generic Mach32 video card vs. 3Dfx Voodoo3 PCI video card,
2) 128MB ram vs. 384MB ram, and
3) generic newfs options vs. -b 32768 and -f 4096.

Ryan
Comment 3 Sheldon Hearn 2002-01-17 20:55:44 UTC
On Thu, 17 Jan 2002 12:00:05 PST, Ryan Dooley wrote:

>  The only difference(s) are: (new vs. old where problem exists)
>  
>  1) Generic Mach32 video card vs. 3Dfx Voodoo3 PCI video card,
>  2) 128MB ram vs. 384MB ram, and
>  3) generic newfs options vs. -b 32768 and -f 4096.

I'd be _very_ careful trying a block size anything larger than 16384.
I've heard horrible things about larger block sizes.  I'm pretty sure
Matt Dillon warned that >16384 block sizes would cause undesirable
behaviour in the VM sysystem.

Certainly, VM problems could account for your SEGV.

Matt?  Am I smoking crack, or did you say Very Bad Things about the VM
system and block sizes >16384?

Ciao,
Sheldon.
Comment 4 Ryan Dooley 2002-01-17 21:13:34 UTC
	Hey,

> I'd be _very_ careful trying a block size anything larger than 16384.
> I've heard horrible things about larger block sizes.  I'm pretty sure
> Matt Dillon warned that >16384 block sizes would cause undesirable
> behaviour in the VM sysystem.
>
> Certainly, VM problems could account for your SEGV.
>
> Matt?  Am I smoking crack, or did you say Very Bad Things about the VM
> system and block sizes >16384?

	Uh oh,

	I have a server then with 65536/8192 (bs,fr) for a 953GB
	fiber channel raid.  I've not noticed anything bad off hand
	(it was CVSup'd on Saturday around midnight CST.

	This actually concearns me more than my workstation :-)

	We changed the block/frag size to speed up file system checks
	when we had to fsck that partition (which holds 41000+ user
	home directories).  We went from fsck's taking about 120 minutes
	to 15 minutes which we drastically needed.

	I have had reports that reads over NFS (a client running a program
	on a file (SAS data)) took two or three attempts to initialy
	access the file before sas ran with it.  Sounded like a NFS cache
	issue, but I couldn't reproduce the error myself. (AIX client to
	FreeBSD server).

	As the semester is about to start, I can't reformat that array
	right now.

		Ryan
Comment 5 Matthew Dillon 2002-01-17 21:41:53 UTC
:On Thu, 17 Jan 2002 12:00:05 PST, Ryan Dooley wrote:
:
:>  The only difference(s) are: (new vs. old where problem exists)
:>  
:>  1) Generic Mach32 video card vs. 3Dfx Voodoo3 PCI video card,
:>  2) 128MB ram vs. 384MB ram, and
:>  3) generic newfs options vs. -b 32768 and -f 4096.
:
:I'd be _very_ careful trying a block size anything larger than 16384.
:I've heard horrible things about larger block sizes.  I'm pretty sure
:Matt Dillon warned that >16384 block sizes would cause undesirable
:behaviour in the VM sysystem.
:
:Certainly, VM problems could account for your SEGV.
:
:Matt?  Am I smoking crack, or did you say Very Bad Things about the VM
:system and block sizes >16384?
:
:Ciao,
:Sheldon.

    It should work fine as long as the filesystem frag ratio is 8:1.  The
    buffer cache is optimized for 16384 byte buffers and can become 
    fragmented if larger block sizes are used, leading to inefficient
    operation, but should have no other adverse effects.   I would not use
    a block size greater then 65536 though because you start to hit up
    against internal limitations.  Remember, the buffer cache has to reserve
    KVA for each buffer, so the system's cache efficiency is going to drop
    as the buffer size increases.

					-Matt
					Matthew Dillon 
					<dillon@backplane.com>
Comment 6 Sheldon Hearn 2002-01-17 21:42:54 UTC
On Thu, 17 Jan 2002 15:13:34 CST, Ryan Dooley wrote:

> 	I have a server then with 65536/8192 (bs,fr) for a 953GB
> 	fiber channel raid.  I've not noticed anything bad off hand
> 	(it was CVSup'd on Saturday around midnight CST.
> 
> 	This actually concearns me more than my workstation :-)

Wait for feedback from Matt.  I might just be horribly confused.

Ciao,
Sheldon.
Comment 7 Matthew Dillon 2002-01-17 21:45:07 UTC
:	Uh oh,
:
:	I have a server then with 65536/8192 (bs,fr) for a 953GB
:	fiber channel raid.  I've not noticed anything bad off hand
:	(it was CVSup'd on Saturday around midnight CST.

    That should be fine.

:	This actually concearns me more than my workstation :-)
:
:	We changed the block/frag size to speed up file system checks
:	when we had to fsck that partition (which holds 41000+ user
:	home directories).  We went from fsck's taking about 120 minutes
:	to 15 minutes which we drastically needed.
:
:	I have had reports that reads over NFS (a client running a program
:	on a file (SAS data)) took two or three attempts to initialy
:	access the file before sas ran with it.  Sounded like a NFS cache
:	issue, but I couldn't reproduce the error myself. (AIX client to
:	FreeBSD server).
:
:	As the semester is about to start, I can't reformat that array
:	right now.
:
:		Ryan

    This would depend on the NFS block size, which is independant of the
    filesystem block size.  Even a standard NFS block size of 8K requires
    7 IP fragments to construct a packet (with a standard ethernet's MTU).
    A larger NFS block size would result in even more fragments and 
    potentially overload the client's packet buffers.

    It is usually possible to mitigate NFS 'packet storm' issues by using
    TCP NFS mounts rather then UDP.

					-Matt
					Matthew Dillon 
					<dillon@backplane.com>
Comment 8 Sheldon Hearn 2002-01-17 21:48:43 UTC
On Thu, 17 Jan 2002 13:41:53 PST, Matthew Dillon wrote:

>     It should work fine as long as the filesystem frag ratio is 8:1.  The
>     buffer cache is optimized for 16384 byte buffers and can become 
>     fragmented if larger block sizes are used, leading to inefficient
>     operation, but should have no other adverse effects.

Okay, so there are no nasty surprises beyond what's already documented
in newfs(8) and tuning(7).

Sorry for the false alarm, Ryan.  Back to the drawing board on trying to
find your problem.

It'd be interesting to see whether the problematic box makes it through
a buildworld without any problems.

Ciao,
Sheldon.
Comment 9 Ryan Dooley 2002-01-17 21:51:31 UTC
>     That should be fine.

	*whew* :-)

>     This would depend on the NFS block size, which is independant of the
>     filesystem block size.  Even a standard NFS block size of 8K requires
>     7 IP fragments to construct a packet (with a standard ethernet's MTU).
>     A larger NFS block size would result in even more fragments and
>     potentially overload the client's packet buffers.

	Right, we saw this with 32k packet sizes and we just left the
	default 8k.

>     It is usually possible to mitigate NFS 'packet storm' issues by using
>     TCP NFS mounts rather then UDP.

	For our IRIX and AIX clients that nfsv3/tcp works just fine.  With
	Linux however, the only thing we've got is nfsv3/udp....  That
	darn linux :-)

	Thanks for getting back with me on this.

	Cheers,
	Ryan
Comment 10 Ryan Dooley 2002-01-17 22:13:43 UTC
> Okay, so there are no nasty surprises beyond what's already documented
> in newfs(8) and tuning(7).

	:-)

> Sorry for the false alarm, Ryan.  Back to the drawing board on trying to
> find your problem.

	No problem.  I'd rather know if there was something up.  I could
	have engineered a little down time at oh-dark-thirty during
	our maintaince window.  The server has a twin machine (different
	size raid though) and rsync's over gigE don't take too long :-)

> It'd be interesting to see whether the problematic box makes it through
> a buildworld without any problems.

	Acutally, it doesn't have any issues doing a buildworld (but the
	system drive has the defaults set for bs/fg.

	Cheers,
	Ryan
Comment 11 Matthew Dillon 2002-01-17 22:30:47 UTC
:
:
:> Okay, so there are no nasty surprises beyond what's already documented
:> in newfs(8) and tuning(7).
:
:	:-)
:
:> Sorry for the false alarm, Ryan.  Back to the drawing board on trying to
:> find your problem.
:
:	No problem.  I'd rather know if there was something up.  I could
:	have engineered a little down time at oh-dark-thirty during
:	our maintaince window.  The server has a twin machine (different
:	size raid though) and rsync's over gigE don't take too long :-)
:
:> It'd be interesting to see whether the problematic box makes it through
:> a buildworld without any problems.
:
:	Acutally, it doesn't have any issues doing a buildworld (but the
:	system drive has the defaults set for bs/fg.
:
:	Cheers,
:	Ryan

    Ryan, if you can make the dev_mkdb core dump and (-g compiled) binary
    available for download I will take a look at it.

    Also check for duplicate device nodes in /dev or device nodes that
    exist on the machine exhibiting the problem that do not exist on 
    machines that do not exhibit the problem.

    It sounds like a program bug to me rather then an OS bug.

					-Matt
					Matthew Dillon 
					<dillon@backplane.com>
Comment 12 Ryan Dooley 2002-01-17 22:37:04 UTC
>     Ryan, if you can make the dev_mkdb core dump and (-g compiled) binary
>     available for download I will take a look at it.

	I'll see what I can do.

>     Also check for duplicate device nodes in /dev or device nodes that
>     exist on the machine exhibiting the problem that do not exist on
>     machines that do not exhibit the problem.

	No dup /dev entries (and they both have the same list).

>     It sounds like a program bug to me rather then an OS bug.
>
	Yeah.  The system itself runs as expected.

	Cheers,
	Ryan
Comment 13 ru freebsd_committer freebsd_triage 2002-01-18 08:39:46 UTC
On Thu, Jan 17, 2002 at 01:56:29PM -0600, Ryan Dooley wrote:
> > Try compiling a debugging version of libc and linking dev_mkdb
> > statically with it.  You can run a stripped down version of it,
> > and the dump could still be used with the unstripped version
> > for post mortem analysis.
> 
> Unfortunatly, I don't have that option off hand right now.  The curious
> thing I just installed on a similar workstation and the problem doesn't
> exist there.
> 
> The only difference(s) are: (new vs. old where problem exists)
> 
> 1) Generic Mach32 video card vs. 3Dfx Voodoo3 PCI video card,
> 2) 128MB ram vs. 384MB ram, and
> 3) generic newfs options vs. -b 32768 and -f 4096.
> 
This is unlikely to be the case.


Cheers,
-- 
Ruslan Ermilov		Oracle Developer/DBA,
ru@sunbay.com		Sunbay Software AG,
ru@FreeBSD.org		FreeBSD committer,
+380.652.512.251	Simferopol, Ukraine

http://www.FreeBSD.org	The Power To Serve
http://www.oracle.com	Enabling The Information Age
Comment 14 Yar Tikhiy freebsd_committer freebsd_triage 2007-08-17 07:17:41 UTC
State Changed
From-To: open->closed

dev_mkdb has been removed from all supported branches 
after the advent of devfs. 


Comment 15 Yar Tikhiy freebsd_committer freebsd_triage 2007-08-17 07:17:41 UTC
Responsible Changed
From-To: freebsd-bugs->yar

So I can see feedback.