Bug 19355

Summary: fstat gives signal 10 (SIGBUS) when outputting data
Product: Base System Reporter: greg <greg>
Component: binAssignee: freebsd-bugs (Nobody) <bugs>
Status: Closed FIXED    
Severity: Affects Only Me    
Priority: Normal    
Version: 3.4-STABLE   
Hardware: Any   
OS: Any   

Description greg 2000-06-17 18:50:00 UTC
	This problem popped up when I was running sockstat, as stated earlier, and was 
	then isolated to fstat specifically. It appeared to core when listing information
	for a specific user (it was the last user in the list that appeared onscreen when
	doing a plain 'fstat' before it SIGBUS'd), and the behaviour repeats when I use
	fstat -u username. Example follows with gdb output.

[root@voyager] /usr/src/usr.bin/fstat: make clean all
[root@voyager] /usr/src/usr.bin/fstat: cd /usr/obj/usr/src/usr.bin/fstat
[root@voyager] /usr/obj/usr/src/usr.bin/fstat: gdb ./fstat
GNU gdb 4.18
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-unknown-freebsd"...
(gdb) run -u bin2ooo
Starting program: /usr/obj/usr/src/usr.bin/fstat/./fstat -u bin2ooo
USER     CMD          PID   FD MOUNT      INUM MODE         SZ|DV R/W
bin2ooo  bnc        10558 root /             2 drwxr-xr-x    1024  r
bin2ooo  bnc        10558   wd /home    1714378 drwxr-xr-x     512  r
bin2ooo  bnc        10558 text /usr     3190287 -rwxr-xr-x   79658  r
bin2ooo  bnc        10558    0 /          6766 crw--w----   ttyp2 rw
bin2ooo  bnc        10558    1 /          6766 crw--w----   ttyp2 rw
bin2ooo  bnc        10558    2 /          6766 crw--w----   ttyp2 rw
bin2ooo  bnc        10558    3* internet stream tcp dc5b9180
bin2ooo  bnc        10558    4 /home    1714190 -rw-r--r--   25717  w
bin2ooo  bnc        10558    6* internet stream tcp dc5d4840
bin2ooo  bnc        10558    7* internet stream tcp dc5ff2a0
bin2ooo  bash       10547 text /usr     3190357 -rwxr-xr-x  367780  r

Program received signal SIGBUS, Bus error.
0x280cd832 in bcopy () from /usr/lib/libc.so.3
(gdb) bt
#0  0x280cd832 in bcopy () from /usr/lib/libc.so.3
#1  0x5 in ?? ()
#2  0x8048e80 in main (argc=3, argv=0xbfbfdbe8)
    at /usr/src/usr.bin/fstat/fstat.c:265
#3  0x80489f5 in _start ()
(gdb) up
#1  0x5 in ?? ()
(gdb) up
#2  0x8048e80 in main (argc=3, argv=0xbfbfdbe8)
    at /usr/src/usr.bin/fstat/fstat.c:265
265                     dofiles(p);
(gdb) list
260                     putchar('\n');
261
262             for (plast = &p[cnt]; p < plast; ++p) {
263                     if (p->kp_proc.p_stat == SZOMB)
264                             continue;
265                     dofiles(p);
266             }
267             exit(0);
268     }
269
(gdb) quit
The program is running.  Exit anyway? (y or n) y
[root@voyager] /usr/obj/usr/src/usr.bin/fstat: ps uwxU bin2ooo
USER      PID %CPU %MEM   VSZ  RSS  TT  STAT STARTED      TIME COMMAND
bin2ooo 10547  0.0  0.0     0    0  p2  IEs+ -         0:00.00  (bash)
bin2ooo 10558  0.0  0.1  1000  576  ??  Is   Thu08PM   0:03.63 bnc
[root@voyager] /usr/obj/usr/src/usr.bin/fstat:

Fix: 

I'm wondering if this is a memory failure somewhere in fstat? I'm
	no FreeBSD hacker, so I don't have the slightest clue.
How-To-Repeat: 
	I'm not sure if this can be reproduced on other systems, I can't 
	seem to track down this error myself, so I can't pinpoint where it's 
	failing and thus reproduce it elsewhere, but for the last ten minutes
	the same action has caused this to happen again and again. More info 
	available upon request (that's if it's still
	failing when you request it :))
Comment 1 greg 2000-06-17 19:28:16 UTC
Hey .. I was playing around a little more with gdb, isolated it a little
more to the exact line, and have some variable context information ..

(gdb) step
350                     bcopy(filed0.fd_dfiles, ofiles,
(filed.fd_lastfile+1) * FPSIZE);
(gdb) p filed0
$3 = {fd_fd = {fd_ofiles = 0xc8128d80, fd_ofileflags = 0x0, fd_cdir = 0x0,
    fd_rdir = 0x0, fd_nfiles = 0, fd_lastfile = 6922, fd_freefile = 12635,
    fd_cmask = 12859, fd_refcnt = 29236}, fd_dfiles = {0x32325b1b,
0x1b48313b,
    0x20204b5b, 0x20202020, 0x20202020, 0x20202020, 0x20202020,
0x2f232020,
    0x4057753c, 0x23205469, 0x57753c2f, 0x20546940, 0x753c2f23,
0x54694057,
    0x3c2f2320, 0x69405775, 0x2f232054, 0x4057753c, 0x23205469,
0x57753c2f},
  fd_dfileflags = "@iT\e[K\e[1;22r\e[22;1H"}
(gdb) p filed0.fd_fd.fd_lastfile
$4 = 6922
(gdb) p ofiles
$5 = (struct file **) 0x8068000
(gdb) p *ofiles
$6 = (struct file *) 0x0
(gdb) p (filed0.fd_fd.fd_lastfile+1)
$7 = 6923
[note: FPSIZE must be a define, I had several errors printing the whole
expression]
(gdb) p filed0.fd_dfiles
$8 = {0x32325b1b, 0x1b48313b, 0x20204b5b, 0x20202020, 0x20202020,
0x20202020,
  0x20202020, 0x2f232020, 0x4057753c, 0x23205469, 0x57753c2f, 0x20546940,
  0x753c2f23, 0x54694057, 0x3c2f2320, 0x69405775, 0x2f232054, 0x4057753c,
  0x23205469, 0x57753c2f}

I'm still puzzled .. if no information comes back regarding more requests
for info without the next 30 minutes - hour, i'm going to kill the
offending pid, and end this. (note: I tracked down the pid by fstat'ing
each of the user's processes).

/gp

.... ..   .  ... .     .       .   .     .
              g r e g @ s t r a y n e t . c o m
.-----.----.-----.-----. senior administrator, straynet online
|  _  |   _|  -__|  _  | head network administrator, wen dot net
|___  |__| |_____|___  | staff consultant, micro web company
|_____|          |_____| icq: 10405504      /    aol im: xysters
Comment 2 iedowse freebsd_committer freebsd_triage 2001-08-25 23:45:46 UTC
State Changed
From-To: open->feedback


Have you seen this problem occur again since? I seem to remember 
seeing something similar quite a while ago, but I never attempted 
to track it down.
Comment 3 iedowse freebsd_committer freebsd_triage 2002-01-13 18:38:45 UTC
State Changed
From-To: feedback->closed


Feedback timeout.