Bug 25266

Summary: fdesc file system in -STABLE locks up during nightly builds
Product: Base System Reporter: mwm
Component: kernAssignee: chris <chris>
Status: Closed FIXED    
Severity: Affects Only Me    
Priority: Normal    
Version: 4.2-STABLE   
Hardware: Any   
OS: Any   

Description mwm 2001-02-21 23:20:00 UTC
The system locks up every night a few seconds into the daily periodic
run.

The stack trace from ddb starts in the debugger, back through sc &
atkbd, thence through the interrupt (presumably from my invoking DDB
at the console). From there, it's:

	fdesc_readdir+0xe6(<address>, <address>, 0, <address>, 4, 0)
	getdirentries+0xf4(5 <addresses>)
	Xint0x80_syscall+0x2b

According to gdb on the core dump, the fdesc_readdir is:

0xc019223e is in fdesc_readdir (../../miscfs/fdesc/fdesc_vnops.c:614).
609                     while (i < sizeof(rootent) / sizeof(rootent[0]) &&
610                         uio->uio_resid >= UIO_MX) {
611                             dt = &rootent[i];
612                             switch (dt->d_fileno) {
613                             case FD_CTTY:
614                                     if (cttyvp(uio->uio_procp) == NULL)
615                                             continue;
616                                     break;
617
618                             case FD_STDIN:

On the face of it, the while loop in fdesc_readdir is simply
broken. If you hit one of the continues in the loop (there are others
further down), you have missed anything in the loop that might change
either i or uip, meaning the loop doesn't terminate. It may be waiting
on other events to change it, but somehow I doubt it for the fdesc
code. Further, note that the code that is finding the continue in this
case is checking for a controlling terminal, which would explain the
difference in behavior between running periodic from cron vs from the
command line.

Fix: 

It's not clear how to fix the code. I note that the last MFC of this
code is over a year old, and the very next commit simply removed this
code. Possibly MFC'ing a more recent version would solve the problem.
How-To-Repeat: 
set daily_clean_disks_enable to YES in /etc/make.conf on a 4.2 box,
mount fdesc, and watch it lock up.
Comment 1 Kris Kennaway freebsd_committer freebsd_triage 2001-02-24 07:47:47 UTC
Responsible Changed
From-To: freebsd-bugs->chris

chris is the fdescfs maintainer
Comment 2 chris freebsd_committer freebsd_triage 2001-03-19 22:16:32 UTC
State Changed
From-To: open->analyzed

Looking into MFCing the changes I made in -CURRENT after the release is 
over with.  A workaround for now would be not to use fdesc in its current 
state.
Comment 3 kazarov 2001-04-20 21:27:40 UTC
On my system (4.2-STABLE SMP) it does not locks up, but panic system on
executing this code (I've extracted it from /etc/security that panic
system):

# mount | fgrep fdesc
fdesc on /dev (fdesc, noatime, union)

# cat > security
echo 'Checking for uids of 0:'
n=$(awk -F: '$3==0 {print $1,$3}' /etc/master.passwd |
    tee /dev/stderr |
    sed -e '/^root 0$/d' -e '/^toor 0$/d' |
    wc -l)
# sh security 2>&1 | sendmail root

A panic message with 'tee' reported as current process.

BTW: I'm getting error: "ln: /dev/vga: Read-only file system" on system
boot - IMHO mounting fdesc on /dev is not a best solution for other devices.

Dmitry
Comment 4 chris freebsd_committer freebsd_triage 2001-10-23 00:39:04 UTC
State Changed
From-To: analyzed->closed

Merged the -CURRENT version into -STABLE.