Bug 28566

Summary: nullfs: mount_null loopbacks can hang startx temporarily [4.3]
Product: Base System Reporter: Tony Maher <tonym>
Component: kernAssignee: Boris Popov <bp>
Status: Closed FIXED    
Severity: Affects Only Me    
Priority: Normal    
Version: 4.3-STABLE   
Hardware: Any   
OS: Any   

Description Tony Maher 2001-07-01 07:10:01 UTC
  Seeing merges to fix null_fs (and yes I note the warning is still in
  mount_null.8), I decided to ty it out and remove a couple of symlinks
  and try to loopback with mount_null. e.g. 

mount_null /space/usr/src /usr/src
mount_null /var/obj /usr/obj

  I successfully did a make world cycle and fsck'ed the disks and all
  was ok.
  So decide to remove all symlinks and use mount_null loopbacks instead.
  My /etc/fstab looks like this:

/space/home/staff   /home/staff null    rw      0   0
/space/db/mysql     /db/mysql   null    rw      0   0
/space/db/pgsql     /db/pgsql   null    rw      0   0
/space/usr/doc      /usr/doc    null    rw      0   0
/space/usr/ports    /usr/ports  null    rw      0   0
/space/usr/src      /usr/src    null    rw      0   0
/var/obj            /usr/obj    null    rw      0   0

  Everything appeared to start ok and can login fine.
  When trying to use startx it hung (something to do with xauth)
  but eventually works after a minute or so.
  Truss shows a couple of differences around here:
  (sorry this is extra wide output here to show the diffs)


           Hanging version                                                      Normal version                                              

pipe()                                           = 3 (0x3)      pipe()                                           = 3 (0x3)
fork()                                           = 484 (0x1e4 | fork()                                           = 402 (0x192
close(4)                                         = 0 (0x0)      close(4)                                         = 0 (0x0)
fork()                                           = 485 (0x1e5 | fork()                                           = 403 (0x193
                                                              > SIGNAL 20
                                                              > SIGNAL 20
close(3)                                         = 0 (0x0)      close(3)                                         = 0 (0x0)
close(-1)                                        ERR#9 'Bad f   close(-1)                                        ERR#9 'Bad f
getpgrp()                                        = 469 (0x1d5 | getpgrp()                                        = 387 (0x183
xauth: (argv):1:  bad display name "dt.home:0" in "list" comm | wait4(0xffffffff,0xbfbff8e0,0x2,0x0)             = 403 (0x193
SIGNAL 20                                                     | wait4(0xffffffff,0xbfbff8e0,0x2,0x0)             = 402 (0x192
wait4(0xffffffff,0xbfbff8e0,0x2,0x0)             = 485 (0x1e5 | fork()                                           = 404 (0x194
wait4(0xffffffff,0xbfbff8e0,0x2,0x0)             = 484 (0x1e4 | getpgrp()                                        = 387 (0x183
xauth: (argv):1:  bad display name "dt.home:0" in "add" comma <
fork()                                           = 486 (0x1e6 <
SIGNAL 20                                                       SIGNAL 20
getpgrp()                                        = 469 (0x1d5 | wait4(0xffffffff,0xbfbff850,0x2,0x0)             = 404 (0x194
wait4(0xffffffff,0xbfbff850,0x2,0x0)             = 486 (0x1e6 <
read(0xa,0x80bb600,0x3ff)                        = 107 (0x6b)   read(0xa,0x80bb600,0x3ff)                        = 107 (0x6b)

Fix: 

The strange thing was if I added noauto option to /space/home/staff entry
  in fstab and the manuaully mounted it, then startx worked perfectly.
  By modifying /etc/rc and doing the explicit mount after "mount -a" it worked.
  Decided it was probably to do with it being the first mount_null entry.
  So swapping first and second entries so that /space/home/staff entry
   became the second entry it worked perfectly.
  
  Deciding to put in a dummy mount_null entry that is not actually used for 
  anything rather than risk strange behaviour on another working mount point 
  (and keeping /space/home/staff as second entry)
  e.g.
  
/space/hack         /usr/hack   null    rw      0   0
/space/home/staff   /home/staff null    rw      0   0
/space/db/mysql     /db/mysql   null    rw      0   0

  and the hanging problem *was* present again!
  On a whim (well actually to match format of the original second entry
  /space/db/mysql that I had succesfully swapped and tested previously)
  I changed it to 

/space/hack/hack    /usr/hack   null    rw      0   0
/space/home/staff   /home/staff null    rw      0   0
/space/db/mysql     /db/mysql   null    rw      0   0

  and the hanging problem was *not* there!

  Everything seems to be ok with this hack but I have not
  done any heavy duty work wiht it.

  This is on my laptop (Dell Inspiron 3500) which I mention as it
  had weird behaviour with mount_smbfs (and this was reported to bp in 
  private emails last year)
How-To-Repeat: 
  Use mount_null for home directories and try to startx
  (see under Fix: the home directories have to be the first mount_null
   entry)
Comment 1 dwmalone freebsd_committer freebsd_triage 2001-07-01 21:26:46 UTC
Responsible Changed
From-To: freebsd-bugs->bp

bp has been doing alot of the nullfs fixes.
Comment 2 Mark Linimon freebsd_committer freebsd_triage 2004-07-16 06:02:11 UTC
State Changed
From-To: open->feedback

Is this still a problem with more modern versions of FreeBSD?
Comment 3 Mark Linimon freebsd_committer freebsd_triage 2004-07-16 17:27:29 UTC
State Changed
From-To: feedback->closed

Submitter notes that as 5.2 the problem does not recur.  If anyone 
is still experiencing this problem on -STABLE, the time to speak 
up is now, so that I can reopen this PR.