| Summary: | nullfs: mount_null loopbacks can hang startx temporarily [4.3] | ||
|---|---|---|---|
| Product: | Base System | Reporter: | Tony Maher <tonym> |
| Component: | kern | Assignee: | Boris Popov <bp> |
| Status: | Closed FIXED | ||
| Severity: | Affects Only Me | ||
| Priority: | Normal | ||
| Version: | 4.3-STABLE | ||
| Hardware: | Any | ||
| OS: | Any | ||
Responsible Changed From-To: freebsd-bugs->bp bp has been doing alot of the nullfs fixes. State Changed From-To: open->feedback Is this still a problem with more modern versions of FreeBSD? State Changed From-To: feedback->closed Submitter notes that as 5.2 the problem does not recur. If anyone is still experiencing this problem on -STABLE, the time to speak up is now, so that I can reopen this PR. |
Seeing merges to fix null_fs (and yes I note the warning is still in mount_null.8), I decided to ty it out and remove a couple of symlinks and try to loopback with mount_null. e.g. mount_null /space/usr/src /usr/src mount_null /var/obj /usr/obj I successfully did a make world cycle and fsck'ed the disks and all was ok. So decide to remove all symlinks and use mount_null loopbacks instead. My /etc/fstab looks like this: /space/home/staff /home/staff null rw 0 0 /space/db/mysql /db/mysql null rw 0 0 /space/db/pgsql /db/pgsql null rw 0 0 /space/usr/doc /usr/doc null rw 0 0 /space/usr/ports /usr/ports null rw 0 0 /space/usr/src /usr/src null rw 0 0 /var/obj /usr/obj null rw 0 0 Everything appeared to start ok and can login fine. When trying to use startx it hung (something to do with xauth) but eventually works after a minute or so. Truss shows a couple of differences around here: (sorry this is extra wide output here to show the diffs) Hanging version Normal version pipe() = 3 (0x3) pipe() = 3 (0x3) fork() = 484 (0x1e4 | fork() = 402 (0x192 close(4) = 0 (0x0) close(4) = 0 (0x0) fork() = 485 (0x1e5 | fork() = 403 (0x193 > SIGNAL 20 > SIGNAL 20 close(3) = 0 (0x0) close(3) = 0 (0x0) close(-1) ERR#9 'Bad f close(-1) ERR#9 'Bad f getpgrp() = 469 (0x1d5 | getpgrp() = 387 (0x183 xauth: (argv):1: bad display name "dt.home:0" in "list" comm | wait4(0xffffffff,0xbfbff8e0,0x2,0x0) = 403 (0x193 SIGNAL 20 | wait4(0xffffffff,0xbfbff8e0,0x2,0x0) = 402 (0x192 wait4(0xffffffff,0xbfbff8e0,0x2,0x0) = 485 (0x1e5 | fork() = 404 (0x194 wait4(0xffffffff,0xbfbff8e0,0x2,0x0) = 484 (0x1e4 | getpgrp() = 387 (0x183 xauth: (argv):1: bad display name "dt.home:0" in "add" comma < fork() = 486 (0x1e6 < SIGNAL 20 SIGNAL 20 getpgrp() = 469 (0x1d5 | wait4(0xffffffff,0xbfbff850,0x2,0x0) = 404 (0x194 wait4(0xffffffff,0xbfbff850,0x2,0x0) = 486 (0x1e6 < read(0xa,0x80bb600,0x3ff) = 107 (0x6b) read(0xa,0x80bb600,0x3ff) = 107 (0x6b) Fix: The strange thing was if I added noauto option to /space/home/staff entry in fstab and the manuaully mounted it, then startx worked perfectly. By modifying /etc/rc and doing the explicit mount after "mount -a" it worked. Decided it was probably to do with it being the first mount_null entry. So swapping first and second entries so that /space/home/staff entry became the second entry it worked perfectly. Deciding to put in a dummy mount_null entry that is not actually used for anything rather than risk strange behaviour on another working mount point (and keeping /space/home/staff as second entry) e.g. /space/hack /usr/hack null rw 0 0 /space/home/staff /home/staff null rw 0 0 /space/db/mysql /db/mysql null rw 0 0 and the hanging problem *was* present again! On a whim (well actually to match format of the original second entry /space/db/mysql that I had succesfully swapped and tested previously) I changed it to /space/hack/hack /usr/hack null rw 0 0 /space/home/staff /home/staff null rw 0 0 /space/db/mysql /db/mysql null rw 0 0 and the hanging problem was *not* there! Everything seems to be ok with this hack but I have not done any heavy duty work wiht it. This is on my laptop (Dell Inspiron 3500) which I mention as it had weird behaviour with mount_smbfs (and this was reported to bp in private emails last year) How-To-Repeat: Use mount_null for home directories and try to startx (see under Fix: the home directories have to be the first mount_null entry)