Hi, I ran into this bug on two different systems. - One uses a r324861 amd64 GENERIC-NODEBUG kernel and userland with ZFS, - the other uses a r324918 arm RPI-B kernel an userland with UFS2. The directory hierarchy loop occures on both systems, so I think it is not related to the underlying filesystem. After installing freebsd-uucp from ports and configuring it, there exists a new directory tree /var/spool/uucp/lyxys/[DX]. At that moment the content of lyxys/D. and lyxys/X. looked as expected. But some time after both looked for ls(1) like a copy of /var/spool/uucp. And indeed ls(1) showed the same inode number for all three directories: 63747 drwxrwxr-x 10 uucp uucp 512 Feb 7 2017 /var/spool/uucp 95493 drwxr-xr-x 4 uucp uucp 512 Oct 24 23:53 /var/spool/uucp/lyxys 63747 drwxrwxr-x 10 uucp uucp 512 Feb 7 2017 /var/spool/uucp/lyxys/D. 63747 drwxrwxr-x 10 uucp uucp 512 Feb 7 2017 /var/spool/uucp/lyxys/X. fsdb(8) showed me different inode numbers than ls(1) for the D. and X. directories. But this can be a mistake, as it was my first use of fsdb. fsck_ufs does not report any error or tries to fix something. As ZFS does not have a fsck, I tried "zpool scrub". Which also did not report any error. So both internal filesystem structures seem to be OK. Steps tried to reproduce the problem: With ZFS all I can do is a rooback to a earlier snapshot without the /var/spool/uucp directory and start again. As it is easier for me to experiment with the arm system I used the Raspberry Pi for the following tests. Trying to get closer to the moment this happens I set up a date(1), ls(1) loop on a fresh set up[1] UFS. It showed the inode change in the directory information happend sometime between, 02:57 and 03:57 CEST: Wed Oct 25 02:57:07 CEST 2017 95237 drwxr-xr-x 25 root wheel 512 Oct 24 23:38 /var 102372 drwxr-xr-x 11 root wheel 512 Mar 7 2017 /var/spool/ 102701 drwxrwxr-x 10 uucp uucp 512 Feb 7 2017 /var/spool/uucp 102711 drwxr-xr-x 4 uucp uucp 512 Oct 24 23:53 /var/spool/uucp/lyxys 102684 drwxr-xr-x 2 uucp uucp 512 Oct 24 23:53 /var/spool/uucp/lyxys/D. 102685 drwxr-xr-x 2 uucp uucp 512 Oct 24 23:53 /var/spool/uucp/lyxys/X. Wed Oct 25 03:57:08 CEST 2017 95237 drwxr-xr-x 25 root wheel 512 Oct 24 23:38 /var 102372 drwxr-xr-x 11 root wheel 512 Mar 7 2017 /var/spool/ 102701 drwxrwxr-x 10 uucp uucp 512 Feb 7 2017 /var/spool/uucp 102711 drwxr-xr-x 4 uucp uucp 512 Oct 24 23:53 /var/spool/uucp/lyxys 102701 drwxrwxr-x 10 uucp uucp 512 Feb 7 2017 /var/spool/uucp/lyxys/D. 102701 drwxrwxr-x 10 uucp uucp 512 Feb 7 2017 /var/spool/uucp/lyxys/X. As the system was not used actively, the only difference between the hours before and after the change was cron running periodic daily at 03:01:00. In the next step I searched for a way to faster set up the UFS for testing. First I used "dump -0a -f /usr/local/root_dump /" on the arm then moved the SD-card to the amd64 and with newfs -UEt /dev/da0s2a mount /dev/da0s2a /mnt mount -r /dev/da0s3a /tmp/da0s3a/ cd /mnt restore rf /tmp/da0s3a/root_dump rm restoresymtable restored the filesystem. The output of ls(1) showed everything is OK: # ls -lid var/spool/uucp var/spool/uucp/lyxys var/spool/uucp/lyxys/* 95491 drwxrwxr-x 10 uucp uucp 512 Feb 7 2017 var/spool/uucp 261 drwxr-xr-x 4 uucp uucp 512 Oct 24 23:53 var/spool/uucp/lyxys 32005 drwxr-xr-x 2 uucp uucp 512 Oct 24 23:53 var/spool/uucp/lyxys/D. 63746 drwxr-xr-x 2 uucp uucp 512 Oct 24 23:53 var/spool/uucp/lyxys/X. # But after booting this filesystem on the arm: # cd /; ls -lid var/spool/uucp var/spool/uucp/lyxys var/spool/uucp/lyxys/* 95491 drwxrwxr-x 10 uucp uucp 512 Feb 7 2017 var/spool/uucp 261 drwxr-xr-x 4 uucp uucp 512 Oct 24 23:53 var/spool/uucp/lyxys 95491 drwxrwxr-x 10 uucp uucp 512 Feb 7 2017 var/spool/uucp/lyxys/D. 95491 drwxrwxr-x 10 uucp uucp 512 Feb 7 2017 var/spool/uucp/lyxys/X. # Again mounted on the amd64 showed the same output. Now I tried just restoring, unmounting and mounting the UFS on the amd64 and it looks like it happens immediately with the unmount: # rm restoresymtable # ls -lid var/spool/uucp var/spool/uucp/lyxys var/spool/uucp/lyxys/* 63747 drwxrwxr-x 10 uucp uucp 512 Feb 7 2017 var/spool/uucp 95493 drwxr-xr-x 4 uucp uucp 512 Oct 24 23:53 var/spool/uucp/lyxys 261 drwxr-xr-x 2 uucp uucp 512 Oct 24 23:53 var/spool/uucp/lyxys/D. 32002 drwxr-xr-x 2 uucp uucp 512 Oct 24 23:53 var/spool/uucp/lyxys/X. # cd; umount /mnt; mount /dev/da0s2a /mnt; cd /mnt # ls -lid var/spool/uucp var/spool/uucp/lyxys var/spool/uucp/lyxys/* 63747 drwxrwxr-x 10 uucp uucp 512 Feb 7 2017 var/spool/uucp 95493 drwxr-xr-x 4 uucp uucp 512 Oct 24 23:53 var/spool/uucp/lyxys 63747 drwxrwxr-x 10 uucp uucp 512 Feb 7 2017 var/spool/uucp/lyxys/D. 63747 drwxrwxr-x 10 uucp uucp 512 Feb 7 2017 var/spool/uucp/lyxys/X. # So I have two ways of repeating this behaviour: - Waiting till periodiy daily triggers ist somehow. Or calling it immediately. - Restoring the filesystem from dump(8) and unmounting it. Just to complete the information and to demonstrate the loop, this is what the ls(1) optput looks like without -d: # ls -li var/spool/uucp var/spool/uucp/lyxys var/spool/uucp/lyxys/* var/spool/uucp: total 44 95491 drwxr-xr-x 3 uucp uucp 512 Sep 4 2013 .Failed 259 drwxrwxr-x 2 uucp uucp 512 Feb 7 2017 .Preserve 63748 drwxr-xr-x 2 uucp uucp 512 Sep 4 2013 .Received 95492 drwxrwxr-x 2 uucp uucp 512 Feb 7 2017 .Sequence 260 drwxrwxr-x 2 uucp uucp 512 Sep 4 2013 .Status 32001 drwxrwxr-x 2 uucp uucp 512 Feb 7 2017 .Temp 63749 drwxrwxr-x 2 uucp uucp 512 Feb 7 2017 .Xqtdir 67117 -rw-r--r-- 1 uucp uucp 5959 Nov 24 2014 Log 67118 -rw-r--r-- 1 uucp uucp 3728 Nov 24 2014 Stats 95493 drwxr-xr-x 4 uucp uucp 512 Oct 24 23:53 lyxys var/spool/uucp/lyxys: total 8 63747 drwxrwxr-x 10 uucp uucp 512 Feb 7 2017 D. 63747 drwxrwxr-x 10 uucp uucp 512 Feb 7 2017 X. var/spool/uucp/lyxys/D.: total 44 95491 drwxr-xr-x 3 uucp uucp 512 Sep 4 2013 .Failed 259 drwxrwxr-x 2 uucp uucp 512 Feb 7 2017 .Preserve 63748 drwxr-xr-x 2 uucp uucp 512 Sep 4 2013 .Received 95492 drwxrwxr-x 2 uucp uucp 512 Feb 7 2017 .Sequence 260 drwxrwxr-x 2 uucp uucp 512 Sep 4 2013 .Status 32001 drwxrwxr-x 2 uucp uucp 512 Feb 7 2017 .Temp 63749 drwxrwxr-x 2 uucp uucp 512 Feb 7 2017 .Xqtdir 67117 -rw-r--r-- 1 uucp uucp 5959 Nov 24 2014 Log 67118 -rw-r--r-- 1 uucp uucp 3728 Nov 24 2014 Stats 95493 drwxr-xr-x 4 uucp uucp 512 Oct 24 23:53 lyxys var/spool/uucp/lyxys/X.: total 44 95491 drwxr-xr-x 3 uucp uucp 512 Sep 4 2013 .Failed 259 drwxrwxr-x 2 uucp uucp 512 Feb 7 2017 .Preserve 63748 drwxr-xr-x 2 uucp uucp 512 Sep 4 2013 .Received 95492 drwxrwxr-x 2 uucp uucp 512 Feb 7 2017 .Sequence 260 drwxrwxr-x 2 uucp uucp 512 Sep 4 2013 .Status 32001 drwxrwxr-x 2 uucp uucp 512 Feb 7 2017 .Temp 63749 drwxrwxr-x 2 uucp uucp 512 Feb 7 2017 .Xqtdir 67117 -rw-r--r-- 1 uucp uucp 5959 Nov 24 2014 Log 67118 -rw-r--r-- 1 uucp uucp 3728 Nov 24 2014 Stats 95493 drwxr-xr-x 4 uucp uucp 512 Oct 24 23:53 lyxys # A friend hinted me to the 64bit inode change as a possible cause, but as I do not have a backup old enough I can not verify his idea easily. Ralf [1] newfs, rsync from backup without /var/spool/uucp, port reinstalled, configured.
Another hypothesis is that the name cache may have something to do with it.
(In reply to Andriy Gapon from comment #1) To expand on my previous comment, I mean that your report is the first of its kind and the affected directories have rather peculiar names of form "?.". So, my suspicion is that some generic lookup or cache code may have a flaw of misinterpreting those names as "..".
And when you know what you are looking for: static __noinline int cache_lookup_nomakeentry(struct vnode *dvp, struct vnode **vpp, struct componentname *cnp, struct timespec *tsp, int *ticksp) { struct namecache *ncp; struct rwlock *blp; struct mtx *dvlp, *dvlp2; uint32_t hash; int error; if (cnp->cn_namelen == 2 && cnp->cn_nameptr[1] == '.') { There is a similar condition in cache_lookup() as well. I do not see any code to pre-check that cnp->cn_nameptr[0] == '.' Seems like bad refactoring in r324378.
wow, that's a fucking retarded bug if i ever seen one. sorry for the inconvenience. So to be clear, does this solve the problem for you? diff --git a/sys/kern/vfs_cache.c b/sys/kern/vfs_cache.c index 3da1c97..23c80a9 100644 --- a/sys/kern/vfs_cache.c +++ b/sys/kern/vfs_cache.c @@ -1126,7 +1126,8 @@ cache_lookup_nomakeentry(struct vnode *dvp, struct vnode **vpp, uint32_t hash; int error; - if (cnp->cn_namelen == 2 && cnp->cn_nameptr[1] == '.') { + if (cnp->cn_namelen == 2 && + cnp->cn_nameptr[0] == '.' && cnp->cn_nameptr[1] == '.') { counter_u64_add(dotdothits, 1); dvlp = VP2VNODELOCK(dvp); dvlp2 = NULL;
The obvious bug is fixed in https://svnweb.freebsd.org/changeset/base/325268
Sorry for the wait. I can confirm, that base r325268 fixes the bug. Tested with the faster restore-umount-mount case. Compiling the arm will take some more hours, but I do not expect a surprise there. So I like to close the bug as fixed. Thank you all. Ralf