Created attachment 246245 [details] patch to enable building with NFS-mounted /usr/src and /usr/obj Scenario: - FreeBSD stable/14 latest (e4fb49e8) - /usr/src and /usr/obj on another machine via NFS - "cd /usr/src ; make buildworld buildkernel KERNCONF=..." Result: - Build fails with "cp: chflags: .../stable/14/amd64.amd64/tmp/legacy/bin/basename: Operation not supported" Expected result: - Building was possible using this setup until not too long ago and should continue to work with NFS-mounted src and obj directories Patch attached. -- Martin
The root cause probably is that files such as /usr/bin/basename have uarch set in stable/14, but not in releng/12.4.
Alex, Jess - it seems reasonable to me to copy host tools without -p; I'm happy to land this patch if you agree. Martin, would you attach a git format patch so that author metadata can be preserved?
Created attachment 246393 [details] [PATCH] enable building with NFS-mounted /usr/obj
commit 0e533c72bc300bd90390e950ea9e473b38eeb409 Author: Warner Losh <imp@FreeBSD.org> Date: Mon Aug 24 16:06:11 2020 +0000 When copying over the binaries, use '-p' to preserve date/time Although I can't reproduce it, others are seeing different lex/yacc programs always regenerated after my change to copy rather than symlink the files. The reported fix is to add '-p' to the copies. Since it doesn't hurt, go head and add it, though the reasons for this mattering remain at best obscure and poorly articulated. Notes: svn path=/head/; revision=364701
Given that commit, I am rather hesitant to say we should just go ahead and drop the -p without first understanding why it was needed in the first place.
Iirc, copying without -p would not preserve the file's times, which lead to some dependency breaking. Removing that will break things again. Iirc, there were other problems fixed that were discovered after the commit... it greatly surprised me at the time. And it was a series of seventh that lead to the copying as well that I need to see if I can recall... Wtf is uarch being set on basename. It's not on any of the systems I checked So its not all NFS setups that would fail.. this has got to be limited to systems running some third party backup program... But if ever there was a file flag to not set, it's uarch. Do not land this patch. I'll try to recreate this here.
But the best way to fix this would be to import NetBSD's cp's -N flag: don't copy flags. We don't need them. And I can think of no good that can come from copying them... Though I'd really like to understand how they came to be set on the stable/14 system... If FreeBSD's installer or things are doing it, then we need to stop them from doing it. If it's a 3rd party backup program, we need to add a note to the release notes. If it can't be discovered why, then I'm unsure what to do.
NetBSD's cp -N: https://reviews.freebsd.org/D42673
Also NetBSD's import of BSDI patch to ignore EOPNOTSUPP when flags == 0 https://reviews.freebsd.org/D42674
(In reply to Jessica Clarke from comment #4) Could this be related to bug 249957? - There seem to be races in what make is doing when timestamps are either truncated to seconds (over NFS) or otherwise very close together (as mentioned in the commit you reproduced). However, for the first case (bug 249957) I tried to add a sleep after the config(8) which produces opt_global.h, to no avail. Maybe it was one of the opt_global.h which are specially generated in the various kernel module obj dirs. -- Martin [0]# pwd /z/OBJ/FreeBSD/amd64/src/MBi/stable/14/amd64.amd64/sys/XYZZY_SMP [0]# find . -name opt_global.h ./modules/z/SRC/FreeBSD/src/MBi/stable/14/sys/modules/pfsync/opt_global.h ./modules/z/SRC/FreeBSD/src/MBi/stable/14/sys/modules/pflog/opt_global.h ./modules/z/SRC/FreeBSD/src/MBi/stable/14/sys/modules/dtrace/fasttrap/opt_global.h ./modules/z/SRC/FreeBSD/src/MBi/stable/14/sys/modules/pf/opt_global.h ./opt_global.h [0]#
The -p change was to fix meta mode. Now that I've had a chance to sleep on it I remember: it was for metamode. filemon catches EVERYTHING that's used to build a target, including the binaries that are used. So, if the date changes on them, anything built by them gets regenerated. I don't think it's an opt_global.h thing. that was an actual failure. I think cp -N is the way to go. But a case might be made for using cpio -p to copy them as well, since that will be more portable (at least it will work without needing to bootstrap cp) and won't preserve flags either. The current code is a bit "sophisticated" to make that be an easy drop in replacement (there's cases like awk where we need to link it to nawk, and the list of things to use is source directories, and we need to transform each element into a destination, plus we assume only one destination, which is a poor fit for cpio, though it could be manageable with symlinks). cp replaced the ln -sf because mid-way through installworld we'd run new binaries with old libraries where the new binaries expected something new that wasn't in the old libraries. We replaced the even older cp with ln for the macos build anyway (though it's not quite as straight forward as that, and maybe the real bug here is that we're copying too many things too early since the bootstrap phase shouldn't need much of anything on a real FreeBSD system). So the quick hack may be cp -N, but the a more nuanced analysis is likely needed for this to be solved.
(In reply to Warner Losh from comment #11) How about doing the copy only if the existing target differs from the source? Something like "cmp -s a b || cp a b". Would that be sufficient to satisfy meta mode? -- Martin
(In reply to Martin Birgmeier from comment #12) We still still don't know why the flags are set on /bin/basename... Until I understand where they are coming from, or what environment we see them, I'm loathe to change anything... The cmp || cp might work. I'll have to take a look... We have a big problem with bmake using the link time rather than the target of the link time for rebuilding, so there's always a lot of noise in a world build to plow through...
(In reply to Warner Losh from comment #13) On two machines with /usr/bin/basename on UFS, the uarch bit is not set. On one machine with a ZFS root, uarch is set on /usr/bin/basename. Also, to minimize changing files after a buildworld, I first install into a temporary destination and then copy over only the files which differ. And this using tar: ( cd <tmp dir> && tar cfT - <list of changed files> ) | ( cd / && tar xfp -). Before this copy, the old files in / are simply removed. There are additional steps to deal with schg files and removed files, but this is not important here. Summarizing, it is probably tar which sets uarch and/or root-on-zfs. -- Martin
Btw tar does not complain if it cannot set flags, although it does set them if possible (but not on NFS, obviously).
On the zfs-on-root system I looked for one (of the few) files without uarch not set and used cp to copy it to root's home dir. The copy has uarch set. -- Martin
(In reply to Martin Birgmeier from comment #16) So we could use tar, with the following super ugly (and likely not very portable) formulation: diff --git a/Makefile.inc1 b/Makefile.inc1 index f47b9f66b69e..cc2bdb315a58 100644 --- a/Makefile.inc1 +++ b/Makefile.inc1 @@ -2584,6 +2584,7 @@ _bootstrap_tools_links+=${_links:S/,/ /g} # the host version is known to be compatible into ${WORLDTMP}/legacy # We do this before building any of the bootstrap tools in case they depend on # the presence of any of the links (e.g. as m4/lex/awk) +# Use tar because it ignores flags, which we don't want to copy ${_bt}-links: .PHONY .for _tool in ${_bootstrap_tools_links} @@ -2593,7 +2594,7 @@ ${_bt}-link-${_tool}: .PHONY if [ ! -e "$${source_path}" ] ; then \ echo "Cannot find host tool '${_tool}'"; false; \ fi; \ - cp -pf "$${source_path}" "${WORLDTMP}/legacy/bin/${_tool}" + tar -cf - --absolute-paths "$${source_path" | tar -C "${WORLDTMP}/legacy/bin/" -f - -s '=.*/==g' ${_bt}-links: ${_bt}-link-${_tool} .endfor But that kinda argues for the cmp route, though we'd need to modify the logic a bit as well... I'm surprised the copies in ZFS has this set... But reading the man page, we see in chflags(2): For instance, ZFS tracks changes to files and will set this bit when a file is updated. which is, imho, decidedly unhelpful.... But that explains how it got set and the scope of the problem... And I can confirm that my ZFS root machines (which I hadn't checked) do, indeed, have this set. That argues, imho, that we should mask this bit in cp, but that's a bigger fight.... I think I need to figure out how to recreate the reason I added -p in the first place...
Are there any news on this? I use this setup (src/obj on NFS) to update Tier2-platforms via nfs and manually removed the "-p" option as suggested in the first patch above. Are there any adverse effects to be expected?
+1 on needing this fixed, subscribing
Also happening on recent -current n281766
I thought it might be worthwhile mentioning the complete circumstances I'm seeing the problem in: 1. rpi2b+ (16:current) accessing 16:current content as 2, nfs client mounted /usr/src from a stable/15 amd64 nfs server on 1Gb network 3. also /usr/obj and /var/cache/ccache mounted the same way problem only happens when make buildworld starts on the rpi2b+ machine this machine successfully built and installed kernel this machine is ufs. it's a new install and has never run a backup of any type. This machine uses meta mode and ccache.
(In reply to Martin Birgmeier from comment #0) This problem might be caused by an older NFSv4 server, although I have no idea why the file has uarch set. Let me explain... Commit 3b6d4c6 added support for the NFSv4 Archive attribute (uarch) to the NFSv4 client/server. This implies that the NFSv4 client now checks to see if the NFSv4 server supports the Archive attribute. However, if the NFSv4 server is an older one (prior to the above commit done to main on Oct. 21), the server will say that Archive is not supported. Is your NFS mount using NFSv4 and is your NFSv4 server a FreeBSD one that isn't completely up to date? If so, I think I have to come up with a better way to handle an NFSv4 server that does not support Archive (uarch). (I'm away from home for a couple of weeks, so I won't be able to come up with any patch for a while.)
(In reply to Rick Macklem from comment #22) I now see that the version affected is 14.0. Does that mean the system doing the building is 14.0 or newer? Same question for your NFS server? (and what does the output of "nfsstat -m" on the client show the mount version as?)
(In reply to Rick Macklem from comment #23) Hi Rick The nfs server is nfs v3 It runs a raidz2 pool with sharenfs property on selected vdevs. The OS is 15.0-ALPHA2 stable/15-n280129-deaa609d065d GENERIC amd64 1500064 1500064 It was source built on 14th Sept. The nfs client is on a rpi2b+ running 16:current of 3rd Nov. Does the uarch issue affect nfsv3 as well? Here's nfsstat -m. Thank you for looking at this # nfsstat -m 192.168.1.102:/data/src-rpi2b on /usr/src nfsv3,tcp,resvport,nconnect=1,hard,cto,lockd,sec=sys,acdirmin=3,acdirmax=60,acregmin=5,acregmax=60,nametimeo=60,negnametimeo=60,rsize=65536,wsize=65536,readdirsize=65536,readahead=16,wcommitsize=4194304,timeout=120,retrans=2 192.168.1.102:/data/ccache-rpi2b on /var/cache/ccache nfsv3,tcp,resvport,nconnect=1,hard,cto,lockd,sec=sys,acdirmin=3,acdirmax=60,acregmin=5,acregmax=60,nametimeo=60,negnametimeo=60,rsize=65536,wsize=65536,readdirsize=65536,readahead=16,wcommitsize=4194304,timeout=120,retrans=2 192.168.1.102:/data/ports-rpi2b on /usr/ports nfsv3,tcp,resvport,nconnect=1,hard,cto,lockd,sec=sys,acdirmin=3,acdirmax=60,acregmin=5,acregmax=60,nametimeo=60,negnametimeo=60,rsize=65536,wsize=65536,readdirsize=65536,readahead=16,wcommitsize=4194304,timeout=120,retrans=2 192.168.1.102:/data/obj-rpi2b on /usr/obj nfsv3,tcp,resvport,nconnect=1,hard,cto,lockd,sec=sys,acdirmin=3,acdirmax=60,acregmin=5,acregmax=60,nametimeo=60,negnametimeo=60,rsize=65536,wsize=65536,readdirsize=65536,readahead=16,wcommitsize=4194304,timeout=120,retrans=2
(In reply to void from comment #24) No, I do not believe that the commit would have affected NFSv3 behaviour. As far as I can see, NFSv3 would have always reported EOPNOTSUPP when a chflags() of uarch was done.
(In reply to Rick Macklem from comment #25) I'll bring the nfs server uptodate with latest stable/15 anyway as 15-alpha is, well, alpha ;) and report back
(In reply to Rick Macklem from comment #23) All NFS mounts go via the automounter. I do not know whether this uses NFSv4. All machines (NFS server and client) run the exact same FreeBSD build. -- Martin
(In reply to Rick Macklem from comment #22) nfs server is now at 15.0-STABLE stable/15-n281149-ff9dbbc2c6f4 GENERIC amd64 1500501 1500501 the problem is still happening with buildworld when obj is nfs mounted
(In reply to void from comment #28) The patch for the Archive attribute is not in releng/15.0 (it missed the feature deadline) and went into stable/15 on Nov. 4. (I have no idea how to change one of those nXXX numbers into a date.) Also, the NFS client still does an NFSv3 mount unless there is an explicit "nfsv4" or "vers=4" option on the mount. (I've never changed the default, since I've assumed that would be a POLA violation.) Having said the above, it sounds like the fix is adding "cp -N" so that it doesn't get upset when it cannot copy the chflags.
(In reply to Rick Macklem from comment #29) The stable/15 was built from sources downloaded via git in the early hrs UTC. How/where would cp -N be added?
(In reply to void from comment #30) oops... that's the early hrs UTC today 2025-11-12. Around 0230
(In reply to void from comment #31) And you were using a NFSv4.2 mount? (You can check by doing "nfsstat -m" on the client.)
(In reply to Rick Macklem from comment #32) No, nfsv3. See comment #24
I made an nfs server on OpenBSD 7.8 arm64 and served freebsd obj from there which sidesteps the problem for the time being.
Is it normal for the build process to set uarch in for example, here in armv7 # ls -lo /usr/obj/usr/src/arm.armv7/tmp/legacy/bin/awk -r-xr-xr-x 1 root wheel uarch 218196 Nov 3 09:20 /usr/obj/usr/src/arm.armv7/tmp/legacy/bin/awk it doesn't happen in amd64: % ls -lo /usr/obj/usr/src/amd64.amd64/tmp/legacy/bin/awk -r-xr-xr-x 1 root wheel - 233184 Nov 3 05:35 /usr/obj/usr/src/amd64.amd64/tmp/legacy/bin/awk same 16:current no nfs involved
What filesystems are each of those systems using?
(In reply to void from comment #34) OK the freebsd nfs server thing is a red herring The problem also happens with openbsd 7.8 nfs server. Specifically: 1. freebsd 16:current armv7 rpi2+ nfs client nfs mounted /usr/obj 2. any nfs v3 server problem does not happen if: building world with amd64 with nfs mounted obj building kernel (either armv7 or amd64) with nfs mounted obj, chflags -R 0 /usr/src or /usr/obj doesn't fix it make cleanworld && make cleandir && make clean doesn't fix it neither does deleting /usr/obj and starting again
(In reply to Jessica Clarke from comment #36) the rpi2 uses ufs2 on mmcsd & nfs mounted from a freebsd box and another nfs mounted from an openbsd machine (the latter at the moment due to testing)
Basically, buildworld fails for armv7 with this "cp: chflags: /usr/obj/usr/src/arm.armv7/tmp/legacy/bin/basename: Operation not supported" error if /usr/obj is nfs mounted and it doesn't matter if the nfs server is freebsd or openbsd. buildkernel does not fail. amd64 does not fail *at all*. It seems specific to armv7.
(In reply to Jessica Clarke from comment #36) both of the clients are using ufs2 for /usr on the rpi2 i'm trying to use nfs for /usr/obj for space and speed reasons for testing I used amd64 and made /usr/obj on nfs and everything worked the openbsd nfs server uses ffs2 the freebsd nfs server serves nfs as z zfs sharenfs property all are nfs v3
Our cp has had -N for a while now, but that doesn't really help. The cp command used to copy bootstrap tools is the host cp. Replacing `cp -pf` with `cp -Npf` will fix building with /usr/obj on NFS on recent FreeBSD, but break building on older FreeBSD or non-FreeBSD hosts (except recent macOS) whose cp command does not have a -N option. A better solution is probably to replace `cp -pf` with `install -p`. We already assume that the host has a working install command. https://reviews.freebsd.org/D53751
(In reply to Dag-Erling Smørgrav from comment #41) Applied your patch (armv7) Seems to be buildworlding :D well, it's no longer failing in the same place it'll be quite a while until it completes though thank you!
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=f3cf4c0af5af6ce95065a615f998117ec1cd63aa commit f3cf4c0af5af6ce95065a615f998117ec1cd63aa Author: Dag-Erling Smørgrav <des@FreeBSD.org> AuthorDate: 2025-11-15 03:18:35 +0000 Commit: Dag-Erling Smørgrav <des@FreeBSD.org> CommitDate: 2025-11-15 03:18:35 +0000 Use install instead of cp to copy bootstrap tools We need to preserve modification times on bootstrap tools, but `cp -p` also tries to preserve flags, which fails if OBJROOT is on NFS. A -N option was added to cp for this purpose, but trying to use that would break cross-building on hosts that don't have that option. The best remaining option is `install -p`, which we already assume is present. PR: 275030 Reviewed by: imp, emaste Differential Revision: https://reviews.freebsd.org/D53751 Makefile.inc1 | 2 +- tools/build/Makefile | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
A commit in branch stable/15 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=c67643712dc6ccd02ea7807729fc53288db47a87 commit c67643712dc6ccd02ea7807729fc53288db47a87 Author: Dag-Erling Smørgrav <des@FreeBSD.org> AuthorDate: 2025-11-15 03:18:35 +0000 Commit: Dag-Erling Smørgrav <des@FreeBSD.org> CommitDate: 2025-11-20 11:22:33 +0000 Use install instead of cp to copy bootstrap tools We need to preserve modification times on bootstrap tools, but `cp -p` also tries to preserve flags, which fails if OBJROOT is on NFS. A -N option was added to cp for this purpose, but trying to use that would break cross-building on hosts that don't have that option. The best remaining option is `install -p`, which we already assume is present. PR: 275030 Reviewed by: imp, emaste Differential Revision: https://reviews.freebsd.org/D53751 (cherry picked from commit f3cf4c0af5af6ce95065a615f998117ec1cd63aa) Makefile.inc1 | 2 +- tools/build/Makefile | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
A commit in branch stable/14 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=fcca6e11694dbfc24abed7b484fe8cc8fc9affa1 commit fcca6e11694dbfc24abed7b484fe8cc8fc9affa1 Author: Dag-Erling Smørgrav <des@FreeBSD.org> AuthorDate: 2025-11-15 03:18:35 +0000 Commit: Dag-Erling Smørgrav <des@FreeBSD.org> CommitDate: 2025-11-20 11:25:15 +0000 Use install instead of cp to copy bootstrap tools We need to preserve modification times on bootstrap tools, but `cp -p` also tries to preserve flags, which fails if OBJROOT is on NFS. A -N option was added to cp for this purpose, but trying to use that would break cross-building on hosts that don't have that option. The best remaining option is `install -p`, which we already assume is present. PR: 275030 Reviewed by: imp, emaste Differential Revision: https://reviews.freebsd.org/D53751 (cherry picked from commit f3cf4c0af5af6ce95065a615f998117ec1cd63aa) Makefile.inc1 | 2 +- tools/build/Makefile | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)