I described it in mailinglist with more verbose details https://lists.freebsd.org/archives/freebsd-stable/2023-April/001271.html https://lists.freebsd.org/archives/freebsd-stable/2023-May/001289.html In some corner cases etcupdate installs 80+ empty files in to /etc/ which makes the system unbootable. All files that should be just automatically updated were installed empty. Vital files like /etc/login.conf, scripts in /etc/rc.d/ and so on. etcupdate contains function install_new() which uses "cp -Rp" to copy files to /etc/ but it does not check if copy failed or not and silently continue. This leads to unbootable system in some corner cases. The function clearly does not do what it should according to comment above the function. It always returns 0. # Install the "new" version of a file. Returns true if it succeeds # and false otherwise. # # $1 - pathname of the file to install (relative to DESTDIR) install_new() { if ! install_dirs $NEWTREE "$DESTDIR" $1; then return 1 fi log "cp -Rp ${NEWTREE}$1 ${DESTDIR}$1" if [ -z "$dryrun" ]; then cp -Rp ${NEWTREE}$1 ${DESTDIR}$1 >&3 2>&1 fi post_install_file $1 return 0 } In my case the problem of failing cp -Rp was caused by running etcupdate on upgraded system from 12.3 to 13.2 without reboot so every call of cp -Rp failed with "Function not implemented", but it is not user visible error. etcupdate shows nothing wrong. Those messages are logged in to /var/db/etcupdate/log but if you run "etcupdate" and because it shows success then "etcupdate resolve" and "etcupdate status" then the log file does not contain anything useful because it is overwritten by each new run of etcupdate. I propose appending to log file instead of overwriting it. Or use some kind of rotation of at least 5 history files. It was really hard to find what / where is going wrong after the system cannot boot.
Created attachment 242040 [details] patch for main branch Check for error of cp -Rp and exit on failure
Created attachment 242041 [details] patch for stable/13 branch Check for error of cp -Rp and exit on failure Only line numbers differs.
Nice work tracking this down! It looks good to me, personally. I do like to use `set -e` when I can, but that may or may not be relevant here.
To be clear, in this case cp was failing after it had opened the destination with O_TRUNC which is why the files were truncated and the goal of the panic in this case is that if cp(1) is busted it's best to bail out early after only possibly trashing one file than many?
(In reply to John Baldwin from comment #4) Yes. Maybe use better wording in the panic message to tell the user "This file is broken now, you must fix it manually" too. Thinking about it again, this kind of error message fits this specific case only but the cp can fail for many other reasons - /etc/ can be read only - and in this case the target file will be intact but user still need to know that cp failed. The biggest problem now is that etcupdate do not show any errors from the cp. I was not noticed about anything and ended up with more than 80 files broken. If etcupdate stop on the first copy error, tell the user "Houston, we have a problem", then user can take some actions to fix it before reboot.