Bug 271287 - etcupdate silently installs empty files / unhandled error in install_new()
Summary: etcupdate silently installs empty files / unhandled error in install_new()
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: 13.2-RELEASE
Hardware: Any Any
: --- Affects Many People
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-05-06 20:23 UTC by Miroslav Lachman
Modified: 2023-12-21 19:51 UTC (History)
7 users (show)

See Also:


Attachments
patch for main branch (462 bytes, patch)
2023-05-07 17:40 UTC, Miroslav Lachman
no flags Details | Diff
patch for stable/13 branch (462 bytes, patch)
2023-05-07 17:41 UTC, Miroslav Lachman
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Miroslav Lachman 2023-05-06 20:23:31 UTC
I described it in mailinglist with more verbose details
https://lists.freebsd.org/archives/freebsd-stable/2023-April/001271.html
https://lists.freebsd.org/archives/freebsd-stable/2023-May/001289.html

In some corner cases etcupdate installs 80+ empty files in to /etc/ which makes the system unbootable. All files that should be just automatically updated were installed empty. Vital files like /etc/login.conf, scripts in /etc/rc.d/ and so on.

etcupdate contains function install_new() which uses "cp -Rp" to copy files to /etc/ but it does not check if copy failed or not and silently continue. This leads to unbootable system in some corner cases. The function clearly does not do what it should according to comment above the function. It always returns 0.

# Install the "new" version of a file.  Returns true if it succeeds
# and false otherwise.
#
# $1 - pathname of the file to install (relative to DESTDIR)
install_new()
{

    if ! install_dirs $NEWTREE "$DESTDIR" $1; then
        return 1
    fi
    log "cp -Rp ${NEWTREE}$1 ${DESTDIR}$1"
    if [ -z "$dryrun" ]; then
        cp -Rp ${NEWTREE}$1 ${DESTDIR}$1 >&3 2>&1
    fi
    post_install_file $1
    return 0
} 


In my case the problem of failing cp -Rp was caused by running etcupdate on upgraded system from 12.3 to 13.2 without reboot so every call of cp -Rp failed with "Function not implemented", but it is not user visible error. etcupdate shows nothing wrong. Those messages are logged in to /var/db/etcupdate/log but if you run "etcupdate" and because it shows success then "etcupdate resolve" and "etcupdate status" then the log file does not contain anything useful because it is overwritten by each new run of etcupdate.
I propose appending to log file instead of overwriting it. Or use some kind of rotation of at least 5 history files. It was really hard to find what / where is going wrong after the system cannot boot.
Comment 1 Miroslav Lachman 2023-05-07 17:40:36 UTC
Created attachment 242040 [details]
patch for main branch

Check for error of cp -Rp and exit on failure
Comment 2 Miroslav Lachman 2023-05-07 17:41:57 UTC
Created attachment 242041 [details]
patch for stable/13 branch

Check for error of cp -Rp and exit on failure
Only line numbers differs.
Comment 3 Henrich Hartzer 2023-05-08 17:26:04 UTC
Nice work tracking this down! It looks good to me, personally. I do like to use `set -e` when I can, but that may or may not be relevant here.
Comment 4 John Baldwin freebsd_committer freebsd_triage 2023-05-08 17:48:16 UTC
To be clear, in this case cp was failing after it had opened the destination with O_TRUNC which is why the files were truncated and the goal of the panic in this case is that if cp(1) is busted it's best to bail out early after only possibly trashing one file than many?
Comment 5 Miroslav Lachman 2023-05-08 17:56:24 UTC
(In reply to John Baldwin from comment #4)
Yes. Maybe use better wording in the panic message to tell the user "This file is broken now, you must fix it manually" too. 
Thinking about it again, this kind of error message fits this specific case only but the cp can fail for many other reasons - /etc/ can be read only - and in this case the target file will be intact but user still need to know that cp failed.
The biggest problem now is that etcupdate do not show any errors from the cp. I was not noticed about anything and ended up with more than 80 files broken.
If etcupdate stop on the first copy error, tell the user "Houston, we have a problem", then user can take some actions to fix it before reboot.