Created attachment 227752 [details]
pkg log messages
I did a "pkg upgrade" on a desktop system using a pkgbase repo. When upgrading to a repo built from commit c751d067c166db71ce8bf3a323c62ac3428bd32a, a number of packages were upgraded but at some point post-install scripts started failing due to a missing cap_mkdb(1).
I don't have the pkg output anymore, though I will try upgrading another host to see if I can reproduce this. I attached a snippet of /var/log/messages containing pkg messages. The FreeBSD-libbsdxml and FreeBSD-libregex packages were uninstalled, I guess because of commits 30975efbaff0a021545e81bd9fa09d848edfaafa and dfa9131d709121b2e502a82ff66cf3e376654942. Then, FreeBSD-utilities-14.snap20210907220446 is installed, not upgraded. Later, FreeBSD-utilities-14.snap20210827204353 is uninstalled, which seems to be where the problem starts.
This sounds like a bug some of us tripped over in pkg a few weeks ago -- I don't think it ever got tracked down but it seemed to be related to an already-installed package becoming a dependency of a new package.
In any case, "package has new version installed and then old version uninstalled" is almost certainly a pkg issue, not a pkgbase issue.
Yes this is a pkg bug and again I cannot reproduce here.
Both libbsdxml and libregex conflict with the new utilities and so are removed.
I'm running pkg-devel (so 18.104.22.168) but that shouldn't change a lot.
Created attachment 227761 [details]
script to reproduce the issue
Using the attached script to try to reproduce here (packages are still building for my machines).
(In reply to Emmanuel Vadot from comment #4)
I was able to reproduce it using a variation on your script: instead of installing only FreeBSD-utilities from the old repo, install all packages, i.e., "pkg install -g 'FreeBSD-*'".
I can try digging into it later today, or if there is some debug info you'd like, I can provide it.
(In reply to Mark Johnston from comment #5)
This is with pkg 1.17.1, btw.
I managed to repro but not 100% ...
Anyway this is a bug in pkg's solver (I think) and I don't understand a thing at this.
Maybe an interesting data point: while trying to narrow things down, I modified a test script to cache the downloaded packages with -o PKG_CACHEDIR, since I blow away the fake rootdir on each attempt and have to re-download otherwise. But when both package sets are already cached locally, the problem no longer reproduces. That seems surprising, I wouldn't expect that to influence solver behaviour...
(In reply to Mark Johnston from comment #8)
... or not. It still repros with cached packages, sometimes. I have no idea why it is intermittent.
In any case, based on debug logs the problem is related to pkg_jobs_set_execute_priority(), which may split an upgrade job into separate install/delete jobs upon conflicts. Conflicts are reported in this case, as the new FreeBSD-utilities contains files from the old FreeBSD-libbsdxml etc.. There is some logic to update job priorities in this case (and presumably the problem would not occur if the new FreeBSD-utilities is installed after the old one is deleted), but I don't quite understand the code yet, or in particular this "remote"/"local" package abstraction.
I dug into this a bit more and was able to fix it locally. Basically, when splitting an upgrade into separate remove and install jobs, we assign the remove job a special type, PKG_SOLVED_UPGRADE_REMOVE, not PKG_SOLVED_DELETE. I'm not sure exactly why that's needed, as they are mostly handled identically outside of the job prioritization code, and I still don't really understand that code.
After splitting jobs and assigning priorities, we sort the job list according to the numeric priority. When two jobs have identical priority, higher precedence is given to jobs of type not equal to PKG_SOLVED_DELETE. If I change this to PKG_SOLVED_DELETE || PKG_SOLVED_UPGRADE_REMOVE, then the job list gets sorted in such a way that the problem doesn't occur.
I submitted https://github.com/freebsd/pkg/pull/1986