Bug 246066 - Portsnap continuously fails with "snapshot corrupt"
Summary: Portsnap continuously fails with "snapshot corrupt"
Status: Open
Alias: None
Product: Services
Classification: Unclassified
Component: Portsnap (show other bugs)
Version: unspecified
Hardware: Any Any
: --- Affects Only Me
Assignee: Colin Percival
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-04-30 19:54 UTC by Patrick McMunn
Modified: 2020-09-25 23:49 UTC (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Patrick McMunn 2020-04-30 19:54:53 UTC
I've run "rm -rf /var/db/portsnap/*" and "rm -rf /usr/ports" to start with a clean slate. Then I ran "portsnap fetch extract". The extraction always fails at the same point:

...
/usr/ports/devel/py-oslo.versionedobjects/
/usr/ports/devel/py-oslo.versionedobjects1/
/usr/ports/devel/py-oslo.vmware/
/usr/ports/devel/py-oslo.vmware2/
files/472400e8914a19db7ad90d65cb635af1cbf07292271ea6b2e68801de91a6186b.gz not found -- snapshot corrupt.

I've had this "snapshot corrupt" issue several times in the past few months, but usually deleting portsnap's database and the ports tree then refetching and extracting the snapshot resolves it (at least temporarily), but I have been unable to resolve it with the usual method for the past 2 or 3 days now.

I tried pinging ec2-eu-west-1.portsnap.freebsd.org, sourcefire.portsnap.freebsd.org, your-org.portsnap.freebsd.org, ec2-ap-southeast-2.portsnap.freebsd.org, ec2-ap-northeast-1.portsnap.freebsd.org, and ec2-sa-east-1.portsnap.freebsd.org to see which one was the fastest, but the your-org site was the only one that even responded (this is the one which portsnap uses automatically if I don't specify one). I don't know if this is why portsnap defaults to your-org, but I do seem to be able to use other sites if I specify them on the command line. However, snapshots from other sites still fail (users in the forums who have had the "corrupt snapshot" issue sometimes claim that using a different server resolves the issue, but it didn't work for me).
Comment 1 Patrick McMunn 2020-05-02 02:19:07 UTC
Apparently the tree was updated and the latest snapshot correct because today I deleted the portsnap database and the ports tree and ran "portsnap auto" and got a ports tree with no errors. This seems to indicate there was an upstream issue that has since been resolved (at least for this particular incident). But if it is an upstream issue, I am still concerned because of how frequently it has been occurring over the past few months.
Comment 2 Kurt Jaeger freebsd_committer 2020-05-03 19:09:21 UTC
How often did it occur in the last few month ?
Comment 3 Kurt Jaeger freebsd_committer 2020-05-03 19:10:36 UTC
And: can you check/report which portsnap site is used ?

My poudriere reports where I get it:

[00:00:00] Updating portstree "default" with portsnap...Looking up portsnap.FreeBSD.org mirrors... 6 mirrors found.
Fetching snapshot tag from ec2-eu-west-1.portsnap.freebsd.org... done.
Comment 4 Kurt Jaeger freebsd_committer 2020-05-03 19:19:40 UTC
Sorry, found your report detail. Ignore my last comment 8-}
Comment 5 Li-Wen Hsu freebsd_committer 2020-07-01 18:03:39 UTC
Feedback timeout. Please feel free to reopen this if you still see the same issue.
Comment 6 Patrick McMunn 2020-07-20 23:49:45 UTC
Reopening. I apologize for not following up in a timely manner. I must have missed the email notifications that there were any responses. The issue also cleared up for a while until a day or two ago when it occurred again. So I deleted my ports tree and Portsnap database then ran "portsnap auto". It downloaded and extracted just fine. But then I tried to update the tree this afternoon, and it happened again. The thing that baffles me is the errors about "no such file or directory" for /dev/stdout even though I can clearly see the device when I run ls. Is it possible that the snapshots themselves are fine but there is some error happening locally that causes them to not work? Of course, that's just pure conjecture on my part. Below is the output from this afternoon.

root@dl380g7:/usr/ports # portsnap auto
Looking up portsnap.FreeBSD.org mirrors... 4 mirrors found.
Fetching snapshot tag from ipv4.aws.portsnap.freebsd.org... done.
Fetching snapshot metadata... done.
Updating from Sat Jul 18 13:07:10 CDT 2020 to Mon Jul 20 17:11:43 CDT 2020.
Fetching 4 metadata patches./usr/sbin/portsnap: cannot create /dev/stdout: No such file or directory
 done.
Applying metadata patches... done.
cut: /dev/stdin: No such file or directory
Fetching 4 metadata files... done.
Fetching 173 patches. 
/usr/sbin/portsnap: cannot create /dev/stdout: No such file or directory
(0/173) 0.00%  done.                               
done.
Applying patches... 
done.pping 72bb2fb5de1924af3bc0273c48386c96b466f2d3697823a0966ccb75712cd2fb-4ae46e0984ba5f14903914d5a25d1f403550735cbe46654dc97b79f27d088867 (173 of      173 patchlist).
cut: /dev/stdin: No such file or directory
Fetching 173 new ports or files... done.
Removing old files and directories... done.
Extracting new files:
/usr/ports/MOVED
/usr/ports/Mk/Scripts/
/usr/ports/Mk/Uses/
/usr/ports/UPDATING
/usr/ports/archivers/pecl-lzf/
/usr/ports/astro/Makefile
/usr/ports/astro/py-astropy-helpers/
files/7f8bf65c9fc6fac66ac8e2bbfbe5633ac1dfde7f9df090eb4dff5d646ec2d9ae.gz not found -- snapshot corrupt.
Comment 7 Colin Percival freebsd_committer 2020-07-21 00:53:57 UTC
There's definitely something going on here beyond just portsnap issues.  Are you running inside a chroot or jail?  Can you check that /dev/ is mounted?  Which version of FreeBSD (especially the kernel) are you running?
Comment 8 Patrick McMunn 2020-07-22 01:31:27 UTC
Although I tinkered around with jails a bit and may have some cbsd stuff in /etc/rc.conf.local, I'm not running any of these commands inside a chroot ot jail. I'm just running this as root on the system.

This is the output of "mount" which shows that /dev is mounted:

zroot/ROOT/default on / (zfs, local, noatime, nfsv4acls)
devfs on /dev (devfs, local, multilabel)
linprocfs on /compat/linux/proc (linprocfs, local)
linsysfs on /compat/linux/sys (linsysfs, local)
tmpfs on /compat/linux/dev/shm (tmpfs, local)
fdescfs on /dev/fd (fdescfs)
procfs on /proc (procfs, local)
zroot/GoG on /GoG (zfs, local, noatime, nfsv4acls)

This system was originally installed from a 12.1-RELEASE USB image, and I eventually migrated to 12-STABLE which I update from source roughly once I month, so I keep it pretty current. It's currently running 12.1-STABLE r363327 (kernel and world). But these issues with Portsnap were occurring even while I was running 12.1-RELEASE. I make sure to run "mergemaster -Ui", "make delete-old", and "make delete-old-libs" when I upgrade, so there shouldn't be any stale files causing issues.
Comment 9 Colin Percival freebsd_committer 2020-07-22 01:42:19 UTC
Hi Patrick,

Can you check whether this issues goes away if you unmount /dev/fd?  Most people don't have that, and devfs provides stdin/stdout and /dev/fd/[012] already; I'm wondering if fdescfs is breaking somehow.

Colin Percival
Comment 10 Patrick McMunn 2020-07-22 02:50:57 UTC
It doesn't appear to have made a difference. I ran "portsnap auto" for the first time today. It downloaded the latest patch, emitted errors about /dev/stdout, and failed with a "snapshot corrupt" message. I then unmounted /dev/fd and reran "portsnap auto" to see if anything changed. It didn't emit any errors about /dev/stdout on the subsequent attempt, but it still failed with "snapshot corrupt". I don't think the lack of messages about /dev/stdout on the subsequent attempt has anything with /dev/fd; I think they normally only appear the first time a patch is downloaded.

As an aside, I have /dev/fd in my fstab because the installation message for OpenJDK says to put it there. Are the ports which direct users to add this to fstab incorrect and need be updated?
Comment 11 Colin Percival freebsd_committer 2020-07-22 04:36:15 UTC
Right, once portsnap gets into the state where it doesn't have the files it thinks it has, you'll get errors -- at that point the only thing you can do is get rid of /var/db/portsnap and start over.

What I was wondering was whether unmounting fdescfs prevented the errors when you were first downloading the bits.

The openjdk pkg-message is probably correct, so I'm not suggesting that you run without fdescfs in the long term -- I'm just wondering if this can be reproduced without fdescfs since knowing that will help us to pin down what's going wrong.
Comment 12 Patrick McMunn 2020-07-22 20:42:49 UTC
Well, after last night, I deleted my Portsnap database and my ports tree, and I downloaded a fresh copy. Today, after seeing your reply, I removed fdescfs from fstab and rebooted so I know I'm working with a clean slate. I ran "portsnap auto" and updated the tree with no errors. Of course this error is intermittent (I went over two and a half months before it cropped up again), so I guess only time can tell for sure.
Comment 13 Jochen Neumeister freebsd_committer 2020-07-24 09:05:46 UTC
as we' ve been added, how can port-secteam help here?
Comment 14 Christos Chatzaras 2020-07-26 13:52:04 UTC
The last weeks I notice the same issue.

I have some jails and also I have /dev mounted as I need /dev/random

/dev/mirror/gm0p2 on / (ufs, local, noatime)
devfs on /dev (devfs, local, multilabel)
/dev/mirror/gm0p7 on /home (ufs, local, noatime, nosuid, with quotas, soft-updates, acls)
/dev/mirror/gm0p8 on /home2 (ufs, local, noatime, noexec, nosuid, soft-updates)
/dev/mirror/gm0p4 on /tmp (ufs, local, noatime, nosuid, soft-updates)
/dev/mirror/gm0p6 on /usr (ufs, local, noatime, soft-updates)
/dev/mirror/gm0p5 on /var (ufs, local, noatime, soft-updates)
tmpfs on /tmpfs (tmpfs, local, nosuid)
/home/www on /home/jail/php72/home/www (nullfs, local, noatime, nosuid)
/tmp on /home/jail/php72/tmp (nullfs, local, noatime, nosuid)
/tmpfs on /home/jail/php72/tmpfs (nullfs, local, noatime, nosuid)
devfs on /home/jail/php72/dev (devfs, local, multilabel)
/home/www on /home/jail/php71/home/www (nullfs, local, noatime, nosuid)
/tmp on /home/jail/php71/tmp (nullfs, local, noatime, nosuid)
/tmpfs on /home/jail/php71/tmpfs (nullfs, local, noatime, nosuid)
devfs on /home/jail/php71/dev (devfs, local, multilabel)
/home/www on /home/jail/php56/home/www (nullfs, local, noatime, nosuid)
/tmp on /home/jail/php56/tmp (nullfs, local, noatime, nosuid)
/tmpfs on /home/jail/php56/tmpfs (nullfs, local, noatime, nosuid)
devfs on /home/jail/php56/dev (devfs, local, multilabel)
/tmp on /home/jail/redis/tmp (nullfs, local, noatime, nosuid)
devfs on /home/jail/redis/dev (devfs, local, multilabel)
Comment 15 Patrick McMunn 2020-09-25 23:49:15 UTC
I just wanted to follow up. I didn't realize it has been so long since I last posted, but it has been almost 5 months since I followed Colin's advice in comment 9 by unmounting /dev/fd, and the problem has not cropped up again.