From time to time I see messages such as this one when running "make -C /usr/ports fetchindex": /usr/ports/INDEX-11.bz2 31% of 2093 kB 396 kBps 00m02s fetch: /usr/ports/INDEX-11.bz2 appears to be truncated: 673899/2143836 bytes *** Error code 1 I thought the INDEX files were updated using a temporary file which is later renamed. Or maybe the scheduling is wrong so that two or more instances interfere with each other. Also, I was unsure of which category to pick. Individual Port? Nah. Package Infrastructure? Maybe. Ports Framework? Maybe.
This is on www.freebsd.org, outside of our perview.
I can confirm the bug. I see it sometimes for the documentation project during the build of the ports HTML pages: doc/en_US.ISO8859-1/htdocs/ports$ make /usr/bin/fetch -am -o INDEX.bz2 https://www.FreeBSD.org/ports/INDEX-11.bz2 INDEX.bz2 34% of 2156 kB 1009 kBps 00m01s fetch: INDEX.bz2 appears to be truncated: 766037/2208356 bytes *** Error code 1
We are using fetch -a -a, --retry Automatically retry the transfer upon soft failures. so it is unlikely a network issue between the client and the server. I guess this is an issue with the front-end varnish daemon on www.freebsd.org. Under some conditions (low memory, high memory pressure) the server returns truncated results. Sad.
Running "make -C /usr/ports fetchindex" repeatly shows fetch(1) coming closer and closer to the actual file length. That led me to conclude that the INDEX file is updated in-place. Maybe that's simply false. Python flavors and DEFAULT_VERSIONS=blah-blah-blah and EMACS_PORT_NAME=emacs-nox11, both in /etc/make.conf, has forced me to run "make -C /usr/ports index" instead.
You can see the actual file size, last modification, age of the object in the cache with: $ lynx -head -source https://www.FreeBSD.org/ports/INDEX-12.bz2 HTTP/1.1 200 OK Date: Fri, 05 Jan 2018 15:32:27 GMT Content-Type: application/octet-stream Content-Length: 2209757 Connection: close Server: nginx/1.12.2 Last-Modified: Fri, 05 Jan 2018 15:14:13 GMT ETag: "5a4f9645-21b7dd" X-Varnish: 216949192 233144374 Age: 74 Via: 1.1 wfe0.ysv.FreeBSD.org X-Cache: HIT I'm pretty sure that an update of the INDEX files is atomar. You will get an old or a new file, but never something in between, or an growing file while downloading.
(In reply to Wolfram Schneider from comment #5) Should we close this PR, or should it remain open?
(In reply to Trond.Endrestol from comment #6) The PR should be remain open until we are sure it is fixed.
(In reply to Wolfram Schneider from comment #3) > We are using fetch -a > > -a, --retry > Automatically retry the transfer upon soft failures. > > so it is unlikely a network issue between the client and the server. > > I guess this is an issue with the front-end varnish daemon on > www.freebsd.org. Under some conditions (low memory, high memory pressure) > the server returns truncated results. Sad. I've run into trouble with this on the freebsd site in other areas (man page archives, for example). I think unless they either fix varnish, or freebsd uses something else; this will *always* be a problem. Till then, this pr(1) is likely to grow quite large. --Chris
(In reply to Chris Hutchinson from comment #8) FWIW, I remember this bug exists for at least 3 years, and is not related to a specific varnish or FreeBSD version. I don't believe we are the only one who have this problem.
As far as I understand varnish: the front-end has a fast connection to the backend. The backend delivers the data fast (>1GBit/s) Varnish needs to allocate memory to cache the data from the backend. If the malloc() call fails, varnish may return truncated results. If we know the exact file size in advance, then your client will report an error message due the mismatch between the content-length and the received bytes. For dynamic content, you will get rotten results - that's horrible. I do not understand why varnish fails to cache a 2MB "big" file. The machine has several GB RAM reserved for varnish. I would expect that in case of a malloc() failure varnish fails back to streaming, and trash the dirty cache. Or break the TCP connection, and do a reset so the client get an error message on the lower level and can handle it (retry).
I quite agree, that varnish is probably not well suited for dynamic content. In the context of this pr(1); it appears that varnish lacks effective check summing. IOW it isn't comparing the file(s) after creation. So it blindly serves its cached version w/o first having determined that the version it created was, in fact correct -- EBADFD? This is what it looks like from the outside in, anyway. --Chris
Will this get fixed? It happens all the time now. Maybe something in the handbook/man ports stating that it doesn't work?
(In reply to tech-lists from comment #12) Well, we should fix the mess, not document it.
(In reply to tech-lists from comment #12) If this isn't going to be repaired, people that track the ports with svn should be told in documentation that they can build their own indexes with, for exmple on a FreeBSD-11 machine: cd /usr/ports /usr/local/bin/svn update /usr/bin/make index (i.e., to build INDEX-11) /usr/bin/make index INDEXFILE=INDEX-10 /usr/bin/make index INDEXFILE=INDEX-12 This is substantially more time consuming than just fetching the index, but will get you a reliable index file every time. It's unfortunate that this is necessary because the building of the index has already been done elsewhere, but since a good index file isn't reliably available I've been building indexes like this for a long time.
I've mitigated the issue with a simple wrapper: until make -C /usr/ports fetchindex; do sleep 0.1; done It's not perfect but it works.
To be honest I don't understand how this can still be a thing. It affects everybody. Where is this implemented and can we take a look at it?
This sounds like it might be something related to the wfe* front-end machines. I don't know how those work but I can take a look.
Maybe the INDEX file could be added to a new git repository in order to not clutter the commit history of ports. Then the 'fetchindex' target could be adjusted.
(In reply to Philip Paeps from comment #17) Did you figure out what the problem was?
I did not. Thanks for the nudge! This slipped off my radar.
I still can reproduce the error $ for i in $(seq 0 999);do printf "."; /usr/bin/fetch -q -a -o INDEX.bz2 https://www.FreeBSD.org/ports/INDEX-12.bz2;done ...................................................................................................................................................................................................................................................................................................................................................fetch: INDEX.bz2 appears to be truncated: 2157442/2383323 bytes .......................................................................................................fetch: INDEX.bz2 appears to be truncated: 839819/2383323 bytes .fetch: INDEX.bz2 appears to be truncated: 1758337/2383323 bytes ..................................................................................................................................................................................................................................................................................................................................................................................................fetch: INDEX.bz2 appears to be truncated: 1571602/2383323 bytes ......................................fetch: INDEX.bz2 appears to be truncated: 2118718/2383323 bytes ................................................................... [...]
In case anyone is interested, I am mirroring the INDEX files for major versions 11, 12, 13, 14 and update them every 10 minutes. Feel free to use by adding the following line to /etc/make.conf: MASTER_SITE_INDEX=http://snow.critical.ch/fbsd-ports-index/
So this keeps coming up. I'll try to find out where is file is generated. Assign it to me so it goes a little higher on my list.
Not sure if it helps... but apparently the size I get on a quick succession of retries increases each time, and fetch restarts from scratch. (where, by contrast, wget -c would pick up where it left off). The final fetch was successful. Very wild guessing and stupid question: Is this a live view of a cache or reverse proxy or load balancer slowly downloading a file and serving the part of it that it already has, without atomic replacement of a prior copy? /usr/ports/INDEX-13.bz2 30% of 2353 kB 1215 kBps 02s fetch: /usr/ports/INDEX-13.bz2 appears to be truncated: 746972/2409975 bytes *** Error code 1 Stop. make: stopped in /usr/ports /usr/bin/env fetch -am -o /usr/ports/INDEX-13.bz2 https://www.FreeBSD.org/ports/INDEX-13.bz2 /usr/ports/INDEX-13.bz2 49% of 2353 kB 1583 kBps 01s fetch: /usr/ports/INDEX-13.bz2 appears to be truncated: 1181373/2409975 bytes *** Error code 1 Stop. make: stopped in /usr/ports /usr/bin/env fetch -am -o /usr/ports/INDEX-13.bz2 https://www.FreeBSD.org/ports/INDEX-13.bz2 /usr/ports/INDEX-13.bz2 62% of 2353 kB 1838 kBps 00s fetch: /usr/ports/INDEX-13.bz2 appears to be truncated: 1503200/2409975 bytes *** Error code 1 Stop. make: stopped in /usr/ports /usr/bin/env fetch -am -o /usr/ports/INDEX-13.bz2 https://www.FreeBSD.org/ports/INDEX-13.bz2 /usr/ports/INDEX-13.bz2 76% of 2353 kB 2130 kBps 01s fetch: /usr/ports/INDEX-13.bz2 appears to be truncated: 1851401/2409975 bytes *** Error code 1 Stop. make: stopped in /usr/ports /usr/bin/env fetch -am -o /usr/ports/INDEX-13.bz2 https://www.FreeBSD.org/ports/INDEX-13.bz2 /usr/ports/INDEX-13.bz2 90% of 2353 kB 2418 kBps 01s fetch: /usr/ports/INDEX-13.bz2 appears to be truncated: 2179716/2409975 bytes *** Error code 1 Stop. make: stopped in /usr/ports /usr/bin/env fetch -am -o /usr/ports/INDEX-13.bz2 https://www.FreeBSD.org/ports/INDEX-13.bz2 /usr/ports/INDEX-13.bz2 2353 kB 2482 kBps 01s
It does look like an atomicity problem, but I'm not sure where it occurs. I suspect this rsync on bit0: nice lockf -s -t 0 /home/bitmir/.index.lock \ /usr/local/bin/rsync -vaH --delete --time-limit=9 \ rsync://portsindexbuild.nyi.FreeBSD.org/ports-index/ \ /home/bit/ports-index/ >> /var/log/sync-ports-index.log 2>&1 Investigating ...
(In reply to Philip Paeps from comment #25) Suggestion: Use rsync's --bwlimit=RATE flag to make the rsync take longer so you have more time to see what's happening?
clusteradm made several improvements to the web frontend infrastructure yesterday. We believe that these finally fixed this problem. I'll leave this bug open for a couple more days to let others confirm (or deny) that this is the case.
(In reply to Philip Paeps from comment #27) I can confirm it's fixed, the last three days without any issue. Thank you!
I'll mark this as fixed now. Sorry this took so long to resolve.
The last time I saw this happen was on October 23rd; thanks for the fix!