Bug 212065

Summary: fetch -r fails on a complete file, even with -S 12345 (OS version independent)
Product: Base System Reporter: Matthias Andree <mandree>
Component: binAssignee: Dag-Erling Smørgrav <des>
Status: Closed FIXED    
Severity: Affects Some People CC: cem, felixstella
Priority: --- Keywords: patch
Version: 10.3-RELEASEFlags: des: mfc-stable11+
des: mfc-stable10+
Hardware: Any   
OS: Any   
Attachments:
Description Flags
Extremely naive patch to treat all 416 errors as successes
none
Assume that a 416 with non-zero offset means EOF none

Description Matthias Andree freebsd_committer freebsd_triage 2016-08-22 21:40:12 UTC
When fetch(1) is used to resume (-r) on a completely downloaded file, it erroneously fails with exit status 1. It should instead detect that condition "file is complete and same size as on server" and exit with status 0 (= success).

Adding to the -r option also a -S 12345 option, where -S is the same size as the file locally has, and as the server thinks its file is, does NOT help.  Using "-m" is not an option if the file date isn't reported by the server.

This bug has been reported on #bsdports (IRC) against 9.3 with portsnap, and can be verified without portsnap on 10.3-RELEASE-p7. Commented verbose fetch output on a hacked "portsnap fetch" downloading the initial snapshot as of today.  First, see we have the file:

$ ls -l *.tgz
-rw-r--r--  1 root  wheel  74481909 Aug 22 23:03 80c30ef3c99969de68fad7d92a47c4e69cd6f1c8135504152d794c21305b1245.tgz

Next, ask server for the size:

$ fetch -s http://ec2-eu-west-1.portsnap.freebsd.org/s/80c30ef3c99969de68fad7d92a47c4e69cd6f1c8135504152d794c21305b1245.tgz ; echo $?
74481909
0

So, sizes match. File is complete.
Now, try to resume the transfer, verbose mode:

$ fetch -rvv http://ec2-eu-west-1.portsnap.freebsd.org/s/80c30ef3c99969de68fad7d92a47c4e69cd6f1c8135504152d794c21305b1245.tgz ; echo $?
[...]
---> ec2-eu-west-1.portsnap.freebsd.org:80
looking up ec2-eu-west-1.portsnap.freebsd.org
connecting to ec2-eu-west-1.portsnap.freebsd.org:80
requesting http://ec2-eu-west-1.portsnap.freebsd.org/s/80c30ef3c99969de68fad7d92a47c4e69cd6f1c8135504152d794c21305b1245.tgz
>>> GET /s/80c30ef3c99969de68fad7d92a47c4e69cd6f1c8135504152d794c21305b1245.tgz HTTP/1.1
>>> Host: ec2-eu-west-1.portsnap.freebsd.org
>>> Accept: */*
>>> User-Agent: fetch libfetch/2.0
>>> Range: bytes=74481909-
>>> Connection: close
>>> 
<<< HTTP/1.1 416 Requested Range Not Satisfiable
<<< Content-Type: text/html
<<< Accept-Ranges: bytes
<<< Content-Length: 389
<<< Connection: close
content length: [389]
<<< Date: Mon, 22 Aug 2016 21:07:00 GMT
<<< Server: lighttpd/1.4.33
<<< 
fetch: http://ec2-eu-west-1.portsnap.freebsd.org/s/80c30ef3c99969de68fad7d92a47c4e69cd6f1c8135504152d794c21305b1245.tgz: Requested Range Not Satisfiable
1


OOPS! See that -S doesn't help, same size as above:


$ fetch -S 74481909 -r http://ec2-eu-west-1.portsnap.freebsd.org/s/80c30ef3c99969de68fad7d92a47c4e69cd6f1c8135504152d794c21305b1245.tgz ; echo $?
fetch: http://ec2-eu-west-1.portsnap.freebsd.org/s/80c30ef3c99969de68fad7d92a47c4e69cd6f1c8135504152d794c21305b1245.tgz: Requested Range Not Satisfiable
1

NOTE: for proper robustness, fetch might want to issue a HEAD request to obtain the file size unconditionally when resuming, so as to be able to tell the condition "file shrunk on server" apart from "file has been completely downloaded".


Other tools:
* curl 7.50.1 from ports fails in a similar way, and to add insult to injury, wrongly concludes that the server did not support byte ranges when in fact it does (but cannot start a download beyond the end of a file).
* wget 1.18 also sees the 416 code from the server, but concludes "The file is already fully retrieved; nothing to do." and exits with status 0.
* lftp 4.7.3 with "get -c" does the right thing and exits silently if the file is complete.
Comment 1 Conrad Meyer freebsd_committer freebsd_triage 2017-01-19 16:39:40 UTC
libfetch does appear to recognize that 416 errors might just be empty ranges:

1806                 case HTTP_BAD_RANGE:
1807                         /*
1808                          * This can happen if we ask for 0 bytes because
1809                          * we already have the whole file.  Consider this
1810                          * a success for now, and check sizes later.
1811                          */
1812                         break;

And if the HTTP server returns a Content-Range header along with the 416, it treats that as no error:

1926                 /* requested range not satisfiable */
1927                 if (conn->err == HTTP_BAD_RANGE) {
1928                         if (url->offset == size && url->length == 0) {
1929                                 /* asked for 0 bytes; fake it */
1930                                 offset = url->offset;
1931                                 clength = -1;
1932                                 conn->err = HTTP_OK;
1933                                 break;

Otherwise, it treats it as an error:

1934                         } else {
1935                                 http_seterr(conn->err);
1936                                 goto ouch;
1937                         }

https://tools.ietf.org/html/rfc2616#section-14.16
>   The Content-Range entity-header is sent with a partial entity-body to
>   specify where in the full entity-body the partial body should be
>   applied.

>   A server sending a response with status code 416 (Requested range not
>   satisfiable) SHOULD include a Content-Range field with a byte-range-
>   resp-spec of "*". The instance-length specifies the current length of
>   the selected resource. 

The byte-range-resp-spec is the left side of the A-B/C style response of Content-Range.  So httpds should respond to zero byte range requests with 416 + Content-Range: bytes */1234, if the total file size is 1234.

It appears that ec2's web server isn't following this SHOULD recommendation.  That's allowed, it's not a MUST.  So maybe libfetch needs to be a little more tolerant of such responses.
Comment 2 Conrad Meyer freebsd_committer freebsd_triage 2017-01-19 16:43:36 UTC
Created attachment 179070 [details]
Extremely naive patch to treat all 416 errors as successes

This patch probably shouldn't be applied as is.  It just removes the Content-Range validation on 416 responses to range requests and treats them all as completed resumes.

I think your suggestion of following these missing "Content-Range" requests up with a HEAD request is more wise, but a less straightforward patch.
Comment 3 Dag-Erling Smørgrav freebsd_committer freebsd_triage 2017-01-20 15:55:50 UTC
Created attachment 179140 [details]
Assume that a 416 with non-zero offset means EOF

The solution is actually quite simple.

Since 416 is an error code, any Content-Range header in the response would refer to the error message, not the requested document, so relying on the value of size when you know you got a 416 is wrong.  And that scenario shouldn't happen anyway, so size should always be -1, and the if will always hit the else clause.  Simply checking for url->offset > 0 on line 1928 instead of comparing it to size should fix the problem.  The question is whether it introduces others; I believe the answer is no, but I'm not absolutely certain.
Comment 4 commit-hook freebsd_committer freebsd_triage 2017-03-05 12:07:21 UTC
A commit references this bug:

Author: des
Date: Sun Mar  5 12:06:46 UTC 2017
New revision: 314701
URL: https://svnweb.freebsd.org/changeset/base/314701

Log:
  Fix partial requests (used by fetch -r) when the requested file is
  already complete.

  Since 416 is an error code, any Content-Range header in the response
  would refer to the error message, not the requested document, so
  relying on the value of size when we know we got a 416 is wrong.
  Instead, just verify that offset == 0 and assume that we've reached
  the end of the document (if offset > 0, we did not request a range,
  and the server is screwing with us).  Note that we cannot distinguish
  between reaching the end and going past it, but that is a flaw in the
  protocol, not in the code, so we just have to assume that the caller
  knows what it's doing.  A smart caller would request an offset
  slightly before what it believes is the end and compare the result to
  what is already in the file.

  PR:		212065
  Reported by:	mandree
  MFC after:	3 weeks

Changes:
  head/lib/libfetch/http.c
Comment 5 commit-hook freebsd_committer freebsd_triage 2017-03-24 14:20:23 UTC
A commit references this bug:

Author: des
Date: Fri Mar 24 14:19:53 UTC 2017
New revision: 315902
URL: https://svnweb.freebsd.org/changeset/base/315902

Log:
  MFH (r313974,r314596): open .netrc early in case we want to drop privs
  MFH (r314396,r315143): fix a crash caused by an incorrect format string
  MFH (r314701): fix handling of 416 errors when requesting a range
  MFH (r315455): fix parsing of IP literals (square brackets)

  PR:		212065, 217723

Changes:
_U  stable/11/
  stable/11/lib/libfetch/common.c
  stable/11/lib/libfetch/common.h
  stable/11/lib/libfetch/fetch.c
  stable/11/lib/libfetch/fetch.h
  stable/11/lib/libfetch/http.c
Comment 6 commit-hook freebsd_committer freebsd_triage 2017-03-24 14:26:33 UTC
A commit references this bug:

Author: des
Date: Fri Mar 24 14:26:01 UTC 2017
New revision: 315904
URL: https://svnweb.freebsd.org/changeset/base/315904

Log:
  MFH (r313974,r314596): open .netrc early in case we want to drop privs
  MFH (r314396,r315143): fix a crash caused by an incorrect format string
  MFH (r314701): fix handling of 416 errors when requesting a range
  MFH (r315455): fix parsing of IP literals (square brackets)

  PR:		212065, 217723

Changes:
_U  stable/10/
  stable/10/lib/libfetch/common.c
  stable/10/lib/libfetch/common.h
  stable/10/lib/libfetch/fetch.c
  stable/10/lib/libfetch/fetch.h
  stable/10/lib/libfetch/http.c
Comment 7 FStl 2017-06-20 07:00:32 UTC
The fix that has been applied for this bug renders the -rR options meaningless in those cases where the server does not add a Content-Range header in its 416 response. In such cases, fetch re-downloads the whole file(s) which defeats the purpose of the -rR options.

Please apply a more comprehensive fix for this issue.