Bug 117277 - [patch] fetch(1): fetch's resume mode doesn't verify that it actually got partial content
Summary: [patch] fetch(1): fetch's resume mode doesn't verify that it actually got par...
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: unspecified
Hardware: Any Any
: Normal Affects Only Me
Assignee: Dag-Erling Smørgrav
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-10-17 18:50 UTC by Fabian Keil
Modified: 2012-11-05 13:03 UTC (History)
0 users

See Also:


Attachments
file.diff (588 bytes, patch)
2007-10-17 18:50 UTC, Fabian Keil
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Fabian Keil 2007-10-17 18:50:01 UTC
When resuming a file over HTTP, fetch(1) treats complete retransmits
like partial content and appends the whole response to the
already existing part that was fetched earlier.

The problem shows if the server doesn't support Range requests
or if the proxy strips out the Range header to prevent range
mismatches (which could happen if the proxy modified the first
response):

fk@TP51 /tank/fetch-tests $fetch -r http://10.0.0.1/BrooksDavis.EuroBSDCon.2007.avi
BrooksDavis.EuroBSDCon.2007.avi                30% of   92 MB 8440 kBps^C
fetch: transfer interrupted

fk@TP51 /tank/fetch-tests $fetch -r http://10.0.0.1/BrooksDavis.EuroBSDCon.2007.avi
BrooksDavis.EuroBSDCon.2007.avi               100% of   92 MB   17 MBps
fk@TP51 /tank/fetch-tests $ls -lh
total 123649
-rw-r--r--  1 fk  wheel   121M Sep 25 21:01 BrooksDavis.EuroBSDCon.2007.avi

Fix: While fetch(3) doesn't pass the HTTP status code to fetch(1),
the problem can be solved by checking the content offset.

If the offset is zero, the response contains the whole file
and the already existing part of it has to be overwritten.

With the attached patch it works as expected:

fk@TP51 /tank/fetch-tests $rm BrooksDavis.EuroBSDCon.2007.avi 
fk@TP51 /tank/fetch-tests $fetch -r http://10.0.0.1/BrooksDavis.EuroBSDCon.2007.avi
BrooksDavis.EuroBSDCon.2007.avi                58% of   92 MB   15 MBps^C
fetch: transfer interrupted

fk@TP51 /tank/fetch-tests $fetch -r http://10.0.0.1/BrooksDavis.EuroBSDCon.2007.avi
BrooksDavis.EuroBSDCon.2007.avi               100% of   92 MB   13 MBps
fk@TP51 /tank/fetch-tests $ls -lh
total 95081
-rw-r--r--  1 fk  wheel    93M Sep 25 21:01 BrooksDavis.EuroBSDCon.2007.avi


Patch attached with submission follows:
How-To-Repeat: Resume a file from a HTTP server that doesn't support
Range requests or with a proxy that removes the Range
header.
Comment 1 Xin LI freebsd_committer 2007-10-17 20:17:03 UTC
Responsible Changed
From-To: freebsd-bugs->des

I think Dag-Erling will be more authoritive on libfetch(3) related stuff.
Comment 2 des 2007-10-18 08:48:59 UTC
Fabian Keil <fk@fabiankeil.de> writes:
> --- .zfs/snapshot/2007-10-15/usr.bin/fetch/fetch.c	2006-11-10 23:05:41.00=
0000000 +0100
> +++ usr.bin/fetch/fetch.c	2007-10-16 14:21:20.221581714 +0200
> @@ -488,8 +488,11 @@
>  	if (o_stdout) {
>  		/* output to stdout */
>  		of =3D stdout;
> -	} else if (r_flag && sb.st_size !=3D -1) {
> -		/* resume mode, local file exists */
> +	} else if (r_flag && sb.st_size !=3D -1 && url->offset) {
> +		/*
> +		 * resume mode, local file exists and we
> +		 * actually received partial content as requested
> +		 */
>  		if (!F_flag && us.mtime && sb.st_mtime !=3D us.mtime) {
>  			/* no match! have to refetch */
>  			fclose(f);

For extra credit, the code should check that url->offset is actually in
the (0, sb.st_size) range, and lseek to url->offset before resuming the
transfer.

DES
--=20
Dag-Erling Sm=C3=B8rgrav - des@des.no
Comment 3 Fabian Keil 2007-10-18 20:14:46 UTC
Dag-Erling Sm=F8rgrav <des@des.no> wrote:

> Fabian Keil <fk@fabiankeil.de> writes:
> > --- .zfs/snapshot/2007-10-15/usr.bin/fetch/fetch.c	2006-11-10
> > 23:05:41.000000000 +0100 +++ usr.bin/fetch/fetch.c	2007-10-16
> > 14:21:20.221581714 +0200 @@ -488,8 +488,11 @@
> >  	if (o_stdout) {
> >  		/* output to stdout */
> >  		of =3D stdout;
> > -	} else if (r_flag && sb.st_size !=3D -1) {
> > -		/* resume mode, local file exists */
> > +	} else if (r_flag && sb.st_size !=3D -1 && url->offset) {
> > +		/*
> > +		 * resume mode, local file exists and we
> > +		 * actually received partial content as requested
> > +		 */
> >  		if (!F_flag && us.mtime && sb.st_mtime !=3D us.mtime) {
> >  			/* no match! have to refetch */
> >  			fclose(f);
>=20
> For extra credit, the code should check that url->offset is actually in
> the (0, sb.st_size) range, and lseek to url->offset before resuming the
> transfer.

Due to ENOTIME I have to let this unique
opportunity for extra credit pass by.

Fabian
Comment 4 Dag-Erling Smørgrav freebsd_committer 2011-05-13 08:58:58 UTC
State Changed
From-To: open->analyzed

Currently testing a patch.
Comment 5 dfilter service freebsd_committer 2011-09-15 23:50:46 UTC
Author: des
Date: Thu Sep 15 22:50:31 2011
New Revision: 225599
URL: http://svn.freebsd.org/changeset/base/225599

Log:
  When resuming an HTTP download, we failed to verify that the range
  returned by the server matched what we requested, and blindly appended
  what we received to what we already had.  This could go two ways: if the
  delivered offset was higher than expected, the local file would contain
  duplicate data, while if it was lower than expected, there would be data
  missing from the middle of the file.  Furthermore, if the transfer was
  interrupted again, each subsequent attempt would compound the error.
  Fix the first problem by restarting the transfer from scratch if there
  is a gap, and the second by explicitly seeking to the correct location
  in the local file so as to overwrite any duplicated data.
  
  PR:		bin/117277
  Approved by:	re (kib)
  MFC after:	3 weeks

Modified:
  head/usr.bin/fetch/fetch.c

Modified: head/usr.bin/fetch/fetch.c
==============================================================================
--- head/usr.bin/fetch/fetch.c	Thu Sep 15 22:14:35 2011	(r225598)
+++ head/usr.bin/fetch/fetch.c	Thu Sep 15 22:50:31 2011	(r225599)
@@ -522,6 +522,12 @@ fetch(char *URL, const char *path)
 				    "does not match remote", path);
 				goto failure_keep;
 			}
+		} else if (url->offset > sb.st_size) {
+			/* gap between what we asked for and what we got */
+			warnx("%s: gap in resume mode", URL);
+			fclose(of);
+			of = NULL;
+			/* picked up again later */
 		} else if (us.size != -1) {
 			if (us.size == sb.st_size)
 				/* nothing to do */
@@ -551,6 +557,14 @@ fetch(char *URL, const char *path)
 				fclose(of);
 				of = NULL;
 				sb = nsb;
+				/* picked up again later */
+			}
+			/* seek to where we left off */
+			if (of != NULL && fseek(of, url->offset, SEEK_SET) != 0) {
+				warn("%s: fseek()", path);
+				fclose(of);
+				of = NULL;
+				/* picked up again later */
 			}
 		}
 	} else if (m_flag && sb.st_size != -1) {
_______________________________________________
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
Comment 6 Dag-Erling Smørgrav freebsd_committer 2011-09-16 00:24:40 UTC
State Changed
From-To: analyzed->patched

Fixed in head
Comment 7 Fabian Keil 2011-09-19 10:52:05 UTC
The fix in HEAD doesn't seem to address the issue I reported.

Complete responses are still appended to the partial content
fetched previously, resulting in a broken file with duplicated
data.

The submitted patch still applies and works for me, though.

Fabian
Comment 8 des 2011-09-22 13:13:33 UTC
Fabian Keil <fk@fabiankeil.de> writes:
> Complete responses are still appended to the partial content
> fetched previously, resulting in a broken file with duplicated
> data.

I think I see the error.  Can you take the } on line 568 and move it up
to right before line 562 ("seek to where we left off") and test again?

Actually, that } should probably move up to right before line 547
("check that it didn't move under our feet"), but one thing at a time...

DES
--=20
Dag-Erling Sm=C3=B8rgrav - des@des.no
Comment 9 Fabian Keil 2011-09-24 09:40:04 UTC
Dag-Erling Smørgrav <des@des.no> wrote:

> Fabian Keil <fk@fabiankeil.de> writes:
> > Complete responses are still appended to the partial content
> > fetched previously, resulting in a broken file with duplicated
> > data.
> 
> I think I see the error.  Can you take the } on line 568 and move it up
> to right before line 562 ("seek to where we left off") and test again?


This:

--- a/usr.bin/fetch/fetch.c
+++ b/usr.bin/fetch/fetch.c
@@ -559,13 +559,13 @@ fetch(char *URL, const char *path)
                                sb = nsb;
                                /* picked up again later */
                        }
-                       /* seek to where we left off */
-                       if (of != NULL && fseek(of, url->offset, SEEK_SET) != 0) {
-                               warn("%s: fseek()", path);
-                               fclose(of);
-                               of = NULL;
-                               /* picked up again later */
-                       }
+               }
+               /* seek to where we left off */
+               if (of != NULL && fseek(of, url->offset, SEEK_SET) != 0) {
+                       warn("%s: fseek()", path);
+                       fclose(of);
+                       of = NULL;
+                       /* picked up again later */
                }

doesn't seem to work:

fk@r500 /tank/scratch $/usr/obj/usr/src/usr.bin/fetch/fetch -aRr http://10.0.0.1:81/1G-file
1G-file                                        35% of 1024 MB   40 MBps^C
fetch: transfer interrupted

fk@r500 /tank/scratch $/usr/obj/usr/src/usr.bin/fetch/fetch -aRr http://10.0.0.1:81/1G-file
1G-file                                       100% of 1024 MB   39 MBps 00m00s
fk@r500 /tank/scratch $ls -lh 1G-file
-rw-r-----  1 fk  fk   1.4G Sep 24 09:48 1G-file

I think this is because the file is opened in append mode.
Quoting fopen(3):
|     ``a''   Open for writing.  The file is created if it does not exist.  The
|             stream is positioned at the end of the file.  Subsequent writes
|             to the file will always end up at the then current end of file,
|             irrespective of any intervening fseek(3) or similar.

I guess this means the fseek() had no effect in the previous
location either, although I didn't test it.

Fabian
Comment 10 Fabian Keil 2011-09-24 12:17:09 UTC
Fabian Keil <fk@fabiankeil.de> wrote:

> Dag-Erling Smørgrav <des@des.no> wrote:
> 
> > Fabian Keil <fk@fabiankeil.de> writes:
> > > Complete responses are still appended to the partial content
> > > fetched previously, resulting in a broken file with duplicated
> > > data.
> > 
> > I think I see the error.  Can you take the } on line 568 and move it up
> > to right before line 562 ("seek to where we left off") and test again?
> 
> This:

[...]
> doesn't seem to work:


This does (with and without relocating the fseek() block first):

diff --git a/usr.bin/fetch/fetch.c b/usr.bin/fetch/fetch.c
index a165501..6c5ece5 100644
--- a/usr.bin/fetch/fetch.c
+++ b/usr.bin/fetch/fetch.c
@@ -540,7 +540,7 @@ fetch(char *URL, const char *path)
 				goto failure;
 			}
 			/* we got it, open local file */
-			if ((of = fopen(path, "a")) == NULL) {
+			if ((of = fopen(path, "r+")) == NULL) {
 				warn("%s: fopen()", path);
 				goto failure;
 			}

Fabian
Comment 11 des 2011-09-24 12:45:32 UTC
Fabian Keil <fk@fabiankeil.de> writes:
> I think this is because the file is opened in append mode.
> Quoting fopen(3):
> |     ``a''   Open for writing.  The file is created if it does not exist=
.  The
> |             stream is positioned at the end of the file.  Subsequent wr=
ites
> |             to the file will always end up at the then current end of f=
ile,
> |             irrespective of any intervening fseek(3) or similar.
>
> I guess this means the fseek() had no effect in the previous
> location either, although I didn't test it.

Doh!  I had forgotten that "a" broke fseek.  Thank you.

DES
--=20
Dag-Erling Sm=C3=B8rgrav - des@des.no
Comment 12 dfilter service freebsd_committer 2011-09-27 16:57:22 UTC
Author: des
Date: Tue Sep 27 15:57:13 2011
New Revision: 225800
URL: http://svn.freebsd.org/changeset/base/225800

Log:
  Followup to r225599: the fseek() was a no-op since the file was opened
  in append mode.  Open it in read-write mode instead.  Also move the
  fseek up one level to cover the (unlikely but not impossible) case where
  the server accepts ranges but does not send a Content-Size header.
  
  PR:		bin/117277
  MFC after:	3 weeks

Modified:
  head/usr.bin/fetch/fetch.c

Modified: head/usr.bin/fetch/fetch.c
==============================================================================
--- head/usr.bin/fetch/fetch.c	Tue Sep 27 15:08:59 2011	(r225799)
+++ head/usr.bin/fetch/fetch.c	Tue Sep 27 15:57:13 2011	(r225800)
@@ -540,7 +540,7 @@ fetch(char *URL, const char *path)
 				goto failure;
 			}
 			/* we got it, open local file */
-			if ((of = fopen(path, "a")) == NULL) {
+			if ((of = fopen(path, "r+")) == NULL) {
 				warn("%s: fopen()", path);
 				goto failure;
 			}
@@ -559,13 +559,13 @@ fetch(char *URL, const char *path)
 				sb = nsb;
 				/* picked up again later */
 			}
-			/* seek to where we left off */
-			if (of != NULL && fseek(of, url->offset, SEEK_SET) != 0) {
-				warn("%s: fseek()", path);
-				fclose(of);
-				of = NULL;
-				/* picked up again later */
-			}
+		}
+		/* seek to where we left off */
+		if (of != NULL && fseek(of, url->offset, SEEK_SET) != 0) {
+			warn("%s: fseek()", path);
+			fclose(of);
+			of = NULL;
+			/* picked up again later */
 		}
 	} else if (m_flag && sb.st_size != -1) {
 		/* mirror mode, local file exists */
_______________________________________________
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
Comment 13 dfilter service freebsd_committer 2011-10-19 13:14:24 UTC
Author: des
Date: Wed Oct 19 12:14:14 2011
New Revision: 226540
URL: http://svn.freebsd.org/changeset/base/226540

Log:
  MFH r225599,225800,225805: improve handling of resumed http transfers
  
  PR:		bin/117277

Modified:
  stable/8/usr.bin/fetch/fetch.c
Directory Properties:
  stable/8/usr.bin/fetch/   (props changed)

Modified: stable/8/usr.bin/fetch/fetch.c
==============================================================================
--- stable/8/usr.bin/fetch/fetch.c	Wed Oct 19 11:49:14 2011	(r226539)
+++ stable/8/usr.bin/fetch/fetch.c	Wed Oct 19 12:14:14 2011	(r226540)
@@ -522,6 +522,12 @@ fetch(char *URL, const char *path)
 				    "does not match remote", path);
 				goto failure_keep;
 			}
+		} else if (url->offset > sb.st_size) {
+			/* gap between what we asked for and what we got */
+			warnx("%s: gap in resume mode", URL);
+			fclose(of);
+			of = NULL;
+			/* picked up again later */
 		} else if (us.size != -1) {
 			if (us.size == sb.st_size)
 				/* nothing to do */
@@ -534,7 +540,7 @@ fetch(char *URL, const char *path)
 				goto failure;
 			}
 			/* we got it, open local file */
-			if ((of = fopen(path, "a")) == NULL) {
+			if ((of = fopen(path, "r+")) == NULL) {
 				warn("%s: fopen()", path);
 				goto failure;
 			}
@@ -551,8 +557,16 @@ fetch(char *URL, const char *path)
 				fclose(of);
 				of = NULL;
 				sb = nsb;
+				/* picked up again later */
 			}
 		}
+		/* seek to where we left off */
+		if (of != NULL && fseeko(of, url->offset, SEEK_SET) != 0) {
+			warn("%s: fseeko()", path);
+			fclose(of);
+			of = NULL;
+			/* picked up again later */
+		}
 	} else if (m_flag && sb.st_size != -1) {
 		/* mirror mode, local file exists */
 		if (sb.st_size == us.size && sb.st_mtime == us.mtime)
_______________________________________________
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
Comment 14 Dag-Erling Smørgrav freebsd_committer 2012-11-05 13:03:34 UTC
State Changed
From-To: patched->closed

fixed