Bug 21476

Summary: ftp in 4.1-STABLE fails on http:// URLs
Product: Base System Reporter: brett <brett>
Component: binAssignee: freebsd-bugs (Nobody) <bugs>
Status: Closed FIXED    
Severity: Affects Only Me    
Priority: Normal    
Version: 4.1-STABLE   
Hardware: Any   
OS: Any   

Description brett 2000-09-22 16:45:10 UTC
>Number:         21476
>Category:       bin
>Synopsis:       ftp in 4.1-STABLE fails on http:// URLs
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          closed
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Fri Sep 22 08:50:01 PDT 2000
>Closed-Date:
>Last-Modified:
>Originator:     Brett Glass
>Release:        4.1-STABLE of 9/16/2000
>Organization:
>Environment:
>Description:
ftp utility fails to retrieve some (not all!) files via HTTP. 
A 404 error is reported. A memory error is also reported
after the failure.
>How-To-Repeat:
%ftp http://www.ben-tech.com/projects/noattach.tar.gz
Requesting http://www.ben-tech.com/projects/noattach.tar.gz
ftp: Error retrieving file: 404 Not Found

ftp in free(): warning: chunk is already free.
>Fix:
Haven't investigated a fix yet.

>Release-Note:
>Audit-Trail:
>Unformatted:


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-bugs" in the body of the message
Comment 1 brett 2000-09-22 16:50:01 UTC
ftp utility fails to retrieve some (not all!) files via HTTP. 
A 404 error is reported. A memory error is also reported
after the failure.

Fix: 

Haven't investigated a fix yet.
How-To-Repeat: %ftp http://www.ben-tech.com/projects/noattach.tar.gz
Requesting http://www.ben-tech.com/projects/noattach.tar.gz
ftp: Error retrieving file: 404 Not Found

ftp in free(): warning: chunk is already free.
Comment 2 Ruslan Ermilov 2000-09-25 12:42:39 UTC
On Fri, Sep 22, 2000 at 08:45:10AM -0700, brett@lariat.org wrote:
> 
> ftp utility fails to retrieve some (not all!) files via HTTP. 
> A 404 error is reported. A memory error is also reported
> after the failure.

> %ftp http://www.ben-tech.com/projects/noattach.tar.gz
> Requesting http://www.ben-tech.com/projects/noattach.tar.gz
> ftp: Error retrieving file: 404 Not Found
> 
The server www.ben-tech.com is violating RFC1945 by requiring that
the Host: header be present in any HTTP/1.0 request.  Compare:

$ telnet www.ben-tech.com 80
Trying 204.249.185.211...
Connected to www.ben-tech.com.
Escape character is '^]'.
HEAD /projects/noattach.tar.gz HTTP/1.0

HTTP/1.1 404 Not Found
Date: Mon, 25 Sep 2000 11:03:01 GMT
Server: Apache/1.3.12 Ben-SSL/1.39 (Unix)
Connection: close
Content-Type: text/html

Connection closed by foreign host.

... and ...

$ telnet www.ben-tech.com 80
Trying 204.249.185.211...
Connected to www.ben-tech.com.
Escape character is '^]'.
HEAD /projects/noattach.tar.gz HTTP/1.0
Host: www.ben-tech.com

HTTP/1.1 200 OK
Date: Mon, 25 Sep 2000 11:30:21 GMT
Server: Apache/1.3.12 Ben-SSL/1.39 (Unix)
Last-Modified: Fri, 22 Sep 2000 15:10:25 GMT
ETag: "3e003-665-39cb7661"
Accept-Ranges: bytes
Content-Length: 1637
Connection: close
Content-Type: application/x-tar
Content-Encoding: x-gzip

Connection closed by foreign host.


> ftp in free(): warning: chunk is already free.
> 
This one I have fixed in src/usr.bin/ftp/fetch.c,v 1.15.

-- 
Ruslan Ermilov		Oracle Developer/DBA,
ru@sunbay.com		Sunbay Software AG,
ru@FreeBSD.org		FreeBSD committer,
+380.652.512.251	Simferopol, Ukraine

http://www.FreeBSD.org	The Power To Serve
http://www.oracle.com	Enabling The Information Age
Comment 3 Ruslan Ermilov 2000-09-25 14:58:07 UTC
On Mon, Sep 25, 2000 at 03:14:15PM +0200, Dag-Erling Smorgrav wrote:
> Ruslan Ermilov <ru@sunbay.com> writes:
> >  The server www.ben-tech.com is violating RFC1945 by requiring that
> >  the Host: header be present in any HTTP/1.0 request.  Compare:
> 
> RFC1945 was never a standard. The closest thing to a standard is
> RFC2616, which basically says the Host: header is required in HTTP/1.1
> requests (it's slightly more complicated than that; see sections 5.2,
> 9, 14.23 and especially 19.6.1.1).
> 
> Yes, you made an HTTP/1.0 request, but www.ben-tech.com runs Apache
> 1.3.12, which is what the RFC calls an "HTTP/1.1 origin server", and
> as such (quoting section 3.1),
> 
>                                      SHOULD use an HTTP-Version of
>    "HTTP/1.1" in their messages, and MUST do so for any message that is
>    not compatible with HTTP/1.0. For more details on when to send
>    specific HTTP-Version values, see RFC 2145 [36].
> 
> (RFC2145 is "Use and interpretation of HTTP version numbers", which
> explains what version number to use when communicating with servers or
> clients that implement a different HTTP version)
> 
> Furthermore, section 19.6.1.1 ("Changes to Simplify Multi-homed Web
> Servers and Conserve IP Addresses") states:
> 
>    The requirements that clients and servers support the Host request-
>    header, report an error if the Host request-header (section 14.23) is
>    missing from an HTTP/1.1 request, and accept absolute URIs (section
>    5.1.2) are among the most important changes defined by this
>    specification.
> 
>                                        [...] Given the rate of growth of
>    the Web, and the number of servers already deployed, it is extremely
>    important that all implementations of HTTP (including updates to
>    existing HTTP/1.0 applications) correctly implement these
>    requirements:
> 
>       - Both clients and servers MUST support the Host request-header.
> 
>       - A client that sends an HTTP/1.1 request MUST send a Host header.
> 
>       - Servers MUST report a 400 (Bad Request) error if an HTTP/1.1
>         request does not include a Host request-header.
> 
>       - Servers MUST accept absolute URIs.
> 
> In summary, this says FreeBSD's ftp(1) has been in error since June
> 1999, when RFC2616 was published.
> 
[fetch(1) propaganda removed :-)]
> 
> (Oh, and Apache was correct in this instance, but it isn't always.
> Amongst other crimes, it will insist on using chunked encoding in
> conversations with HTTP/1.0 clients, in violation of RFC2145, and it
> will in some instances choke on absolute URIs in requests, in
> violation of RFC2616)
> 
Forgive me my ignorance but from where of the above it does follow that
the HTTP/1.0 request should include the Host: header and not having it
there should be answered with a 404 reply?  For example,

$ telnet www.apache.org 80
Trying 63.211.145.10...
Connected to www.apache.org.
Escape character is '^]'.
HEAD /index.html HTTP/1.0

HTTP/1.1 200 OK
Date: Mon, 25 Sep 2000 13:52:35 GMT
Server: Apache/1.3.13-dev (Unix) tomcat/1.0
Connection: close
Content-Type: text/html

Connection closed by foreign host.


Cheers,
-- 
Ruslan Ermilov		Oracle Developer/DBA,
ru@sunbay.com		Sunbay Software AG,
ru@FreeBSD.org		FreeBSD committer,
+380.652.512.251	Simferopol, Ukraine

http://www.FreeBSD.org	The Power To Serve
http://www.oracle.com	Enabling The Information Age
Comment 4 des 2000-09-25 15:23:08 UTC
Ruslan Ermilov <ru@sunbay.com> writes:
> Forgive me my ignorance but from where of the above it does follow that
> the HTTP/1.0 request should include the Host: header and not having it
> there should be answered with a 404 reply?  For example,

Read section 19.6.1.1 of RFC2616. Pay special attention to the part
that says "including updates to existing HTTP/1.0 applications" and
the part that says "Both clients and servers MUST support the Host
request-header".

Or just forget the RFC and consider that if ftp(1) does not send a
Host: header, it simply will not work with the thousands if not
millions of web sites out there that are actually virtual hosts.

Regarding the fact that not sending a Host: header sometimes works,
and sometimes doesn't, even on servers that host a single site - this
is a consequence of Apache striving to be backwards compatible wrt the
format of its configuration file (to be more specific, there are two
different ways to configure a single-site server, and one of them
causes requests without a Host: header to fail)

Regarding what you call "fetch(1) propaganda", consider that these and
other issues that ftp(1) has trouble with have already been addresses
- a long time ago! - in libfetch, and that whoever maintains ftp(1)
will now have to solve these problems *again*, in a different code
base, for no added gain. Do one thing and do it well! Ftp(1) is very
good at interactive or scripted FTP sessions, but sucks at HTTP. Fetch
(or rather libfetch) is (or strives to be) very good at simple FTP and
HTTP requests, but does not try to support interactive or otherwise
complicated FTP sessions.

BTW, if you want a list of things libfetch's HTTP code supports but
ftp(1) doen't, here goes:

  - virtual hosts (which is vital these days!)
  - authentication (including proxy authentication)
  - redirects (including redirects to FTP documents)
  - partial fetches
  - chunked encoding

Need I go on?

DES
-- 
Dag-Erling Smorgrav - des@ofug.org
Comment 5 Ruslan Ermilov 2000-09-25 15:42:24 UTC
On Mon, Sep 25, 2000 at 04:23:08PM +0200, Dag-Erling Smorgrav wrote:
> Ruslan Ermilov <ru@sunbay.com> writes:
> > Forgive me my ignorance but from where of the above it does follow that
> > the HTTP/1.0 request should include the Host: header and not having it
> > there should be answered with a 404 reply?  For example,
> 
> Read section 19.6.1.1 of RFC2616. Pay special attention to the part
> that says "including updates to existing HTTP/1.0 applications" and
> the part that says "Both clients and servers MUST support the Host
> request-header".
> 
> Or just forget the RFC and consider that if ftp(1) does not send a
> Host: header, it simply will not work with the thousands if not
> millions of web sites out there that are actually virtual hosts.
> 
This is the case here -- the server does a virtual hosting and returns
404 because either the "default" virtual host does not have this document
or there is no "default" virtual host at all.

> Regarding the fact that not sending a Host: header sometimes works,
> and sometimes doesn't, even on servers that host a single site - this
> is a consequence of Apache striving to be backwards compatible wrt the
> format of its configuration file (to be more specific, there are two
> different ways to configure a single-site server, and one of them
> causes requests without a Host: header to fail)
> 
It is actually because RFC does not say that the HTTP/1.0 request without
a Host: header MUST be answered with a 400 error.  It may, however, be
answered with a 404 error (like in this particular case).


Cheers,
-- 
Ruslan Ermilov		Oracle Developer/DBA,
ru@sunbay.com		Sunbay Software AG,
ru@FreeBSD.org		FreeBSD committer,
+380.652.512.251	Simferopol, Ukraine

http://www.FreeBSD.org	The Power To Serve
http://www.oracle.com	Enabling The Information Age
Comment 6 des 2000-09-25 17:21:42 UTC
Ruslan Ermilov <ru@sunbay.com> writes:
> It is actually because RFC does not say that the HTTP/1.0 request without
> a Host: header MUST be answered with a 400 error.  It may, however, be
> answered with a 404 error (like in this particular case).

The bottom line being that your claim that Apache was not standards-
conforming was not correct in this case.

Regarding the PR, I'd suggest that the originator considered using
fetch(1) instead of ftp(1) for HTTP as well as for simple,
non-interactive FTP requests.

I'd also like to suggest that the URL functionality and the HTTP code
be ripped out of ftp(1), but I'd probably ble flamed to hell and back
by all the armchair generals out there, so I won't.

DES
-- 
Dag-Erling Smorgrav - des@ofug.org
Comment 7 Ruslan Ermilov 2000-09-26 08:13:23 UTC
On Mon, Sep 25, 2000 at 06:21:42PM +0200, Dag-Erling Smorgrav wrote:
> Ruslan Ermilov <ru@sunbay.com> writes:
> > It is actually because RFC does not say that the HTTP/1.0 request without
> > a Host: header MUST be answered with a 400 error.  It may, however, be
> > answered with a 404 error (like in this particular case).
> 
> The bottom line being that your claim that Apache was not standards-
> conforming was not correct in this case.
> 
Yes, I was wrong.

> Regarding the PR, I'd suggest that the originator considered using
> fetch(1) instead of ftp(1) for HTTP as well as for simple,
> non-interactive FTP requests.
> 
> I'd also like to suggest that the URL functionality and the HTTP code
> be ripped out of ftp(1), but I'd probably ble flamed to hell and back
> by all the armchair generals out there, so I won't.
> 
I am in favor of ripping this from ftp(1) though one may consider to
merge the latest NetBSD's fetch.c which does:

/*
 * Retrieve URL, via a proxy if necessary, using HTTP.
 * If proxyenv is set, use that for the proxy, otherwise try ftp_proxy or
 * http_proxy as appropriate.
 * Supports HTTP redirects.
 * Returns -1 on failure, 0 on completed xfer, 1 if ftp connection
 * is still open (e.g, ftp xfer with trailing /)
 */

And yes, it sends Host: header in requests.


-- 
Ruslan Ermilov		Oracle Developer/DBA,
ru@sunbay.com		Sunbay Software AG,
ru@FreeBSD.org		FreeBSD committer,
+380.652.512.251	Simferopol, Ukraine

http://www.FreeBSD.org	The Power To Serve
http://www.oracle.com	Enabling The Information Age
Comment 8 Joseph Mallett 2001-08-24 20:29:01 UTC
Can this PR get closed? The bootom line is that if you want good, reliable 
HTTP getting, fetch and wget are great, and ftp might not always be, and 
it shouldn't be intended to do so either. If you want an http file grabber 
we could call it 'http' or 'httpget' or 'htget' or something, or... I 
know... 'fetch'.

Or, if someone feels this functionality is really really vital to the 
FreeBSD base system, in ftp, I could probably send a patch in the next few 
days.

-- 
Joseph Mallett, jmallett@xMach.org
xMach Core Team, http://xMach.org/
Resume: http://srcsys.org/resume.txt
Comment 9 iedowse freebsd_committer freebsd_triage 2001-11-17 14:04:27 UTC
State Changed
From-To: open->feedback


Is the suggestion to use fetch or wget instead acceptable to you?
Comment 10 n_hibma 2001-11-22 09:29:30 UTC
I agree with DES. http:// and ftp:// functionality should be removed
from ftp. It is in fetch, fetch does a good job of it and it is part of
the base system. ftp obviously is buggy.
Comment 11 ashp freebsd_committer freebsd_triage 2002-02-09 18:59:43 UTC
State Changed
From-To: feedback->closed

Risk being flamed and close this PR.  Everyone agrees that 
fetch is the right solution here.