Bug 125350 - [libfetch] [patch] src/lib/libfetch add support for deflate and gzip encoded http downloads
Summary: [libfetch] [patch] src/lib/libfetch add support for deflate and gzip encoded ...
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: unspecified
Hardware: Any Any
: Normal Affects Only Me
Assignee: Dag-Erling Smørgrav
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-07-07 00:20 UTC by kamikaze
Modified: 2010-11-15 13:51 UTC (History)
0 users

See Also:


Attachments
file.diff (17.10 KB, patch)
2008-07-07 00:20 UTC, kamikaze
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description kamikaze 2008-07-07 00:20:00 UTC
The patch adds support for gzip and deflate compression for html downloads to libfetch.

This is work in progress. Things yet to be done in order of preference:
1. Add support for compress encoding.
2. Implement random access layer (not required/useful for libfetch).
3. Clean up http.c (a seemingly monumental task).

Fix: Patch attached with submission follows:
Comment 1 Mark Linimon freebsd_committer freebsd_triage 2008-07-07 03:53:22 UTC
Responsible Changed
From-To: freebsd-bugs->des

Over to maintainer.
Comment 2 Dag-Erling Smørgrav 2008-07-07 10:01:38 UTC
[resent]

I dislike the idea of nested funopen(); I would much prefer using a
single code path for transfer and content encoding.

openRandom(), httpDecodeRandom() are bad ideas and should not be
implemented.

Support for COMPRESS encoding is unnecessary.  I am not aware of any
HTTP server that implements it, nor of any HTTP server that does not
support DEFLATE (except for those that do not support compression at
all).

The new code needs to conform to the style of the existing code,
i.e. style(9).  This includes changing function names and removing
javadoc metadata.

Adding new .c and .h files is unnecessary, the new code can go in
common.[ch].

Cleaning up http.c is more trouble (and risk) than it's worth.  If it
works, don't fix it.

DES
--=20
Dag-Erling Sm=C3=B8rgrav - des@des.no
Comment 3 kamikaze 2008-07-07 13:17:32 UTC
Here is a newer version with some serious bugs fixed.

diff -Pur /usr/src/lib/libfetch.orig/Makefile 
/usr/src/lib/libfetch/Makefile
--- /usr/src/lib/libfetch.orig/Makefile	2008-07-07 
00:56:11.000000000 +0200
+++ /usr/src/lib/libfetch/Makefile	2008-07-07 00:56:36.000000000 +0200
@@ -4,7 +4,7 @@

  LIB=		fetch
  CFLAGS+=	-I.
-SRCS=		fetch.c common.c ftp.c http.c file.c \
+SRCS=		fetch.c common.c ftp.c http.c httpdecode.c file.c \
  		ftperr.h httperr.h
  INCS=		fetch.h
  MAN=		fetch.3
@@ -20,6 +20,8 @@
  LDADD=		-lssl -lcrypto
  .endif

+LDADD+=		-lz
+
  CFLAGS+=	-DFTP_COMBINE_CWDS

  CSTD?=		c99
diff -Pur /usr/src/lib/libfetch.orig/http.c /usr/src/lib/libfetch/http.c
--- /usr/src/lib/libfetch.orig/http.c	2008-07-07 00:56:11.000000000 
+0200
+++ /usr/src/lib/libfetch/http.c	2008-07-07 10:09:29.000000000 +0200
@@ -82,6 +82,7 @@
  #include "fetch.h"
  #include "common.h"
  #include "httperr.h"
+#include "httpdecode.h"

  /* Maximum number of redirects to follow */
  #define MAX_REDIRECT 5
@@ -336,6 +337,7 @@
  	hdr_error = -1,
  	hdr_end = 0,
  	hdr_unknown = 1,
+	hdr_content_encoding,
  	hdr_content_length,
  	hdr_content_range,
  	hdr_last_modified,
@@ -349,6 +351,7 @@
  	hdr_t		 num;
  	const char	*name;
  } hdr_names[] = {
+	{ hdr_content_encoding,		"Content-Encoding" },
  	{ hdr_content_length,		"Content-Length" },
  	{ hdr_content_range,		"Content-Range" },
  	{ hdr_last_modified,		"Last-Modified" },
@@ -496,6 +499,21 @@
  }

  /*
+ * Parse a content-encoding header
+ */
+static int
+http_parse_encoding(const char *p)
+{
+	if (strcmp("gzip", p) == 0)
+		return(ENCODING_GZIP);
+	if (strcmp("deflate", p) == 0)
+		return(ENCODING_DEFLATE);
+	if (strcmp("compress", p) == 0)
+		return(ENCODING_COMPRESS);
+	return(ENCODING_RAW);
+}
+
+/*
   * Parse a content-length header
   */
  static int
@@ -800,14 +818,17 @@
  	conn_t *conn;
  	struct url *url, *new;
  	int chunked, direct, need_auth, noredirect, verbose;
-	int e, i, n, val;
+	int e, i, n, val, encoding;
  	off_t offset, clength, length, size;
  	time_t mtime;
  	const char *p;
-	FILE *f;
+	FILE *f, *d;
  	hdr_t h;
  	char hbuf[MAXHOSTNAMELEN + 7], *host;

+	f = NULL;
+	d = NULL;
+
  	direct = CHECK_FLAG('d');
  	noredirect = CHECK_FLAG('A');
  	verbose = CHECK_FLAG('v');
@@ -834,6 +855,7 @@
  		length = -1;
  		size = -1;
  		mtime = 0;
+		encoding = ENCODING_RAW;

  		/* check port */
  		if (!url->port)
@@ -919,6 +941,7 @@
  			http_cmd(conn, "User-Agent: %s " _LIBFETCH_VER, getprogname());
  		if (url->offset > 0)
  			http_cmd(conn, "Range: bytes=%lld-", (long long)url->offset);
+		http_cmd(conn, "Accept-Encoding: gzip,deflate");
  		http_cmd(conn, "Connection: close");
  		http_cmd(conn, "");

@@ -999,6 +1022,9 @@
  			case hdr_error:
  				http_seterr(HTTP_PROTOCOL_ERROR);
  				goto ouch;
+			case hdr_content_encoding:
+				encoding = http_parse_encoding(p);
+				break;
  			case hdr_content_length:
  				http_parse_length(p, &clength);
  				break;
@@ -1119,7 +1145,9 @@

  	/* fill in stats */
  	if (us) {
-		us->size = size;
+		/* we can only predict the size of unencoded streams */
+		if (encoding == ENCODING_RAW)
+			us->size = size;
  		us->atime = us->mtime = mtime;
  	}

@@ -1139,6 +1167,13 @@
  		goto ouch;
  	}

+	/* wrap the decoder around it */
+	if ((d = httpDecode(f, encoding, size)) == NULL) {
+		fetch_syserr();
+		fclose(f);
+		goto ouch;
+	}
+
  	if (url != URL)
  		fetchFreeURL(url);
  	if (purl)
@@ -1150,7 +1185,7 @@
  		f = NULL;
  	}

-	return (f);
+	return (d);

  ouch:
  	if (url != URL)
diff -Pur /usr/src/lib/libfetch.orig/httpdecode.c 
/usr/src/lib/libfetch/httpdecode.c
--- /usr/src/lib/libfetch.orig/httpdecode.c	1970-01-01 
01:00:00.000000000 +0100
+++ /usr/src/lib/libfetch/httpdecode.c	2008-07-07 14:12:56.000000000 
+0200
@@ -0,0 +1,404 @@
+/*
+ * I wrote this and I say you can do whatever you want with it. Period.
+ * However, I'd love to hear from you what you've done.
+ *
+ * Dominic Fandrey <kamikaze@bsdforen.de>
+ */
+
+/**
+ * \file httpdecode.c
+ *
+ * This file contains the implemention of the prototypes defined in
+ * httpdecode.h.
+ *
+ * @brief
+ *	HTTP content decoding implemention.
+ * @see
+ *	httpdecode.h
+ * @author
+ *	Dominic Fandrey <kamikaze@bsdforen.de>
+ * @version
+ *	0.1.99.2008.07.07
+ */
+
+/* LINTLIBRARY */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <errno.h>
+#include <zlib.h>
+#include "httpdecode.h"
+
+/* PRIVATE STRUCTS */
+
+/**
+ * @brief
+ *	The necessary data to maintain a zlib decoding stream.
+ */
+struct zlibStream {
+	/**
+	 * @brief
+	 *	The original stream with the encoded data.
+	 */
+	FILE * source;
+
+	/**
+	 * @brief
+	 *	A read buffer for the encoded stream.
+	 */
+	char * buffer;
+
+	/**
+	 * This specifies the encoding of the data. The values
+	 *  ENCODING_GZIP and ENCODING_DEFLATE are possible.
+	 *
+	 * @brief
+	 *	The encoding type of the stream.
+	 */
+	int encoding;
+
+	/**
+	 * @brief
+	 *	The stream data used by zlib.
+	 */
+	z_stream stream;
+
+	/**
+	 * The length of the source stream. The value 0 means that the length
+	 * is unknown higher values will be used to automatically close the
+	 * stream. This prevents overreading and allows the continued use
+	 * of the underlying HTTP stream.
+	 *
+	 * @brief
+	 *	The length of the encoded source stream.
+	 */
+	size_t length;
+
+	/**
+	 * @brief
+	 *	The amount of data that has been read.
+	 */
+	size_t read;
+
+	/**
+	 * @brief
+	 *	The size of the buffer for encoded data.
+	 */
+	size_t bufferSize;
+};
+
+/* PRIVATE PROTOTYPES */
+void moveBuffer(struct zlibStream * cookie, char * newBuffer, 
size_t size);
+FILE * zlibOpen(struct zlibStream * cookie);
+size_t zlibRead(struct zlibStream * cookie, char * buffer, size_t 
length);
+int zlibClose(struct zlibStream * cookie);
+/* TODO
+FILE * compressOpen(struct zlibStream * cookie);
+size_t compressRead(struct zlibStream * cookie, char * buffer, 
size_t length);
+int compressClose(struct zlibStream * cookie);
+FILE * randomOpen(struct zlibStream * cookie);
+size_t randomRead(struct zlibStream * cookie, char * buffer, size_t 
length);
+int randomSeek(struct zlibStream * cookie, off_t offset, int whence);
+int randomClose(struct zlibStream * cookie);
+*/
+
+/* PUBLIC FUNCTIONS */
+
+/**
+ * Opens a given stream for decoding and returns a FILE handle that 
can be
+ * used with the fread and fclose function. Internally funopen is used
+ * to achieve this.
+ *
+ * In case of failure NULL is returned and errno is set to EINVAL for
+ * invalid parameters and ENOMEM for insufficient memory.
+ *
+ * @brief
+ *	Open a FILE stream to read decoded data from.
+ * @param source
+ *	The stream to read the encoded data from.
+ * @param encoding
+ *	The encoding type of the source stream.
+ * @param length
+ *	The length of the source stream. Use 0 if unknown.
+ * @return
+ *	Returns a FILE handle to read an encoded stream.
+ * @see
+ *	funopen(3)
+ * @see
+ *	fread(3)
+ * @see
+ *	fclose(3)
+ */
+FILE * httpDecode(FILE * source, int encoding, size_t length) {
+	struct zlibStream * zlibCookie;
+
+	switch (encoding) {
+	case ENCODING_RAW:
+		return(source);
+	case ENCODING_GZIP: case ENCODING_DEFLATE:
+		zlibCookie = malloc(sizeof(struct zlibStream));
+		if (zlibCookie == NULL) /* errno == ENOMEM */
+			return(NULL);
+		zlibCookie->buffer = NULL;
+		zlibCookie->bufferSize = 0;
+		zlibCookie->source = source;
+		zlibCookie->length = length;
+		zlibCookie->read = 0;
+		zlibCookie->encoding = encoding;
+		return(zlibOpen(zlibCookie));
+	case ENCODING_COMPRESS:
+		return(NULL);
+	}
+
+	return(NULL);
+}
+
+/**
+ * This function is a wraper around httpDecode that allows random 
access
+ * by writing the stream into a temporary file. The file is buffered
+ * by a given number of buffers in memory.
+ * Buffers are overwritten in LRU order.
+ *
+ * @brief
+ *	A file backed wrapper around httpDecode for random access.
+ * @param source
+ *	The stream to read the encoded data from.
+ * @param encoding
+ *	The encoding type of the source stream.
+ * @param length
+ *	The length of the source stream. Use 0 if unknown.
+ * @param bufferSize
+ *	The size of a buffer.
+ * @param
+ *	The number of buffers.
+ * @return
+ *	Returns a FILE handle to read an encoded stream.
+ */
+/* TODO
+FILE * httpDecodeRandom(FILE * source, int encoding, size_t length,
+	size_t bufferSize, size_t buffers) {
+	return(source);
+}
+*/
+
+/* PRIVATE FUNCTIONS */
+
+/**
+ * This function replaces the read buffer in the cookie with the 
new buffer.
+ * The old buffer is freed but the contents are saved in the new 
buffer.
+ * However, no security checks are performed.
+ * That means that newBuffer must at least have the same size as 
the old one.
+ *
+ * @brief
+ *	Replace the current read buffer.
+ * @param cookie
+ *	Contains all the data necessary to maintain the stream.
+ * @param newBuffer
+ *	The new buffer to use.
+ */
+void moveBuffer(struct zlibStream * cookie, char * newBuffer, 
size_t size) {
+	memmove(newBuffer, cookie->buffer, cookie->stream.avail_in);
+	free(cookie->buffer);
+	cookie->buffer = newBuffer;
+	cookie->bufferSize = size;
+}
+
+/**
+ * This function initializes a zlib stream and creates the file handler
+ * that will later be used to pull data from the stream.
+ *
+ * Upon any kind of failure errno is set to one of the following 
values:
+ * EINVAL	This can either indicate that an unsupported encoding
+ *		was given or that this code and the used zlib implemention
+ *		are incompatible.
+ * ENOMEM	Indicates that the available memory is insuficient for
+ *		the decode buffer, zlib or funopen.
+ *
+ * @brief
+ *	Open a zlib stream.
+ * @param cookie
+ *	Contains all the data necessary to maintain the stream.
+ * @return
+ *	A FILE* pointer or NULL in case of failure.
+ */
+FILE * zlibOpen(struct zlibStream * cookie) {
+	int wbits;
+	z_stream * stream = &(cookie->stream);
+
+	/* Set window bits for the selected encoding. */
+	switch(cookie->encoding) {
+	case ENCODING_DEFLATE:
+		wbits = -MAX_WBITS;
+		break;
+	case ENCODING_GZIP:
+		wbits = MAX_WBITS + 16;
+		break;
+	default:
+		errno = EINVAL;
+		return(NULL);
+	}
+
+	/* Create the decoding buffer. */
+	cookie->bufferSize = 512;
+	cookie->buffer = malloc(cookie->bufferSize);
+	if (cookie->buffer == NULL)
+		return(NULL); /* errno == ENOMEM */
+
+	/* Initialize zlib stream data. */
+	stream->zalloc = Z_NULL;
+	stream->zfree = Z_NULL;
+	stream->opaque = Z_NULL;
+	stream->avail_in = 0;
+	stream->next_in = (Bytef *) cookie->buffer;
+
+	/* Initialize stream for decoding. */
+	switch(inflateInit2(stream, wbits)) {
+	case Z_OK:
+		errno = 0;
+		break;
+	case Z_MEM_ERROR:
+		errno = ENOMEM;
+		break;
+	case Z_STREAM_ERROR: /* This is not supposed to happen. */
+		errno = EINVAL;
+		break;
+	}
+	if (errno) {
+		free(cookie->buffer);
+		return(NULL);
+	}
+
+	/* Create the file stream to return. */
+	return(funopen(cookie,(int (*)(void *, char *, int)) zlibRead,
+		NULL, NULL, (int (*)(void *)) zlibClose));
+}
+
+/**
+ * Writes a chunk of decoded data to the given buffer.
+ *
+ * In case of an error (size_t) -1 is returned to indicate to the 
funopen
+ * wrapper that an error occured. In such a case errno is set to EIO.
+ *
+ * An error does not cause the stream to be closed.
+ *
+ * @brief
+ *	Read decoded data from the encoded stream.
+ * @param cookie
+ *	Contains all the data necessary to maintain the stream.
+ * @param buffer
+ *	The buffer to write the decoded data to.
+ * @param length
+ *	The space available in the buffer.
+ * @return
+ *	The number of bytes written to the buffer or (size_t) -1 in case of
+ *	failure.
+ */
+size_t zlibRead(struct zlibStream * cookie, char * buffer, size_t 
length) {
+	char * tmpBuffer;
+	size_t growth, maxRead, bufferAvailable, flushed;
+	int zlibStatus;
+	z_stream * stream = &(cookie->stream);
+
+	/*
+	 * Adjust buffer size if the target buffer is larger than 2 times
+	 * the source buffer.
+	 */
+	if ((length >> 1) > cookie->bufferSize) {
+		tmpBuffer = malloc(length >> 1);
+
+		/*
+		 * If creating a new buffer fails pretend never to have
+		 * attempted it.
+		 */
+		if (tmpBuffer == NULL)
+			errno = 0;
+		else
+			/* Move data from the old buffer to the new one. */
+			moveBuffer(cookie, tmpBuffer, length >> 1);
+	}
+
+	/* Run until the target buffer has been filled. */
+	flushed = 0;
+	while (length > 0) {
+		/* If the input buffer is not full, fill it. */
+		growth = 0;
+		bufferAvailable = cookie->bufferSize - stream->avail_in;
+		if (cookie->length) {
+			maxRead = cookie->length - cookie->read;
+			bufferAvailable = (maxRead < bufferAvailable \
+				? maxRead : bufferAvailable);
+		}
+		if (bufferAvailable > 0) {
+			growth = fread(cookie->buffer + stream->avail_in, \
+				sizeof(char), bufferAvailable, cookie->source);
+			/* Forward errors. */
+			if (ferror(cookie->source))
+				return((size_t) -1);
+			stream->avail_in += growth;
+			cookie->read += growth;
+		}
+
+		/* Decode data from the read to the target buffer. */
+		stream->next_in = (Bytef *) cookie->buffer;
+		stream->avail_out = length;
+		stream->next_out = (Bytef *) buffer;
+		zlibStatus = inflate(stream, Z_SYNC_FLUSH);
+
+		/* The amount of data just written to the target buffer. */
+		growth = length - stream->avail_out;
+
+		/* Adjust the read buffer. */
+		memmove(cookie->buffer, stream->next_in, \
+			(size_t) stream->avail_in);
+		stream->next_in = (Bytef *) cookie->buffer;
+
+		/* Adjust the target buffer. */
+		flushed += growth;
+		buffer += growth;
+		length = stream->avail_out;
+
+		/* Deal with errors. */
+		switch (zlibStatus) {
+		case Z_OK:
+			break;
+		case Z_STREAM_END:
+			length = 0;
+			break;
+		case Z_BUF_ERROR:
+			/* The read buffer is too small, try to double it. */
+			tmpBuffer = malloc(cookie->bufferSize << 1);
+			if (!tmpBuffer) /* errno == ENOMEN */
+				return((size_t) -1);
+			moveBuffer(cookie, tmpBuffer, cookie->bufferSize << 1);
+			break;
+		case Z_NEED_DICT: case Z_DATA_ERROR: case Z_STREAM_ERROR:
+			errno = EIO;
+			return((size_t) -1);
+		case Z_MEM_ERROR:
+			errno = ENOMEM;
+			return((size_t) -1);
+		}
+	}
+
+	return(flushed);
+}
+
+/**
+ * Closes the decoding stream and frees all buffers.
+ *
+ * @brief
+ *	Closes the decoding stream.
+ * @param cookie
+ *	Contains all the data necessary to maintain the stream.
+ * @return
+ *	Always 0 for success.
+ */
+int zlibClose(struct zlibStream * cookie) {
+	inflateEnd(&(cookie->stream));
+	free(cookie->buffer);
+	free(cookie);
+	return(0);
+}
+
diff -Pur /usr/src/lib/libfetch.orig/httpdecode.h 
/usr/src/lib/libfetch/httpdecode.h
--- /usr/src/lib/libfetch.orig/httpdecode.h	1970-01-01 
01:00:00.000000000 +0100
+++ /usr/src/lib/libfetch/httpdecode.h	2008-07-07 00:56:36.000000000 
+0200
@@ -0,0 +1,116 @@
+/*
+ * I wrote this and I say you can do whatever you want with it. Period.
+ * However, I'd love to hear from you what you've done.
+ *
+ * Dominic Fandrey <kamikaze@bsdforen.de>
+ */
+
+#ifndef HTTPDECODE_H
+#define HTTPDECODE_H
+
+/**
+ * \file httpdecode.h
+ *
+ * This file contains the public prototypes and defines required to 
read
+ * compressed data streams. Supported formats are those listed in
+ * RFC2616 section 3.5 (HTTP 1.1 content encodings). Compress 
decoding is
+ * not yet implemented.
+ *
+ * @brief
+ *	Public defines and prototypes to decode encoded HTML streams.
+ * @see
+ *	http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.5
+ * @see
+ *	httpdecode.c
+ * @author
+ *	Dominic Fandrey <kamikaze@bsdforen.de>
+ * @version
+ *	0.3.99.2008.07.07
+ */
+
+/**
+ * This can be used to abuse httpDecodeRandom as a random (read) 
access layer
+ * for any FILE stream.
+ *
+ * @brief
+ *	The source stream ist not encoded.
+ */
+#define ENCODING_RAW		0
+
+/**
+ * @brief
+ *	The source stream is deflate encoded.
+ * @see
+ *	zlib(3)
+ */
+#define ENCODING_DEFLATE	1
+
+/**
+ * @brief
+ *	The source stream is gzip encoded.
+ * @see
+ *	gzip(1)
+ * @see
+ *	zlib(3)
+ */
+#define ENCODING_GZIP		2
+
+/**
+ * @brief
+ *	The source stream is compress encoded.
+ * @see
+ *	compress(1)
+ */
+#define ENCODING_COMPRESS	3
+
+
+/**
+ * Opens a given stream for decoding and returns a FILE handle that 
can be
+ * used with the read and close function. Internally funopen is used
+ * to achieve this.
+ *
+ * @param source
+ *	The stream to read the encoded data from.
+ * @param encoding
+ *	The encoding type of the source stream.
+ * @param length
+ *	The length of the source stream. Use 0 if unknown.
+ * @return
+ *	Returns a FILE handle to read an encoded stream.
+ * @see
+ *	funopen(3)
+ * @see
+ *	fread(3)
+ * @see
+ *	fclose(3)
+ */
+FILE * httpDecode(FILE * source, int encoding, size_t length);
+
+/**
+ * This function is a wraper around httpDecode that allows random 
access
+ * by writing the stream into a temporary file. The file is buffered
+ * by a given number of buffers in memory.
+ * Buffers are overwritten in LRU order.
+ *
+ * @brief
+ *	A file backed wrapper around httpDecode for random access.
+ * @param source
+ *	The stream to read the encoded data from.
+ * @param encoding
+ *	The encoding type of the source stream.
+ * @param length
+ *	The length of the source stream. Use 0 if unknown.
+ * @param bufferSize
+ *	The size of a buffer.
+ * @param
+ *	The number of buffers.
+ * @return
+ *	Returns a FILE handle to read an encoded stream.
+ */
+/* TODO
+FILE * httpDecodeRandom(FILE * source, int encoding, size_t length,
+	size_t bufferSize, size_t buffers);
+*/
+
+#endif /* HTTPDECODE_H */
+
Comment 4 Dag-Erling Smørgrav 2008-07-07 15:04:59 UTC
Dominic Fandrey <kamikaze@bsdforen.de> writes:
> Here is a newer version with some serious bugs fixed.

It addresses none of the issues I raised.

DES
--=20
Dag-Erling Sm=C3=B8rgrav - des@des.no
Comment 5 kamikaze 2008-07-07 20:34:04 UTC
I know of no issues you raised. Anyway, this one hopefully comes without
the destructive adding of newlines that happens between thunderbird and
GNATS.


Files src/lib/libfetch.orig/.http.c.swp and src/lib/libfetch/.http.c.swp differ
diff -Pur src/lib/libfetch.orig/Makefile src/lib/libfetch/Makefile
--- src/lib/libfetch.orig/Makefile	2008-07-07 00:56:11.000000000 +0200
+++ src/lib/libfetch/Makefile	2008-07-07 00:56:36.000000000 +0200
@@ -4,7 +4,7 @@

 LIB=		fetch
 CFLAGS+=	-I.
-SRCS=		fetch.c common.c ftp.c http.c file.c \
+SRCS=		fetch.c common.c ftp.c http.c httpdecode.c file.c \
 		ftperr.h httperr.h
 INCS=		fetch.h
 MAN=		fetch.3
@@ -20,6 +20,8 @@
 LDADD=		-lssl -lcrypto
 .endif

+LDADD+=		-lz
+
 CFLAGS+=	-DFTP_COMBINE_CWDS

 CSTD?=		c99
diff -Pur src/lib/libfetch.orig/http.c src/lib/libfetch/http.c
--- src/lib/libfetch.orig/http.c	2008-07-07 00:56:11.000000000 +0200
+++ src/lib/libfetch/http.c	2008-07-07 21:29:43.000000000 +0200
@@ -82,6 +82,7 @@
 #include "fetch.h"
 #include "common.h"
 #include "httperr.h"
+#include "httpdecode.h"

 /* Maximum number of redirects to follow */
 #define MAX_REDIRECT 5
@@ -336,6 +337,7 @@
 	hdr_error = -1,
 	hdr_end = 0,
 	hdr_unknown = 1,
+	hdr_content_encoding,
 	hdr_content_length,
 	hdr_content_range,
 	hdr_last_modified,
@@ -349,6 +351,7 @@
 	hdr_t		 num;
 	const char	*name;
 } hdr_names[] = {
+	{ hdr_content_encoding,		"Content-Encoding" },
 	{ hdr_content_length,		"Content-Length" },
 	{ hdr_content_range,		"Content-Range" },
 	{ hdr_last_modified,		"Last-Modified" },
@@ -496,6 +499,21 @@
 }

 /*
+ * Parse a content-encoding header
+ */
+static int
+http_parse_encoding(const char *p)
+{
+	if (strcmp("gzip", p) == 0)
+		return(ENCODING_GZIP);
+	if (strcmp("deflate", p) == 0)
+		return(ENCODING_DEFLATE);
+	if (strcmp("compress", p) == 0)
+		return(ENCODING_COMPRESS);
+	return(ENCODING_RAW);
+}
+
+/*
  * Parse a content-length header
  */
 static int
@@ -800,7 +818,7 @@
 	conn_t *conn;
 	struct url *url, *new;
 	int chunked, direct, need_auth, noredirect, verbose;
-	int e, i, n, val;
+	int e, i, n, val, encoding;
 	off_t offset, clength, length, size;
 	time_t mtime;
 	const char *p;
@@ -834,6 +852,7 @@
 		length = -1;
 		size = -1;
 		mtime = 0;
+		encoding = ENCODING_RAW;

 		/* check port */
 		if (!url->port)
@@ -919,6 +938,7 @@
 			http_cmd(conn, "User-Agent: %s " _LIBFETCH_VER, getprogname());
 		if (url->offset > 0)
 			http_cmd(conn, "Range: bytes=%lld-", (long long)url->offset);
+		http_cmd(conn, "Accept-Encoding: gzip,deflate,compress");
 		http_cmd(conn, "Connection: close");
 		http_cmd(conn, "");

@@ -999,6 +1019,9 @@
 			case hdr_error:
 				http_seterr(HTTP_PROTOCOL_ERROR);
 				goto ouch;
+			case hdr_content_encoding:
+				encoding = http_parse_encoding(p);
+				break;
 			case hdr_content_length:
 				http_parse_length(p, &clength);
 				break;
@@ -1119,7 +1142,9 @@

 	/* fill in stats */
 	if (us) {
-		us->size = size;
+		/* we can only predict the size of unencoded streams */
+		if (encoding == ENCODING_RAW)
+			us->size = size;
 		us->atime = us->mtime = mtime;
 	}

@@ -1139,6 +1164,12 @@
 		goto ouch;
 	}

+	/* wrap the decoder around it */
+	if ((f = httpDecode(f, encoding, size, SOURCE_CLOSE)) == NULL) {
+		fetch_syserr();
+		goto ouch;
+	}
+
 	if (url != URL)
 		fetchFreeURL(url);
 	if (purl)
diff -Pur src/lib/libfetch.orig/httpdecode.c src/lib/libfetch/httpdecode.c
--- src/lib/libfetch.orig/httpdecode.c	1970-01-01 01:00:00.000000000 +0100
+++ src/lib/libfetch/httpdecode.c	2008-07-07 19:35:25.000000000 +0200
@@ -0,0 +1,431 @@
+/*
+ * I wrote this and I say you can do whatever you want with it. Period.
+ * However, I'd love to hear from you what you've done.
+ *
+ * Dominic Fandrey <kamikaze@bsdforen.de>
+ */
+
+/**
+ * \file httpdecode.c
+ *
+ * This file contains the implemention of the prototypes defined in
+ * httpdecode.h.
+ *
+ * @brief
+ *	HTTP content decoding implemention.
+ * @see
+ *	httpdecode.h
+ * @author
+ *	Dominic Fandrey <kamikaze@bsdforen.de>
+ * @version
+ *	0.1.99.2008.07.07
+ */
+
+/* LINTLIBRARY */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <errno.h>
+#include <zlib.h>
+#include "httpdecode.h"
+
+/* PRIVATE STRUCTS */
+
+/**
+ * @brief
+ *	The necessary data to maintain a zlib decoding stream.
+ */
+struct zlibStream {
+	/**
+	 * @brief
+	 *	The original stream with the encoded data.
+	 */
+	FILE * source;
+
+	/**
+	 * @brief
+	 *	A read buffer for the encoded stream.
+	 */
+	char * buffer;
+
+	/**
+	 * This specifies the encoding of the data. The values
+	 *  ENCODING_GZIP and ENCODING_DEFLATE are possible.
+	 *
+	 * @brief
+	 *	The encoding type of the stream.
+	 */
+	int encoding;
+
+	/**
+	 * @brief
+	 *	The stream data used by zlib.
+	 */
+	z_stream stream;
+
+	/**
+	 * The length of the source stream. The value 0 means that the length
+	 * is unknown higher values will be used to automatically close the
+	 * stream. This prevents overreading and allows the continued use
+	 * of the underlying HTTP stream.
+	 *
+	 * @brief
+	 *	The length of the encoded source stream.
+	 */
+	size_t length;
+
+	/**
+	 * @brief
+	 *	The amount of data that has been read.
+	 */
+	size_t read;
+
+	/**
+	 * @brief
+	 *	The size of the buffer for encoded data.
+	 */
+	size_t bufferSize;
+
+	/**
+	 * If set to SOURCE_CLOSE the source stream will be closed together
+	 * with decoded stream in zlibClose. To prevail the source stream
+	 * this should be set to SOURCE_KEEP.
+	 *
+	 * @brief
+	 *	Controls weather the source stream may be closed.
+	 */
+	int sourceControl;
+};
+
+/* PRIVATE PROTOTYPES */
+void moveBuffer(struct zlibStream * cookie, char * newBuffer, size_t size);
+FILE * zlibOpen(struct zlibStream * cookie);
+size_t zlibRead(struct zlibStream * cookie, char * buffer, size_t length);
+int zlibClose(struct zlibStream * cookie);
+/* TODO
+FILE * compressOpen(struct compressStream * cookie);
+size_t compressRead(struct compressStream * cookie, char * buffer, \
+	size_t length);
+int compressClose(struct compressStream * cookie);
+FILE * randomOpen(struct randomStream * cookie);
+size_t randomRead(struct randomStream * cookie, char * buffer, size_t length);
+int randomSeek(struct randomStream * cookie, off_t offset, int whence);
+int randomClose(struct randomStream * cookie);
+*/
+
+/* PUBLIC FUNCTIONS */
+
+/**
+ * Opens a given stream for decoding and returns a FILE handle that can be
+ * used with the fread and fclose function. Internally funopen is used
+ * to achieve this.
+ *
+ * In case of failure NULL is returned and errno is set to EINVAL for
+ * invalid parameters and ENOMEM for insufficient memory.
+ *
+ * @brief
+ *	Open a FILE stream to read decoded data from.
+ * @param source
+ *	The stream to read the encoded data from.
+ * @param encoding
+ *	The encoding type of the source stream.
+ * @param length
+ *	The length of the source stream. Use 0 if unknown.
+ * @param sourceControl
+ *	Set this to SOURCE_CLOSE or SOURCE_KEEP.
+ * @return
+ *	Returns a FILE handle to read an encoded stream.
+ * @see
+ *	funopen(3)
+ * @see
+ *	fread(3)
+ * @see
+ *	fclose(3)
+ */
+FILE * httpDecode(FILE * source, int encoding, size_t length, \
+	int sourceControl) {
+	struct zlibStream * zlibCookie;
+	FILE * stream = NULL;
+
+	switch (encoding) {
+	case ENCODING_RAW:
+		return(source);
+	case ENCODING_GZIP: case ENCODING_DEFLATE:
+		zlibCookie = malloc(sizeof(struct zlibStream));
+		if (zlibCookie == NULL) /* errno == ENOMEM */
+			break;
+		zlibCookie->source = source;
+		zlibCookie->length = length;
+		zlibCookie->read = 0;
+		zlibCookie->encoding = encoding;
+		zlibCookie->sourceControl = sourceControl;
+		stream = zlibOpen(zlibCookie);
+		if (stream != NULL)
+			return(stream);
+		break;
+	case ENCODING_COMPRESS:
+		errno = EIO;
+		break;
+	}
+
+	/* This is where we end up in case of an error. */
+	if (sourceControl == SOURCE_CLOSE)
+		fclose(source);
+	return(NULL);
+}
+
+/**
+ * This function is a wraper around httpDecode that allows random access
+ * by writing the stream into a temporary file. The file is buffered
+ * by a given number of buffers in memory.
+ * Buffers are overwritten in LRU order.
+ *
+ * @brief
+ *	A file backed wrapper around httpDecode for random access.
+ * @param source
+ *	The stream to read the encoded data from.
+ * @param encoding
+ *	The encoding type of the source stream.
+ * @param length
+ *	The length of the source stream. Use 0 if unknown.
+ * @param bufferSize
+ *	The size of a buffer.
+ * @param buffers
+ *	The number of buffers.
+ * @param sourceControl
+ *	Set this to 1 to close the source stream with the decoded stream.
+ * @return
+ *	Returns a FILE handle to read an encoded stream.
+ */
+/* TODO
+FILE * httpDecodeRandom(FILE * source, int encoding, size_t length,
+	size_t bufferSize, size_t buffers, int sourceControl) {
+	return(source);
+}
+*/
+
+/* PRIVATE FUNCTIONS */
+
+/**
+ * This function replaces the read buffer in the cookie with the new buffer.
+ * The old buffer is freed but the contents are saved in the new buffer.
+ * However, no security checks are performed.
+ * That means that newBuffer must at least have the same size as the old one.
+ *
+ * @brief
+ *	Replace the current read buffer.
+ * @param cookie
+ *	Contains all the data necessary to maintain the stream.
+ * @param newBuffer
+ *	The new buffer to use.
+ */
+void moveBuffer(struct zlibStream * cookie, char * newBuffer, size_t size) {
+	memmove(newBuffer, cookie->buffer, (size_t) cookie->stream.avail_in);
+	free(cookie->buffer);
+	cookie->buffer = newBuffer;
+	cookie->bufferSize = size;
+}
+
+/**
+ * This function initializes a zlib stream and creates the file handler
+ * that will later be used to pull data from the stream.
+ *
+ * Upon any kind of failure errno is set to one of the following values:
+ * EINVAL	This can either indicate that an unsupported encoding
+ *		was given or that this code and the used zlib implemention
+ *		are incompatible.
+ * ENOMEM	Indicates that the available memory is insuficient for
+ *		the decode buffer, zlib or funopen.
+ *
+ * @brief
+ *	Open a zlib stream.
+ * @param cookie
+ *	Contains all the data necessary to maintain the stream.
+ * @return
+ *	A FILE* pointer or NULL in case of failure.
+ */
+FILE * zlibOpen(struct zlibStream * cookie) {
+	int wbits;
+	z_stream * stream = &(cookie->stream);
+
+	/* Set window bits for the selected encoding. */
+	switch(cookie->encoding) {
+	case ENCODING_DEFLATE:
+		wbits = -MAX_WBITS;
+		break;
+	case ENCODING_GZIP:
+		wbits = MAX_WBITS + 16;
+		break;
+	default:
+		errno = EINVAL;
+		return(NULL);
+	}
+
+	/* Create the decoding buffer. */
+	cookie->bufferSize = 512;
+	cookie->buffer = malloc(cookie->bufferSize);
+	if (cookie->buffer == NULL)
+		return(NULL); /* errno == ENOMEM */
+
+	/* Initialize zlib stream data. */
+	stream->zalloc = Z_NULL;
+	stream->zfree = Z_NULL;
+	stream->opaque = Z_NULL;
+	stream->avail_in = 0;
+	stream->next_in = (Bytef *) cookie->buffer;
+
+	/* Initialize stream for decoding. */
+	switch(inflateInit2(stream, wbits)) {
+	case Z_OK:
+		errno = 0;
+		break;
+	case Z_MEM_ERROR:
+		errno = ENOMEM;
+		break;
+	case Z_STREAM_ERROR: /* This is not supposed to happen. */
+		errno = EINVAL;
+		break;
+	}
+	if (errno) {
+		free(cookie->buffer);
+		return(NULL);
+	}
+
+	/* Create the file stream to return. */
+	return(funopen(cookie,(int (*)(void *, char *, int)) zlibRead,
+		NULL, NULL, (int (*)(void *)) zlibClose));
+}
+
+/**
+ * Writes a chunk of decoded data to the given buffer.
+ *
+ * In case of an error (size_t) -1 is returned to indicate to the funopen
+ * wrapper that an error occured. In such a case errno is set to EIO.
+ *
+ * An error does not cause the stream to be closed.
+ *
+ * @brief
+ *	Read decoded data from the encoded stream.
+ * @param cookie
+ *	Contains all the data necessary to maintain the stream.
+ * @param buffer
+ *	The buffer to write the decoded data to.
+ * @param length
+ *	The space available in the buffer.
+ * @return
+ *	The number of bytes written to the buffer or (size_t) -1 in case of
+ *	failure.
+ */
+size_t zlibRead(struct zlibStream * cookie, char * buffer, size_t length) {
+	char * tmpBuffer;
+	size_t growth, maxRead, bufferAvailable, flushed;
+	int zlibStatus;
+	z_stream * stream = &(cookie->stream);
+
+	/*
+	 * Adjust buffer size if the target buffer is larger than 2 times
+	 * the source buffer.
+	 */
+	if ((length >> 1) > cookie->bufferSize) {
+		tmpBuffer = malloc(length >> 1);
+
+		/*
+		 * If creating a new buffer fails pretend never to have
+		 * attempted it.
+		 */
+		if (tmpBuffer == NULL)
+			errno = 0;
+		else
+			/* Move data from the old buffer to the new one. */
+			moveBuffer(cookie, tmpBuffer, length >> 1);
+	}
+
+	/* Run until the target buffer has been filled. */
+	flushed = 0;
+	while (length > 0) {
+		/* If the input buffer is not full, fill it. */
+		growth = 0;
+		bufferAvailable = cookie->bufferSize - stream->avail_in;
+		if (cookie->length) {
+			maxRead = cookie->length - cookie->read;
+			bufferAvailable = (maxRead < bufferAvailable \
+				? maxRead : bufferAvailable);
+		}
+		if (bufferAvailable > 0) {
+			growth = fread(cookie->buffer + stream->avail_in, \
+				sizeof(char), bufferAvailable, cookie->source);
+			/* Forward errors. */
+			if (ferror(cookie->source))
+				return((size_t) -1);
+			stream->avail_in += growth;
+			cookie->read += growth;
+			if (feof(cookie->source))
+				cookie->length = cookie->read;
+		}
+
+		/* Decode data from the read to the target buffer. */
+		stream->next_in = (Bytef *) cookie->buffer;
+		stream->avail_out = length;
+		stream->next_out = (Bytef *) buffer;
+		zlibStatus = inflate(stream, Z_SYNC_FLUSH);
+
+		/* The amount of data just written to the target buffer. */
+		growth = length - stream->avail_out;
+
+		/* Adjust the read buffer. */
+		memmove(cookie->buffer, stream->next_in, \
+			(size_t) stream->avail_in);
+		stream->next_in = (Bytef *) cookie->buffer;
+
+		/* Adjust the target buffer. */
+		flushed += growth;
+		buffer += growth;
+		length = stream->avail_out;
+
+		/* Deal with errors. */
+		switch (zlibStatus) {
+		case Z_OK:
+			break;
+		case Z_STREAM_END:
+			length = 0;
+			break;
+		case Z_BUF_ERROR:
+			/* The read buffer is too small, try to double it. */
+			tmpBuffer = malloc(cookie->bufferSize << 1);
+			if (!tmpBuffer) /* errno == ENOMEN */
+				return((size_t) -1);
+			moveBuffer(cookie, tmpBuffer, cookie->bufferSize << 1);
+			break;
+		case Z_NEED_DICT: case Z_DATA_ERROR: case Z_STREAM_ERROR:
+			errno = EIO;
+			return((size_t) -1);
+		case Z_MEM_ERROR:
+			errno = ENOMEM;
+			return((size_t) -1);
+		}
+	}
+
+	return(flushed);
+}
+
+/**
+ * Closes the decoding stream and frees all buffers.
+ *
+ * @brief
+ *	Closes the decoding stream.
+ * @param cookie
+ *	Contains all the data necessary to maintain the stream.
+ * @return
+ *	Always 0 for success.
+ */
+int zlibClose(struct zlibStream * cookie) {
+	inflateEnd(&(cookie->stream));
+	if (cookie->sourceControl == SOURCE_CLOSE)
+		fclose(cookie->source);
+	free(cookie->buffer);
+	free(cookie);
+	return(0);
+}
+
diff -Pur src/lib/libfetch.orig/httpdecode.h src/lib/libfetch/httpdecode.h
--- src/lib/libfetch.orig/httpdecode.h	1970-01-01 01:00:00.000000000 +0100
+++ src/lib/libfetch/httpdecode.h	2008-07-07 19:04:26.000000000 +0200
@@ -0,0 +1,134 @@
+/*
+ * I wrote this and I say you can do whatever you want with it. Period.
+ * However, I'd love to hear from you what you've done.
+ *
+ * Dominic Fandrey <kamikaze@bsdforen.de>
+ */
+
+#ifndef HTTPDECODE_H
+#define HTTPDECODE_H
+
+/**
+ * \file httpdecode.h
+ *
+ * This file contains the public prototypes and defines required to read
+ * compressed data streams. Supported formats are those listed in
+ * RFC2616 section 3.5 (HTTP 1.1 content encodings). Compress decoding is
+ * not yet implemented.
+ *
+ * @brief
+ *	Public defines and prototypes to decode encoded HTML streams.
+ * @see
+ *	http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.5
+ * @see
+ *	httpdecode.c
+ * @author
+ *	Dominic Fandrey <kamikaze@bsdforen.de>
+ * @version
+ *	0.3.99.2008.07.07
+ */
+
+/**
+ * This can be used to abuse httpDecodeRandom as a random (read) access layer
+ * for any FILE stream.
+ *
+ * @brief
+ *	The source stream ist not encoded.
+ */
+#define ENCODING_RAW		0
+
+/**
+ * @brief
+ *	The source stream is deflate encoded.
+ * @see
+ *	zlib(3)
+ */
+#define ENCODING_DEFLATE	1
+
+/**
+ * @brief
+ *	The source stream is gzip encoded.
+ * @see
+ *	gzip(1)
+ * @see
+ *	zlib(3)
+ */
+#define ENCODING_GZIP		2
+
+/**
+ * @brief
+ *	The source stream is compress encoded.
+ * @see
+ *	compress(1)
+ */
+#define ENCODING_COMPRESS	3
+
+/**
+ * @brief
+ *	The source stream should be kept open.
+ */
+#define SOURCE_KEEP		0
+
+/**
+ * @brief
+ *	The source stream should be closed with the decoded stream.
+ */
+#define SOURCE_CLOSE		1
+
+
+
+/**
+ * Opens a given stream for decoding and returns a FILE handle that can be
+ * used with the read and close function. Internally funopen is used
+ * to achieve this.
+ *
+ * @param source
+ *	The stream to read the encoded data from.
+ * @param encoding
+ *	The encoding type of the source stream.
+ * @param length
+ *	The length of the source stream. Use 0 if unknown.
+ * @param sourceControl
+ *	Set this to SOURCE_CLOSE or SOURCE_KEEP.
+ * @return
+ *	Returns a FILE handle to read an encoded stream.
+ * @see
+ *	funopen(3)
+ * @see
+ *	fread(3)
+ * @see
+ *	fclose(3)
+ */
+FILE * httpDecode(FILE * source, int encoding, size_t length, \
+	int sourceControl);
+
+/**
+ * This function is a wraper around httpDecode that allows random access
+ * by writing the stream into a temporary file. The file is buffered
+ * by a given number of buffers in memory.
+ * Buffers are overwritten in LRU order.
+ *
+ * @brief
+ *	A file backed wrapper around httpDecode for random access.
+ * @param source
+ *	The stream to read the encoded data from.
+ * @param encoding
+ *	The encoding type of the source stream.
+ * @param length
+ *	The length of the source stream. Use 0 if unknown.
+ * @param bufferSize
+ *	The size of a buffer.
+ * @param buffers
+ *	The number of buffers.
+ * @param sourceControl
+ *	Set this to SOURCE_CLOSE or SOURCE_KEEP.
+ * @return
+ *	Returns a FILE handle to read an encoded stream.
+ */
+/* TODO
+FILE * httpDecodeRandom(FILE * source, int encoding, size_t length,
+	size_t bufferSize, size_t buffers, int sourceControl);
+*/
+
+#endif /* HTTPDECODE_H */
+
Comment 6 kamikaze 2008-07-08 20:38:39 UTC
The code is for my personal use, adding it to libfetch is only a bonus.
So to me httpDecodeRandom() is a good idea. There's no reason to include
it in libfetch, though.
I can adjust the style when the implementation is ready, this does not
concern me yet. In that stage I can as well include the code into http.c.

About function naming, there is no consistent function naming style in
libfetch. Should I follow the style of fetch.c or http.c?

I WANT to do compress, so you either have to take what I give or patch it
away when I'm done. I even have received requests for the inclusion of
compress. It simply seems to make things more /complete/ to me.

Avoiding nested funopen would certainly require an overhaul of http.c.
But this doesn't really matter, because I see no reasonable way of
achieving RFC2616 conformity without it, because it allows nested
encodings (§14.11) and thus requires nested decoding.

Also funopen offers a transparent view on the data that allows me to
add the features without a deep understanding of the underlying layers.
Or in other words, I'm too lazy to take another approach.
Comment 7 Dag-Erling Smørgrav 2008-07-09 09:46:08 UTC
Dominic Fandrey <kamikaze@bsdforen.de> writes:
> The code is for my personal use, adding it to libfetch is only a
> bonus.

Your personal use is of no interest to me.  You have submitted a patch
to FreeBSD, so I must evaluate it in the context of FreeBSD.

> I can adjust the style when the implementation is ready, this does not
> concern me yet. In that stage I can as well include the code into
> http.c.

What is the point of submitting a patch that you know isn't even close
to being committable?

> About function naming, there is no consistent function naming style in
> libfetch. Should I follow the style of fetch.c or http.c?

There is a consistent style for internal functions.

> I WANT to do compress,

why?

> It simply seems to make things more /complete/ to me.

"the only thing that is worse than generalizing from one example is
generalizing from zero examples"

How are you going to test COMPRESS?  Apache doesn't support it, nor does
IIS (to the best of my knowledge).

DES
--=20
Dag-Erling Sm=C3=B8rgrav - des@des.no
Comment 8 kamikaze 2008-07-09 17:45:48 UTC
I have web applications that support compress.

It would be nice to get CCed.
Comment 9 kamikaze 2010-04-10 10:08:15 UTC
I abandoned this endeavour for the sake of more pressing matters.

Feel free to close this.

Thank you for your time and feedback.
Comment 10 Alexander Best freebsd_committer 2010-11-15 13:51:04 UTC
State Changed
From-To: open->closed

Originator requested that this PR should be closed.