sendfile(2) currupts data for files on tmpfs Fix: Not known. How-To-Repeat: 1. Create tmpfs and write test file # mount -t tmpfs tmpfs /spool/tmpfs # echo 'this is test file' > /spool/tmpfs/tt # cat /spool/tmpfs/tt this is test file 2. Transfer this file using sendfile. E. g. using web server from port www/nginx # cat nginx.conf events { worker_connections 1024; } http { default_type application/octet-stream; access_log off; sendfile on; server { listen 8081; location / { root /spool/tmpfs; } } } # nginx -c nginx.conf Content of fetched file will be different garbage on each request: # fetch -qo - http://127.0.0.1:8081/tt | hd 00000000 d4 98 7a 00 08 00 00 00 dc 98 7a 00 08 00 00 00 |Ôz.....Üz.....| 00000010 e4 98 |ä| 00000012 # fetch -qo - http://127.0.0.1:8081/tt | md5 ff035bff2dcf972ee7dfd023455997ef # fetch -qo - http://127.0.0.1:8081/tt | md5 bb07cd1a0ede3c7af24e4a86a14b0493 # fetch -qo - http://127.0.0.1:8081/tt | md5 e84539257c2650da866a186435acc93d With sendfile disbled in nginx.conf file is not corrupted, but we need sendfile support for other application (where only sendfile supported).
Responsible Changed From-To: freebsd-bugs->freebsd-fs Over to maintainer(s).
Hi, I investigated this a bit. First, note that this bug has some security implications, because it appears that the garbage written by sendfile is kernel memory contents, which could contain something sensitive. It is sufficient for an attacker to have read access to a file on a mounted tmpfs. So it should really get fixed. I'm not terribly familiar with vfs or vm internals, but it appears that sendfile causes VOP_READ to be called with the IO_VMIO flag and a dummy uio. tmpfs_read (in sys/fs/tmpfs/tmpfs_vnops.c) doesn't handle this correctly; it always just copies the data to the supplied uio, which in this case does nothing. It looks like the data is supposed to make it into vn->v_object, and tmpfs_read doesn't do that. (If I understand it correctly, on a normal filesystem this is taken care of by bread().) I am not sure what the correct semantics of IO_VMIO are supposed to be, so I don't know what the correct fix would be. However, a quick fix is to not have a v_object at all; remove the call to vnode_create_vobject in tmpfs_open. This seems to be legal since procfs, etc, work that way. It does however mean that sendfile doesn't work at all. I am curious what was the point of having a v_object in the first place, since the data is already in virtual memory. Unless the goal was just to make sendfile work, which evidently wasn't successful. Incidentally, to the initial reporter, what application do you have that requires sendfile? In my experience, most things will fall back to a read/write loop if sendfile fails, since sendfile isn't available on all systems or under all circumstances. So if you apply the quick fix so that sendfile always fails, it might at least get your application working again. -- Nate Eldredge neldredge@math.ucsd.edu
Hello, On Mon, 6 Oct 2008, 06:40-0000, Nate Eldredge wrote: [...] > Incidentally, to the initial reporter, what application do you have > that requires sendfile? In my experience, most things will fall > back to a read/write loop if sendfile fails, since sendfile isn't > available on all systems or under all circumstances. So if you > apply the quick fix so that sendfile always fails, it might at > least get your application working again. > As stated in the PR Andrey used nginx (ports/www/nginx). But I could easily reproduce the bug with the stock ftpd(8) with the ftproot on tmpfs. -- Maxim Konovalov
> Incidentally, to the initial reporter, what application do you have that > requires sendfile? We want to use tmpfs with our homegrown application, which can work only using sendfile(). Currently we use md+ufs, but with md data present in memory twice - in md and in VM cache. -- Anton Yuzhaninov
On Mon, 6 Oct 2008, Maxim Konovalov wrote: > Hello, > > On Mon, 6 Oct 2008, 06:40-0000, Nate Eldredge wrote: > > [...] >> Incidentally, to the initial reporter, what application do you have >> that requires sendfile? In my experience, most things will fall >> back to a read/write loop if sendfile fails, since sendfile isn't >> available on all systems or under all circumstances. So if you >> apply the quick fix so that sendfile always fails, it might at >> least get your application working again. >> > As stated in the PR Andrey used nginx (ports/www/nginx). But I could > easily reproduce the bug with the stock ftpd(8) with the ftproot on > tmpfs. To simplify matters further, here is the testcase I used when testing this, which uses sendfile to send some data over a unix domain socket. Do: ./server /tmpfs/data mysocket & ./client mysocket >data.out cmp /tmpfs/data data.out If things work right, data and data.out should be identical. But if data is a file on a tmpfs, data.out contains apparently random kernel memory contents. # This is a shell archive. Save it in a file, remove anything before # this line, and then unpack it by entering "sh file". Note, it may # create directories; files and directories will be owned by you and # have default permissions. # # This archive contains: # # Makefile # client.c # server.c # util.c # util.h # echo x - Makefile sed 's/^X//' >Makefile << 'END-of-Makefile' XCC = gcc XCFLAGS = -Wall -W -g X Xall : server client X Xserver : server.o util.o X $(CC) -o $@ $> X Xclient : client.o util.o X $(CC) -o $@ $> X Xserver.o client.o util.o : util.h X Xclean : X rm -f server client *.o END-of-Makefile echo x - client.c sed 's/^X//' >client.c << 'END-of-client.c' X#include <stdio.h> X#include <fcntl.h> X#include <unistd.h> X#include <stdlib.h> X#include "util.h" X Xint main(int argc, char *argv[]) { X int s; X if (argc < 2) { X fprintf(stderr, "Usage: %s socketname\n", argv[0]); X exit(1); X } X if ((s = connect_unix_socket(argv[1])) < 0) { X exit(1); X } X fake_sendfile(s, 1); X return 0; X} X X X END-of-client.c echo x - server.c sed 's/^X//' >server.c << 'END-of-server.c' X#include <stdio.h> X#include <fcntl.h> X#include <unistd.h> X#include <stdlib.h> X#include "util.h" X Xint main(int argc, char *argv[]) { X int f, listener, connection; X if (argc < 3) { X fprintf(stderr, "Usage: %s filename socketname\n", argv[0]); X exit(1); X } X if ((f = open(argv[1], O_RDONLY)) < 0) { X perror(argv[1]); X exit(1); X } X if ((listener = listen_unix_socket(argv[2])) < 0) { X exit(1); X } X if ((connection = accept_unix_socket(listener)) >= 0) { X real_sendfile(f, connection); X } X return 0; X} X X X END-of-server.c echo x - util.c sed 's/^X//' >util.c << 'END-of-util.c' X/* send data from file to unix domain socket */ X X#include <stdio.h> X#include <time.h> X#include <signal.h> X#include <errno.h> X#include <sys/types.h> X#include <sys/socket.h> X#include <sys/un.h> X#include <string.h> X#include <stdlib.h> X#include <unistd.h> X Xint create_unix_socket(void) { X int fd; X if ((fd = socket(PF_LOCAL, SOCK_STREAM, 0)) < 0) { X perror("socket"); X return -1; X } X return fd; X} X Xint make_unix_sockaddr(const char *pathname, struct sockaddr_un *sa) { X memset(sa, 0, sizeof(*sa)); X sa->sun_family = PF_LOCAL; X if (strlen(pathname) + 1 > sizeof(sa->sun_path)) { X fprintf(stderr, "%s: pathname too long (max %lu)\n", X pathname, sizeof(sa->sun_path)); X errno = ENAMETOOLONG; X return -1; X } X strcpy(sa->sun_path, pathname); X return 0; X} X Xstatic char *sockname; Xvoid delete_socket(void) { X unlink(sockname); X} X Xint listen_unix_socket(const char *path) { X int fd; X struct sockaddr_un sa; X if (make_unix_sockaddr(path, &sa) < 0) X return -1; X if ((fd = create_unix_socket()) < 0) X return -1; X if (bind(fd, (struct sockaddr *)&sa, sizeof(sa)) < 0) { X perror("bind"); X close(fd); X return -1; X } X sockname = strdup(path); X atexit(delete_socket); X X if (listen(fd, 5) < 0) { X perror("listen"); X close(fd); X return -1; X } X return fd; X} X Xint accept_unix_socket(int fd) { X int s; X if ((s = accept(fd, NULL, 0)) < 0) { X perror("accept"); X return -1; X } X return s; X} X Xint connect_unix_socket(const char *path) { X int fd; X struct sockaddr_un sa; X if (make_unix_sockaddr(path, &sa) < 0) X return -1; X if ((fd = create_unix_socket()) < 0) X return -1; X if (connect(fd, (struct sockaddr *)&sa, sizeof(sa)) < 0) { X perror("connect"); X return -1; X } X return fd; X} X X#define BUFSIZE 65536 X Xint fake_sendfile(int from, int to) { X char buf[BUFSIZE]; X int v; X int sent = 0; X while ((v = read(from, buf, BUFSIZE)) > 0) { X int d = 0; X while (d < v) { X int w = write(to, buf, v - d); X if (w <= 0) { X perror("write"); X return -1; X } X d += w; X sent += w; X } X } X if (v != 0) { X perror("read"); X return -1; X } X return sent; X} X Xint real_sendfile(int from, int to) { X int v; X v = sendfile(from, to, 0, 0, NULL, NULL, 0); X if (v < 0) { X perror("sendfile"); X } X return v; X} X X END-of-util.c echo x - util.h sed 's/^X//' >util.h << 'END-of-util.h' X/* send data from file to unix domain socket */ X X#include <stdio.h> X#include <time.h> X#include <signal.h> X#include <errno.h> X#include <sys/types.h> X#include <sys/socket.h> X#include <sys/un.h> X Xint create_unix_socket(void); Xint make_unix_sockaddr(const char *pathname, struct sockaddr_un *sa); Xint listen_unix_socket(const char *path); Xint accept_unix_socket(int fd); Xint connect_unix_socket(const char *path); Xint fake_sendfile(int from, int to); Xint real_sendfile(int from, int to); X X END-of-util.h exit -- Nate Eldredge neldredge@math.ucsd.edu
On Mon, 6 Oct 2008, 15:22-0700, Nate Eldredge wrote: > On Mon, 6 Oct 2008, Maxim Konovalov wrote: > > > Hello, > > > > On Mon, 6 Oct 2008, 06:40-0000, Nate Eldredge wrote: > > > > [...] > > > Incidentally, to the initial reporter, what application do you have > > > that requires sendfile? In my experience, most things will fall > > > back to a read/write loop if sendfile fails, since sendfile isn't > > > available on all systems or under all circumstances. So if you > > > apply the quick fix so that sendfile always fails, it might at > > > least get your application working again. > > > > > As stated in the PR Andrey used nginx (ports/www/nginx). But I could > > easily reproduce the bug with the stock ftpd(8) with the ftproot on > > tmpfs. > > To simplify matters further, here is the testcase I used when > testing this, which uses sendfile to send some data over a unix > domain socket. Do: > > ./server /tmpfs/data mysocket & > ./client mysocket >data.out > cmp /tmpfs/data data.out > > If things work right, data and data.out should be identical. But if > data is a file on a tmpfs, data.out contains apparently random > kernel memory contents. > Hi Nate, It'd be really nice if you extend src/tools/regression/sockets/sendfile regression test for this bug. Now it doesn't detect this case. -- Maxim Konovalov
This is not just a bug, but a security hole. What you are seeing is the content of other VM pages. This has exactly the same symptoms as http://security.freebsd.org/advisories/FreeBSD-SA-05:02.sendfile.asc In short: do not use tmpfs unless you trust all users on the system with root access.
I've submitted the PR kern/138465 which contains a patch to tools/regression/sockets/sendfile which adds an MD5 field to the header structure; when all the data has been received the MD5 is recalculated and the test fails if they differ. As expected it fails when /tmp is a tmpfs filesystem. -- Bruce
[ resending it with changed subject. looks like it was ignored by bug-followup@ ] Try the patch attached, it fixes the bug for me. The same workaround is used by zfs.
Author: delphij Date: Wed Oct 7 23:17:15 2009 New Revision: 197850 URL: http://svn.freebsd.org/changeset/base/197850 Log: Add a special workaround to handle UIO_NOCOPY case. This fixes data corruption observed when sendfile() is being used. PR: kern/127213 Submitted by: gk MFC after: 2 weeks Modified: head/sys/fs/tmpfs/tmpfs_vnops.c Modified: head/sys/fs/tmpfs/tmpfs_vnops.c ============================================================================== --- head/sys/fs/tmpfs/tmpfs_vnops.c Wed Oct 7 23:01:31 2009 (r197849) +++ head/sys/fs/tmpfs/tmpfs_vnops.c Wed Oct 7 23:17:15 2009 (r197850) @@ -43,6 +43,8 @@ __FBSDID("$FreeBSD$"); #include <sys/priv.h> #include <sys/proc.h> #include <sys/resourcevar.h> +#include <sys/sched.h> +#include <sys/sf_buf.h> #include <sys/stat.h> #include <sys/systm.h> #include <sys/unistd.h> @@ -428,15 +430,72 @@ tmpfs_setattr(struct vop_setattr_args *v } /* --------------------------------------------------------------------- */ +static int +tmpfs_nocacheread(vm_object_t tobj, vm_pindex_t idx, + vm_offset_t offset, size_t tlen, struct uio *uio) +{ + vm_page_t m; + int error; + + VM_OBJECT_LOCK(tobj); + vm_object_pip_add(tobj, 1); + m = vm_page_grab(tobj, idx, VM_ALLOC_WIRED | + VM_ALLOC_ZERO | VM_ALLOC_NORMAL | VM_ALLOC_RETRY); + if (m->valid != VM_PAGE_BITS_ALL) { + if (vm_pager_has_page(tobj, idx, NULL, NULL)) { + error = vm_pager_get_pages(tobj, &m, 1, 0); + if (error != 0) { + printf("tmpfs get pages from pager error [read]\n"); + goto out; + } + } else + vm_page_zero_invalid(m, TRUE); + } + VM_OBJECT_UNLOCK(tobj); + error = uiomove_fromphys(&m, offset, tlen, uio); + VM_OBJECT_LOCK(tobj); +out: + vm_page_lock_queues(); + vm_page_unwire(m, TRUE); + vm_page_unlock_queues(); + vm_page_wakeup(m); + vm_object_pip_subtract(tobj, 1); + VM_OBJECT_UNLOCK(tobj); + + return (error); +} + +static __inline int +tmpfs_nocacheread_buf(vm_object_t tobj, vm_pindex_t idx, + vm_offset_t offset, size_t tlen, void *buf) +{ + struct uio uio; + struct iovec iov; + + uio.uio_iovcnt = 1; + uio.uio_iov = &iov; + iov.iov_base = buf; + iov.iov_len = tlen; + + uio.uio_offset = 0; + uio.uio_resid = tlen; + uio.uio_rw = UIO_READ; + uio.uio_segflg = UIO_SYSSPACE; + uio.uio_td = curthread; + + return (tmpfs_nocacheread(tobj, idx, offset, tlen, &uio)); +} static int tmpfs_mappedread(vm_object_t vobj, vm_object_t tobj, size_t len, struct uio *uio) { + struct sf_buf *sf; vm_pindex_t idx; vm_page_t m; vm_offset_t offset; off_t addr; size_t tlen; + char *ma; int error; addr = uio->uio_offset; @@ -461,33 +520,30 @@ lookupvpg: vm_page_wakeup(m); VM_OBJECT_UNLOCK(vobj); return (error); + } else if (m != NULL && uio->uio_segflg == UIO_NOCOPY) { + if (vm_page_sleep_if_busy(m, FALSE, "tmfsmr")) + goto lookupvpg; + vm_page_busy(m); + VM_OBJECT_UNLOCK(vobj); + sched_pin(); + sf = sf_buf_alloc(m, SFB_CPUPRIVATE); + ma = (char *)sf_buf_kva(sf); + error = tmpfs_nocacheread_buf(tobj, idx, offset, tlen, + ma + offset); + if (error == 0) { + uio->uio_offset += tlen; + uio->uio_resid -= tlen; + } + sf_buf_free(sf); + sched_unpin(); + VM_OBJECT_LOCK(vobj); + vm_page_wakeup(m); + VM_OBJECT_UNLOCK(vobj); + return (error); } VM_OBJECT_UNLOCK(vobj); nocache: - VM_OBJECT_LOCK(tobj); - vm_object_pip_add(tobj, 1); - m = vm_page_grab(tobj, idx, VM_ALLOC_WIRED | - VM_ALLOC_ZERO | VM_ALLOC_NORMAL | VM_ALLOC_RETRY); - if (m->valid != VM_PAGE_BITS_ALL) { - if (vm_pager_has_page(tobj, idx, NULL, NULL)) { - error = vm_pager_get_pages(tobj, &m, 1, 0); - if (error != 0) { - printf("tmpfs get pages from pager error [read]\n"); - goto out; - } - } else - vm_page_zero_invalid(m, TRUE); - } - VM_OBJECT_UNLOCK(tobj); - error = uiomove_fromphys(&m, offset, tlen, uio); - VM_OBJECT_LOCK(tobj); -out: - vm_page_lock_queues(); - vm_page_unwire(m, TRUE); - vm_page_unlock_queues(); - vm_page_wakeup(m); - vm_object_pip_subtract(tobj, 1); - VM_OBJECT_UNLOCK(tobj); + error = tmpfs_nocacheread(tobj, idx, offset, tlen, uio); return (error); } _______________________________________________ svn-src-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
State Changed From-To: open->patched Patch from gk@ has been applied against -HEAD, MFC reminder.
Responsible Changed From-To: freebsd-fs->delphij Take.
r197850 has never been MFC-ed?! -- Andriy Gapon
Author: delphij Date: Fri Sep 24 17:26:57 2010 New Revision: 213106 URL: http://svn.freebsd.org/changeset/base/213106 Log: MFC r197850: Add a special workaround to handle UIO_NOCOPY case. This fixes data corruption observed when sendfile() is being used. Requested by: avg PR: kern/127213 Submitted by: gk Modified: stable/8/sys/fs/tmpfs/tmpfs_vnops.c Directory Properties: stable/8/sys/ (props changed) stable/8/sys/amd64/include/xen/ (props changed) stable/8/sys/cddl/contrib/opensolaris/ (props changed) stable/8/sys/contrib/dev/acpica/ (props changed) stable/8/sys/contrib/pf/ (props changed) stable/8/sys/dev/xen/xenpci/ (props changed) Modified: stable/8/sys/fs/tmpfs/tmpfs_vnops.c ============================================================================== --- stable/8/sys/fs/tmpfs/tmpfs_vnops.c Fri Sep 24 16:40:46 2010 (r213105) +++ stable/8/sys/fs/tmpfs/tmpfs_vnops.c Fri Sep 24 17:26:57 2010 (r213106) @@ -43,6 +43,8 @@ __FBSDID("$FreeBSD$"); #include <sys/priv.h> #include <sys/proc.h> #include <sys/resourcevar.h> +#include <sys/sched.h> +#include <sys/sf_buf.h> #include <sys/stat.h> #include <sys/systm.h> #include <sys/unistd.h> @@ -433,15 +435,72 @@ tmpfs_setattr(struct vop_setattr_args *v } /* --------------------------------------------------------------------- */ +static int +tmpfs_nocacheread(vm_object_t tobj, vm_pindex_t idx, + vm_offset_t offset, size_t tlen, struct uio *uio) +{ + vm_page_t m; + int error; + + VM_OBJECT_LOCK(tobj); + vm_object_pip_add(tobj, 1); + m = vm_page_grab(tobj, idx, VM_ALLOC_WIRED | + VM_ALLOC_ZERO | VM_ALLOC_NORMAL | VM_ALLOC_RETRY); + if (m->valid != VM_PAGE_BITS_ALL) { + if (vm_pager_has_page(tobj, idx, NULL, NULL)) { + error = vm_pager_get_pages(tobj, &m, 1, 0); + if (error != 0) { + printf("tmpfs get pages from pager error [read]\n"); + goto out; + } + } else + vm_page_zero_invalid(m, TRUE); + } + VM_OBJECT_UNLOCK(tobj); + error = uiomove_fromphys(&m, offset, tlen, uio); + VM_OBJECT_LOCK(tobj); +out: + vm_page_lock_queues(); + vm_page_unwire(m, TRUE); + vm_page_unlock_queues(); + vm_page_wakeup(m); + vm_object_pip_subtract(tobj, 1); + VM_OBJECT_UNLOCK(tobj); + + return (error); +} + +static __inline int +tmpfs_nocacheread_buf(vm_object_t tobj, vm_pindex_t idx, + vm_offset_t offset, size_t tlen, void *buf) +{ + struct uio uio; + struct iovec iov; + + uio.uio_iovcnt = 1; + uio.uio_iov = &iov; + iov.iov_base = buf; + iov.iov_len = tlen; + + uio.uio_offset = 0; + uio.uio_resid = tlen; + uio.uio_rw = UIO_READ; + uio.uio_segflg = UIO_SYSSPACE; + uio.uio_td = curthread; + + return (tmpfs_nocacheread(tobj, idx, offset, tlen, &uio)); +} static int tmpfs_mappedread(vm_object_t vobj, vm_object_t tobj, size_t len, struct uio *uio) { + struct sf_buf *sf; vm_pindex_t idx; vm_page_t m; vm_offset_t offset; off_t addr; size_t tlen; + char *ma; int error; addr = uio->uio_offset; @@ -465,33 +524,30 @@ lookupvpg: vm_page_wakeup(m); VM_OBJECT_UNLOCK(vobj); return (error); + } else if (m != NULL && uio->uio_segflg == UIO_NOCOPY) { + if (vm_page_sleep_if_busy(m, FALSE, "tmfsmr")) + goto lookupvpg; + vm_page_busy(m); + VM_OBJECT_UNLOCK(vobj); + sched_pin(); + sf = sf_buf_alloc(m, SFB_CPUPRIVATE); + ma = (char *)sf_buf_kva(sf); + error = tmpfs_nocacheread_buf(tobj, idx, offset, tlen, + ma + offset); + if (error == 0) { + uio->uio_offset += tlen; + uio->uio_resid -= tlen; + } + sf_buf_free(sf); + sched_unpin(); + VM_OBJECT_LOCK(vobj); + vm_page_wakeup(m); + VM_OBJECT_UNLOCK(vobj); + return (error); } VM_OBJECT_UNLOCK(vobj); nocache: - VM_OBJECT_LOCK(tobj); - vm_object_pip_add(tobj, 1); - m = vm_page_grab(tobj, idx, VM_ALLOC_WIRED | - VM_ALLOC_ZERO | VM_ALLOC_NORMAL | VM_ALLOC_RETRY); - if (m->valid != VM_PAGE_BITS_ALL) { - if (vm_pager_has_page(tobj, idx, NULL, NULL)) { - error = vm_pager_get_pages(tobj, &m, 1, 0); - if (error != 0) { - printf("tmpfs get pages from pager error [read]\n"); - goto out; - } - } else - vm_page_zero_invalid(m, TRUE); - } - VM_OBJECT_UNLOCK(tobj); - error = uiomove_fromphys(&m, offset, tlen, uio); - VM_OBJECT_LOCK(tobj); -out: - vm_page_lock_queues(); - vm_page_unwire(m, TRUE); - vm_page_unlock_queues(); - vm_page_wakeup(m); - vm_object_pip_subtract(tobj, 1); - VM_OBJECT_UNLOCK(tobj); + error = tmpfs_nocacheread(tobj, idx, offset, tlen, uio); return (error); } _______________________________________________ svn-src-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
State Changed From-To: patched->closed Patch MFC'ed as 213106.