| Summary: | Panic running linux binary on ext2fs | ||
|---|---|---|---|
| Product: | Base System | Reporter: | krentel <krentel> |
| Component: | kern | Assignee: | freebsd-bugs (Nobody) <bugs> |
| Status: | Closed FIXED | ||
| Severity: | Affects Only Me | ||
| Priority: | Normal | ||
| Version: | 4.0-STABLE | ||
| Hardware: | Any | ||
| OS: | Any | ||
|
Description
krentel
2000-06-20 21:20:00 UTC
It was suggested on -emulation that this may be a branding issue. So, I copied the Linux binaries, branded them and ran the branded version (still on ext2fs). And I get the same panic. --Mark I've run some more experiments and I've narrowed the problem somewhat.
Using the Slackware 7 live file system, I tar-copied /cdrom/live/bin
onto ufs and ext2fs partitions. Then I ran Slackware's ls from ufs,
cdrom and ext2fs and listed directories on ufs, cdrom and ext2fs.
Sometimes it worked ok, sometimes the output of ls was corrupt (too
few files), and the pattern is quite clear.
directory listed on
binary on ufs cdrom ext2fs
ufs ok corrupt corrupt
cdrom ok corrupt corrupt
ext2fs ok corrupt corrupt
I also updated libncurses.so.5.0 and installed emacs's libexec and
share files and repeated the above test with dired from emacs. I got
the same results, except that the corrupt directory listings were
slightly different between ls and emacs. For example, in one
directory on ext2fs that actually has 77 files, ls reported 71 files,
but dired listed only 29. But they always either both worked or both
had too few files. And sometimes the bottom row panics, but not this
time.
For example, this is Slackware's ls (on ufs) listing a directory on
ext2fs that actually has 89 files.
% ./ls /mnt/bin
awk chmod cp gawk keys;^ mkdir mv sed touch
bash chown dd gawk-3.0.4 ln mknod rm sh
chgrp consolechars df igawk ls mktemp rmdir sync
And the same Linux ls listing a cdrom directory with 801 files.
It comes up 792 files short.
% ./ls /cdrom/live/usr/bin
00_TRANS.TBL a2p aafire aainfo aasavefont aatest aclocal addr addr2line
So, apparently the Linux ls is having trouble reading non-ufs file
systems. And I noticed that dired was unable to do path completion.
I typed /cdrom/li and hit tab, and emacs complained that there was no
completion, probably because there is no /compat/linux/cdrom/li*. But
there is /cdrom/live/bin and dired listed it, although incorrectly.
I'll take a wild guess and say that the Linuxulator opens a file or
directory and gets an error, but it doesn't notice the error and
proceeds blindly along. Maybe where it chooses between lookups in
/compat/linux or /. But that's a wild guess.
--Mark
I can repeat the above results with the simple readdir program (opendir followed by a loop of readdir). Under Linux emulation, readdir is prematurely returning NULL on non-ufs file systems. --Mark On Thu, 6 Jul 2000, Mark W. Krentel wrote:
> I've run some more experiments and I've narrowed the problem somewhat.
>
> Using the Slackware 7 live file system, I tar-copied /cdrom/live/bin
> onto ufs and ext2fs partitions. Then I ran Slackware's ls from ufs,
> cdrom and ext2fs and listed directories on ufs, cdrom and ext2fs.
> Sometimes it worked ok, sometimes the output of ls was corrupt (too
> few files), and the pattern is quite clear.
>
> directory listed on
> binary on ufs cdrom ext2fs
> ufs ok corrupt corrupt
> cdrom ok corrupt corrupt
> ext2fs ok corrupt corrupt
I found some of the problems using these hints. There were 2 serious bugs
in ext2_readdir(): writing far beyond the end of the cookie buffer, and
reading a little beyond the end of the directory buffer.
There don't seem to be any problems with the Linuxulator. It just asks
ext2_readdir() for cookies. Then cookie processing is usually fatal.
Similarly for readdir() on an nfs-mounted ext2fs filesystem.
Overrunning the directory buffer can cause panics and wrong results from
readdir(3) even for native binaries, but this problem doesn't usually occur
for native binaries because they use an adequate buffer size (4K). Linux
binaries trigger the bug by using a too-small buffer size (512 bytes).
This size makes Linux's ls (an old (1997) RedHat version) take about 4
times as much system time as FreeBSD's ls even on ufs filesystems.
getdirentries(2) claims that the correct size is given by stat(2), but
Linux's ls apparently doesn't know this, and in any case the correct
size is a little larger than the filesystem blocksize for ext2fs, since
ext2_readdir() expands some directory entries.
Try these fixes:
Index: ext2_lookup.c
===================================================================
RCS file: /home/ncvs/src/sys/gnu/ext2fs/ext2_lookup.c,v
retrieving revision 1.24
diff -c -2 -r1.24 ext2_lookup.c
*** ext2_lookup.c 2000/05/05 09:57:57 1.24
--- ext2_lookup.c 2000/07/24 02:09:03
***************
*** 153,166 ****
struct iovec aiov;
caddr_t dirbuf;
int readcnt;
! u_quad_t startoffset = uio->uio_offset;
! count = uio->uio_resid; /* legyenek boldogok akik akarnak ... */
! uio->uio_resid = count;
! uio->uio_iov->iov_len = count;
!
! #if 0
! printf("ext2_readdir called uio->uio_offset %d uio->uio_resid %d count %d \n",
! (int)uio->uio_offset, (int)uio->uio_resid, (int)count);
#endif
--- 152,175 ----
struct iovec aiov;
caddr_t dirbuf;
+ int DIRBLKSIZ = VTOI(ap->a_vp)->i_e2fs->s_blocksize;
int readcnt;
! off_t startoffset = uio->uio_offset;
! count = uio->uio_resid;
! /*
! * Avoid complications for partial directory entries by adjusting
! * the i/o to end at a block boundary. Don't give up (like ufs
! * does) if the initial adjustment gives a negative count, since
! * many callers don't supply a large enough buffer. The correct
! * size is a little larger than DIRBLKSIZ to allow for expansion
! * of directory entries, but some callers just use 512.
! */
! count -= (uio->uio_offset + count) & (DIRBLKSIZ -1);
! if (count <= 0)
! count += DIRBLKSIZ;
!
! #ifdef EXT2FS_DEBUG
! printf("ext2_readdir: uio_offset = %lld, uio_resid = %d, count = %d\n",
! uio->uio_offset, uio->uio_resid, count);
#endif
***************
*** 168,171 ****
--- 177,181 ----
auio.uio_iov = &aiov;
auio.uio_iovcnt = 1;
+ auio.uio_resid = count;
auio.uio_segflg = UIO_SYSSPACE;
aiov.iov_len = count;
***************
*** 226,231 ****
if (!error && ap->a_ncookies != NULL) {
! u_long *cookies;
! u_long *cookiep;
off_t off;
--- 236,240 ----
if (!error && ap->a_ncookies != NULL) {
! u_long *cookiep, *cookies, *ecookies;
off_t off;
***************
*** 235,240 ****
M_WAITOK);
off = startoffset;
! for (dp = (struct ext2_dir_entry_2 *)dirbuf, cookiep = cookies;
! dp < edp;
dp = (struct ext2_dir_entry_2 *)((caddr_t) dp + dp->rec_len)) {
off += dp->rec_len;
--- 244,250 ----
M_WAITOK);
off = startoffset;
! for (dp = (struct ext2_dir_entry_2 *)dirbuf,
! cookiep = cookies, ecookies = cookies + ncookies;
! cookiep < ecookies;
dp = (struct ext2_dir_entry_2 *)((caddr_t) dp + dp->rec_len)) {
off += dp->rec_len;
Bruce
> I found some of the problems using these hints. There were 2 serious bugs > in ext2_readdir(): writing far beyond the end of the cookie buffer, and > reading a little beyond the end of the directory buffer. Thanks for looking at the PR! I tried the patch, but unfortunately it didn't make any difference. Are you able to reproduce the bug? I can produce it with just the simple readdir program (see below). Readdir prematurely returns NULL on both ext2fs and cdrom partitions and thus lists too few files. That is, I can produce the bug without even using an ext2fs partition. > Overrunning the directory buffer can cause panics and wrong results from > readdir(3) even for native binaries, but this problem doesn't usually occur > for native binaries because they use an adequate buffer size (4K). Linux > binaries trigger the bug by using a too-small buffer size (512 bytes). What buffers? Are they something a user program has control over, or are they buried within library routines? I tried bypassing readdir by using open and read on the directory. I wrote a simple hex dump program and compiled it in RH 6.1. But Linux wouldn't run it; read on a directory returned EISDIR (Is a directory). Ironically, the Linuxulator did run the program, and read returned the entire directory. So, I guess that narrows the problem to something in the readdir library between the levels of read and readdir. When 4.1 is released, I plan to cvsup to 4.1-R and redo these tests more thoroughly. Maybe your patch is enough to prevent the panic, and maybe the readdir problem is separate bug. I'll let you know. --Mark ---------- /* * List directory contents with opendir and readdir. * Basically the same as "ls -1af". */ #include <sys/types.h> #include <dirent.h> #include <stdio.h> void my_err(char *mesg) { printf("Error: %s\n", mesg); exit(1); } int main(int argc, char **argv) { DIR *dp; struct dirent *de; int n; if ( argc < 2 ) my_err("missing directory"); if ( (dp = opendir(argv[1])) == NULL ) my_err("unable to open directory"); n = 0; while ( (de = readdir(dp)) != NULL ) { printf("%s\n", de->d_name); n++; } printf("Total: %d files\n", n); return 0; } On Tue, 25 Jul 2000, Mark W. Krentel wrote: > > I found some of the problems using these hints. There were 2 serious bugs > > in ext2_readdir(): writing far beyond the end of the cookie buffer, and > > reading a little beyond the end of the directory buffer. > > Thanks for looking at the PR! I tried the patch, but unfortunately > it didn't make any difference. > > Are you able to reproduce the bug? I can produce it with just the Only the panic. > simple readdir program (see below). Readdir prematurely returns NULL > on both ext2fs and cdrom partitions and thus lists too few files. > That is, I can produce the bug without even using an ext2fs partition. I didn't try the program, but linux-ls -R works right on a linux partition and on a cdrom here. > > Overrunning the directory buffer can cause panics and wrong results from > > readdir(3) even for native binaries, but this problem doesn't usually occur > > for native binaries because they use an adequate buffer size (4K). Linux > > binaries trigger the bug by using a too-small buffer size (512 bytes). > > What buffers? Are they something a user program has control over, or > are they buried within library routines? Mostly user buffers in readdir(3), but the Linuxulator and nfs use too-small buffers or a too-small rounding up in some cases. > I tried bypassing readdir by using open and read on the directory. I > wrote a simple hex dump program and compiled it in RH 6.1. But Linux > wouldn't run it; read on a directory returned EISDIR (Is a directory). > Ironically, the Linuxulator did run the program, and read returned the > entire directory. So, I guess that narrows the problem to something > in the readdir library between the levels of read and readdir. readdir(3) doesn't use read(2) under either FreeBSD or Linux. It can't, because not all file systems have read(2)'able directories (under Linux, no file systems have read(2)'able directories). Under FreeBSD, readdir(3) is a simple wrapper around getdirentries(2), and the bug is probably in the latter. Bruce Ok, I've cvsup'd to 4.1-R, applied the patch, rebuilt world and kernel,
and done more tests. I beat on it with "ls -lR" and "find | xargs ls",
I ran emacs, xv, xsnow, xboard, all from the ext2 partition, and I've
been unable to induce a panic. Then, I removed the patch, reran the
tests and got a panic almost immediately. Finally, I put the patch
back in, beat on it some more and no panic. So, I'm satisfied that
you've identified the cause of the panic and that your patch fixes it.
Good job!
And remember that I'm running 4.1, so I have rev 1.21 of ext2_lookup.c
(your patch was for rev 1.24). From looking at the RCS diffs, I don't
think it's a problem, but you would know better than me.
But I'm still seeing the problem where ls or readdir returns too few
files. So this must be a separate problem. And it happens on both
ext2 and cdrom partitions, so maybe it's in the Linuxulator. I'll try
looking at the source code. I guess it's the Linux readdir(3) library
calling getdents(2) in the Linuxulator, is that right?
> I didn't try the program, but linux-ls -R works right on a linux partition
> and on a cdrom here.
Again, I can't figure out what I'm doing differently. Do you have a
machine with a local ext2 partition? What version of Linux do you
have? You're running -current with the linux_base-6.1 port?
I searched the open PR's and found a few more involving panics on ext2
partitions. One stands out as being very similar.
PR i386/15074 -- Two different panics when running Linux binaries on Athlon
Three more may be related, or maybe not.
PR kern/10581 -- Kernel panic while using find on an ext2 filesystem.
PR kern/10594 -- EXT2FS mount problems
PR gnu/15892 -- NFS-exported ext2 file system makes Linux crash
P.S. What does "legyenek boldogok akik akarnak" mean? It didn't rot13
to anything meaningful.
--Mark
Hi Bruce, Given the feedback and the origin of the patch, can I assign the PR to you? :-) Ciao, Sheldon. State Changed From-To: open->feedback My patch has been applied to -current, RELENG_4 and RELENG_3, but there is still a problem, possibly at the Linuxulator level. Ok, I've upgraded to 4.2-RC (as of Nov 8) which includes both Bruce's patch and Marcel's patch to src/sys/compat/linux/linux_file.c for the getdents problem. I've tried the linux ls, readdir and dired (emacs) on ext2 and cdrom partitions and they all work. I also get identical results with the Linux and Freebsd ls -R run over large hierarchies. So, I'm satisfied that Marcel's patch fixes the remaining problem and that this PR should now be closed. Good job! I think these patches may also fix PR i386/15074 and PR gnu/15892, if someone wants to take another look at them. --Mark State Changed From-To: feedback->closed This PR described two problems. Both have been resolved. Thanks to Mark for his patience and contribution! |