| Summary: | mtree "no such file" message at job's end | ||
|---|---|---|---|
| Product: | Base System | Reporter: | Gerhard Sittig <Gerhard.Sittig> |
| Component: | bin | Assignee: | Sheldon Hearn <sheldonh> |
| Status: | Closed FIXED | ||
| Severity: | Affects Only Me | ||
| Priority: | Normal | ||
| Version: | 4.1-STABLE | ||
| Hardware: | Any | ||
| OS: | Any | ||
|
Description
Gerhard Sittig
2000-09-03 20:20:01 UTC
On Sun, 03 Sep 2000 21:11:43 +0200, Gerhard Sittig wrote:
> >Number: 21017
> >Category: bin
> >Synopsis: mtree "no such file" message at job's end
Let us know if you can come up with a simpler How-To-Repeat that
generates the error predictably.
Ciao,
Sheldon.
On Tue, Sep 05, 2000 at 12:18 +0200, Sheldon Hearn wrote: > > On Sun, 03 Sep 2000 21:11:43 +0200, Gerhard Sittig wrote: > > > >Number: 21017 > > >Category: bin > > >Synopsis: mtree "no such file" message at job's end > > Let us know if you can come up with a simpler How-To-Repeat > that generates the error predictably. I tried truss(1)ing the mtree process. But that's somewhat pointless with 550MB of /usr files and only 200MB of free disk space (truss will produce a complete line of text for every 1K of data read, and mtree seems to compute every checksum separately). That's when script(1) or "truss mtree 2> logfile" won't work. I guess I have to dig into setting up multilog from the daemontools package for this particular purpose. I hope to have the stderr tail of truss available, then. Maybe I can tell you soon which syscall results in the ENOENT(?) error. Am I really alone in seeing this erroneous message or am I one of very few people using mtree for more than a "make hierarchy"? I wouldn't think so. At least I hoped to have some users jumping in saying "me too, preferably under _these_ conditions" ... virtually yours 82D1 9B9C 01DC 4FB4 D7B4 61BE 3F49 4F77 72DE DA76 Gerhard Sittig true | mail -s "get gpg key" Gerhard.Sittig@gmx.net -- If you don't understand or are scared by any of the above ask your parents or an adult to help you. On Tue, 05 Sep 2000 18:28:33 +0200, Gerhard Sittig wrote:
> Am I really alone in seeing this erroneous message or am I one of
> very few people using mtree for more than a "make hierarchy"? I
> wouldn't think so. At least I hoped to have some users jumping
> in saying "me too, preferably under _these_ conditions" ...
I'm not sure of that. You might try drawing people's attention to this
PR on the freebsd-questions mailing list. Perhaps they've had reports
of this there.
Ciao,
Sheldon.
On Tue, Sep 05, 2000 at 18:28 +0200, Gerhard Sittig wrote: > > [ ... ] I hope to have the stderr tail of truss available, > then. Maybe I can tell you soon which syscall results in the > ENOENT(?) error. truss(1)ing mtree is what happened now. The result of KEYS=nlink,type,mode,flags,uid,gid,size,time,cksum,md5digest,sha1digest,ripemd160digest truss /usr/sbin/mtree -K $KEYS -p /usr -x -X $DBDIR/_usr.ex < $DBDIR/_usr.db 2>&1 >/dev/null | tail -500 looks like this ----- truss mtree | tail snippets ------------------------------- ... open("./compat/linux/usr/lib/libz.so.1",0,027757774510) = 3 (0x3) read(0x3,0xbfbff548,0x400) = 1024 (0x400) ... read(0x3,0xbfbff548,0x400) = 1024 (0x400) read(0x3,0xbfbff548,0x400) = 421 (0x1a5) read(0x3,0xbfbff548,0x400) = 0 (0x0) close(3) = 0 (0x0) open("./compat/linux/usr/lib/libz.so.1",0,027757774510) = 3 (0x3) read(0x3,0xbfbff548,0x400) = 1024 (0x400) ... read(0x3,0xbfbff548,0x400) = 1024 (0x400) read(0x3,0xbfbff548,0x400) = 421 (0x1a5) read(0x3,0xbfbff548,0x400) = 0 (0x0) close(3) = 0 (0x0) open("./compat/linux/usr/lib/libz.so.1.1.3",0,00) = 3 (0x3) read(0x3,0xbfbfb948,0x4000) = 16384 (0x4000) read(0x3,0xbfbfb948,0x4000) = 16384 (0x4000) read(0x3,0xbfbfb948,0x4000) = 16384 (0x4000) read(0x3,0xbfbfb948,0x4000) = 13733 (0x35a5) read(0x3,0xbfbfb948,0x4000) = 0 (0x0) close(3) = 0 (0x0) open("./compat/linux/usr/lib/libz.so.1.1.3",0,027757774510) = 3 (0x3) read(0x3,0xbfbff548,0x400) = 1024 (0x400) ... read(0x3,0xbfbff548,0x400) = 1024 (0x400) read(0x3,0xbfbff548,0x400) = 421 (0x1a5) read(0x3,0xbfbff548,0x400) = 0 (0x0) close(3) = 0 (0x0) open("./compat/linux/usr/lib/libz.so.1.1.3",0,027757774510) = 3 (0x3) read(0x3,0xbfbff548,0x400) = 1024 (0x400) ... read(0x3,0xbfbff548,0x400) = 1024 (0x400) read(0x3,0xbfbff548,0x400) = 421 (0x1a5) read(0x3,0xbfbff548,0x400) = 0 (0x0) close(3) = 0 (0x0) ... open("./compat/linux/usr/lib/python1.5/site-packages/rpmmodule.so",0,027757774510) = 3 (0x3) read(0x3,0xbfbff548,0x400) = 1024 (0x400) ... read(0x3,0xbfbff548,0x400) = 1024 (0x400) read(0x3,0xbfbff548,0x400) = 168 (0xa8) read(0x3,0xbfbff548,0x400) = 0 (0x0) close(3) = 0 (0x0) readlink("X11",0x804ef00,1023) ERR#2 'No such file or directory' mtree: write(2,0xbfbff1b4,7) = 7 (0x7) line 1361949: X11write(2,0xbfbff1e4,17) = 17 (0x11) : write(2,0xbfbff1a4,2) = 2 (0x2) No such file or directory write(2,0xbfbff1a4,26) = 26 (0x1a) sigprocmask(0x1,0x280605a0,0xbfbff84c) = 0 (0x0) sigprocmask(0x3,0x280605b0,0x0) = 0 (0x0) write(1,0xd2da000,958) = 958 (0x3be) exit(0x1) process exit, rval = 256 ----- truss mtree | tail snippets ------------------------------- Whoops! Why is libz.so.1.1.3 being read in chunks of 16KB when every other file is read in single KB buffers? This "finding" was done by chance ... There's no (obvious) reference to read in usr.sbin/mtree/*.[ch], so I would expect it to be called from {MD5,SHA1_,RIPEMD160_}File(3). The symlink /usr/compat/linux/usr/lib/X11 seems to cause the error. The db description looks like this: ----- _usr.db snippet for the symlink --------------------------- ... # ./compat/linux/usr/lib /set type=file uid=0 gid=0 mode=0755 nlink=1 lib type=dir nlink=8 size=1536 time=964377066.0 X11 type=link size=16 time=964377066.0 link=../X11R6/lib/X11 libbfd-2.9.1.0.24.so \ ... ----- _usr.db snippet for the symlink --------------------------- The symlink is there but "broken". This should never hurt mtree, neither when creating nor when comparing the database. ----- ls -l output ---------------------------------------------- lrwxr-xr-x 1 root wheel 16 Jul 23 20:31 /usr/compat/linux/usr/lib/X11@ -> ../X11R6/lib/X11 ls: /usr/compat/linux/usr/X11R6/lib/X11: No such file or directory ----- ls -l output ---------------------------------------------- So I ctag(1)ed /usr/src/usr.sbin/mtree, read manpages for readlink(2), symlink(7) and fts(3) and tried building and checking a database for the /usr/compat/linux path only -- this time everything worked! Well that's a surprise. (Now I can see why you wish for a better way to cause the symptom ...) The only thing I could see is that the above mentioned "X11" file is the only broken symlink on the /usr filesystem (that's what I get from "find /usr -xdev -type l -print0 | xargs -0 file | grep broken", although symlinks aren't necessarily a problem -- perl has a lot of these). And why does this broken symlink break readlink(2) sometimes and sometimes it does not? The only readlink(2) reference I can see in mtree is in the rlink() function in compare.c -- but of course I miss all the implicit invocations libc or fts(3) could bring with them. But the code makes me quite sure: errors in readlink cause err(3) to be called with the formerly mentioned "line %d: %s" message with lineno and fname. Does it matter that lineno is always _behind_ the last db line? I'll dig into this place a little further ... I'm really confused as to where to continue searching, but I'm willing to help with whatever I can do ... To summarize: It's not about the broken symlink in itself. But readlink(2) fails at a broken symlink when something else happened before -- but I don't know what this is. :< Could bin/4961 (nonzero errno although there's no error) apply in this case? virtually yours 82D1 9B9C 01DC 4FB4 D7B4 61BE 3F49 4F77 72DE DA76 Gerhard Sittig true | mail -s "get gpg key" Gerhard.Sittig@gmx.net -- If you don't understand or are scared by any of the above ask your parents or an adult to help you. On Wed, Sep 06, 2000 at 22:26 +0200, Gerhard Sittig wrote: > > [ ... a whole bunch trying to chase it down ... ] > > To summarize: It's not about the broken symlink in itself. > But readlink(2) fails at a broken symlink when something else > happened before -- but I don't know what this is. :< Could > bin/4961 (nonzero errno although there's no error) apply in > this case? I hate to disappoint the gentle reader here. :( Just when I thought I had a grip to the problem -- it turned out to not lead any further. Hmmm ... I expanded /usr/src/usr.sbin/mtree/compare.c with the following printout, did a make and made sure by means of strings(1) that I ran the newly compiled executable: ----- snip ------------------------------------------------------ # cvs diff -u compare.c Index: compare.c =================================================================== RCS file: /home/ncvs/src/usr.sbin/mtree/compare.c,v retrieving revision 1.15.2.1 diff -u -r1.15.2.1 compare.c --- compare.c 2000/06/28 02:33:17 1.15.2.1 +++ compare.c 2000/09/06 20:07:23 @@ -362,6 +362,9 @@ static char lbuf[MAXPATHLEN]; register int len; + if (errno) { + warn("nonzero errno before readlink(2) in compare:rlink(%s)", name); + } if ((len = readlink(name, lbuf, sizeof(lbuf) - 1)) == -1) err(1, "line %d: %s", lineno, name); lbuf[len] = '\0'; # strings /usr/obj/usr/src/usr.sbin/mtree/mtree | grep nonzero nonzero errno before readlink(2) in compare:rlink(%s) # truss /usr/obj/usr/src/usr.sbin/mtree/mtree \ -K $KEYS -p /usr -x -X $DB/_usr.ex < $DB/_usr.db \ 2> ~/nonzero.err > ~/nonzero.std # tail -15 ~/nonzero.err read(0x3,0xbfbff520,0x400) = 1024 (0x400) read(0x3,0xbfbff520,0x400) = 1024 (0x400) read(0x3,0xbfbff520,0x400) = 168 (0xa8) read(0x3,0xbfbff520,0x400) = 0 (0x0) close(3) = 0 (0x0) readlink("X11",0x804ef80,1023) ERR#2 'No such file or directory' mtree: write(2,0xbfbff18c,7) = 7 (0x7) line 1361949: X11write(2,0xbfbff1bc,17) = 17 (0x11) : write(2,0xbfbff17c,2) = 2 (0x2) No such file or directory write(2,0xbfbff17c,26) = 26 (0x1a) sigprocmask(0x1,0x280605a0,0xbfbff824) = 0 (0x0) sigprocmask(0x3,0x280605b0,0x0) = 0 (0x0) write(1,0xd2dd000,2466) = 2466 (0x9a2) exit(0x1) process exit, rval = 256 ----- snap ------------------------------------------------------ In the end I can only see two reasons for mtree's failure: - readlink(2) barfs on broken symlinks under not known yet conditions (after "long" runs?) and does so unpredictably - mtree fails to chdir(2) to the new directory and therefor fails to read _any_ file there with a relative pathname The latter thought is new and comes from the fact, that "X11" on organ as well as "30_qmail.sh" on the third machine are the first files in their directories (and in the .db file's respective section). But this doesn't hold any longer when looking at "libtermcap_p.a" on stein which is "in the middle" of the listing and the .db dir section. Once I can find another spare moment I will try to dump the relevant info of the readlink invocation's environment (current dir, fts' root + path + filename, fts' already provided slink and access failure info, etc). And I will try to determine the "needed" depth of the tree somewhere between /usr and /usr/compat/linux/usr/lib to make the mtree run fail. virtually yours 82D1 9B9C 01DC 4FB4 D7B4 61BE 3F49 4F77 72DE DA76 Gerhard Sittig true | mail -s "get gpg key" Gerhard.Sittig@gmx.net -- If you don't understand or are scared by any of the above ask your parents or an adult to help you. State Changed From-To: open->feedback Gerhard is still trying to get a handle on this. Anyone else seeing the problem, please feel free to chirp up. Hi Sheldon,
In compare.c, how about changing
if (s->flags & F_SLINK && strcmp(cp = rlink(name), s->slink)) {
to
if (s->flags & F_SLINK && strcmp(cp = rlink((p->fts_accpath), s->slink))
{
readlink() in rlink() fails because the cwd is /, and name is a bare filename.
p->fts_accpath is the relative pathname through which the file called name can
be reached. I saw the same thing with /etc/malloc.conf -> aj, and this fix
seems to work around the problem. Btw, I sent this to Gerhard Sittig as well.
Do tell me if I'm way off base here :)
Cheers,
--
Jos Backus _/ _/_/_/ "Modularity is not a hack."
_/ _/ _/ -- D. J. Bernstein
_/ _/_/_/
_/ _/ _/ _/
josb@cncdsl.com _/_/ _/_/_/ use Std::Disclaimer;
On Thu, Sep 07, 2000 at 21:08 +0200, Gerhard Sittig wrote: >=20 > In the end I can only see two reasons for mtree's failure: > - readlink(2) barfs on broken symlinks under not known yet > conditions (after "long" runs?) and does so unpredictably > - mtree fails to chdir(2) to the new directory and therefor fails > to read _any_ file there with a relative pathname Obviously I haven't seen the third opportunity: - mtree(1) (or fts(3) used by mtree) should either chdir(2) and readlink(2) with a relative path or stay in the -p base and readlink(2) with a pathname relative to the basedir. see below > Once I can find another spare moment I will try to dump the > relevant info of the readlink invocation's environment (current > dir, fts' root + path + filename, fts' already provided slink > and access failure info, etc). After applying the patch cited below (inspired by Jos Backus' hint) and running another session I got these results: ----------------------------------------------------------------- $ cd /usr/src/usr.sbin/mtree $ cvs diff -u=20 cvs diff: Diffing . Index: compare.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D RCS file: /home/ncvs/src/usr.sbin/mtree/compare.c,v retrieving revision 1.15.2.1 diff -u -r1.15.2.1 compare.c --- compare.c 2000/06/28 02:33:17 1.15.2.1 +++ compare.c 2000/09/09 14:02:49 @@ -64,6 +64,7 @@ extern int lineno; =20 static char *ftype __P((u_int)); +static char *rlink_check __P((char *, FTSENT *)); =20 #define INDENTNAMELEN 8 #define LABEL \ @@ -298,7 +299,7 @@ } #endif /* RMD160 */ =20 - if (s->flags & F_SLINK && strcmp(cp =3D rlink(name), s->slink)) { + if (s->flags & F_SLINK && strcmp(cp =3D rlink_check(name, p), s->slink)) { LABEL; (void)printf("%slink ref (%s, %s)\n", tab, cp, s->slink); } @@ -353,6 +354,44 @@ return ("unknown"); } /* NOTREACHED */ +} + +char * +rlink_check(name, p) + char *name; + FTSENT *p; +{ +#if 1 + FILE *f; + char *buf; + + if (errno) { + warn("nonzero errno before readlink(2)"); + } + + f =3D fopen("/var/tmp/mtree.log", "a"); + if (f) { + buf =3D getcwd(NULL, 0); + fprintf(f, "cwd=3D%s\n", buf); + free(buf); buf =3D NULL; + + fprintf(f, "name=3D%s\n", name); + + fprintf(f, "fts_info=3D%hu\n", p->fts_info); + fprintf(f, "fts_accpath=3D%s\n", p->fts_accpath); + fprintf(f, "fts_path=3D%s\n", p->fts_path); + fprintf(f, "fts_name=3D%s\n", p->fts_name); + fprintf(f, "fts_level=3D%h\n", p->fts_level); + fprintf(f, "fts_errno=3D%d\n", p->fts_errno); + + fprintf(f, "\n"); + fclose(f); + } else { + warn("logfile problem in rlink_check()"); + } +#endif + + return(rlink(name)); } =20 char * $ strings /usr/obj/usr/src/usr.sbin/mtree/mtree | grep rlink logfile problem in rlink_check() $ truss /usr/obj/usr/src/usr.sbin/mtree/mtree \ -K $KEYS -p /usr -x -X $DB/_usr.ex < $DB/_usr.db \ 2>&1 >/dev/null | tail -100 read(0x3,0xbfbff16c,0x400) =3D 0 (0x0) close(3) =3D 0 (0x0) open("./compat/linux/usr/lib/python1.5/site-packages/rpmmodule.so",0,027757= 772554) =3D 3 (0x3) read(0x3,0xbfbff16c,0x400) =3D 1024 (0x400) =2E.. read(0x3,0xbfbff16c,0x400) =3D 1024 (0x400) read(0x3,0xbfbff16c,0x400) =3D 168 (0xa8) read(0x3,0xbfbff16c,0x400) =3D 0 (0x0) close(3) =3D 0 (0x0) open("./compat/linux/usr/lib/python1.5/site-packages/rpmmodule.so",0,027757= 772554) =3D 3 (0x3) read(0x3,0xbfbff16c,0x400) =3D 1024 (0x400) =2E.. read(0x3,0xbfbff16c,0x400) =3D 1024 (0x400) read(0x3,0xbfbff16c,0x400) =3D 168 (0xa8) read(0x3,0xbfbff16c,0x400) =3D 0 (0x0) close(3) =3D 0 (0x0) open("/var/tmp/mtree.log",521,0666) =3D 3 (0x3) lseek(3,0x0,2) =3D 0 (0x0) sigaction(SIGSYS,0xbfbff4b8,0xbfbff4a0) =3D 0 (0x0) __getcwd(0xd2da400,0x3fc) =3D 0 (0x0) sigaction(SIGSYS,0xbfbff4a0,0x0) =3D 0 (0x0) fstat(3,0xbfbff1b0) =3D 0 (0x0) write(3,0xd2de000,142) =3D 142 (0x8e) close(3) =3D 0 (0x0) readlink("X11",0x804f160,1023) ERR#2 'No such file or directory' mtree: write(2,0xbfbfeda8,7) =3D 7 (0x7) line 1361949: X11write(2,0xbfbfedd8,17) =3D 17 (0x11) : write(2,0xbfbfed98,2) =3D 2 (0x2) No such file or directory write(2,0xbfbfed98,26) =3D 26 (0x1a) sigprocmask(0x1,0x280605a0,0xbfbff440) =3D 0 (0x0) sigprocmask(0x3,0x280605b0,0x0) =3D 0 (0x0) write(1,0xd2da000,357) =3D 357 (0x165) exit(0x1) process exit, rval =3D 256 $ file /var/tmp/mtree.log /var/tmp/mtree.log: ASCII text $ tail -100 /var/tmp/mtree.log cwd=3D/usr name=3DX11 fts_info=3D13 fts_accpath=3D./compat/linux/usr/lib/X11 fts_path=3D./compat/linux/usr/lib/X11 fts_name=3DX11 fts_level=3D fts_errno=3D0 ----------------------------------------------------------------- This means that by chance(?) the broken symlink is the first symlink at all in the /usr fs. So I went and "fixed" mtree with the following patch: BTW: Labouring on the topic and using sysmouse for transferring the reports and diffs between machines I'm *very* urged to go and make scmouse work in the _expected_ way. Has nobody ever noticed it's "broken" (I couldn't see a PR in up to and including #21086)? At the very least it's inconsistent with xterm and rather annoying. Prepare to see another PR on this very subject soon ... :> ----------------------------------------------------------------- --- compare.c 2000/06/28 02:33:17 1.15.2.1 +++ compare.c 2000/09/09 15:20:41 @@ -298,7 +298,7 @@ } #endif /* RMD160 */ - if (s->flags & F_SLINK && strcmp(cp =3D rlink(name), s->slink)) { + if (s->flags & F_SLINK && strcmp(cp =3D rlink(p->fts_accpath), s->s= link)) { LABEL; (void)printf("%slink ref (%s, %s)\n", tab, cp, s->slink); } ----------------------------------------------------------------- This patch makes mtree(1) work again. But I'm still not clear as to whether fts(3) has chdir(2) problems (or if it should chdir(2) at all) or if it's mtree(1)'s fault damaging the current directory setting somehow. Having a closer look at the compare() function everywhere fts_accpath is used and the name parameter seems to be for logging or relative pathname database creation only. So it all could have been this simple and the problem is solved and closed? But I still feel mtree should have failed for *any* symlink not residing in the -p base before. Hmmm ... virtually yours 82D1 9B9C 01DC 4FB4 D7B4 61BE 3F49 4F77 72DE DA76 Gerhard Sittig true | mail -s "get gpg key" Gerhard.Sittig@gmx.net --=20 If you don't understand or are scared by any of the above ask your parents or an adult to help you. :r !ssh -l admin 192.168.11.142 cat /var/tmp/mtree.scr Script started on Sat Sep 9 16:06:02 2000 organ# pwd /usr/src/usr.sbin/mtree organ# ^D=08=08exit Script done on Sat Sep 9 16:08:31 2000 On Sat, Sep 09, 2000 at 20:14 +0200, Gerhard Sittig wrote: > > ----------------------------------------------------------------- > --- compare.c 2000/06/28 02:33:17 1.15.2.1 > +++ compare.c 2000/09/09 15:20:41 > @@ -298,7 +298,7 @@ > > } > > #endif /* RMD160 */ > - if (s->flags & F_SLINK && strcmp(cp = rlink(name), s->slink)) { > + if (s->flags & F_SLINK && strcmp(cp = rlink(p->fts_accpath), s->slink)) { > LABEL; > (void)printf("%slink ref (%s, %s)\n", tab, cp, s->slink); > } > ----------------------------------------------------------------- > > This patch makes mtree(1) work again. [ ... ] That's the stage where I thought this PR to be assigned to some mtree programmer and closed real soon. :) Looking at create.c (which I wish I had done so a little earlier) will reveal that this is the real solution to the bug mtree has: it fails to compare _any_ symlink not living in the -p base directory (or cwd, however you invoke it). But maybe this patch should not have been buried the way it was in my previous posting. That I thought mtree to fail sometimes only seems to be due to me not realizing how few symlinks are on my filesystems ... And I'm aware of the fact that I should teach myself some more basic skills (how do I operate my editor? may I stop proofreading my emails after recognizing my own sig or is this too much of a hurry?) before telling others how to fix sbin executables. :> virtually yours 82D1 9B9C 01DC 4FB4 D7B4 61BE 3F49 4F77 72DE DA76 Gerhard Sittig true | mail -s "get gpg key" Gerhard.Sittig@gmx.net -- If you don't understand or are scared by any of the above ask your parents or an adult to help you. State Changed From-To: feedback->open We've got good feedback. Responsible Changed From-To: freebsd-bugs->sheldonh I'll commit this. State Changed From-To: open->analyzed Patch applied to HEAD as rev 1.17 of compare.c. This will be merged onto the RELENG_4 branch later. Thanks, gents! State Changed From-To: analyzed->closed Merged onto the RELENG_4 branch in time for 4.2-RELEASE. |