| Summary: | Error in output of tcpdump(1) | ||
|---|---|---|---|
| Product: | Base System | Reporter: | Oliver Fromme <olli> |
| Component: | bin | Assignee: | freebsd-bugs (Nobody) <bugs> |
| Status: | Closed FIXED | ||
| Severity: | Affects Only Me | CC: | olli |
| Priority: | Normal | ||
| Version: | 6.2-PRERELEASE | ||
| Hardware: | Any | ||
| OS: | Any | ||
|
Description
Oliver Fromme
2006-11-09 12:30:31 UTC
Ricardo Nabinger Sanchez wrote: > Oliver Fromme wrote: > > > While trying to debug a problem with NFS mounts via TCP, > > I used the following tcpdump(1) command to watch traffic > > on lo0. Note that some (but not all) of the port numbers > > are wrong: > > > > 127.0.0.1.2714894848 > 127.0.0.1.2049 > > 127.0.0.1.2049 > 127.0.0.1.3251765760 > > Not sure if it makes sense, but 2714894848 & 65535 == 1024. Any > chances that this port was indeed used during your capture (perhaps > cross-checking with netstat)? No, the actual port was 982, which seems to be completely unrelated to the numbers printed by tcpdump. I converted the "suspicios" numbers to hex: 2714894848 == 0xa1d20200 3251765760 == 0xc1d20200 652476928 == 0x26e40200 1054278144 == 0x3ed70200 98828800 == 0x05e40200 The only common patter is 0x0200 (decimal 1024) in the lower 16 bits. Could this be a subtle compiler bug? I've compiled the system without any special compiler settings. make.conf is empty. > In contrib/tcpdump/addrtoname.c:621:638 (-CURRENT): > [...] The source you quoted looks OK to me. I can't see anything that could explain the symptom. > Are you running i386 binary on an AMD64/EMT64 (or another > 64 bits variant)? It's all i386 32bit (both kernel + userland). I don't even know if the machine supports 64bit, it's a Xeon 3.2 GHz. The dmesg says "AMD Features=0x20000000<LM>", does that mean "long mode", i.e. 64bit? Anyway, I'm running 32bit, and going to 64bit is not an option. Best regards Oliver -- Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing Dienstleistungen mit Schwerpunkt FreeBSD: http://www.secnetix.de/bsd Any opinions expressed in this message may be personal to the author and may not necessarily reflect the opinions of secnetix in any way. "anyone new to programming should be kept as far from C++ as possible; actually showing the stuff should be considered a criminal offence" -- Jacek Generowicz On Thu, 9 Nov 2006 18:35:44 +0100 (CET) Oliver Fromme <olli@lurza.secnetix.de> wrote: > Ricardo Nabinger Sanchez wrote: > > Oliver Fromme wrote: > > > > > While trying to debug a problem with NFS mounts via TCP, > > > I used the following tcpdump(1) command to watch traffic > > > on lo0. Note that some (but not all) of the port numbers > > > are wrong: > > > > > > 127.0.0.1.2714894848 > 127.0.0.1.2049 > > > 127.0.0.1.2049 > 127.0.0.1.3251765760 > > > > Not sure if it makes sense, but 2714894848 & 65535 == 1024. Any > > chances that this port was indeed used during your capture (perhaps > > cross-checking with netstat)? > > No, the actual port was 982, which seems to be completely > unrelated to the numbers printed by tcpdump. I converted > the "suspicios" numbers to hex: > > 2714894848 == 0xa1d20200 > 3251765760 == 0xc1d20200 > 652476928 == 0x26e40200 > 1054278144 == 0x3ed70200 > 98828800 == 0x05e40200 > > The only common patter is 0x0200 (decimal 1024) in the > lower 16 bits. > > Could this be a subtle compiler bug? I've compiled the > system without any special compiler settings. make.conf > is empty. It's possible, but I think this is the least expected cause. > > > In contrib/tcpdump/addrtoname.c:621:638 (-CURRENT): > > [...] > > The source you quoted looks OK to me. I can't see anything > that could explain the symptom. The printed number was a 16 bit integer cast to a 32 bit one, and it got mangled somehow. > > > Are you running i386 binary on an AMD64/EMT64 (or another > > 64 bits variant)? > > It's all i386 32bit (both kernel + userland). I don't even > know if the machine supports 64bit, it's a Xeon 3.2 GHz. > The dmesg says "AMD Features=0x20000000<LM>", does that > mean "long mode", i.e. 64bit? Anyway, I'm running 32bit, > and going to 64bit is not an option. I'd recommend you to capture a tcpdump trace, if possible, with -w option. Something like: # tcpdump -c 100 -s 1600 -w oddport.dump To save 100 packets and probably some that triggers this issue. With this (small) dump we can debug tcpdump and see what's going wrong. Regards. :) ps: my previous message was classified as spam (and discarded), but I'm subscribed to the list. Odd, too. -- Ricardo Nabinger Sanchez <rnsanchez@{gmail.com,wait4.org}> Powered by FreeBSD "Left to themselves, things tend to go from bad to worse." Ricardo Nabinger Sanchez wrote: > The printed number was a 16 bit integer cast to a 32 bit one, and it got > mangled somehow. True. > I'd recommend you to capture a tcpdump trace, if possible, with -w option. > Something like: > > # tcpdump -c 100 -s 1600 -w oddport.dump > > To save 100 packets and probably some that triggers this issue. With this > (small) dump we can debug tcpdump and see what's going wrong. Now it gets even more interesting. I did what you suggested, then tried to view the file with "tcpdump -r" (on the same machine), and it displayed "0" for the port number in all places where the big 32bit numbers where displayed previously. Then I copied the file to a different machine (running an older FreeBSD 4.x), and tcpdump there displayed all of the port numbers correctly (i.e. neither big 32bit numbers nor zero). So it is definitely purely a display problem on my RELENG_6 machine. The contents of the capture file are OK. If anyone wants to look at it anyway, here it is: http://www.secnetix.de/~olli/tmp/tcpdump.log.gz On that offending machine, the first few packets look like this (only IP:port, rest omitted): 127.0.0.1.0 > 127.0.0.1.2049 127.0.0.1.2049 > 127.0.0.1.0 127.0.0.1.911 > 127.0.0.1.2049 127.0.0.1.911 > 127.0.0.1.2049 127.0.0.1.2049 > 127.0.0.1.911 127.0.0.1.0 > 127.0.0.1.2049 127.0.0.1.2049 > 127.0.0.1.955 127.0.0.1.955 > 127.0.0.1.2049 127.0.0.1.0 > 127.0.0.1.2049 127.0.0.1.2049 > 127.0.0.1.0 On my 4.x machine, the same packets from that file are displayed correctly: 127.0.0.1.911 > 127.0.0.1.2049 127.0.0.1.2049 > 127.0.0.1.911 127.0.0.1.911 > 127.0.0.1.2049 127.0.0.1.911 > 127.0.0.1.2049 127.0.0.1.2049 > 127.0.0.1.911 127.0.0.1.955 > 127.0.0.1.2049 127.0.0.1.2049 > 127.0.0.1.955 127.0.0.1.955 > 127.0.0.1.2049 127.0.0.1.955 > 127.0.0.1.2049 127.0.0.1.2049 > 127.0.0.1.955 I've also put my binaries of tcpdump and libpcap online, in case someone wants to look for compiler errors (or just try to reproduce the problem with my binaries): http://www.secnetix.de/~olli/tmp/tcpdump http://www.secnetix.de/~olli/tmp/libpcap.so.4 They're build from RELENG_6 sources of 2006-11-08. Best regards Oliver -- Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing Dienstleistungen mit Schwerpunkt FreeBSD: http://www.secnetix.de/bsd Any opinions expressed in this message may be personal to the author and may not necessarily reflect the opinions of secnetix in any way. "The ITU has offered the IETF formal alignment with its corresponding technology, Penguins, but that won't fly." -- RFC 2549 Hi Oliver, On Thu, 9 Nov 2006 20:01:12 +0100 (CET) Oliver Fromme <olli@lurza.secnetix.de> wrote: > Now it gets even more interesting. I did what you suggested, > then tried to view the file with "tcpdump -r" (on the same > machine), and it displayed "0" for the port number in all > places where the big 32bit numbers where displayed previously. > > Then I copied the file to a different machine (running an > older FreeBSD 4.x), and tcpdump there displayed all of the > port numbers correctly (i.e. neither big 32bit numbers nor > zero). So it is definitely purely a display problem on my > RELENG_6 machine. The contents of the capture file are OK. I downloaded your files, tried with my tcpdump, yours, also with your libpcap, and got consistent and uniform results, verified together with md5 after checking the output data. There must be some odd problem somewhere on your libc. If you want to check what's going on, you can recompile tcpdump enabling debug info (yours doesn't have), set a breakpoint at "tcp_print" and "tcpport_string", in order to check when the port number is mangled. Hope that helps. Regards. :) -- Ricardo Nabinger Sanchez <rnsanchez@{gmail.com,wait4.org}> Powered by FreeBSD "Left to themselves, things tend to go from bad to worse." Hi Ricardo, Sorry for the long delay. I got distracted by other problems that were more important than this one. Ricardo Nabinger Sanchez wrote: > Oliver Fromme wrote: > > [...] > > Now it gets even more interesting. I did what you suggested, > > then tried to view the file with "tcpdump -r" (on the same > > machine), and it displayed "0" for the port number in all > > places where the big 32bit numbers where displayed previously. > > > > Then I copied the file to a different machine (running an > > older FreeBSD 4.x), and tcpdump there displayed all of the > > port numbers correctly (i.e. neither big 32bit numbers nor > > zero). So it is definitely purely a display problem on my > > RELENG_6 machine. The contents of the capture file are OK. > > I downloaded your files, tried with my tcpdump, yours, also with your > libpcap, and got consistent and uniform results, verified together with md5 > after checking the output data. Yes, my results are also consistent and uniform, byt they're still wrong. I have tried the same on a different box (completely different hardware with a RELENG_6 installation unrelated to the first machine). I can reproduce the problem easily 100%: - Export an arbitrary directory to localhost. I used this line in /etc/exports: /usr/bin -ro 127.0.0.1 - Start rpcbind, mountd, nfsd and nfsiod if necessary. - Mount the exported directory: mount -t nfs -o ro /mnt - Run tcpdump, e.g.: tcpdump -ni lo0 - Make some NFS traffic: ls -l /mnt tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on lo0, link-type NULL (BSD loopback), capture size 96 bytes 17:08:45.958451 IP 127.0.0.1.659314964 > 127.0.0.1.2049: 104 access [|nfs] 17:08:45.958502 IP 127.0.0.1.2049 > 127.0.0.1.659314964: reply ok 120 access c 274c5915 17:08:45.958550 IP 127.0.0.1.659314965 > 127.0.0.1.2049: 104 access [|nfs] 17:08:45.958570 IP 127.0.0.1.2049 > 127.0.0.1.659314965: reply ok 120 access c 274c5916 17:08:45.958596 IP 127.0.0.1.659314966 > 127.0.0.1.2049: 100 fsstat [|nfs] 17:08:45.958613 IP 127.0.0.1.2049 > 127.0.0.1.659314966: reply ok 168 fsstat [|nfs] ... and so on. As I said, I can reproduce the problem reliably on RELENG_6 on various hardware. I'm sure you will be able to reproducee it if you perform the same steps that I do. Make sure the NFS mount is UDP/v3 (should be the default). > There must be some odd problem somewhere on your libc. I've cvsupped the sources, rebuilt libc, libpcap and tcpdump, but no change. I even took the binaries and libs from the official 6.2-BETA3. No change. Best regards Oliver -- Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing Dienstleistungen mit Schwerpunkt FreeBSD: http://www.secnetix.de/bsd Any opinions expressed in this message may be personal to the author and may not necessarily reflect the opinions of secnetix in any way. > Can the denizens of this group enlighten me about what the > advantages of Python are, versus Perl ? "python" is more likely to pass unharmed through your spelling checker than "perl". -- An unknown poster and Fredrik Lundh Hi Oliver, On Mon, 27 Nov 2006 21:04:20 +0100 (CET) Oliver Fromme <olli@lurza.secnetix.de> wrote: > Then start the following daemons as root (in this order): > # rpcbind > # mountd > # nfsd > No options are required. Check /var/log/messages for errors. > If you already had a mountd daemon running, you need to send > it a SIGHUP so it wil re-read the /etc/exports file: > # kill -HUP $(cat /var/run/mountd.pid) > > Now mount the exported directory to /mnt: > # mount -t nfs -o ro 127.0.0.1:/usr/bin /mnt > > Use this command to watch NFS UDP traffic on localhost: > # tcpdump -ni lo0 udp port 2049 Thanks for these easy steps. I could reproduce the problem, and discovered that, oddly, what tcpdump prints is not the source port, it's an ID. Look: 18:20:07.299689 IP 127.0.0.1.1490240411 > 127.0.0.1.2049: 100 fsstat [|nfs] 18:20:07.299772 IP 127.0.0.1.2049 > 127.0.0.1.1490240411: reply ok 168 fsstat [|nfs] == src/contrib/contrib_tcpdump/print-nfs.c:420 void nfsreq_print(register const u_char *bp, u_int length, register const u_char *bp2) { ... rp = (const struct sunrpc_msg *)bp; if (!nflag) { snprintf(srcid, sizeof(srcid), "%u", EXTRACT_32BITS(&rp->rm_xid)); strlcpy(dstid, "nfs", sizeof(dstid)); } else { snprintf(srcid, sizeof(srcid), "%u", EXTRACT_32BITS(&rp->rm_xid)); snprintf(dstid, sizeof(dstid), "%u", NFS_PORT); } print_nfsaddr(bp2, srcid, dstid); ... == The "else" branch is taken (because -n was passed to tcpdump), and in the first snprintf you can see that "rp->rm_xid" is printed where one expects to see source port. A hack may workaround this, given that the header position is fixed and can be recomputed, and then the source port can be printed instead of rm_xid. I really don't know if this is what the author wanted, it doesn't make sense to me to have a ID of some kind printed along with the IP address where one is expecting the source port of the packet (POLA violation, perhaps?). Moreover, I don't know who should tackle this PR (FreeBSD or tcpdump developers?). :) Sorry for the long delay to reply (Real Life Problems (tm)). -- Ricardo Nabinger Sanchez <rnsanchez@{gmail.com,wait4.org}> Powered by FreeBSD "Left to themselves, things tend to go from bad to worse." State Changed From-To: open->feedback From the tcpdump man page: %%% Sun NFS (Network File System) requests and replies are printed as: src.xid > dst.nfs: len op args src.nfs > dst.xid: reply stat len op results sushi.6709 > wrl.nfs: 112 readlink fh 21,24/10.73165 wrl.nfs > sushi.6709: reply ok 40 readlink "../var" sushi.201b > wrl.nfs: 144 lookup fh 9,74/4096.6878 "xcolors" wrl.nfs > sushi.201b: reply ok 128 lookup fh 9,74/4134.3150 In the first line, host sushi sends a transaction with id 6709 to wrl (note that the number following the src host is a transaction id, not the source port). The request was 112 bytes, excluding the UDP and IP headers. The operation was a readlink (read symbolic link) on file handle (fh) 21,24/10.731657119. (If one is lucky, as in this case, the file handle can be interpreted as a major,minor device number pair, followed by the inode number and generation number.) Wrl replies `ok' with the contents of the link. In the third line, sushi asks wrl to lookup the name `xcolors' in directory file 9,74/4096.6878. Note that the data printed depends on the operation type. The format is intended to be self explanatory if read in conjunction with an NFS protocol spec. %%% So what you see is the transaction id and not a source port. If you want to see the port you can use tcpdump -q. Do you agree that this PR can be closed? State Changed From-To: feedback->closed Oliver agrees the PR can be closed. |