Running lsof without arguments gives an error. lsof-4.96.4,8 # gdb lsof Reading symbols from lsof... (No debugging symbols found in lsof) (gdb) run Starting program: /usr/local/sbin/lsof Program received signal SIGSEGV, Segmentation fault. Invalid permissions for mapped object. memcpy () at /home/ronald/dev/freebsd/src/contrib/arm-optimized-routines/string/aarch64/memcpy.S:175 175 stp D_l, D_h, [dst, 64]! (gdb) bt #0 memcpy () at /home/ronald/dev/freebsd/src/contrib/arm-optimized-routines/string/aarch64/memcpy.S:175 #1 0x0000000000218be4 in ?? () #2 0x0000000500000000 in ?? () Backtrace stopped: previous frame identical to this frame (corrupt stack?) (gdb) # uname -a FreeBSD rpi4 14.0-CURRENT FreeBSD 14.0-CURRENT #1 main-aba921bd9e-dirty: Sat Oct 29 00:31:53 CEST 2022 GENERIC-NODEBUG arm64
(In reply to Ronald Klop from comment #0) I tried building (via poudriere), installing, and running lsof 4.96.4,8 on amd64 and it worked fine. I then did the same on aarch64 (HoneyComb) and it also worked fine. aarch64 text: # pkg add -f /usr/local/poudriere/data/packages/main-CA72-bulk_a-default/All/lsof-4.96.4,8.pkg pkg: Warning: Major OS version upgrade detected. Running "pkg bootstrap -f" recommended Installing lsof-4.96.4,8... . . . # lsof lsof: WARNING: device cache mismatch: /dev/vcio COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME kernel 0 root cwd VDIR 222756737,2677268003 65 34 / (zopt0/ROOT/main-CA72) . . . You may need to specify more context to better enable more folks to reproduce the failure. A possibility might be UFS vs. ZFS but there could be other context differences. For reference (long output line split for readability): # uname -apKU FreeBSD CA72_16Gp_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #65 main-n259064-f83db6441a2f-dirty: Sun Nov 6 17:08:00 PST 2022 root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-nodbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-NODBG-CA72 arm64 aarch64 1400073 1400073 # ~/fbsd-based-on-what-commit.sh -C /usr/main-src/ f83db6441a2f (HEAD -> main, freebsd/main, freebsd/HEAD) sctp: minor changes due to upstreaming of Glebs recent changes branch: main merge-base: f83db6441a2f4f925a169c7ddf844589cb73c9b5 merge-base: CommitDate: 2022-11-06 22:06:40 +0000 n259064 (--first-parent --count for merge-base) # ~/fbsd-based-on-what-commit.sh -C /usr/ports/ ed69f5862e3f (HEAD -> main, freebsd/main, freebsd/HEAD) math/vtk9: Fix mismatch in MPI dependency branch: main merge-base: ed69f5862e3f322a32eda4fdb62100d2792419a4 merge-base: CommitDate: 2022-10-20 22:31:03 +0000 n598718 (--first-parent --count for merge-base)
FYI: > pkg: Warning: Major OS version upgrade detected. Running "pkg bootstrap -f" recommended is from my happening to have pkg from a 13.1-RELEASE build in place. Other than the rare use of lsof, I do not use ports that are tied to the kernel vintage details and so normally run 13.?-RELEASE ports on each of releng/13.? , stable/13 , and main . lsof I build just for main: # pkg info lsof pkg: Warning: Major OS version upgrade detected. Running "pkg bootstrap -f" recommended lsof-4.96.4,8 Name : lsof Version : 4.96.4,8 Installed on : Mon Nov 14 20:54:01 2022 PST Origin : sysutils/lsof Architecture : FreeBSD:14:aarch64 Prefix : /usr/local Categories : sysutils Licenses : lsof Maintainer : ler@FreeBSD.org WWW : https://github.com/lsof-org/lsof Comment : Lists information about open files (similar to fstat(1)) Annotations : FreeBSD_version: 1400073 build_timestamp: 2022-11-15T04:53:41+0000 built_by : poudriere-git-3.3.99.20220831 cpe : cpe:2.3:a:lsof_project:lsof:4.96.4:::::freebsd14:aarch64 port_checkout_unclean: no port_git_hash : 7a902c8910c3 ports_top_checkout_unclean: yes ports_top_git_hash: ed69f5862e3f . . .
(In reply to Mark Millard from comment #2) I tried installing and running lsof on a UFS system (a RPi4B) that has a little older FreeBSD vintage and it worked fine there as well: # pkg add -f ~/lsof-4.96.4,8.pkg Installing lsof-4.96.4,8... Newer FreeBSD version for package lsof: To ignore this error set IGNORE_OSVERSION=yes - package: 1400073 - running kernel: 1400072 Ignore the mismatch and continue? [y/N]: y Extracting lsof-4.96.4,8: 100% . . . # lsof lsof: WARNING: device cache mismatch: /dev/input/event0 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME kernel 0 root cwd VDIR 0,105 1536 2 / (/dev/gpt/CA72USBufs) . . . For reference (long output line split for readability): # uname -apKU FreeBSD CA72_UFS 14.0-CURRENT FreeBSD 14.0-CURRENT #63 main-n258610-ba7319e9091b-dirty: Fri Oct 14 14:29:14 PDT 2022 root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-nodbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-NODBG-CA72 arm64 aarch64 1400072 1400072 On the HoneyComb: # git -C /usr/main-src/ diff contrib/arm-optimized-routines/string/aarch64/memcpy.S # (i.e., no changes.) I'm unsure what difference(s) of context matter here.
This is weird. I cannot reproduce this today. I can scroll back in my terminal where the error happened yesterday. But running it again works.
Could that be related to bug 264094?
@emaste: Can you look into this?
I see this is now closed as not reproducible; if it happens again, please get a backtrace from the core file using lldb or gdb, or by running lsof directly under lldb and issuing the `bt` command. For example: ~ $ lldb lsof (lldb) target create "lsof" Current executable set to 'lsof' (x86_64). (lldb) run Process 28273 launched: '/usr/local/sbin/lsof' (x86_64) lsof: WARNING: compiled for FreeBSD release 13.1-RELEASE-p5; this is 14.0-CURRENT. lsof: kvm_open(execfile=/boot/wipbsd.20221006/kernel, corefile=/dev/mem): Invalid argument COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME sh 2010 emaste txt VREG 431860091,3334069762 169816 652702 ... ... Process 28273 stopped * thread #1, name = 'lsof', stop reason = signal SIGSTOP frame #0: 0x00000008237bdc0a libc.so.7`__sys_poll at _poll.S:4 1 /* @generated by libc/sys/Makefile.inc */ 2 #include "compat.h" 3 #include "SYS.h" -> 4 PSEUDO(poll) 5 .section .note.GNU-stack,"",%progbits (lldb) bt * thread #1, name = 'lsof', stop reason = signal SIGSTOP * frame #0: 0x00000008237bdc0a libc.so.7`__sys_poll at _poll.S:4 frame #1: 0x000000082379681b libc.so.7`__res_nsend [inlined] ... frame #2: 0x0000000823796675 libc.so.7`__res_nsend... frame #3: 0x00000008237950bc libc.so.7`__res_nquery... frame #4: 0x0000000823763779 libc.so.7`_dns_gethostbyaddr... (In this example I interrupted lsof while it was doing DNS resolution)
Hello, I'm having a segmentation fault issue with lsof on 13.2-RELEASE as well, I'm not sure if it's the same issue though. For me the pkg version prints a warning on startup: lsof: WARNING: compiled for FreeBSD release 13.1-RELEASE-p5; this is 13.2-RELEASE-p2 and then crashes immediately. The version from ports crashes too (but without a warning), here is a stack trace for it: Process 93962 launched: '/usr/ports/sysutils/lsof/work/stage/usr/local/sbin/lsof' (x86_64) Process 93962 stopped * thread #1, name = 'lsof', stop reason = signal SIGSEGV: address access protected (fault address: 0x800c4fffa) frame #0: 0x00000008227d948f libc.so.7`memmove + 607 libc.so.7`memmove: -> 0x8227d948f <+607>: rep movsq (%rsi), %es:(%rdi) 0x8227d9492 <+610>: cld 0x8227d9493 <+611>: movq %rdx, %rcx 0x8227d9496 <+614>: andb $0x7, %cl (lldb) bt * thread #1, name = 'lsof', stop reason = signal SIGSEGV: address access protected (fault address: 0x800c4fffa) * frame #0: 0x00000008227d948f libc.so.7`memmove + 607 frame #1: 0x00000820c4d7b000 frame #2: 0x000000000020e374 lsof`process_kinfo_file(kf=0x0000000827924c30, xfile=0x00000008273170c0, pcbs=0x00000008254ed000, locks=0x0000000820c4d998) at dproc.c:224:6 frame #3: 0x000000000020dd2b lsof`process_file_descriptors(p=0x0000000825505500, ckscko=0, xfiles=0x0000000827307740, n_xfiles=6339, pcbs=0x00000008254ed000, locks=0x0000000820c4d998) at dproc.c:315:3 frame #4: 0x000000000020d444 lsof`gather_proc_info at dproc.c:520:2 frame #5: 0x0000000000218748 lsof`main(argc=1, argv=0x0000000820c4dd58) at main.c:1322:6 frame #6: 0x000000000020a570 lsof`_start(ap=<unavailable>, cleanup=<unavailable>) at crt1_c.c:75:7
I stumbled across this happening with FreeBSD-14. Hopefully this is the same issue you guys have seen, as I include the fix. Basically, it's only a problem if you have NULLFS mounted filesystems. I noticed it when a lot of files were open on such a filesystem (doing a software build which used parallel compiles) As seen in the attached patch, the problem code replaces the nullfs mounted fullpathname with the "real" full pathname: memmove(&vfs_path[strlen(vfs->fsname) + 1], &vfs_path[strlen(vfs->dir) + 1], strlen(vfs_path) - strlen(vfs->dir) + 1); memcpy(vfs_path, vfs->fsname, strlen(vfs->fsname)); However, sometimes this is called with a pathname of NULL. If your real mountpoint has a length shorter than the nullfs mountpoint, this causes an attempt to memmove a negative number of bytes. The attached fix doesn't attempt the memmove/memcpy if the size would be negative. Also, please note, this also fixes an "off by one" error in the same code - it wasn't actually producing the correct result even when it wasn't SIGSEGV'ing !
Created attachment 246486 [details] fix to nullsfs sigsegv
P.S. I could reproduce this on armv7 and amd64. In both cases, I built from source using ports. The patch works on both these architectures.
(In reply to Jamie Landeg-Jones from comment #9) Hi, nice finding. "If your real mountpoint has a length shorter than the nullfs mountpoint" Do you have an example of this situation? I run some jails on this machine which uses nullfs mounts and I tend to use this structure of mounts: zrpi4/jails/freebsd13 on /data/jails/freebsd13 (zfs, local, nfsv4acls) zrpi4/jails/loghost on /data/jails/loghost (zfs, local, noatime, nfsv4acls) /data/jails/freebsd13 on /data/jails/loghost/_root (nullfs, local, read-only, nfsv4acls) /data/jails/loghost/root on /data/jails/loghost/_root/root (nullfs, local, noatime, nfsv4acls) /data/jails/loghost/etc on /data/jails/loghost/_root/etc (nullfs, local, noatime, nfsv4acls) /data/jails/loghost/var on /data/jails/loghost/_root/var (nullfs, local, noatime, nfsv4acls) /data/jails/loghost/var/tmp on /data/jails/loghost/_root/tmp (nullfs, local, noatime, nfsv4acls) /data/jails/loghost/local on /data/jails/loghost/_root/usr/local (nullfs, local, noatime, nfsv4acls) devfs on /data/jails/loghost/_root/dev (devfs) It readonly nullfs mounts the OS on _root and then nullfs mounts some writable stuff into it. Could this trigger the bug? Or do you have another example?
Thanks! And sorry, I should have provided more details. Yes, those scenarios would trigger the bug. Let me clarify a few points. Based on my install history (using the same tmpfs setup in both cases), the issue only seemed to start sometime after FreeBSD-13.0-RC2 and at least before FreeBSD-14.0-RELEASE. This was on an armv7 box, but I managed to replicate it on an AMD64 box. I don't know *why* some entries are now reported with a null pathname, it's not unique to tmpfs, but it's a VFS rabbit-hole I've been too scared to go down! However, it doesn't seem to cause any problems (apart from tripping up this lsof bug) Here are some debug details: An amended patch to produce debugging information (showing SKIPPED where the program would normally coredump) is attached. Also attached is this debug information for one process of mine that trips this bug. Note the "off by one" error I mentioned (and fixed) in the first post: In all cases, the "new vfs_path" doesn't copy over the "/" between the filesystem and the path, instead exposing whatever character was previously in the buffer ('s' in my case here) Let me know if I can help further. Cheers, Jamie P.S. Are you able to reopen this PR?
Created attachment 246565 [details] debugging patch to highlight bug
Created attachment 246566 [details] Example results of the debugging patch Note, the debugging output is first showing the length of the field, followed by the field itself.
Reopen as the PR is reproducible now and a even a patch os added. Thanks for looking into this.
Just to add, in case of compatibility worries, the changes I've made only affect tmpfs, and then only the situation where the program would SIGSEGV anyway - any other uses that normally work will remain unchanged.
Created attachment 247081 [details] patch to fix problem with lsof 4.99 Problem still exists with latest lsof in ports. Refactored patch attached.
Can the new fix be committed please?
It'll be done by EOD today (16/Dec/2023). (I'll upstream the fix)
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/ports/commit/?id=ec96413b952b2aade44e1b499c3153c4772860b7 commit ec96413b952b2aade44e1b499c3153c4772860b7 Author: Larry Rosenman <ler@FreeBSD.org> AuthorDate: 2023-12-16 18:02:56 +0000 Commit: Larry Rosenman <ler@FreeBSD.org> CommitDate: 2023-12-16 18:02:56 +0000 sysutils/lsof: update to 4.99.1 Fix compilation error when HASIPv6 is not defined. (@chenrui333) Add configure option --disable-liblsof to disable installation of liblsof. (@subnut, #300) [freebsd] fix segfault from fs info (FreeBSD bug 267760) PR: 267760 Reported by: Ronald Klop sysutils/lsof/Makefile | 3 +-- sysutils/lsof/distinfo | 6 +++--- 2 files changed, 4 insertions(+), 5 deletions(-)
new version created upstream and released, and port updated.
I just confirmed that the issue is fixed for me too. Before upgrade I could reproduce the issue. Dec 27 08:45:18 rpi4 pkg[23576]: lsof upgraded: 4.99.0_1,8 -> 4.99.3,8 After upgrade lsof works fine.
Cheers for the feedback. Just in case someone is reading this thread in the future, please note that on at least 2 occasions I referred to "tmpfs", when I meant "nullfs" ! ... and I wasn't even drinking!