Bug 267760 - sysutils/lsof: Segmentation fault
Summary: sysutils/lsof: Segmentation fault
Status: Closed FIXED
Alias: None
Product: Ports & Packages
Classification: Unclassified
Component: Individual Port(s) (show other bugs)
Version: Latest
Hardware: arm64 Any
: --- Affects Only Me
Assignee: Larry Rosenman
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-11-14 16:00 UTC by Ronald Klop
Modified: 2023-12-27 22:30 UTC (History)
6 users (show)

See Also:
bugzilla: maintainer-feedback? (ler)


Attachments
fix to nullsfs sigsegv (1020 bytes, patch)
2023-11-22 12:15 UTC, Jamie Landeg-Jones
no flags Details | Diff
debugging patch to highlight bug (1.98 KB, patch)
2023-11-25 16:07 UTC, Jamie Landeg-Jones
no flags Details | Diff
Example results of the debugging patch (5.18 KB, text/plain)
2023-11-25 16:09 UTC, Jamie Landeg-Jones
no flags Details
patch to fix problem with lsof 4.99 (1.10 KB, patch)
2023-12-16 17:16 UTC, Jamie Landeg-Jones
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Ronald Klop freebsd_committer freebsd_triage 2022-11-14 16:00:41 UTC
Running lsof without arguments gives an error.

lsof-4.96.4,8

# gdb lsof
Reading symbols from lsof...                                                                                                      
(No debugging symbols found in lsof)                                                                                              
(gdb) run                                                                                                                         
Starting program: /usr/local/sbin/lsof                                                                                            
                                                                                                                                  
Program received signal SIGSEGV, Segmentation fault.                                                                              
Invalid permissions for mapped object.                                                                                            
memcpy () at /home/ronald/dev/freebsd/src/contrib/arm-optimized-routines/string/aarch64/memcpy.S:175                              
175             stp     D_l, D_h, [dst, 64]!                                                                                      
(gdb) bt                                                                                                                          
#0  memcpy () at /home/ronald/dev/freebsd/src/contrib/arm-optimized-routines/string/aarch64/memcpy.S:175
#1  0x0000000000218be4 in ?? ()
#2  0x0000000500000000 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) 


# uname -a
FreeBSD rpi4 14.0-CURRENT FreeBSD 14.0-CURRENT #1 main-aba921bd9e-dirty: Sat Oct 29 00:31:53 CEST 2022     GENERIC-NODEBUG arm64
Comment 1 Mark Millard 2022-11-15 05:03:21 UTC
(In reply to Ronald Klop from comment #0)

I tried building (via poudriere), installing, and running lsof 4.96.4,8 on amd64
and it worked fine.

I then did the same on aarch64 (HoneyComb) and it also worked fine. aarch64
text:

# pkg add -f /usr/local/poudriere/data/packages/main-CA72-bulk_a-default/All/lsof-4.96.4,8.pkg
pkg: Warning: Major OS version upgrade detected.  Running "pkg bootstrap -f" recommended
Installing lsof-4.96.4,8...
. . .

# lsof
lsof: WARNING: device cache mismatch: /dev/vcio
COMMAND    PID  USER   FD      TYPE                DEVICE SIZE/OFF    NODE NAME
kernel       0  root  cwd      VDIR  222756737,2677268003       65      34 / (zopt0/ROOT/main-CA72)
. . .

You may need to specify more context to better enable more folks to
reproduce the failure. A possibility might be UFS vs. ZFS but there
could be other context differences.



For reference (long output line split for readability):

# uname -apKU
FreeBSD CA72_16Gp_ZFS 14.0-CURRENT
FreeBSD 14.0-CURRENT #65 main-n259064-f83db6441a2f-dirty:
Sun Nov  6 17:08:00 PST 2022
root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-nodbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-NODBG-CA72
arm64 aarch64 1400073 1400073

# ~/fbsd-based-on-what-commit.sh -C /usr/main-src/
f83db6441a2f (HEAD -> main, freebsd/main, freebsd/HEAD) sctp: minor changes due to upstreaming of Glebs recent changes
branch: main
merge-base: f83db6441a2f4f925a169c7ddf844589cb73c9b5
merge-base: CommitDate: 2022-11-06 22:06:40 +0000
n259064 (--first-parent --count for merge-base)

# ~/fbsd-based-on-what-commit.sh -C /usr/ports/
ed69f5862e3f (HEAD -> main, freebsd/main, freebsd/HEAD) math/vtk9: Fix mismatch in MPI dependency
branch: main
merge-base: ed69f5862e3f322a32eda4fdb62100d2792419a4
merge-base: CommitDate: 2022-10-20 22:31:03 +0000
n598718 (--first-parent --count for merge-base)
Comment 2 Mark Millard 2022-11-15 05:15:05 UTC
FYI:

> pkg: Warning: Major OS version upgrade detected.  Running "pkg bootstrap -f" recommended

is from my happening to have pkg from a 13.1-RELEASE build in place. Other
than the rare use of lsof, I do not use ports that are tied to the kernel
vintage details and so normally run 13.?-RELEASE ports on each of
releng/13.? , stable/13 , and main . lsof I build just for main:

# pkg info lsof
pkg: Warning: Major OS version upgrade detected.  Running "pkg bootstrap -f" recommended
lsof-4.96.4,8
Name           : lsof
Version        : 4.96.4,8
Installed on   : Mon Nov 14 20:54:01 2022 PST
Origin         : sysutils/lsof
Architecture   : FreeBSD:14:aarch64
Prefix         : /usr/local
Categories     : sysutils
Licenses       : lsof
Maintainer     : ler@FreeBSD.org
WWW            : https://github.com/lsof-org/lsof
Comment        : Lists information about open files (similar to fstat(1))
Annotations    :
        FreeBSD_version: 1400073
        build_timestamp: 2022-11-15T04:53:41+0000
        built_by       : poudriere-git-3.3.99.20220831
        cpe            : cpe:2.3:a:lsof_project:lsof:4.96.4:::::freebsd14:aarch64
        port_checkout_unclean: no
        port_git_hash  : 7a902c8910c3
        ports_top_checkout_unclean: yes
        ports_top_git_hash: ed69f5862e3f
. . .
Comment 3 Mark Millard 2022-11-15 05:39:21 UTC
(In reply to Mark Millard from comment #2)

I tried installing and running lsof on a UFS system (a
RPi4B) that has a little older FreeBSD vintage and it
worked fine there as well:

# pkg add -f ~/lsof-4.96.4,8.pkg 
Installing lsof-4.96.4,8...
Newer FreeBSD version for package lsof:
To ignore this error set IGNORE_OSVERSION=yes
- package: 1400073
- running kernel: 1400072
Ignore the mismatch and continue? [y/N]: y
Extracting lsof-4.96.4,8: 100%
. . .

# lsof
lsof: WARNING: device cache mismatch: /dev/input/event0
COMMAND   PID  USER   FD      TYPE             DEVICE SIZE/OFF      NODE NAME
kernel      0  root  cwd      VDIR              0,105     1536         2 / (/dev/gpt/CA72USBufs)
. . .


For reference (long output line split for readability):

# uname -apKU
FreeBSD CA72_UFS 14.0-CURRENT
FreeBSD 14.0-CURRENT #63 main-n258610-ba7319e9091b-dirty:
Fri Oct 14 14:29:14 PDT 2022
root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-nodbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-NODBG-CA72
arm64 aarch64 1400072 1400072

On the HoneyComb:

# git -C /usr/main-src/ diff contrib/arm-optimized-routines/string/aarch64/memcpy.S
# 

(i.e., no changes.)


I'm unsure what difference(s) of context matter here.
Comment 4 Ronald Klop freebsd_committer freebsd_triage 2022-11-15 11:30:44 UTC
This is weird.
I cannot reproduce this today.
I can scroll back in my terminal where the error happened yesterday. But running it again works.
Comment 5 Marek Zarychta 2023-01-01 21:17:10 UTC
Could that be related to bug 264094?
Comment 6 Larry Rosenman freebsd_committer freebsd_triage 2023-01-02 15:28:54 UTC
@emaste:  Can you look into this?
Comment 7 Ed Maste freebsd_committer freebsd_triage 2023-01-09 18:43:25 UTC
I see this is now closed as not reproducible; if it happens again, please get a backtrace from the core file using lldb or gdb, or by running lsof directly under lldb and issuing the `bt` command.

For example:

~ $ lldb lsof
(lldb) target create "lsof"
Current executable set to 'lsof' (x86_64).
(lldb) run
Process 28273 launched: '/usr/local/sbin/lsof' (x86_64)
lsof: WARNING: compiled for FreeBSD release 13.1-RELEASE-p5; this is 14.0-CURRENT.
lsof: kvm_open(execfile=/boot/wipbsd.20221006/kernel, corefile=/dev/mem): Invalid argument
COMMAND     PID   USER   FD      TYPE               DEVICE  SIZE/OFF     NODE NAME
sh         2010 emaste  txt      VREG 431860091,3334069762    169816   652702 ...
...


Process 28273 stopped
* thread #1, name = 'lsof', stop reason = signal SIGSTOP
    frame #0: 0x00000008237bdc0a libc.so.7`__sys_poll at _poll.S:4
   1    /* @generated by libc/sys/Makefile.inc */
   2    #include "compat.h"
   3    #include "SYS.h"
-> 4    PSEUDO(poll)
   5            .section .note.GNU-stack,"",%progbits
(lldb) bt
* thread #1, name = 'lsof', stop reason = signal SIGSTOP
  * frame #0: 0x00000008237bdc0a libc.so.7`__sys_poll at _poll.S:4
    frame #1: 0x000000082379681b libc.so.7`__res_nsend [inlined] ...
    frame #2: 0x0000000823796675 libc.so.7`__res_nsend...
    frame #3: 0x00000008237950bc libc.so.7`__res_nquery...
    frame #4: 0x0000000823763779 libc.so.7`_dns_gethostbyaddr...

(In this example I interrupted lsof while it was doing DNS resolution)
Comment 8 Sergey Z. 2023-09-09 20:03:17 UTC
Hello,

I'm having a segmentation fault issue with lsof on 13.2-RELEASE as well, I'm not sure if it's the same issue though. For me the pkg version prints a warning on startup:

lsof: WARNING: compiled for FreeBSD release 13.1-RELEASE-p5; this is 13.2-RELEASE-p2

and then crashes immediately. 

The version from ports crashes too (but without a warning), here is a stack trace for it:

Process 93962 launched: '/usr/ports/sysutils/lsof/work/stage/usr/local/sbin/lsof' (x86_64)
Process 93962 stopped
* thread #1, name = 'lsof', stop reason = signal SIGSEGV: address access protected (fault address: 0x800c4fffa)
    frame #0: 0x00000008227d948f libc.so.7`memmove + 607
libc.so.7`memmove:
->  0x8227d948f <+607>: rep    movsq    (%rsi), %es:(%rdi)
    0x8227d9492 <+610>: cld    
    0x8227d9493 <+611>: movq   %rdx, %rcx
    0x8227d9496 <+614>: andb   $0x7, %cl
(lldb) bt
* thread #1, name = 'lsof', stop reason = signal SIGSEGV: address access protected (fault address: 0x800c4fffa)
  * frame #0: 0x00000008227d948f libc.so.7`memmove + 607
    frame #1: 0x00000820c4d7b000
    frame #2: 0x000000000020e374 lsof`process_kinfo_file(kf=0x0000000827924c30, xfile=0x00000008273170c0, pcbs=0x00000008254ed000, locks=0x0000000820c4d998) at dproc.c:224:6
    frame #3: 0x000000000020dd2b lsof`process_file_descriptors(p=0x0000000825505500, ckscko=0, xfiles=0x0000000827307740, n_xfiles=6339, pcbs=0x00000008254ed000, locks=0x0000000820c4d998) at dproc.c:315:3
    frame #4: 0x000000000020d444 lsof`gather_proc_info at dproc.c:520:2
    frame #5: 0x0000000000218748 lsof`main(argc=1, argv=0x0000000820c4dd58) at main.c:1322:6
    frame #6: 0x000000000020a570 lsof`_start(ap=<unavailable>, cleanup=<unavailable>) at crt1_c.c:75:7
Comment 9 Jamie Landeg-Jones 2023-11-22 12:12:30 UTC
I stumbled across this happening with FreeBSD-14.

Hopefully this is the same issue you guys have seen, as I include the fix.

Basically, it's only a problem if you have NULLFS mounted filesystems.

I noticed it when a lot of files were open on such a filesystem (doing a software build which used parallel compiles)

As seen in the attached patch, the problem code replaces the nullfs mounted fullpathname with the "real" full pathname:

  memmove(&vfs_path[strlen(vfs->fsname) + 1], &vfs_path[strlen(vfs->dir) + 1],
       strlen(vfs_path) - strlen(vfs->dir) + 1);
  memcpy(vfs_path, vfs->fsname, strlen(vfs->fsname));


However, sometimes this is called with a pathname of NULL. If your real mountpoint has a length shorter than the nullfs mountpoint, this causes an attempt to memmove a negative number of bytes.

The attached fix doesn't attempt the memmove/memcpy if the size would be negative.

Also, please note, this also fixes an "off by one" error in the same code - it wasn't actually producing the correct result even when it wasn't SIGSEGV'ing !
Comment 10 Jamie Landeg-Jones 2023-11-22 12:15:57 UTC
Created attachment 246486 [details]
fix to nullsfs sigsegv
Comment 11 Jamie Landeg-Jones 2023-11-22 12:19:01 UTC
P.S. I could reproduce this on armv7 and amd64. In both cases, I built from source using ports.

The patch works on both these architectures.
Comment 12 Ronald Klop freebsd_committer freebsd_triage 2023-11-22 13:15:38 UTC
(In reply to Jamie Landeg-Jones from comment #9)
Hi, nice finding.

"If your real mountpoint has a length shorter than the nullfs mountpoint"

Do you have an example of this situation?

I run some jails on this machine which uses nullfs mounts and I tend to use this structure of mounts:

zrpi4/jails/freebsd13 on /data/jails/freebsd13 (zfs, local, nfsv4acls)
zrpi4/jails/loghost on /data/jails/loghost (zfs, local, noatime, nfsv4acls)

/data/jails/freebsd13 on /data/jails/loghost/_root (nullfs, local, read-only, nfsv4acls)
/data/jails/loghost/root on /data/jails/loghost/_root/root (nullfs, local, noatime, nfsv4acls)
/data/jails/loghost/etc on /data/jails/loghost/_root/etc (nullfs, local, noatime, nfsv4acls)
/data/jails/loghost/var on /data/jails/loghost/_root/var (nullfs, local, noatime, nfsv4acls)
/data/jails/loghost/var/tmp on /data/jails/loghost/_root/tmp (nullfs, local, noatime, nfsv4acls)
/data/jails/loghost/local on /data/jails/loghost/_root/usr/local (nullfs, local, noatime, nfsv4acls)
devfs on /data/jails/loghost/_root/dev (devfs)

It readonly nullfs mounts the OS on _root and then nullfs mounts some writable stuff into it.
Could this trigger the bug?
Or do you have another example?
Comment 13 Jamie Landeg-Jones 2023-11-25 15:58:10 UTC
Thanks! And sorry, I should have provided more details.

Yes, those scenarios would trigger the bug. Let me clarify a few points.

Based on my install history (using the same tmpfs setup in both cases), the issue only seemed to start sometime after FreeBSD-13.0-RC2 and at least before FreeBSD-14.0-RELEASE.

This was on an armv7 box, but I managed to replicate it on an AMD64 box.

I don't know *why* some entries are now reported with a null pathname, it's not unique to tmpfs, but it's a VFS rabbit-hole I've been too scared to go down! However, it doesn't seem to cause any problems (apart from tripping up this lsof bug)

Here are some debug details:

An amended patch to produce debugging information (showing SKIPPED where the program would normally coredump) is attached.

Also attached is this debug information for one process of mine that trips this bug.

Note the "off by one" error I mentioned (and fixed) in the first post: In all cases, the "new vfs_path" doesn't copy over the "/" between the filesystem and the path, instead exposing whatever character was previously in the buffer ('s' in my case here)

Let me know if I can help further.

Cheers, Jamie

P.S. Are you able to reopen this PR?
Comment 14 Jamie Landeg-Jones 2023-11-25 16:07:05 UTC
Created attachment 246565 [details]
debugging patch to highlight bug
Comment 15 Jamie Landeg-Jones 2023-11-25 16:09:30 UTC
Created attachment 246566 [details]
Example results of the debugging patch

Note, the debugging output is first showing the length of the field, followed by the field itself.
Comment 16 Ronald Klop freebsd_committer freebsd_triage 2023-11-26 21:24:00 UTC
Reopen as the PR is reproducible now and a even a patch os added.
Thanks for looking into this.
Comment 17 Jamie Landeg-Jones 2023-11-27 12:49:06 UTC
Just to add, in case of compatibility worries, the changes I've made only affect tmpfs, and then only the situation where the program would SIGSEGV anyway - any other uses that normally work will remain unchanged.
Comment 18 Jamie Landeg-Jones 2023-12-16 17:16:05 UTC
Created attachment 247081 [details]
patch to fix problem with lsof 4.99

Problem still exists with latest lsof in ports.

Refactored patch attached.
Comment 19 Jamie Landeg-Jones 2023-12-16 17:18:39 UTC
Can the new fix be committed please?
Comment 20 Larry Rosenman freebsd_committer freebsd_triage 2023-12-16 17:21:39 UTC
It'll be done by EOD today (16/Dec/2023).
(I'll upstream the fix)
Comment 21 commit-hook freebsd_committer freebsd_triage 2023-12-16 18:04:14 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/ports/commit/?id=ec96413b952b2aade44e1b499c3153c4772860b7

commit ec96413b952b2aade44e1b499c3153c4772860b7
Author:     Larry Rosenman <ler@FreeBSD.org>
AuthorDate: 2023-12-16 18:02:56 +0000
Commit:     Larry Rosenman <ler@FreeBSD.org>
CommitDate: 2023-12-16 18:02:56 +0000

    sysutils/lsof: update to 4.99.1

                    Fix compilation error when HASIPv6 is not defined. (@chenrui333)

                    Add configure option --disable-liblsof to disable installation
                    of liblsof. (@subnut, #300)

                    [freebsd] fix segfault from fs info (FreeBSD bug 267760)

    PR:     267760
    Reported by: Ronald Klop

 sysutils/lsof/Makefile | 3 +--
 sysutils/lsof/distinfo | 6 +++---
 2 files changed, 4 insertions(+), 5 deletions(-)
Comment 22 Larry Rosenman freebsd_committer freebsd_triage 2023-12-16 18:04:55 UTC
new version created upstream and released, and port updated.
Comment 23 Ronald Klop freebsd_committer freebsd_triage 2023-12-27 07:49:41 UTC
I just confirmed that the issue is fixed for me too.

Before upgrade I could reproduce the issue.
Dec 27 08:45:18 rpi4 pkg[23576]: lsof upgraded: 4.99.0_1,8 -> 4.99.3,8
After upgrade lsof works fine.
Comment 24 Jamie Landeg-Jones 2023-12-27 22:30:58 UTC
Cheers for the feedback.

Just in case someone is reading this thread in the future, please note that on at least 2 occasions I referred to "tmpfs", when I meant "nullfs" !

... and I wasn't even drinking!