Bug 224878 - devel/valgrind fails on i386
Summary: devel/valgrind fails on i386
Status: New
Alias: None
Product: Ports & Packages
Classification: Unclassified
Component: Individual Port(s) (show other bugs)
Version: Latest
Hardware: Any Any
: --- Affects Only Me
Assignee: Niclas Zeising
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-01-03 21:49 UTC by Mikhail Teterin
Modified: 2020-04-12 07:10 UTC (History)
4 users (show)

See Also:
bugzilla: maintainer-feedback? (bdrewery)


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Mikhail Teterin freebsd_committer 2018-01-03 21:49:53 UTC
The freshly-built port (tried valgrind-devel too) fails to debug anything:

% valgrind sleep 1
valgrind: I failed to allocate space for the application's stack.
valgrind: This may be the result of a very large --main-stacksize=
valgrind: setting.  Cannot continue.  Sorry.

I'd like to think, amd64 is still Ok, but I'm trying to use it on i386 here (running 11.1-STABLE #1 r327490 on a PAE-enabled kernel).
Comment 1 Mikhail Teterin freebsd_committer 2018-02-28 22:59:11 UTC
The same breakage exists on 10.4-STABLE i386, without PAE.
Comment 2 Mikhail Teterin freebsd_committer 2018-03-01 14:59:06 UTC
Yes, the tool works fine for me on amd64. Unfortunately, I need to work with Oracle client-libraries, but the databases/oracle8-client is only available for i386 ...
Comment 3 Ben Bullock 2018-09-04 01:21:29 UTC
Using printf statement debugging,

m_aspacemgr/aspacemgr-linux.c:2836:2 0
m_initimg/initimg-freebsd.c:532:0
m_initimg/initimg-freebsd.c:541:0

I found the error is caused by the following. The error starts in the above aspacemgr-linux.c:
---
   if (nsegments[startI].kind != SkFree){
VG_(printf)("%s:%d:%d %d\n", __FILE__, __LINE__, nsegments[startI].kind, SkFree);
      return False;
}
---
It seems nsegments[startI].kind is 2 but SkFree is zero.

This returns "False", causing the following to fail in initimg-freebsd.c:
---
     ok = VG_(am_create_reservation)(
             resvn_start,
             resvn_size -inner_HACK,
             SmUpper, 
             anon_size +inner_HACK
          );
VG_(printf)("%s:%d:%d\n", __FILE__, __LINE__, ok);
     if (ok) {
        /* allocate a stack - mmap enough space for the stack */
        res = VG_(am_mmap_anon_fixed_client)(
                 anon_start -inner_HACK,
                 anon_size +inner_HACK,
	         VKI_PROT_READ|VKI_PROT_WRITE|VKI_PROT_EXEC
	      );
     }
VG_(printf)("%s:%d:%d\n", __FILE__, __LINE__, ok);
     if ((!ok) || sr_isError(res)) {
        /* Allocation of the stack failed.  We have to stop. */
        VG_(printf)("valgrind: "
                    "I failed to allocate space for the application's stack.\n");
---
SkFree seems to be a constant value, defined in an enum in "include/pub_tool_aspacemgr.h".

The underlying problem seems to be some kind of unaddressed change in the system call used, which is making these failures happen. Is the maintainer actively working on this project any more? The configure.ac script had to be fiddled with to get it to work, since it only refers to old compiler versions.
Comment 4 Walter Schwarzenfeld freebsd_triage 2019-03-09 08:40:07 UTC
Maintainership dropped ports r495096.
Comment 5 Walter Schwarzenfeld freebsd_triage 2019-03-10 11:44:50 UTC
Assign to new maintainer.
Comment 6 Paul Floyd 2020-03-06 21:14:22 UTC
(In reply to Mikhail Teterin from comment #0)
The 'valgrind' executable is just a small stub that execs the tool executable (like memcheck-x86-freebsd or memcheck-amd64-freebsd). So x86 failing has little impact on x86.
Comment 7 Paul Floyd 2020-03-07 12:42:33 UTC
(In reply to Paul Floyd from comment #6)
I meant imoact on amd64.
Comment 8 Paul Floyd 2020-03-07 16:21:16 UTC
(In reply to Ben Bullock from comment #3)

What I'm seeing is that nsegments[startI].kind that is found has a value of SkAnonV (that is the anonymous memory of the valgrind host) whilst as noted it's looking for SkFree.
Comment 9 Paul Floyd 2020-03-08 12:46:47 UTC
(In reply to Paul Floyd from comment #8)
In asapcemgr-linux.c, the client stack gets defined by

     aspacem_maxAddr = VG_PGROUNDDN( sp_at_startup ) - 1;

Presumably this is wrong. If I put in a hard coded hex address (0xfbffdfff) then I no longer get the "failed to allocate" message. I'm still not detecting any leaks and I am getting a syscall failure.
Comment 10 Paul Floyd 2020-03-22 07:51:27 UTC
Jut for reference, here is the procstat output of a 32bit app running on amd64 FreeBSD 12.1

/usr/bin/procstat -v `pgrep slp`
  PID              START                END PRT  RES PRES REF SHD FLAG  TP PATH
22353           0x400000           0x401000 r--    1    6   3   1 CN--- vn /usr/home/paulf/scratch/valgrind/slp
22353           0x401000           0x402000 r-x    1    6   3   1 CN--- vn /usr/home/paulf/scratch/valgrind/slp
22353           0x402000           0x403000 rw-    1    0   1   0 C---- vn /usr/home/paulf/scratch/valgrind/slp
22353           0x403000           0x404000 r--    1    2   2   0 ----- df 
22353           0x404000           0x405000 rw-    1    2   2   0 ----- df 
22353         0x20402000         0x20407000 r--    5   31   3   1 CN--- vn /libexec/ld-elf32.so.1
22353         0x20407000         0x2041f000 r-x   24   31   3   1 CN--- vn /libexec/ld-elf32.so.1
22353         0x2041f000         0x20420000 rw-    1    0   1   0 C---- vn /libexec/ld-elf32.so.1
22353         0x20420000         0x20443000 rw-   20   20   1   0 ----- df 
22353         0x20443000         0x20482000 r--   63  317   3   1 CN--- vn /usr/lib32/libc.so.7
22353         0x20482000         0x205b7000 r-x  229  317   3   1 CN--- vn /usr/lib32/libc.so.7
22353         0x205b7000         0x205bb000 rw-    4    0   2   0 C---- vn /usr/lib32/libc.so.7
22353         0x205bb000         0x205c0000 r--    5    0   2   0 C---- vn /usr/lib32/libc.so.7
22353         0x205c0000         0x205e5000 rw-   16   22   2   0 ----- df 
22353         0x20600000         0x20800000 rw-    6   22   2   0 ----- df 
22353         0xfbffe000         0xfffde000 ---    0    0   0   0 ----- -- 
22353         0xfffde000         0xffffe000 rw-    3    3   1   0 ---D- df 
22353         0xffffe000         0xfffff000 r-x    1    1  94   0 ----- ph
Comment 11 Paul Floyd 2020-03-31 10:06:00 UTC
I think that I now understand the cause of the failure. Since FreeBSD 11.1, mmap has a MAP_GUARD section, which is a region below the user stack of reserved but unmapped memory.


22353         0x20600000         0x20800000 rw-    6   22   2   0 ----- df 
VVV MAP_GUARD VVV
22353         0xfbffe000         0xfffde000 ---    0    0   0   0 ----- -- 
^^^ MAP_GUARD ^^^
22353         0xfffde000         0xffffe000 rw-    3    3   1   0 ---D- df 
22353         0xffffe000         0xfffff000 r-x    1    1  94   0 ----- ph

This doesn't exist on Linux.

I haven't yet seen how this size is calculated. Also I was hoping not to have to deal with FreeBSD minor revisions.
Comment 12 Paul Floyd 2020-04-03 08:24:38 UTC
The next problem is with the syscall interfaces.

First, the function used for debug output to stderr is totally wrong. It's using the Linux style syscall interface. The result is no debug output. I'll fix this next since it will help in debug the next problems. The code for Darwin seems quite similar to what we need.

Second and last for the moment, I'm seeing that there is a message coming from this source

if (sysctl(mib, len, psa, &size, NULL, 0) == -1) {
			_rtld_error("sysctl for hw.pagesize(s) failed");
			die();
		}

which is part of the init_pagesizes function in librtld-elf.so

This should be passing two values into the mib integer array, 6 for hw and 7 for pagesizes. On the Valgrind receiving end I am seeing the 6, but instead of 7 there's a very large value, not far from INT_MAX.
Comment 13 Paul Floyd 2020-04-04 09:23:05 UTC
My previous comment didn't give the full picture. The problem is that there are two syscalls, one to sysctlgetbyname() and then one to sysctl() with the mib returned by the first. The sysctlgetbyname() is the first sysctl done by the client and it looks like there's a problem copying the data on the client stack. The result is that the returned mib has a bogus 2nd element in the int array. I've put in a temporary hack to workaround this problem.

After that, I'm having a problem with the preload of the 32bit shared libs. I can hack this by setting LD_32_PRELOAD. This shouldn't be to hard to fix.

And, with all that, tada!

paulf> cat leak.cpp
int main()
{
   int* pi = new int;
}

paulf> ./vg-in-place  --leak-check=full --track-origins=yes ./leakcpp32.clang++
==41888== Memcheck, a memory error detector
==41888== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==41888== Using Valgrind-3.16.0.GIT and LibVEX; rerun with -h for copyright info
==41888== Command: ./leakcpp32.clang++
==41888== 
==41888== 
==41888== HEAP SUMMARY:
==41888==     in use at exit: 4 bytes in 1 blocks
==41888==   total heap usage: 1 allocs, 0 frees, 4 bytes allocated
==41888== 
==41888== 4 bytes in 1 blocks are definitely lost in loss record 1 of 1
==41888==    at 0xC9DE3C7: operator new(unsigned int) (vg_replace_malloc.c:356)
==41888==    by 0x4012D1: main (leak.cpp:3)
==41888== 
==41888== LEAK SUMMARY:
==41888==    definitely lost: 4 bytes in 1 blocks
==41888==    indirectly lost: 0 bytes in 0 blocks
==41888==      possibly lost: 0 bytes in 0 blocks
==41888==    still reachable: 0 bytes in 0 blocks
==41888==         suppressed: 0 bytes in 0 blocks
==41888== 
==41888== For lists of detected and suppressed errors, rerun with: -s
==41888== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
Comment 14 Nick Briggs 2020-04-11 17:57:49 UTC
(In reply to Paul Floyd from comment #13)

Do you have the patches you made to fix this available somewhere?  I've got a problem on an i386 system running FreeBSD 12.1-RELEASE-p3 that I could really use valgrind's help on.
Comment 15 Paul Floyd 2020-04-11 19:02:15 UTC
(In reply to Nick Briggs from comment #14)

All of the code is on github:
https://github.com/paulfloyd/freebsd_valgrind

Memcheck seems to work fairly well on amd64.

For x86 applications, I've just started to get things to work when running on an amd64 kernel, with a couple of ugly hacks. x86 on x86 should just about work but I've done no real testing.

Versions of FreeBSD older than 12.0 don't work yet.
Comment 16 Nick Briggs 2020-04-11 20:49:22 UTC
Thanks.  Since you advanced the valgrind version from 3.10.0 to 3.16.0 I'm just recompiling exactly what's currently in your git repo.

This is on a "FreeBSD 12.1-RELEASE-p3 GENERIC  i386" system, with
# cc --version
FreeBSD clang version 8.0.1 (tags/RELEASE_801/final 366581) (based on LLVM 8.0.1)
Target: i386-unknown-freebsd12.1
Thread model: posix
InstalledDir: /usr/bin

and unfortunately it fails with:

[...]
../coregrind/link_tool_exe_freebsd 0x38000000 cc     -o memcheck-x86-freebsd  -m32 -O2 -g -Wall -Wmissing-prototypes -Wshadow -Wpointer-arith -Wstrict-prototypes -Wmissing-declarations -Wcast-align -Wcast-qual -Wwrite-strings -Wempty-body -Wformat -Wformat-security -Wignored-qualifiers -Wenum-conversion -finline-functions -fno-stack-protector -fno-strict-aliasing -fno-builtin -Wno-cast-align -Wno-self-assign -Wno-tautological-compare -fomit-frame-pointer -O2 -static -nodefaultlibs -nostartfiles -u _start   -m32 memcheck_x86_freebsd-mc_leakcheck.o memcheck_x86_freebsd-mc_malloc_wrappers.o memcheck_x86_freebsd-mc_main.o memcheck_x86_freebsd-mc_main_asm.o memcheck_x86_freebsd-mc_translate.o memcheck_x86_freebsd-mc_machine.o memcheck_x86_freebsd-mc_errors.o ../coregrind/libcoregrind-x86-freebsd.a ../VEX/libvex-x86-freebsd.a -lgcc 
cc: warning: argument unused during compilation: '-u _start' [-Wunused-command-line-argument]
ld: error: undefined symbol: memset
>>> referenced by host_generic_regs.c:88 (priv/host_generic_regs.c:88)
>>>               libvex_x86_freebsd_a-host_generic_regs.o:(RRegUniverse__init) in archive ../VEX/libvex-x86-freebsd.a
cc: error: linker command failed with exit code 1 (use -v to see invocation)
Comment 17 Nick Briggs 2020-04-11 21:23:39 UTC
... because clang helpfully turned

   *univ = (RRegUniverse){};

into a call to memset().  Given that the subsequent code in RRegUniverse__init() is explicitly initializing every field, I think this should just be omitted as not only unnecessary but harmful.

The same issue shows up in exp-sgcheck/sg_main.c:1072 in outGlobals_init() where it does

   for (i = 0; i < VG_N_THREADS; i++) {
      shadowStacks[i] = NULL;
      siTrees[i] = NULL;
      qcaches[i] = (QCache){};    <<<===
   }

These are the only two places that pattern shows up.

$ grep -r -E '= \([[:alpha:]]+\)\{\}' .
./VEX/priv/host_generic_regs.c:   *univ = (RRegUniverse){};
./exp-sgcheck/sg_main.c:      qcaches[i] = (QCache){};
Comment 18 Paul Floyd 2020-04-11 21:25:51 UTC
I'd recommend sticking to building with GCC for the moment.

configure with CC=gcc CFLAGS="-g -O0"

I just pushed a change that should resolve your link error.

Running on a x86 kernel, there is an issue in coregrind/m_initimg/initimg-freebsd.c. Basically the call to VG_(is32On64)() in setup_client_env() does not work. So I've hard coded the variable 'ld_preload' to be "LD_PRELOAD" or "LD_32_PRELOAD" temporarily until I fix VG_(is32On64)(). These values are correct when running on an amd64 kernel, but wrong when on an x86 kernel. On x86, it should be just "LD_PRELOAD".
Comment 19 Paul Floyd 2020-04-11 21:37:44 UTC
(In reply to Nick Briggs from comment #17)

It's safe to ignore the code in exp-sgcheck. It is scheduled to be removed from Valgrind in the near future.

I haven't worked much on the VEX code (the machine code JITter). Setting everything to zero is inefficient if the values will just get read in soon after.
Comment 20 Nick Briggs 2020-04-11 23:18:30 UTC
OK, with your fix to add memset support, as pushed, and my changing ld_preload (and ld_32_preload just in case) to be "LD_PRELOAD"... memcheck, at least, works fine, even compiled with clang -O2.
Comment 21 Paul Floyd 2020-04-12 07:10:40 UTC
(In reply to Nick Briggs from comment #20)
Great. If (or rather when) you encounter any issues, please could you open an issue on github?