Bug 215826 - C++ program signal handlers not called
Summary: C++ program signal handlers not called
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: misc (show other bugs)
Version: CURRENT
Hardware: amd64 Any
: --- Affects Some People
Assignee: freebsd-bugs (Nobody)
URL:
Keywords: regression
Depends on:
Blocks:
 
Reported: 2017-01-06 12:58 UTC by Dominic Fandrey
Modified: 2017-01-25 14:13 UTC (History)
3 users (show)

See Also:


Attachments
Minimal testcase (227 bytes, text/x-c++src)
2017-01-09 10:46 UTC, Dominic Fandrey
no flags Details
Show sigprocmask, before and after exception (487 bytes, text/x-c++src)
2017-01-09 14:54 UTC, Dominic Fandrey
no flags Details
Fix nested write locks for the compat rtld locking. (3.19 KB, patch)
2017-01-10 15:21 UTC, Konstantin Belousov
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Dominic Fandrey freebsd_committer 2017-01-06 12:58:12 UTC
18 days ago I got a bug report for sysutils/powerdxx: https://github.com/lonkamikaze/powerdxx/issues/3

There is some distracting content in the report, e.g. the OP misidentified an unrelated ACPI problem as part of the problem he/she was reporting.

The issue to sum it up is, that (at least my) C++ programs don't get their signal handlers called (that includes default signal handlers that I never touched).

Affected are the `loadrec` and `powerd++` binaries. The `loadrec` binary does not touch signal handlers, so the default handlers stay in place.

I had a couple of people try to help me debug the issue and as far as we could tell using truss and dtrace signals got sent, but the signal handlers just don't get called. I don't know how to debug this further (how do you debug something that is not happening?). But we found a workaround: link with `-lpthread`. My assumption is that `pthread` replaces system functions with thread safe versions and as a side effect replaces whatever got recently broken on CURRENT.

I documented this workaround in the Makefile of my project:
https://github.com/lonkamikaze/powerdxx/commit/2d80d990121802b4402cf54bc9a328449ae8f326

I don't know when exactly it broke, I tested on head/r310361, the OP was running head/r310173.

The problem occurs both if I include `signal(3)` via `<csignal>` or `<signal.h>` or not at all (i.e. `loadrec`).

To reproduce get the last version without the workaround from the repo: https://github.com/lonkamikaze/powerdxx/tree/93a755fbc4d7ec36e5a9d4a35d5a33052cc0e678
Comment 1 Dominic Fandrey freebsd_committer 2017-01-09 10:46:29 UTC
Created attachment 178664 [details]
Minimal testcase

I managed to produce a minimal example. Apparently raising an exception causes the behaviour.

I reproduced the problem using the following VM image:
http://ftp.freebsd.org/pub/FreeBSD/snapshots/VM-IMAGES/12.0-CURRENT/amd64/20170105/

uname -a:
FreeBSD  12.0-CURRENT FreeBSD 12.0-CURRENT #0 r311461: Thu Jan  5 22:46:38 UTC 2017     root@releng3.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC  amd64

Produce erroneous behaviour:
c++ test.cpp -o test && ./test

Produce expected behaviour:
c++ test.cpp -o test -lpthread && ./test
Comment 2 Dominic Fandrey freebsd_committer 2017-01-09 14:54:02 UTC
Created attachment 178670 [details]
Show sigprocmask, before and after exception

This is a version of the test case that outputs the sigprocmask before and after an exception.

Output on the same system as before:

# c++ mask.cpp -o mask && ./mask
main/try/sigprocmask: 00000000000000000000000000000000
main/catch/sigprocmask: 0xfffef007ffffffffffffffffffffffff
Raising signal: SIGINT
Returning despite signal!
# c++ mask.cpp -o mask -lpthread && ./mask
main/try/sigprocmask: 00000000000000000000000000000000
main/catch/sigprocmask: 00000000000000000000000000000000
Raising signal: SIGINT

#
Comment 3 yamagi 2017-01-10 13:34:32 UTC
This was broken in HEAD by SVN revision r310025[1] and in 11-STABLE with it's MFC in r311202[2]. I don't think that revision is the culprit, it's more likely that it exposed another bug. An educated guess would be the 'make libthr dlopen()able"-work about 1 year ago: https://lists.freebsd.org/pipermail/freebsd-threads/2014-December/005636.html

1: https://svnweb.freebsd.org/base?view=revision&revision=310025
2: https://svnweb.freebsd.org/changeset/base/311202
Comment 4 Konstantin Belousov freebsd_committer 2017-01-10 15:21:38 UTC
Created attachment 178720 [details]
Fix nested write locks for the compat rtld locking.

The issue is that nested write locking for non-libthr locks is broken, and I knew it, see the patched comment.  Due to the peculiarity of the problem, I forgot about it when reviewed the r310025.
Comment 5 yamagi 2017-01-10 16:31:48 UTC
I can confirm that the patch fixes the problem for me. Thank you!
Comment 6 commit-hook freebsd_committer 2017-01-10 19:27:12 UTC
A commit references this bug:

Author: kib
Date: Tue Jan 10 19:26:55 UTC 2017
New revision: 311886
URL: https://svnweb.freebsd.org/changeset/base/311886

Log:
  Fix acquisition of nested write compat rtld locks.

  Obtaining compat rtld lock in write mode sets process signal mask to
  block all signals.  Previous mask is stored in the global variable
  oldsigmask.  If a lock is write-locked while another lock is already
  write-locked, oldsigmask is overwritten by the total mask and on the
  last unlock, all signals except traps appear to be blocked.

  Fix this by counting the write-lock nested level, and only storing to
  oldsigmask/restoring from it at the outermost level.

  Masking signals disables involuntary preemption for libc_r, and there
  could be no voluntary context switches in the locked code
  (dl_iterate_phdr(3) keeps a lock around user callback, but it was
  added long after libc_r was renounced).  Due to this, remembering the
  level in the global variable after the lock is obtained should be
  safe, because no two libc_r threads can acquire different write locks
  in parallel.

  PR:	215826
  Reported by:	kami
  Tested by:	yamagi@yamagi.org (previous version)
  To be reviewed by:	kan
  Sponsored by:	The FreeBSD Foundation
  MFC after:	2 weeks

Changes:
  head/libexec/rtld-elf/rtld_lock.c
Comment 7 Dominic Fandrey freebsd_committer 2017-01-10 23:11:17 UTC
I managed to reproduce the problem on stable/11 r311880 and the patch fixes it there too. So I recommend/request an MFC ASAP.
Comment 8 Dominic Fandrey freebsd_committer 2017-01-17 08:42:04 UTC
Waiting for MFC.
Comment 9 commit-hook freebsd_committer 2017-01-24 11:14:20 UTC
A commit references this bug:

Author: kib
Date: Tue Jan 24 11:13:42 UTC 2017
New revision: 312693
URL: https://svnweb.freebsd.org/changeset/base/312693

Log:
  MFC r311886:
  Fix acquisition of nested write compat rtld locks.

  PR:	215826

Changes:
_U  stable/11/
  stable/11/libexec/rtld-elf/rtld_lock.c
Comment 10 commit-hook freebsd_committer 2017-01-24 17:30:44 UTC
A commit references this bug:

Author: kib
Date: Tue Jan 24 17:30:13 UTC 2017
New revision: 312701
URL: https://svnweb.freebsd.org/changeset/base/312701

Log:
  MFC r311886:
  Fix acquisition of nested write compat rtld locks.

  PR:	215826

Changes:
_U  stable/10/
  stable/10/libexec/rtld-elf/rtld_lock.c
Comment 11 Dominic Fandrey freebsd_committer 2017-01-25 14:13:03 UTC
I can confirm that stable/11 r312742 passes my tests.

Thank you for fixing!