Bug 239803 - Problem with concurrent transfer of file descriptors
Summary: Problem with concurrent transfer of file descriptors
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 12.0-STABLE
Hardware: Any Any
: --- Affects Some People
Assignee: freebsd-bugs mailing list
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-08-12 16:07 UTC by knizhnik@garret.ru
Modified: 2019-09-11 14:06 UTC (History)
2 users (show)

See Also:


Attachments
Reporduce the problem with file description corruption (3.75 KB, text/plain)
2019-08-12 16:07 UTC, knizhnik@garret.ru
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description knizhnik@garret.ru 2019-08-12 16:07:53 UTC
Created attachment 206476 [details]
Reporduce the problem with file description corruption

Transferring file descriptors to another process without waiting confirmations leads to unexpected behavior: descriptor is successfully transferred but attempt to read from is reported as EOF (recv returns 0).

Attached example reproduces the problem (please build it with -pthread option).
If number of client threads is set to 1, then problem is not reproduced.
Another way to fix the problem is to wait confirmation about descriptor delivery and close socket only after it.

It is seems to be critical that socket should not be closed before confirmation of the transfer: if we wait confirmation but close socket before, then the problem is still persists.

The bug is reproduced at several versions of FreeBSD including 12.0-stable and also recent Mac-OS/X.

There was bug report with similar symptoms:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=131876

But looks likes r343784 is not fixing it.
Comment 1 Jason A. Harmening freebsd_committer 2019-09-10 19:15:17 UTC
Possibly the same root cause as https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=227285 ?   Does the patch for cycle-aware gc posted there fix this issue?
Comment 2 Stas Kelvich 2019-09-11 14:06:05 UTC
I've applied patch from bug #227285 on r352207 and it solves the problem. However dmesg is full of such messages:

```
uma_zalloc_arg: zone "16" with the following non-sleepable locks held:
shared rw unp_link_rwlock (unp_link_rwlock) r = 0 (0xffffffff81f673f0) locked @ /usr/src/sys/kern/uipc_usrreq.c:2610
stack backtrace:
#0 0xffffffff80c3f243 at witness_debugger+0x73
#1 0xffffffff80c40262 at witness_warn+0x442
#2 0xffffffff80f02afb at uma_zalloc_arg+0x3b
#3 0xffffffff80bab2db at malloc+0x9b
#4 0xffffffff80c7b23c at unp_gc+0x2dc
#5 0xffffffff80c31cec at taskqueue_run_locked+0x10c
#6 0xffffffff80c32c68 at taskqueue_thread_loop+0x88
#7 0xffffffff80b90ad4 at fork_exit+0x84
#8 0xffffffff8116c65e at fork_trampoline+0xe
```