Created attachment 206476 [details]
Reporduce the problem with file description corruption
Transferring file descriptors to another process without waiting confirmations leads to unexpected behavior: descriptor is successfully transferred but attempt to read from is reported as EOF (recv returns 0).
Attached example reproduces the problem (please build it with -pthread option).
If number of client threads is set to 1, then problem is not reproduced.
Another way to fix the problem is to wait confirmation about descriptor delivery and close socket only after it.
It is seems to be critical that socket should not be closed before confirmation of the transfer: if we wait confirmation but close socket before, then the problem is still persists.
The bug is reproduced at several versions of FreeBSD including 12.0-stable and also recent Mac-OS/X.
There was bug report with similar symptoms:
But looks likes r343784 is not fixing it.
Possibly the same root cause as https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=227285 ? Does the patch for cycle-aware gc posted there fix this issue?
I've applied patch from bug #227285 on r352207 and it solves the problem. However dmesg is full of such messages:
uma_zalloc_arg: zone "16" with the following non-sleepable locks held:
shared rw unp_link_rwlock (unp_link_rwlock) r = 0 (0xffffffff81f673f0) locked @ /usr/src/sys/kern/uipc_usrreq.c:2610
#0 0xffffffff80c3f243 at witness_debugger+0x73
#1 0xffffffff80c40262 at witness_warn+0x442
#2 0xffffffff80f02afb at uma_zalloc_arg+0x3b
#3 0xffffffff80bab2db at malloc+0x9b
#4 0xffffffff80c7b23c at unp_gc+0x2dc
#5 0xffffffff80c31cec at taskqueue_run_locked+0x10c
#6 0xffffffff80c32c68 at taskqueue_thread_loop+0x88
#7 0xffffffff80b90ad4 at fork_exit+0x84
#8 0xffffffff8116c65e at fork_trampoline+0xe
*** This bug has been marked as a duplicate of bug 227285 ***