Summary: | when close(fd) on a fifo fails with EINTR, the file descriptor is not really closed | ||
---|---|---|---|
Product: | Base System | Reporter: | Victor Stinner <victor.stinner> |
Component: | kern | Assignee: | Konstantin Belousov <kib> |
Status: | Closed FIXED | ||
Severity: | Affects Only Me | CC: | bapt, cem, jilles, kib |
Priority: | --- | Keywords: | patch, patch-ready |
Version: | 10.1-RELEASE | Flags: | koobs:
mfc-stable10?
koobs: mfc-stable9? |
Hardware: | Any | ||
OS: | Any | ||
Attachments: |
Description
Victor Stinner
2015-09-16 22:34:14 UTC
FYI related Python bug report: https://bugs.python.org/issue25122 Created attachment 161127 [details]
Syscalls seen by "truss -f" when the bug occurs
Syscalls seen by "truss -f" when the bug occurs. Attached syscalls.txt is this command as a file, just if my comment is badly formatted in Bugzilla.
parent = 57517
child = 61173
57517: open("(null)",O_WRONLY,037777777777) ERR#4 'Interrupted system call'
57517: SIGNAL 14 (SIGALRM)
57517: sigreturn(0x7fffffffe080,0x10006,0x7fffffffe080,0x0,0xffff80000060386e,0x0) ERR#4 'Interrupted system call'
=> parent is blocked on open(fifo, O_WRONLY) because the child didn't open it
yet, but the open() is interrupted by signals every 10 ms.
61173: nanosleep({0.009149571 }) = 0 (0x0)
61173: SIGNAL 14 (SIGALRM)
61173: sigreturn(0x7fffffffe040,0x10006,0x7fffffffe040,0x1ff3019bb4b18,0xffffffff8093b550,0xfffff8001fcf9980) ERR#4 'Interrupted system call'
=> child is sleeping, nanosleep() is interrupted every 10 ms by a signal
57517: open("(null)",O_WRONLY,037777777777) = 0 (0x0)
=> strange, truss says that the parent succeeded first to open a FIFO to
writing, whereas the child didn't open it yet. It looks more like a bug or
race condition in truss. I prefer to think that the parent syscall succeeded
after the open() in the child process.
=> moreover, no, the parent failed to open the FIFO: see below, it still tries
to open it!?
=> And what is this strange file descriptor 0? It is supposed to be already
be used by stdin...
61173: open("fifo_57517_465",O_RDONLY,042716203) = 4 (0x4)
=> child process opened the FIFO for reading, it should unblock open() in
the parent.
61173: close(4) ERR#4 'Interrupted system call'
=> oh oh, closing the FIFO in the child process failed with EINTR...
Bad things happens since this point.
57517: SIGNAL 14 (SIGALRM)
57517: sigreturn(0x7fffffffe080,0x10006,0x7fffffffe080,0x0,0xffff80000060386e,0x0) = 0 (0x0)
=> open() succeeded or was interrupted by EINTR?
61173: fstat(4,0x7fffffffe660) ERR#9 'Bad file descriptor'
=> fstat() confirms that the FIFO file descriptor is really closed in the
child.
61173: setitimer(0,{0.000000, 0.000000 },0x0) = 0 (0x0)
61173: close(3) = 0 (0x0)
=> fd 3 is only used to workaround truss bugs, it's a copy of the stdout
(fd 1).
61173: sigprocmask(SIG_BLOCK,SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2,0x0) = 0 (0x0)
61173: sigprocmask(SIG_SETMASK,0x0,0x0) = 0 (0x0)
61173: sigprocmask(SIG_BLOCK,SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2,0x0) = 0 (0x0)
61173: sigprocmask(SIG_SETMASK,0x0,0x0) = 0 (0x0)
61173: sigprocmask(SIG_BLOCK,SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2,0x0) = 0 (0x0)
61173: sigprocmask(SIG_SETMASK,0x0,0x0) = 0 (0x0)
61173: process exit, rval = 0
ERR#4 'Interrupted system call'
57517: open("(null)",O_WRONLY,037777777777) ERR#4 'Interrupted system call'
57517: SIGNAL 14 (SIGALRM)
57517: sigreturn(0x7fffffffe080,0x10006,0x7fffffffe080,0x0,0xffff80000060386e,0x0) ERR#4 'Interrupted system call'
=> open() still fails in the parent, whereas the child process succeeded to
open it for reading *and* closed it. The parent open() is supposed to be
unblocked since the child open() succeeded, no?
(... open sequence is repeated forever, it never succeed ...)
(In reply to Victor Stinner from comment #1) I see the 'hang', where the parent is sleeping on open(O_WRONLY), and there is no child. The situation is not too mysterious, see the explanation below. Test sets up two processes, parent does open(fifo, O_WRONLY), child does open(fifo, O_RDONLY). Either process can proceed further if other process enters open(), or a signal is delivered interrupting the sleep waiting for the residual open. Assume that the parent sleeps in open waiting for child/reader, child enters open and since there is writer waiting, the open in child succeeded. Child wakes up the parent to notify that open(O_WRONLY) can succeed as well (there is reader after all). And, the crusial point, assume that the parent gets signal notification before it gets wakeup notice. In this setup, open() in parent returns with EINTR. On the other hand, child successfully opened fifo for read, and after some more work, it exits. But the parent retries the open(O_WRONLY). There is no child to open fifo, so open() cannot succeed, it only gets interrupted by signal periodically. And, parent would not start another child until this open succeed. Hopefully this can be followed. FWIW, do not use truss, use kdump -i. Created attachment 161136 [details]
A bit more fairness into the signal interruption of fifo open
The patch would prefer to return success over the EINTR when signal is delivered while other end of fifo becomes ready simultaneously.
Note that the patch still does not fix the fundamental underground issue of the troubles, which is in the test structure itself. If child reader manages to close fifo before the parent writer sleeping in open has a chance to run, parent would deadlock, since nothing else would open the fifo for read.
"The patch would prefer to return success over the EINTR when signal is delivered while other end of fifo becomes ready simultaneously." Do you mean that the signal will be delivered later to the application? Ignoring completly a signal doesn't sound like a good idea :-) "Note that the patch still does not fix the fundamental underground issue of the troubles, which is in the test structure itself." I'm not sure that I understand: do you mean that the test has a race condition? FYI the Python test suite is executed on a wide range of architectures and platforms: http://buildbot.python.org/all/waterfall?category=3.x.stable&category=3.x.unstable The test_eintr test is executed and pass on Linux, Mac OS X, Solaris (and OpenIndiana), OpenBSD, etc. For FreeBSD, test_eintr hangs 50% of the time (raw estimation). For OpenBSD, I'm not sure, the buildbot is currently disabled. For Mac OS X, I'm surprised, I thaught that Mac OS X stole a large part of FreeBSD code :-) Well... I had a lot of trouble to write a reliable C program to reproduce the issue, so the test may hangs on any platform, but I didn't notice hang (I don't check _each_ build of each platforms, it would take too much time). But at least, I can say that the bug fails frequently on FreeBSD, whereas I never saw it failing on Linux. "If child reader manages to close fifo before the parent writer sleeping in open has a chance to run, parent would deadlock, since nothing else would open the fifo for read." On Linux, the open(O_RDONLY) in the child blocks until the open(O_WRONLY) succeed. The Linux kernel looks to ensure that both open succeed "at the time same". Short test in Python. Parent: $ python3 Python 3.4.2 (default, Jul 9 2015, 17:24:30) [GCC 5.1.1 20150618 (Red Hat 5.1.1-4)] on linux >>> import os >>> os.mkfifo("fifo") >>> f=open("fifo", "wb") ^Z [1]+ Stoppé python3 -q Child: haypo@smithers$ python3 Python 3.4.2 (default, Jul 9 2015, 17:24:30) [GCC 5.1.1 20150618 (Red Hat 5.1.1-4)] on linux >>> f=open("fifo", "rb") I used CTRL+z to stop the parent process: the child process hangs until I restart the parent process (using "fg"). (In reply to Victor Stinner from comment #5) FreeBSD also ensures that both open(O_RDONLY) and open(O_WRONLY) succeed simultaneously, but there is a race. Assume that both threads doing the opens noticed each other presence, but right after that the thread doing open(O_RDONLY) continued execution, while the thread doing(O_WRONLY) was not put on the CPU immediately. It was a scheduler decision, for whatever reasons. Then, open(O_RDONLY) succeeds, the thread continues the execution and then it could even run as far as to execute close() on the fd returned by open. After that, when open(O_WRONLY) thread eventually gets CPU, it notices that there is no readers (there was one, but it is gone now). So it returns to the sleep waiting for a reader. Created attachment 161150 [details]
Ignore signal after a wakeup, if there was a complimentary open, even transient
The new version of the patch is even more aggressive with preferring a normal return when there is both signal delivered and parallel open. Now, I run several thousand iterations of your test without hang, while before it would reliably hang on first or second dozens.
Some clarification about signals. Patch does not cause the signal to be lost. Before the patch, you get EINTR error when signal was generated _and_ parallel complimentary type of open happen. Now, the syscall returns with success and file descriptor, but before the usermode continues, a signal frame is pushed on stack and signal handler run. In other words, things are changed so that it _looks_ like the signal was delivered slightly after the wakeup from open.
The patch "Ignore signal after a wakeup, if there was a complimentary open, even transient" looks good to me, except that fi_rgen and fi_wgen should have an unsigned type to avoid undefined behaviour due to overflow. Created attachment 161179 [details]
Ignore signal after a wakeup, if there was a complimentary open, even transient v2
Thank Jilles, patch is updated to use unsigned type. Also, I think there is no reason to use longs, int generation counter should be enough.
Probably, readers and writers counters should be also reduced to int, but this is for different change.
After applying the patch I can confirm that now the issue reported is not appearing anymore. On a box where the test did hang after few seconds I have been running the test 8h without any issue A commit references this bug: Author: kib Date: Sun Sep 20 21:18:34 UTC 2015 New revision: 288044 URL: https://svnweb.freebsd.org/changeset/base/288044 Log: Ensure that when a blockable open of fifo returns success, a valid file descriptor opened for complimentary access exists as well. The implementation of the guarantee is done by counting the generations of readers and writers opens. We return success and not EINTR or ERESTART error, when the sleep for complimentary opening is interrupted, but the generation was changed during the sleep. Longer explanation: assume there are two threads, A doing open("fifo", O_RDONLY) and B doing open("fifo", O_WRONLY), and no other threads either trying to open the fifo, nor there are any file descriptors referencing the fifo. Before the change, it was possible e.g. for for thread A to return a valid file descriptor, while thread B returned EINTR if a signal to B was delivered simultaneously with the wakeup from A. After the change, in this situation both A::open() and B::open() succeed and the signal is made "as if" it was noticed slightly later. Note that the signal actual delivery is not changed, it is done by ast on syscall return path, so signal handler is still executed before first instruction after syscall. See PR for the code demonstrating the issue. PR: 203162 Reported by: Victor Stinner victor.stinner@gmail.com Reviewed by: jilles Tested by: bapt, pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Changes: head/sys/fs/fifofs/fifo_vnops.c A commit references this bug: Author: pho Date: Mon Sep 21 08:44:24 UTC 2015 New revision: 288058 URL: https://svnweb.freebsd.org/changeset/base/288058 Log: Added a regression test. PR: 203162 In collaboration with: kib Sponsored by: EMC / Isilon storage division Changes: user/pho/stress2/misc/mkfifo3.sh Please set mfc-* flags to "+" once MFC'd to stable/10 and stable/9. If not applicable to a particular branch, set to "-" with comment Assign to committer that resolved. Pending MFC |