Bug 164947 - [patch] tee(1) loses data when writing to non-blocking file descriptors
Summary: [patch] tee(1) loses data when writing to non-blocking file descriptors
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: unspecified
Hardware: Any Any
: Normal Affects Only Me
Assignee: freebsd-bugs mailing list
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-02-10 07:10 UTC by Diomidis Spinellis
Modified: 2018-05-20 23:50 UTC (History)
1 user (show)

See Also:


Attachments
file.txt (1.04 KB, text/plain)
2012-02-10 07:10 UTC, Diomidis Spinellis
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Diomidis Spinellis 2012-02-10 07:10:09 UTC
When tee(1) tries to write to a file descriptor that has been set to non-blocking mode the write(2) call may fail with EAGAIN.  Instead of retrying the operation, tee will throw that chunk of data away.

Fix: I attach a patch that fixes the problem.

Patch attached with submission follows:
How-To-Repeat: Run the following:
#!/usr/local/bin/bash
# bash needed for the >(...) functionality
# ssh apparently sets O_NONBLOCK
# Remove the 2>/dev/null to see tee complaining
dd count=100000 if=/dev/zero | 
tee >(ssh localhost dd of=/dev/null) 2>/dev/null | 
(ssh localhost dd of=/dev/null)

100000+0 records in
100000+0 records out
51200000 bytes transferred in 9.224390 secs (5550503 bytes/sec)
100000+0 records in
100000+0 records out
51200000 bytes transferred in 9.061471 secs (5650297 bytes/sec)
92080+0 records in
92080+0 records out
47144960 bytes transferred in 9.101738 secs (5179776 bytes/sec)
Comment 1 Martin Cracauer 2012-02-10 19:17:36 UTC
I don't think it is ssh that is causing this. If you use a named pipe
explicitly and hook ssh up to that the error doesn't appear.  Seems to
be something that bash is doing there.

That doesn't mean I am opposed to handling EAGAIN.

The way I normally do it is a simple retry loop, not using select.
I'm aware of the tradeoffs, so far I was always better off not
investing a second system call into every retry.

Martin

-- 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Martin Cracauer <cracauer@cons.org>   http://www.cons.org/cracauer/
Comment 2 Diomidis Spinellis 2012-02-10 20:32:02 UTC
> I don't think it is ssh that is causing this. If you use a named pipe
> explicitly and hook ssh up to that the error doesn't appear.  Seems to
> be something that bash is doing there.

I think the named pipe isolates the write fd from the ssh end.  If you 
use cat or dd instead of ssh the problem goes away.

> That doesn't mean I am opposed to handling EAGAIN.
>
> The way I normally do it is a simple retry loop, not using select.
> I'm aware of the tradeoffs, so far I was always better off not
> investing a second system call into every retry.

I agree this can be cheaper for many cases, but it can become very 
expensive for long waits.
Comment 3 Martin Cracauer 2012-02-10 21:03:02 UTC
> I think the named pipe isolates the write fd from the ssh end.  If you 
> use cat or dd instead of ssh the problem goes away.

Do you happen to know what bash does there, exactly? I was assuming it
is creating a named pipe behind the user's back.

I noticed that if you do ssh on the "tee part" and something else on
the end of the regular pipe then things also fail.  On the other hand
if you put the "tee part" on something else and the regular pipe on
ssh things never seem to fail.

tee treats both fds the same, and obviously ssh is always setting up
it's input the same way, so the difference must be in what bash is
doing there with that "pipe emulation".

> I agree this can be cheaper for many cases, but it can become very 
> expensive for long waits.

I'd like to understand what exactly is special about the way bash
implements that feature so that we can make a more educated decision
about the tradeoff of using select or not.

Martin
-- 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Martin Cracauer <cracauer@cons.org>   http://www.cons.org/cracauer/
Comment 4 Diomidis Spinellis 2012-02-10 21:17:03 UTC
> Do you happen to know what bash does there, exactly? I was assuming it
> is creating a named pipe behind the user's back.

It is creating a normal pipe and providing it as an argument through 
/dev/fd.  Try

ls -l /dev/fd >(wc -l)

> I noticed that if you do ssh on the "tee part" and something else on
> the end of the regular pipe then things also fail.  On the other hand
> if you put the "tee part" on something else and the regular pipe on
> ssh things never seem to fail.

On 8.1 release I needed both ends to run ssh to see the problem.


BTW The problem also manifests itself on Mac OS X and Linux :-)
Comment 5 Martin Cracauer 2012-02-10 22:16:32 UTC
> It is creating a normal pipe and providing it as an argument through 
> /dev/fd.  Try
> 
> ls -l /dev/fd >(wc -l)

Hmmm, this is what I get in ps from this pipe:
28571  1  T    0:01.56 emacs -nw tee.c.rej
29598  1  T    0:00.00 cstream -n 10m -i- -v2
29599  1  T    0:00.00 -bash (bash)
29600  1  T    0:00.02 ssh localhost dd of=/dev/null
29603  1  T    0:00.00 tee /tmp/cracauer/sh-np-1328937382
29609  1  R+   0:00.00 ps
usr.bin/tee(wings)152% ls -l  /tmp/cracauer/sh-np-1328937382
prw-------  1 cracauer  wheel  0 Feb 10 16:38 /tmp/cracauer/sh-np-1328937382|

Either way, I tested your patch, it fixes the problem and it's
obviously correct (EAGAIN needs to be taken into account) so I'm gonna
commit it.

-- 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Martin Cracauer <cracauer@cons.org>   http://www.cons.org/cracauer/
Comment 6 listlog2011 2012-02-11 03:21:38 UTC
> When tee(1) tries to write to a file descriptor that has been set to
> non-blocking mode the write(2) call may fail with EAGAIN.  Instead of
> retrying the operation, tee will throw that chunk of data away.

so tee should also work with non-blocking read,  your patch is incomplete.
Comment 7 Diomidis Spinellis 2012-02-11 07:25:34 UTC
>> When tee(1) tries to write to a file descriptor that has been set to
>> non-blocking mode the write(2) call may fail with EAGAIN. Instead of
>> retrying the operation, tee will throw that chunk of data away.
> so tee should also work with non-blocking read, your patch is incomplete.

You're right.  By the same argument all other utilities should also be 
fixed.  However, this may create new bugs and instability. For the 
specific case of tee writing I offered a test case, demonstrating the 
problem.  This was distilled from an actual production use (scattering a 
dump to tape and disk).  I think it's best to fix each utility as the 
need arises.
Comment 8 Tim 2017-07-25 09:37:12 UTC
MARKED AS SPAM
Comment 9 Eitan Adler freebsd_committer freebsd_triage 2018-05-20 23:50:16 UTC
For bugs matching the following conditions:
- Status == In Progress
- Assignee == "bugs@FreeBSD.org"
- Last Modified Year <= 2017

Do
- Set Status to "Open"