Bug 148581 - [libc] fopen(3) fails with EMFILE if there are more than SHORT_MAX fds open
Summary: [libc] fopen(3) fails with EMFILE if there are more than SHORT_MAX fds open
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 7.3-RELEASE
Hardware: Any Any
: Normal Affects Only Me
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-07-13 22:30 UTC by Manish Vachharajani
Modified: 2018-05-21 00:00 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Manish Vachharajani 2010-07-13 22:30:07 UTC
fopen will fail with EMFILE if there are more than SHORT_MAX file descriptors open in the process.  It does not matter that these fds were not created by fopen.  

To make matters worse gcc's libstdc++ uses fopen and friends to implement ofstream so those function mysteriously fail if there are more than 32k fds open in the process.

Fix: 

The simple fix is to make _file from struct sFILE { ... } FILE; in stdio.h an int instead of a short.  However, this will break binary compatibility with anyone compiled with an old libc.

A very dirty fix that would not break binary compatibility is, for each architecture, use the open space from the padding and alignment requirements of FILE to stash the other bits of _file and make all users of FILE use an accessor macro that pulls out the right bits.

A quick fix to double the threshold at which the problem occurs would be to make _file an unsigned short and use the all 1's value to indicate that this is not a file resource.  Not sure if this will work either, though.
How-To-Repeat: To reproduce the problem compile and run:

#include <errno.h>
#include <fcntl.h>
#include <stdio.h>
#include <string.h>

#define BIGNUM 50000
int main(int argc, char *argv[]) {
  int i;
  int fds[BIGNUM];
  for(i=0; i < BIGNUM; ++i) {
    fds[i] = open("/dev/null", O_RDONLY);
  }

  FILE *fil = fopen("/dev/null", "r");
  if(fil == NULL) {
    fprintf(stderr, "Could not open /dev/null: %s\n" , strerror(errno));
  }

  for(i=0; i < BIGNUM; ++i) {
    close(fds[i]);
  }

  return 0;
}
Comment 1 Jilles Tjoelker freebsd_committer 2010-07-17 13:06:00 UTC
Strictly speaking, your very dirty supposedly safe fix breaks binary
compatibility because fileno() (in non-threaded programs only) and
fileno_unlocked() are macros that hard-code the location and size of the
_file field into binaries. If you have code compiled before the change
in the same process as code compiled after the change, it might happen
that data is read/written from/to the wrong descriptor.

What may work is extending FILE (although I'm not entirely sure that
there is noone that allocates their own FILE) with a 32-bit file
descriptor field. If the file descriptor exceeds 32767, the 16-bit field
then contains -1 and fileno() in old binaries will return that. This
will at least fail safely although fileno() is not defined to return
error conditions (but it has always returned -1 if the FILE is not
associated with a file descriptor).

-- 
Jilles Tjoelker
Comment 2 tobias.oberstein 2011-11-14 17:47:26 UTC
Using Manish's test, I could verify that the bug is still present on both i=
386 and amd64.

FreeBSD XXXXX 8.2-RELEASE FreeBSD 8.2-RELEASE #0: Thu Feb 17 02:41:51 UTC 2=
011     root@mason.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC  amd64

FreeBSD XXXXX 8.2-RELEASE-p3 FreeBSD 8.2-RELEASE-p3 #0: Tue Sep 27 18:07:27=
 UTC 2011     root@i386-builder.daemonology.net:/usr/obj/usr/src/sys/GENERI=
C  i386

and even on FreeBSD 9 RC1 !!!!

FreeBSD autobahnhub2 9.0-RC1 FreeBSD 9.0-RC1 #0: Tue Oct 18 18:30:38 UTC 20=
11     root@obrian.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC  i386

=3D=3D

I'm doing a kqueue-based network service using Python/Twisted which will ha=
ppily
accept >50k TCP connections, but then bails out on Python open(<file>), sin=
ce
Python uses fopen(), and

"It does not matter that these fds were not created by fopen."

Python can't be recompiled to use open() (Posix) instead of fopen() (libc).

Only the new Python IO does not use fopen() ... but this leads to other pro=
blems (for me).

=3D=3D

So this won't be fixed even for FreeBSD 9?

Please ...
Comment 3 Karl Young 2015-09-08 21:39:52 UTC
We've run into this on 9.2.  We're using OpenSource software that uses popen() to run external commands.  Since popen() returns FILE*, it fails when we get to 32K open files.  This seems to be rarely hit (no comments since 2011), but it can be a show-stopper.

The only workaround I see in our case is to roll our own popen().

BTW in comment 1:

> with a 32-bit file descriptor field. If the file descriptor exceeds 32767, the 16-bit > field then contains -1 and fileno() in old binaries will return that.

I think this fails when FD gets to 64K and the short version starts counting up from 0 again.
Comment 4 Eitan Adler freebsd_committer freebsd_triage 2018-05-21 00:00:03 UTC
For bugs matching the following conditions:
- Status == In Progress
- Assignee == "bugs@FreeBSD.org"
- Last Modified Year <= 2017

Do
- Set Status to "Open"
Comment 5 Eitan Adler freebsd_committer freebsd_triage 2018-05-21 00:00:35 UTC
For bugs matching the following conditions:
- Status == In Progress
- Assignee == "bugs@FreeBSD.org"
- Last Modified Year <= 2017

Do
- Set Status to "Open"