Bug 212033 - fgetwln(3) fails to report most encoding errors
Summary: fgetwln(3) fails to report most encoding errors
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Some People
Assignee: Andrey A. Chernov
URL:
Keywords: patch
Depends on:
Blocks:
 
Reported: 2016-08-21 21:33 UTC by Ingo Schwarze
Modified: 2016-08-27 11:15 UTC (History)
1 user (show)

See Also:
ache: mfc-stable11+
ache: mfc-stable10+


Attachments
Patch to stdio/fgetwln.c to make fgetwln(3) fail on fgetwc(3) failure. (809 bytes, patch)
2016-08-21 21:33 UTC, Ingo Schwarze
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Ingo Schwarze 2016-08-21 21:33:24 UTC
Created attachment 173919 [details]
Patch to stdio/fgetwln.c to make fgetwln(3) fail on fgetwc(3) failure.

The fgetwln(3) manual is quite explicit that the "fgetwln() function
may also fail ... for any of the errors specified for ... mbrtowc(3)"
and that it must return NULL in case of failure.  That's sensible;
we shouldn't expect programmers to inspect ferror(3) or errno(2)
after getting a function return value indicating success.

However, after reading a single valid character, fgetwln(3) will
mistreat all subsequent encoding errors as newlines - returning
success when encountering an invalid encoding, but still setting
both errno(3) and the stdio error indicator.

I committed a fix to OpenBSD which is likely to apply mostly as-is
to FreeBSD as well, see the attachment.

original bug report: http://marc.info/?l=openbsd-tech&m=147178904527666
OpenBSD commit: http://marc.info/?l=openbsd-cvs&m=147181388632431
http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/lib/libc/stdio/fgetwln.c
Comment 1 Andrey A. Chernov freebsd_committer freebsd_triage 2016-08-22 13:05:19 UTC
The check miss one condition:

if (len == 0 || (wc == WEOF && (fp->_flags & __SERR)))
    goto error;
Comment 2 Andrey A. Chernov freebsd_committer freebsd_triage 2016-08-22 13:24:54 UTC
Moreover, the whole code is broken for empty input, it returns NULL instead of pointer to L'\0'
Comment 3 Ingo Schwarze 2016-08-22 14:12:27 UTC
(In reply to Andrey Chernov from comment #1)

> The check miss one condition:  [...]  wc == WEOF

Checking wc is needless; it's even safer to *not* check it.
When the error indicator is set, fgetwln(3) should better fail.

Regarding fgetwc(3), POSIX says: "If an error occurs, the resulting value of the file position indicator for the stream is unspecified."  The OpenBSD manual says: "The end-of-file condition is remembered, even on a terminal, and all subsequent attempts to read will return WEOF until the condition is cleared with clearerr(3)."

So, as soon as the error indicator goes up, all bets are off for reading from the stream.  In that case, fgetwc(3) is expected to always fail.  And even if it happened not too, that would just be weird, and fgetwln(3) should better fail anyway.
Comment 4 Ingo Schwarze 2016-08-22 14:24:51 UTC
(In reply to Andrey Chernov from comment #2)

> Moreover, the whole code is broken for empty input,
> it returns NULL instead of pointer to L'\0'

No, the code is correct in *that* respect.

Empty input means we are at EOF right away.  At EOF, the function is documented to return NULL without setting errno or the stdio error indicator.  Returning L"" means there is an empty line, and an empty line is not the same as empty input.

For example:

Empty line:  Reading from the pipe 'printf "\n" |' returns L"" to the first call of fgetwln(3) and NULL to the second.

Empty input:  Reading from the pipe 'printf "" |' returns NULL to the first call of fgetwln(3).
Comment 5 commit-hook freebsd_committer freebsd_triage 2016-08-22 15:45:41 UTC
A commit references this bug:

Author: ache
Date: Mon Aug 22 15:44:55 UTC 2016
New revision: 304607
URL: https://svnweb.freebsd.org/changeset/base/304607

Log:
  Fix error processing.
  1) Don't forget to set __SERR on __slbexpand() error.
  2) Check for __fgetwc() errors using errno. Don't check for __SERR
  as PR suggested, it user-visible flag which can stick from previous
  functions and stdio code don't check it for this purpose.

  PR:     212033
  MFC after:      3 days

Changes:
  head/lib/libc/stdio/fgetwln.c
Comment 6 Ingo Schwarze 2016-08-22 16:41:34 UTC
(In reply to commit-hook from comment #5)

I still consider the additional checks bogus.
Unless there is a very serious bug in fgetwc(3), __SERR will always be set after failure.  For the same reason, setting __SERR once again in fgetwc(3) is useless.

If __SERR is already set (for whatever reason) before fgetwln(3) is called, then POSIX requires that fgetwc(3) must fail, so fgetwln(3) should fail, too.

So the whole errno dance looks like needless and confusing detour.
Comment 7 Andrey A. Chernov freebsd_committer freebsd_triage 2016-08-22 16:48:48 UTC
(In reply to Ingo Schwarze from comment #6)
1) It is not set _again_, if you read commit comment. It is for __slbexpand(). See fgetln.c code.
2) There is no fgetwln() in the POSIX and all other stdio functions never check __SERR to detect error, just set it, see the commit comment again. It looks very strange, if f.e. fgetwc() works after setting __SERR (and it works) but fgetwln() does not.
Comment 8 Andrey A. Chernov freebsd_committer freebsd_triage 2016-08-22 17:33:53 UTC
(In reply to Ingo Schwarze from comment #6)
Of course, the code can use
	if (wc == WEOF && errno != 0)
 		goto eof;
instead of 
	if (wc == WEOF && errno != 0)
 		goto error;
to avoid additional _SERR set for fgetwc() errors, but it makes code intention more cryptic and unequal with __slbexpand() errors (with invisible assumption that fgetwc() does it for us). Since it is error case, there is no needs to optimize such case.

POSIX don't say that fgetwc() must fail if __SERR is already set.
Comment 9 commit-hook freebsd_committer freebsd_triage 2016-08-22 22:29:36 UTC
A commit references this bug:

Author: ache
Date: Mon Aug 22 22:28:41 UTC 2016
New revision: 304641
URL: https://svnweb.freebsd.org/changeset/base/304641

Log:
  1) Back out r304607 case 2). fgetwln() as its pair fgetln() supposed to
  return partial line on any errors. See the comment in fgetln.c.
  Add corresponding comment to fgetwln() too.
  2) Rewrite r304607 case 1).
  3) Remove "Fast path" from __fgetwc_mbs() since it can't detect encoding
  errors and ignores them all.

  PR:     212033
  MFC after:      7 days

Changes:
  head/lib/libc/stdio/fgetwc.c
  head/lib/libc/stdio/fgetwln.c
Comment 10 commit-hook freebsd_committer freebsd_triage 2016-08-25 21:15:08 UTC
A commit references this bug:

Author: ache
Date: Thu Aug 25 21:14:26 UTC 2016
New revision: 304819
URL: https://svnweb.freebsd.org/changeset/base/304819

Log:
  Original fgetln() from 44lite return sucess for line tail errors,
  i.e. partial line, but set __SERR and errno in the same time, which
  is inconsistent.
  Now both OpenBSD and NetBSD return failure, i.e. no line and set error
  indicators for such case, so make our fgetln() and fgetwln()
  (as its wide version) compatible with the rest of *BSD.

  PR:     212033
  MFC after:      7 days

Changes:
  head/lib/libc/stdio/fgetln.c
  head/lib/libc/stdio/fgetwln.c
Comment 11 commit-hook freebsd_committer freebsd_triage 2016-08-27 10:34:07 UTC
A commit references this bug:

Author: ache
Date: Sat Aug 27 10:34:02 UTC 2016
New revision: 304893
URL: https://svnweb.freebsd.org/changeset/base/304893

Log:
  MFC r304607,r304641,r304819,r304811

  1) Don't forget to set __SERR on __slbexpand() error.

  2) Remove "Fast path" from fgetwc()/fputwc() since it can't detect
  encoding errors and ignores them all.
  One of affected encoding example: US-ASCII

  3)  Original fgetln() from 44lite return success for line tail errors,
  i.e. partial line, but set __SERR and errno in the same time, which
  is inconsistent.
  Now both OpenBSD and NetBSD return failure, i.e. no line and set error
  indicators for such case, so make our fgetln() and fgetwln()
  (as its wide version) compatible with the rest of *BSD.

  PR:     212033

Changes:
_U  stable/10/
  stable/10/lib/libc/stdio/fgetln.c
  stable/10/lib/libc/stdio/fgetwc.c
  stable/10/lib/libc/stdio/fgetwln.c
  stable/10/lib/libc/stdio/fputwc.c
Comment 12 commit-hook freebsd_committer freebsd_triage 2016-08-27 11:08:12 UTC
A commit references this bug:

Author: ache
Date: Sat Aug 27 11:07:57 UTC 2016
New revision: 304896
URL: https://svnweb.freebsd.org/changeset/base/304896

Log:
  MFC r304607,r304641,r304819,r304811

  1) Don't forget to set __SERR on __slbexpand() error.

  2) Remove "Fast path" from fgetwc()/fputwc() since it can't detect
  encoding errors and ignores them all.
  One of affected encoding example: US-ASCII

  3)  Original fgetln() from 44lite return success for line tail errors,
  i.e. partial line, but set __SERR and errno in the same time, which
  is inconsistent.
  Now both OpenBSD and NetBSD return failure, i.e. no line and set error
  indicators for such case, so make our fgetln() and fgetwln()
  (as its wide version) compatible with the rest of *BSD.

  PR:     212033

Changes:
_U  stable/11/
  stable/11/lib/libc/stdio/fgetln.c
  stable/11/lib/libc/stdio/fgetwc.c
  stable/11/lib/libc/stdio/fgetwln.c
  stable/11/lib/libc/stdio/fputwc.c
Comment 13 Andrey A. Chernov freebsd_committer freebsd_triage 2016-08-27 11:14:02 UTC
Done. 
BTW, I see, both OpenBSD and NetBSD check __SERR in fgetwln() due to your patch instead of (wc == WEOF && !__sfeof(fp)) which is the bug.