Bug 272758 - c16rtomb and c32rtomb wrong return value (at least on aarch64)
Summary: c16rtomb and c32rtomb wrong return value (at least on aarch64)
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: misc (show other bugs)
Version: 13.1-STABLE
Hardware: Any Any
: --- Affects Only Me
Assignee: freebsd-numerics (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-07-27 16:28 UTC by Philipp
Modified: 2023-09-12 17:25 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Philipp 2023-07-27 16:28:58 UTC
When looking into an SDCC regression test failing for test-host, I found the following issue:

The mbrtoc16 and mbrtoc32 functions return a wrong value for my test case.
I compiled the following code on a Raspi 4 running FreeBSD13 via "cc test.c" when executing the resulting binary, the last assertion fails.

#include <limits.h>
#include <assert.h>
#include <uchar.h>

int main(void)
{
  static mbstate_t ps;
  char16_t c16[3];
  char c[MB_LEN_MAX] = "C";
  assert(mbrtoc16(c16, c, 1, &ps) == 1);
  assert(mbrtoc16(c16 + 1, c + 1, 1, &ps) == 0); // Writes a null wide character and thus puts ps into the initial conversion state (C2X section 7.30.1.3)
  assert(c16[0] == (u"C")[0]);
  assert(c16rtomb(c, c16[0], &ps) == 1);
  return(0);
}

I do not have any non-aarch64 FreeBSD 13.1 systems to test. But the test does not fail for Debian GNU/Linux on aarch64 and amd64.
Comment 1 John F. Carr 2023-07-27 16:47:05 UTC
Clearing the mbstate_t argument before calling c16rtomb causes the test to pass.

memset(&ps, 0, sizeof ps);
Comment 2 Dimitry Andric freebsd_committer freebsd_triage 2023-09-12 17:09:28 UTC
I think numerics@ is not the right assignee, since that is mostly for math-related problems (i.e. mostly lib/msun). mbrtowc, mbrtoc16, and mbrtoc32 are character conversion functions.
Comment 3 Dimitry Andric freebsd_committer freebsd_triage 2023-09-12 17:24:20 UTC
(In reply to John F. Carr from comment #1)
Yes, setting the mbstate_t to zero is what should be done. Quoting C11 7.29.6:
> The initial conversion state corresponds, for a conversion in either direction, to the beginning of a new multibyte character in the initial shift state. A zero-valued mbstate_t object is (at least) one way to describe an initial conversion state. A zero- valued mbstate_t object can be used to initiate conversion involving any multibyte character sequence, in any LC_CTYPE category setting. If an mbstate_t object has been altered by any of the functions described in this subclause, and is then used with a different multibyte character sequence, or in the other conversion direction, or with a different LC_CTYPE category setting than on earlier function calls, the behavior is undefined.
Comment 4 Dimitry Andric freebsd_committer freebsd_triage 2023-09-12 17:25:23 UTC
Ugh, to make that more readable:

> The initial conversion state corresponds, for a conversion in either
> direction, to the beginning of a new multibyte character in the
> initial shift state. A zero-valued mbstate_t object is (at least) one
> way to describe an initial conversion state. A zero- valued mbstate_t
> object can be used to initiate conversion involving any multibyte
> character sequence, in any LC_CTYPE category setting. If an mbstate_t
> object has been altered by any of the functions described in this
> subclause, and is then used with a different multibyte character
> sequence, or in the other conversion direction, or with a different
> LC_CTYPE category setting than on earlier function calls, the behavior
> is undefined.