Description /bin/sh exits when reading non-ascii characters in ja_JP.eucJP environment! That is, the /bin/sh can not be used in EUC Japanese environment. How-To-Repeat: With 11.2-RELEASE(amd64|i386) 1) set /bin/sh as login-sh then login, or execute "exec sh" in other shell. 2) press key like Alt-A (or Alt-B, Alt-C, .. Alt-1, Alt-4, ..), or input eucJP KANJI code. or 1) in window opend by "xterm /bin/sh" or "kterm" 2) press key like Alt-A (or Alt-B, Alt-C, .. Alt-1, Alt-4, ..), or paste eucJP KANJI code. The cause of this problem lies in (/usr/src/lib)/)libedit, which makes it impossible to handle the eucJP character code with the read_char() function in read.c file.
As a cause of /bin/sh collapse in ja.JP.eucJP environment, there seems to be two problem as below. 1) /usr/src/bin/sh The control characters (CTLENDVAR, CTLBACKQ, CTLARI, CTLENDARI, CTLQUOTEMARK defined in parser.h) matches the second byte of many EUC KANJI characters. Therefore, lexical syntax analysis fails. 2) /usr/src/bin/libedit The following two functions do not consider ja.JP.eucJP environment. chartype.c: ct_conv_cbuff_resize() read.c: read_char() Since /bin/sh is one of the basic programs of Unix, if it rejects all character codes except UTF-8, it should display warning such as "Can not be used in eucJP environment" when it started. At present it exits normally with EUC Kanji code input or key input such as Alt-A, Alt-B, ... Does the same problem occur in ko_KR.eucKR or zh_CN.eucCN environment?
In my investigation, main reason of this problem is because read_char() function doesn't retry read(2) from STDIN when mbrtowc(3) returns -2. In lib/libedit/read.c, we can see following code that retries only when CHARSET_IS_UTF8 flag is set. ``` switch (ct_mbrtowc(cp, cbuf, cbp)) { <snip> case (size_t)-2: /* * We don't support other multibyte charsets. * The second condition shouldn't happen * and is here merely for additional safety. */ if ((el->el_flags & CHARSET_IS_UTF8) == 0 || cbp >= MB_LEN_MAX) { errno = EILSEQ; *cp = L'\0'; return -1; } /* Incomplete sequence, read another byte. */ goto again; ``` Of course, CHARSET_IS_UTF8 flag is not set in eucJP environment. Try cutting CHARSET_IS_UTF8 flag check, /bin/sh works to read eucJP code. And I found another problem with cutting CHARSET_IS_UTF8 flag check. It is that command history mistakes calculating eucJP character length, because ct_enc_width() function in chartype.c doesn't understand other charset than UTF-8. I rewrite ct_enc_width() to use wctomb(3), command history problem is fixed. With these two changes, we don't need CHARSET_IS_UTF8 flag any more. CHARSET_IS_UTF8 flag controls NARROW_HISTORY flag, and NARROW_HISTORY flag is used only in HIST_FUN definition. ``` #ifdef WIDECHAR #define HIST_FUN(el, fn, arg) \ (((el)->el_flags & NARROW_HISTORY) ? hist_convert(el, fn, arg) : \ HIST_FUN_INTERNAL(el, fn, arg)) #else #define HIST_FUN(el, fn, arg) HIST_FUN_INTERNAL(el, fn, arg) #endif ``` In WIDECHAR environment, hist_convert() should be called always, because hist_convert() is a multibyte aware function. For all my fix, I opened new differential on Phabricator. https://reviews.freebsd.org/D17903 I believe my fix solve this problem and doesn't affect other charset than eucJP. Please review my code. Hirabayashi-san: Could you please try my patch from Phabricator and check if this problem is fixed? I don't think /bin/sh is wrong.
I confirmed that the above patch can solve this problem.
Let's not close this just yet, and see if we can actually fix this in the tree :)
This problem is fixed by r340933. https://svnweb.freebsd.org/base?view=revision&revision=340933 Could you please MFC to stable/12 and stable/11. Thanks.
MFC done, sorry about the delay