Bug 246670

Summary: bsdtar: Fails to extract (UTF-8) under QEMU_EMULATING
Product: Base System Reporter: Danilo G. Baio <dbaio>
Component: binAssignee: Port Management Team <portmgr>
Status: New ---    
Severity: Affects Some People CC: cem, chris, chris, jbeich, kevans, pi, portmgr, salvadore, vishwin
Priority: ---    
Version: Unspecified   
Hardware: arm64   
OS: Any   
See Also: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=246618
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=262048
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=271052
Attachments:
Description Flags
set C.UTF-8 as ports tree locale vishwin: maintainer-approval? (portmgr)

Description Danilo G. Baio freebsd_committer freebsd_triage 2020-05-23 01:26:20 UTC
root@12-aarch64-default:~ # locale
LANG=
LC_CTYPE="C"
LC_COLLATE="C"
LC_TIME="C"
LC_NUMERIC="C"
LC_MONETARY="C"
LC_MESSAGES="C"
LC_ALL=

root@12-aarch64-default:/tmp # tar -zxf /portdistfiles/python/Sphinx-3.0.3.tar.gz
tar: Pathname can't be converted from UTF-8 to current locale.
tar: Error exit delayed from previous errors.
root@12-aarch64-default:/tmp # echo $?
1

root@12-aarch64-default:/tmp # setenv LANG en_US.UTF-8
root@12-aarch64-default:/tmp # setenv LC_ALL en_US.UTF-8
root@12-aarch64-default:/tmp # setenv MM_CHARSET UTF-8
root@12-aarch64-default:/tmp # locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_ALL=en_US.UTF-8

root@12-aarch64-default:/tmp # tar -zxf /portdistfiles/python/Sphinx-3.0.3.tar.gz
root@12-aarch64-default:/tmp # echo $?
0


More details on bug #246618
Reported by jbeich@
Comment 1 Conrad Meyer freebsd_committer freebsd_triage 2020-05-23 01:40:05 UTC
What's the bug here?  In the C locale, some UTF-8 characters are not representable.  If you're actually using UTF-8, you should set your locale correctly.  In 13-CURRENT this is somewhat improved by making the default locale "C.UTF-8" instead of "C[.USASCII]".
Comment 2 Danilo G. Baio freebsd_committer freebsd_triage 2020-05-23 01:55:01 UTC
(In reply to Conrad Meyer from comment #1)

It just happens under QEMU_EMULATING, default system.
Comment 3 Conrad Meyer freebsd_committer freebsd_triage 2020-05-23 01:57:02 UTC
Seems like a problem with QEMU_EMULATING.
Comment 4 Danilo G. Baio freebsd_committer freebsd_triage 2020-05-23 02:01:50 UTC
(In reply to Conrad Meyer from comment #3)

Right. Should I move it to ports (emulators/qemu-user-static) ?
Comment 5 Jan Beich freebsd_committer freebsd_triage 2020-05-23 02:44:42 UTC
(In reply to Conrad Meyer from comment #1)
> In the C locale, some UTF-8 characters are not representable.

Indeed. Why bsdtar(1) only errors out when built statically? QEMU_EMULATING builds use native-xtools which are native binaries built statically in order to speed up emulated builds.

--- tar tf (en_US.UTF-8)
+++ tar tf (C)
@@ -1258,7 +1258,7 @@
 Sphinx-3.0.3/tests/roots/test-images/subdir/svgimg.pdf
 Sphinx-3.0.3/tests/roots/test-images/subdir/svgimg.svg
 Sphinx-3.0.3/tests/roots/test-images/subdir/svgimg.xx.svg
-Sphinx-3.0.3/tests/roots/test-images/testimäge.png
+Sphinx-3.0.3/tests/roots/test-images/testim?ge.png
 Sphinx-3.0.3/tests/roots/test-index_on_title/
 Sphinx-3.0.3/tests/roots/test-index_on_title/conf.py
 Sphinx-3.0.3/tests/roots/test-index_on_title/contents.rst
Comment 6 commit-hook freebsd_committer freebsd_triage 2020-05-30 12:27:50 UTC
A commit references this bug:

Author: dbaio
Date: Sat May 30 12:27:38 UTC 2020
New revision: 537077
URL: https://svnweb.freebsd.org/changeset/ports/537077

Log:
  textproc/py-sphinx: Fix build (extract) with static bsdtar(1)

  ===>  Extracting for py37-sphinx-3.0.3,1
  => SHA256 Checksum OK for python/Sphinx-3.0.3.tar.gz.
  tar: Pathname can't be converted from UTF-8 to current locale.
  tar: Error exit delayed from previous errors.
  *** Error code 1

  Issue found at least on arm64.aarch64 and mips.mips64 builds using
  native-x-tools/poudriere.

  As /usr/bin/tar is replaced by the binary in /nxb-bin/, pointing EXTRACT_CMD
  to /usr/bin/bsdtar instead.

  root@12-mips64-default:/tmp # file /usr/bin/tar
  /usr/bin/tar: ELF 64-bit LSB executable, x86-64, version 1 (FreeBSD),
  statically linked, for FreeBSD 12.1, FreeBSD-style, stripped

  root@12-mips64-default:/tmp # file /usr/bin/bsdtar
  /usr/bin/bsdtar: ELF 64-bit MSB executable, MIPS, MIPS-III version 1 (FreeBSD),
  dynamically linked, interpreter /libexec/ld-elf.so.1, FreeBSD-style, for
  FreeBSD 12.1, stripped

  This patch bypass the issue here (all scenarios we have tested), but the
  problem still exists and it's being tracked in bug 246670.

  Please, see more details in bug 246618. Thanks to tijl, jbeich, kevans and
  all people who helped in testing.

  PR:		246618, 246670
  Submitted by:	tijl
  Reported by:	jbeich

Changes:
  head/textproc/py-sphinx/Makefile
Comment 7 Charlie Li freebsd_committer freebsd_triage 2021-04-29 21:51:48 UTC
This is now happening with devel/py-wheel.
Comment 8 Charlie Li freebsd_committer freebsd_triage 2021-04-29 21:58:05 UTC
I commented too soon. This happens when nearly everything in locale is suffixed UTF-8:

root@aarch64-13-0-default:/ # locale
LANG=C.UTF-8
LC_CTYPE="C.UTF-8"
LC_COLLATE="C.UTF-8"
LC_TIME="C.UTF-8"
LC_NUMERIC="C.UTF-8"
LC_MONETARY="C.UTF-8"
LC_MESSAGES="C.UTF-8"
LC_ALL=

May need to play with $LC_ALL a bit, probably in poudriere itself.
Comment 9 Charlie Li freebsd_committer freebsd_triage 2022-02-13 12:46:59 UTC
Created attachment 231791 [details]
set C.UTF-8 as ports tree locale

Following bug 246618, comment 10, as FreeBSD 11 has been EOL. A number of ports are affected. At least the ones I use or otherwise known:
* textproc/py-sphinx
* devel/py-wheel
* sysutils/py-ansible-core
Comment 10 Danilo G. Baio freebsd_committer freebsd_triage 2022-02-13 13:01:21 UTC
(In reply to Charlie Li from comment #9)

More examples can be found here (Just check the build logs):
https://portsfallout.com/fallout?category=extract
Comment 11 Kyle Evans freebsd_committer freebsd_triage 2022-02-13 15:36:14 UTC
(In reply to Charlie Li from comment #9)

Does this actually work? IIRC the root cause of this problem was that we use statically linked tar in the jail, but now that I say that out loud it may have been trying to use iconv and failing dlopen.
Comment 12 Charlie Li freebsd_committer freebsd_triage 2022-02-13 15:41:05 UTC
(In reply to Kyle Evans from comment #11)
Yes. Was slimming down some of my port overlays and both devel/py-wheel and sysutils/py-ansible-core, which have UTF-8 paths in their tarballs; build successfully without any further locale adjustments ie EXTRACT_CMD under native-xtools. Other ports under all of native-xtools, QEMU_EMULATING and native architecture continue unaffected.
Comment 13 Kyle Evans freebsd_committer freebsd_triage 2022-02-13 19:49:04 UTC
(In reply to Charlie Li from comment #12)

Can you open up a new issue for C.UTF-8 in the ports framework and request an exp-run, please?
Comment 14 Christian Ullrich 2023-04-19 12:37:13 UTC
This just bit me with devel/aarch64-none-elf-gcc in poudriere, 13.2-RELEASE jail with native-xtools, building on amd64 for aarch64. The EXTRACT_CMD workaround worked there, too.

tar: Pathname can't be converted from UTF-8 to current locale.
x gcc-11.3.0/gcc/testsuite/go.test/test/fixedbugs/issue27836.dir/\303\204foo.go
tar: Pathname can't be converted from UTF-8 to current locale.
x gcc-11.3.0/gcc/testsuite/go.test/test/fixedbugs/issue27836.dir/\303\204main.go